Adding extensions for generating md version of docs#2173
Conversation
|
Something more to be done:
Right now there is no html_meta description, we might want to try and fix that |
|
After fixing the links in the markdown, I would also like to add them to a |
|
Now the documentation is built in markdown, in addition to the standard html docs, and the llms.txt file correctly points to the location of the markdown files. For now, the location of the md files is in generated/doc/sources, which is not the same folder as the html docs: this means that I can't go from StringEncoder.html to StringEncoder.md to get the markdown version. I'm not sure if that would be a problem in practice. Additionally, the markdown files are also added to skrub/data/docs so that they are bundled with the wheel to make everything available with the package. I still need to add a pixi command that does that. |
|
Tests are failing for unrelated reasons (#2178 ) |
|
We might want to avoid copying some of the files to the skrub/docs folder. A big part of the docs is just the md version of what's already in the docstrings of the modules, so that part can probably be skipped. |
| skrub = [ | ||
| "_docs/**/*.md", | ||
| "_docs/**/*.py", | ||
| "_docs/**/*.css", |
There was a problem hiding this comment.
if it's plain-text docs for llms to read we probably don't need the javascript or css
|
The PR is ready for review. Some sticking points:
|
|
short summary of IRL conversation: it would be great if having the source user guide files inside the package and listed as package data was enough, because:
@rcap107 is going to experiment with this approach. there are probably several ways we can make the easier to navigate both for editors and consumers, such as more explicit filenames, more explicit cross-reference targets, inserting numbers at the start of filenames to make the reading order visible, etc. |
There was a problem hiding this comment.
adding files copied from the skrub/_docs folder to the gitignore so they don't get counted twice (like the changelog)
There was a problem hiding this comment.
this ignores all the .py files that are stored in the _docs folder, which are otherwise executed any time the test collection is run
|
This PR has become too complex and extensive and is mixing various different changes, which have now been spun out into #2190 and #2191 for now. Yet another separate PR will deal with moving the documentation to the skrub/_docs folder, but it needs to wait until the other two PRs have been merged. This PR will be closed in the end. |
This is a WIP PR where I'm trying to generate the markdown version of the documentation.
I had to add two sphinx extensions (sphinx_llm and sphinx_markdown_builder)
So far I was able to generate the md files, but there are some issues.
In some places, the html styling is bleeding in the markdown, which adds a lot of unnecessary clutter.
Overall, the PR: