README is typically the front page of a project, and should contain relevant information for current users & prospective users. As to make sure documentation across a project is consistent as well, imagine if we could include this README that is the front page of our project, both on the repository, and in the documentation. This post goes into how to set this workflow up. Find a live example of this being implemented on: https://github.com/JackMcKew/pandas_alive.
A good starting structure for a project's README is:
Intro - A short description & output (if applicable) of the project.
Usage - A section on how the project is to be used (if applicable).
Documentation - Link to documentation for the project.
Contributing Guidelines - If this is an open source project, a note whether contributions are welcome & instructions how to get involved is well received.
Changelog - Keeping a changelog of what is changing as the project evolves.
Other useful sections when applicable are requirements, future plans and inspiration.
Inspiration for This Post
The inspiration for this post also comes from Pandas_Alive, wherein there is working examples with output hosted on the README. Initially, this was contained in a
generate_examples.py file and as the package evolved, the code to match the examples, was being copied over into code blocks in the
README.md. If you can see where this is going, obviously whenever some new examples were made, the code to generate the examples was being forgotten to be copied over. This is very frustrating for new users to the package, as the examples simply don't work. Thus the workflow we go into in this post was adopted.
In projects, typically it's best practice to not have to repeat yourself in multiple places (this the DRY principle). In the README, it's nice to have working examples on how a user may use the project. If we could tie the original README with live code that generates the examples, that would be ideal, enter
Jupyter supports markdown & code cells, thus all the current documentation in the
README.md can be copied within markdown cells. Similarly, the code used to generate examples or demonstrate usage can then be placed in code cells. Allowing the author, to run the entire notebook, generating the new examples & verifying the examples are working code. Fantastic, this is exactly where we want to go.
Now if you only have the
README.ipynb in the repository, GitHub will represent the file in it's raw form, JSON. For example would be hundreds of line like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
This is not ideal whatsoever, this is nowhere near as attractive as the nicely rendered
README.ipynb -> README.md with nbconvert
nbconvert is a package built to convert Jupyter notebooks to other formats and can be installed similar to jupyter (eg,
pip install jupyter,
pip install nbconvert). See the documentation at: https://nbconvert.readthedocs.io/en/latest/.
Now let's check the supported output types for nbconvert:
- Reveal.js HTML slideshow,
- executable script,
nbconvert supports Markdown! Fantastic, we can add this step into our CI process (eg, GitHub Action). This will allow us to generate a new
README.md whenever our
In Pandas_Alive, we clear the output output of the cells in
README.ipynbwith the flags:
jupyter nbconvert --ClearMetadataPreprocessor.enabled=True --ClearOutput.enabled=True --to markdown README.ipynb.
Python Highlighting in Output
When first run, it was noticed that
nbconvert wasn't marking the code blocks with the language (python). This is required to highlight the code blocks in the
README.md with language specifics. The workaround for this, was to use
nbconvert's support for custom templates. See the docs at: https://nbconvert.readthedocs.io/en/latest/customizing.html#Custom-Templates.
The resulting template "pythoncodeblocks.tpl" was:
1 2 3 4 5 6
Which could be used with
Integration into Documentation with Sphinx
If you haven't already, check out my previous post Automatically Generate Documentation with Sphinx. The post goes into detail on how to implement Sphinx as to generate all of the documentation for a project from docstrings automatically.
Before going on, the live site of the documentation in reference can be reached at: https://jackmckew.github.io/pandas_alive/
- Stored our working code & documentation for a our project's front page in a Jupyter notebook
README.ipynbinto markdown format with
- Inserted language specific (python) into the code blocks within the markdown
The next step is to make the README content also live in the documentation.
Since Sphinx relies on reStructuredText format, so we'll need to convert
m2r, a markdown to reStructuredText converter.
nbconvertcould be used in this step over
m2r, in saying that this step was originally developed prior to the
README.ipynbbeing created, thus only
README.mdexisted. Please drop a comment if you try using
m2rfor this step and your results!
m2r can be installed with pip (
pip install m2r) and we can convert
README.md with the command
m2r README.md which will generate
README.rst in the same directory.
Now we need to include our
README.rst in the documentation. After much tweaking, the documentation structure set up landed upon for Pandas_Alive, with use of autosummary to automatically generate documentation from docstrings was:
Autosummary generated documentation is included within a separate rst file (developer.rst) to nest all the generated with autosummary within one heading with the ReadTheDocs theme
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
1 2 3 4 5 6 7 8 9 10 11 12
Integration with GitHub Actions
All the steps above mentioned are currently being used to maintain the project Pandas_Alive.
Find the GitHub Action yml files at: https://github.com/JackMcKew/pandas_alive/tree/master/.github/workflows
Find the Sphinx configuration files at: https://github.com/JackMcKew/pandas_alive/tree/master/docs