Documenting projects

Oh no, documentation… everyone seems to hate writing it, and everyone seems to hate reading it (because look how poorly it is written, of course). Yet, often it can be the only way to preserve crucial knowledge for future generations. This effect is particularly pronounced in organizations like CERN, where many people come on relatively short fixed-term contracts, and face-to-face knowledge transfer doesn’t always work out.

But documentation is so often misunderstood as a mere vehicle for information. That couldn’t be further from the truth. For one, writing documentation leads you to explain the reasoning behind your choices. It challenge the implicit assumptions you made along the way, as any holes in your chain of reasoning will become plainly visible. More than once have I witnessed someone reach an Aha! moment, which caused them to rethink fundamental assumptions in their approach to a problem, during the course of “just writing some docs for completeness”. How they wished they had done that earlier!

I believe that there are two keys to breaking the vicious circle of poor documentation leading to low engagement leading to low investment in documentation leading to… It starts with acquiring The Right Mindset, which I’d argue boils down to asking yourself who will need what information and when, and using this to select the most appropriate format (or, usually, multiple formats).

The second step is to select Tools Which Will Empower You on your mission, rather than hinder your efforts. In this post, whose primary purpose is to serve as a cheatsheet to myself, I attempt to formulate my views on writing docs, and give tips about the tools that help me every day.

General observations

Projects evolve, so any documentation that cannot be trivially updated is doomed to be obsolete
Your best chance of keeping docs up to date is to make them part of the source code – just like your changelog
API docs should be extracted from the source code and incorporated within a human-designed overall structure. Sphinx does a good job of enabling this.
It is okay to have some redundancy between different documents when it helps readbility or clarifies the context
- For example, a glossary will most likely contain some information that is repeated in main text

Types of documentation

This cannot be emphasized enough – use the right format for the job at hand!

User guide & Developer’s guide

two different audiences, but very similar process
should be aggressively kept up-to-date
- version-controlled together with the code
deployment should be automated, so that it does not require ongoing effort (e.g. Github Pages)

Recommended tool: Sphinx

Advantages:

natively supports Python API docs
C/C++ via Breathe extension
reST language is powerful – although this comes at the cost of a learning curve
high-quality themes freely available

Drawbacks:

not interconnected between projects organization-wide, usually no global search
the syntax for tables in reST is pretty clunky

Examples:

Honorable mention: Jekyll

I wouldn’t recommend it for this usecase because it’s less structured by default, but the systemd project has done spectacular work using it (source code here)

Wiki (DokuWiki, Confluence…)

excellent for planning, writing down meeting minutes, drafting new ideas…
less suitable for documenting the product itself

Advantages:

global search
collaborative editing, WYSIWYG
cross-links between projects less likely to break
attachments

Drawbacks:

since it is not part of source code, it’s easier to forget to update it when implementing a change
- unless it is part of the source code
it can be tricky to propagate information from source code on push/release
not always easy to reorganize and move pages around (since they are not plain files)
‘live information’ can only be embedded via specific plug-ins (e.g. JIRA)
if using a proprietary solution: vendor lock-in, difficult to migrate content elsewhere

Technical report

written once and usually not updated
excellent place to discuss the rationale behind the decisions made, as well as the design process in general (although these would usually deserve to be included in Developer’s guide as well)

Recommended tool: LaTeX

Advantages:

mature feature set
- cross-references
- powerful tables
- equations
- bibliography
outputs to PDF which is easy to pass around
Overleaf
high-quality typesetting

Drawbacks:

learning curve
LaTeX code very difficult to read and maintain

Examples:

CERN Accelerators & Technology Sector Notes, for example this one
Technical design of the Bmboot monitor

Article / blog post

written once and usually not updated
good way to present the project to a new audience, or just inform about recent progress
in practice, this is often the only place documenting the architecture and internals
- this is bad – it will be difficult to find this information later, and it might become obsolete

Recommended tool: LaTeX (for journal/conference articles)

Recommended tool: Jekyll (for blog posts)

Talk / Presentation / Lecture

There is a certain warmth to human speech which cannot be replicated in writing, no matter how skilled. A talk can be inspiring, funny, yet also strangely comforting and reassuring… It is a great medium for conveying information that is stable – keeping it up-to-date can be a challenge otherwise.

Strengths:

uniqueness of the human touch
can be consumed somewhat passively

Limitations:

very difficult to keep a recorded talk up-to-date in face of change
not searchable with current technology, difficult to skim a video

You can find more of my opinions on this topic in this post.

Figures

In LaTeX, the TikZ package is essentially legendary… but personally, I always got too frustrated trying to use it (beyond just copying snippets of already working code from SO). My go-to is ~~draw.io~~ diagrams.net. It’s a remarkable tool, extremely easy to get started, but surprisingly powerful ever for complex diagrams.

I have also enjoyed using yEd (but I’m never sure about its license terms). Its automatic layout features are particularly handy for mind-mapping, or exploring design spaces.

Additional resources

Opinions of other people that I might or might not agree with

Exhibit A

General Guidance on Technical Writing

While we do not provide specific guidance on writing, in general, follow the “four Cs” of technical documentation:

Complete: the document must completely describe the subject matter

Correct: the document must be factually correct

Concise: the document must include only what needs to be there

Consistent: the document’s content and tone must be uniform within the document itself as well as with other related documents

Exhibit B

Documentation should include:

Description of what the program/library is supposed to do. What is it expected to be used for.

What is the user API, how does one use the the program/library - including tutorials and examples.

High level description of implementation strategy

What are the design decisions

What is the rationale/motivation for these decisions

Documentation should not include:

Implementation details. These should be in code itself. If the code is not obvious, then either the code should be changed or comments added.

Reference to any private implementation details.

Documentation prepration should be seen as an aid to building a coherent design rather than some afterthought to try and fix something that has been made over complicated in the first place.

– reddit.com/r/cpp/, emphasis mine

Exhibit C

I very much despise the trend to write sharepoint or wiki/Confluence pages as a means for software documentation. I want my documentation to reside next to my source code, not at some obscure corporate URL.

Exhibit D

I find it hard to get people to write down anything. And if they do, they often times don’t put enough effort into it, like writing down some bullet points without context or screenshots without adding text. This is a personal and cultural problem at companies and in teams. A lot of people don’t know how to generate value from taking notes/documentation and aren’t good writers in general. I’m still thinking about how you can teach people what’s important when it comes to writing and when to take notes.

(…) Search is probably even more important than taking notes in the first place. I find myself searching through my notes multiple times a day, even if it’s something you can find at Google at the same speed. But my notes are mine, meaning I know what I can’t remember or in which form I need some piece of information. Sometimes, I write a long article about something and other times some bullet points are good enough. When I share my notes on a team/organization I try to make sure the context is clear. It’s a bit like code: it’s read by an order of magnitude more often than written/changed.

– Hacker News, trimmed, emphasis mine

Exhibit E

There are two main problems that documentation authors may have with API docs:

The generated API documentation lives in isolation, separate from the rest of your document content. This can lead to difficulties in maintaining a cohesive and integrated document that provides a holistic view of your product or service.

When trying to integrate API documentation with the rest of your content, manual editing is often required. However, often there is no ability to enhance autogenerated reference with manually written parts.

– API documentation | Writerside

Exhibit F

i keep detailed journals of every project i do, these days. partly because i’m so scattered lately that a project might sit for years so being able to pick up where i left off is the only way it will ever progress. but also because writing it down helps me think about it.

Exhibit G

Documentation is extraordinarily difficult to create. You need to anticipate the potential questions and answer them and anticipate all the potential perspectives and answer them from that perspective.

– Hacker News

Exhibit H

For fully-fledged documentation of a large (or small) software project, I’d definitely go with reST, most likely using Sphinx. (…) For a simple markup system use in things like forum comments, or marking up chat messages, the decision is trickier. Markdown is a good choice because more users would be familiar with it.

– reStructuredText vs. Markdown for technical documentation

Originally published on 2024-03-26.