I've been thinking about documentation a lot recently. Namely, how to maintain internal documentation in a large-scale project where multiple teams are involved and whose lifetime is measured in decades rather than years. The challenges in such environment are very different from what you experience when writing the documentation for your personal hobby project. In the latter case it's just a question of writing the document. In the former case, though, there's a whole bunch of different problems: People join the project and leave it all the time. Sometimes they die and take all their knowledge with them to the grave. Documents contradict each other. They are forgotten and not updated for years. When trying to use them you often find out that they are more of an obstacle to understanding than help. Documentation is a read-once affair. It's only read by newbies who don't have sufficient knowledge to improve it. Senior people, who can, at least in theory, fix it, never use it. And the list goes on and on.
So, here's a list of principles that could alleviate the problem:
- Acknowledge that brute force doesn't work.
- Make documentation a first class citizen.
- Make documentation executable.
- Track the intent.
- Measure it.
Acknowledge that brute force doesn't work
This is an observation from the wild: Asking developers to write and maintain documentation isn't enough. It never worked and it never will. Unless the whole system of dealing with documentation, both from technical and organisational perspective, is changed, the things won't get any better.
Accept that. Stop sweeping the problem under the carpet. Do something about it.
Make documentation a first class citizen
Documentation should have exactly the same status as code and should be treated in exactly the same way. If it's not you are sending a message to the developers that you don't care. And if you don't care why should they?
- Is your code stored in source control and your documentation in Word files? Fail!
- Does your IDE allow for rapid one-click browsing through code but not through documentation? Fail!
- Is it easy for developers to change the code, but hard to change the documentation? Are you storing the diagrams as jpegs and require developers to use Photoshop to change them? Fail!
- Is all the code related to a component stored in a single directory but its documentation is elsewhere? Fail!
- Is the code owned by developers and documentation by technical writers? Fail!
- Does every line of code have a person responsible for it but documentation doesn't? Fail!
- Do you do code reviews but no documentation reviews? Fail!
- Is passing the test suite a prerequisite for release, but getting documentation in order is not? Fail!
- Is developers' performance evaluted based on lines of code, but lines of documentation are not take into account? Fail!
Make documentation executable
This point is meant to solve the 'read-once' syndrome. The goal is to make developers revisit the documentation on regular basis.
In devops world, it's easy. Just merge real-time statistics and diagnostics of the system with the documentation, be it dashboards, playbooks or whatever. It's 2015 after all and documentation doesn't have to be a static web page.
In pure software development world it's a bit more tricky but not impossible. Just take into account that a large codebase is a living and breathing system of its own. Consider Travis CI widget that started to appear in READMEs at GitHub lately. Does it make you revisit the README more often? Heck, yes!
Still, much more can be done. Documentation of a component can show recent changes made, rate of test failures over time, contact info of the person responsible for it. In some cases, documentation can even be scraped to drive the automation of the development process.
Track the intent
The biggest problem with long-lived software systems is thar nobody knows why they were built the way they were any more. Nobody knows about the use cases any more. Original developers have, one by one, left the company or retired and new ones have no idea. The system cannot be changed because nobody knows what it is supposed to do in the first place.
The sad thing is that most of that was actually documented in past but the documentation was lost. Most big companies have the practice of writing the design documents where use cases are explained and a solution is proposed. These, however, are often treated as a throw-away documents and couple of years later it is almost impossible to get one's hands on them. Even worse, if you are looking for an explanation for a particular phenomenon in the code you have no idea which design document describes it. The only way is to get as many design documents as you can and skim them for anything that may relate.
And it's really not a hard problem to solve. Just add the design document to the feature patchset, and then keep it in the codebase forever, an unmutable witness of past intent. Additionally, if the design document shares the source control repository with the code it's trivial to cross-reference the two.
Measure it
Last but not least, understand how the documentation is used. If it's a web page it's easy to track number of views. If it lives is in source control measuring number of edits isn't hard either.
If a page isn't used, it's a strong signal that there's a problem. Maybe it's so outdated that nobody cares looking at it any more? Maybe the code it refers to isn't used any more? Maybe the documentation doesn't have dynamic, "executable" aspect as discussed above so that only newcomers read the docs?
If a component shows a lot of change code-wise but no change in documentation something is probably wrong. It's time to trigger an alarm.
Is one department having a significantly different documentation usage that another one? What's going on?
Martin Sústrik, August 9th, 2015
Is a developers' performance evaluated based on lines of code? Fail!
Ha. I made a bet with myself that this comment will show up early in the dicsussion :)
Evaluation based on LoC is terrible, but the point is that evaluation should treat documntation exactly the same way as code.
See http://www.folklore.org/StoryView.py?story=Negative_2000_Lines_Of_Code.txt
That's one of my favourite stories. When I grow up I want to make living by cutting out code :)
One thing I particularly enjoy is cutting out scaffolding from Java code, i.e. ~90% of the codebase. But, for some reason, it makes Java devs angry.
I feel like just trying to have a pleasant conversation about the data path makes them angry.
This is so true. Anyone can read code, but not everyone can read intent. This is why having a playful sense of empathy is really really important, and why code itself (minus comments) is not docmentation. I ask myself "What are we trying to do here?", if i don't know, the details mean nothing to me…
Two words: Literate Programming.
Literate programming doesn't seem to fly in corporate environment.
But if you are interested in that kind of stuff, also look here: http://250bpm.com/blog:54
In a corporate environment you need to hire an Editor-in-Chief. That person attends code reviews to make sure that the new section or paragraph fits the documentation standards and accurately reflect "why this code was written". They manage the repository, ensure that the "project book" is up to date and builds properly. They make sure that the book "reads like a book" and not some random jumble of words. The gold standard is the book: "Physically Based Rendering" by Pharr and Humphreys.
Imagine how much easier it would be to hire people. They have to pass the "Hawaii test". Give them the project book, send them to Hawaii for 2 weeks, and when they return they should be able to maintain, modify, and extend the project as well as any current member of the team.
You wrote (in your linked article)
"I can't imagine engineer documenting the yearly balance program in TeX"
Companies wrote millions of lines of Cobol. The Railroad Retirement organization has multi-millon lines of legacy code. Nobody knows how to change it or maintain it, likely because there are constants like "7.353" which was the marginal tax rate in Oklahoma in 1967. No-one bothered to write down "why" that constant exists and there is no obvious way to reverse engineer it.
Companies now write millions of lines of Java (the new Cobol). They are inserting constants like "1.333" (the aspect ratio of 1024x768 monitors) into web-based code but don't say "why". If you've only worked on mobile screens you might never see that aspect ratio and have no clue how to reverse engineer that number.
The point is that mundane code contains a LOT of "why?" issues and literate programming is all about "why", not "how". The Editor-in-Chief (ideally a Language major) should constantly ask "why?" about everything and make sure the answers are in the book.
New programmers hired to maintain and modify the "yearly balance program" likely have never heard of PBIT, EBIT, or EBITDA. Those concepts need to be in the paragraphs surrounding the code to compute those values.
Projects outlive programmers. If your corporation depends on a piece of code in order to continue to function as a corporation you really need literate programming.
That's all well and nice, but have you seen it actually work anywhere? I'm really curious. If you did, I would love to know where.
Post preview:
Close preview