Previous: A case for unstructured programming
Everyone have heard that Donald Knuth have invented something called "literate programming". Everybody thinks that it's something like commenting your code very heavily, but maybe not — maybe it's different in some way. Wikipedia isn't of much help. Nobody is quite sure.
To understand what's going on, you have to remember what Mr. Knuth does for living: He's an academic. He writes papers and textbooks.
And how does a paper about computer science look like? It's nicely formated text, with chapters and figures and equations and snippets of code.
One writes such papers in TeX.
But there's a problem: How would you know that the program you are presenting would actually compile and run? It's a paper after all, you can't just take it and compile it.
Enter literate programming.
You write the document in a language that makes it easy to extract the snippets of code from the paper and reassemble them to form a full-blown program.
That's why there are two seemingly unrelated aspects of literate programming. First, there's a way to distinguish the document (TeX source) and the code (C, for example). Second, there's a simple template language to specify how do the individual snippets of code fit together.
Easy. Almost trivial. And yet unexpected and beautiful in its own way.
And it's also obvious why the literate programming haven't got much traction: Only minority of programmers works on complex algorithms and publishes papers and textbooks about them. Most programmers write CRUD applications. Or whatever is the latest and shiniest counterpart of CRUD nowadays. When writing CRUD, there's no need for extensive explanation of what the code does. It does CRUD. As simple as that.
Actually, there's an influential train of thought among programmers which argues that extensive documentation in the code is necessarily going to bitrot and turn into a maintainability problem. Programmers with such disposition try to keep the comments in code at minimal level and focus on writing self-documenting code (using descriptive variable names and such).
In any case, I've recently realised that there's a different niche outside of academia where literate programming would make sense.
It's the niche of complex processes which are not very exact or well understood, processes which change often but are executed rarely.
Consider, for example, a program that does yearly report for a big IT department.
IT changes a lot. The program is run on Jan 1st and after a year, when it is supposed to be run again, it no longer works. All the systems it interfaced with have changed. The API is different and the program doesn't even compile. It references database tables that were either heavily refactored or don't even exist anymore. The tools it used to perform its task no longer work. Licenses may have expired or the vendor have simply stopped supporting them. And so on.
In short, it can be said that the program is functional only once a year, on January 1st. The rest of the year it is broken.
Or you can put it in a different way: You can say that program execution and debugging blend to such extent that it's hard to tell the difference between the two.
How would you go about writing such a program?
Writing neat and perfectly polished code doesn't make sense. The program will bitrot almost immediately after it is exectuted anyway. What makes sense though, is documententing the intent of the code in a great detail. The programmer tasked with running the program next year will not make much sense of the broken code. References to APIs and tables that don't exist anymore will confuse him more than help him. However, he will greatly benefit from the lengthy description of what the code is intended to do.
That's why literate programming may help in this case.
However, I don't believe that tools Mr. Knuth and his associates have written would be of much help here. The use case is sufficiently different that, although it still can be recognised as literate programming, it needs either extended or completely different tools. I can't imagine engineer documenting the yearly balance program in TeX. Markdown seems to be a much more realistic option. Also, tying the tool to a single programming language probably won't work. Complex processes in most cases need to use a large number of disparate technologies and even pass the execution to the human being at times (e.g. "press the power button") and we have to account for that.
I am still thinking about how to solve this problem. It's extremely interesting because it requires redefining the very interface between the machine and the programmer. Instead of programmer writing the program and machine executing it afterwards we now have them participating on a single continuous process, programmer providing his ability to deal with the unexpected, machine supplying the raw computational power. Such systems are not unheard of (Prolog, anyone?) but they've rarely went beyond being an academic exercise. And maybe it's time to change that.
Martin Sústrik, July 20th, 2015
Previous: A case for unstructured programming
I've spent some time thinking about the subject. Critique: http://akkartik.name/post/literate-programming. My current improvement on it is a notion of layers: http://akkartik.name/post/wart-layers. My current project is at 9kLoC programmed entirely using layers.
Are you familiar with orgmode and babel? http://orgmode.org/worg/org-contrib/babel/
If you want to see literate programming done right, look at the book "Phyically Based Rendering" by Pharr and Humphreys. Now imagine that you just joined a project. They give you the book, send you to Hawaii for a couple weeks, and then see how well you can maintain and modify the project.
If real projects were maintained with the same level of human-to-human commuication there would be a lot fewer "legacy" projects. The U.S. is still using air traffic code from the 1960s. The Railroad retirement board has a huge legacy code base they can't maintain. Many other government agencies have the same problem.
In fact, you might run into the same problem if you ever get a chance to write code you wrote 10 years ago. You'll know WHAT it does but not WHY it does it. Literate programming is about writing down the WHY, not HOW.
Another application of literate programming is writing verifiable documentation. I like using Dredd. You write documentation for you REST API in Markdown and the document itself can be validated. It saves a lot of time during development, especially on communications.
Link: https://github.com/apiaryio/dredd
Looking forward to the answer to that question. I am interested in the interaction of software to the users of the software and how they can adapt it.
I've written a couple of literate programming plugins for Pandoc ( http://pandoc.org ), which I've written about at http://chriswarbo.net/essays/activecode/index.html
I tried to use Babel, as mentioned in another comment, but found it too bloated and complex. My approach is pretty simple in comparison: code blocks can be annoted with a shell command, which they're piped into, eg.
‘``{pipe="python"}
echo ’Hello world'
```
Most actions can then be achieved in a regular Unix way (eg. by reading/writing files, calling programs from scripts, etc.)
I've tried to define a similar language (although a bit more structured). Have a look here, but take it with grain of salt:
https://github.com/sustrik/litmark
Addressing this problem is the basic idea behind this research: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6344472&filter%3DAND%28p_IS_Number%3A6344456%29
It is generally about documenting and sharing problem-solving knowledge. As you note, this means one sort of thing for a computer scientist writing a paper, but something more pragmatic to an end user or business.
The software evaluated in this research was more of a traditional spreadsheet and GUI design, but other work looked at a test-based approach with something like Markdown. Something similar is going on in the data science community with "notebook" tools like Jupyter and Beaker.
I think the article is behind a paywall :(
Seemingly your problem belongs to 5th world of Joel's classification joelonsoftware com/articles/FiveWorlds.html
For me, first thoughts are rich interactive components like Morphic github com/jmoenig/morphic.js for "UI" part and convenient REPLs like Jupyter ipython.org (quite helpful tools) and Mathematica-like environments for "logic" part.
So I think your literate program is better to be a Jupiter or Mathematica/Sage/Mathics autoevaluating notebook perhaps.
I used noweb many years back to produce a series of Fully documented Drupal 6 modules to great effect. Admittedly I am a big fan of LP and fortunately the client only wanted a solution that worked. Armed with Leo editor (an awesome tool) and the notion of LP I wrote and delivered a dozen custom Drupal modules that integrated with external systems.
The beauty of LP in this case was, as the author states, in being able to describe the intention of each part of the code and I also highlighted key areas where external changes might affect things e.g. a different URl endpoint or database or such like.
I think there *is* a future for LP but not in the mainstream. Having worked in that stream for many years, and trying not to be disrespectful, the "main stream" doesn't care about a beautiful thing like LP, all it wants is a solution on time and on budget and it doesn't care how it got there.
If you have done LP you know that it can be very slow BUT you don't get issues (well not many) later. You could say that current BDD/TDD/CI tools make LP redundant but I am not sure. If there is one thing I truly hate it is "docblock comments" that are out of date or just plain wrong. This is not to say that somebody couldn't just hack the code part of an LP file and not update the text, or, worse, edit the code directly because they didn't understand how to use tangle and weave and in that one second they broke the entire thing because the LP source is now no longer accurately describing the live code base.
There are so many issues to resolve, there are no tools to support LP in the sense that PHPStorm or RubyMine support development.
A long way to go… but I think LP will live on… at least in my heart anyway!
Post preview:
Close preview