The cost of duplicating code, of accumulating technical debt, of not having tests and so on is often discussed and written about. The cost of abstracting things is almost never mentioned though, despite it being a major factor in keeping any project maintainable.
As an example, imagine that your program often increases two variables in sequence:
i++;
j++;
So you decide that the functionality deserves a dedicated function:
template<typename T> inc_pair(T &i, T &j) {
i++;
j++;
}
By doing so you've removed some duplicated code but also — and that's a thing that's rarely mentioned — you've added a new abstraction called "inc_pair".
In this particular case, almost everybody will agree that adding the abstraction was not worth it. But why? It was a tradeoff between code duplication and increased level of abstraction. But why would one decide that the well known cost of code duplication is lower than somewhat fuzzy "cost of abstraction"?
To answer that question we have to look at what "cost of abstraction" really means.
One obvious cost is that an abstraction is adding to the cognitive load of whoever works with the code: They will have to keep one more fact in mind, namely, that inc_pair is a function that increases both of its arguments by one.
However, the main cost of abstraction is in separating the implementation from the specification, or, to put it in a different way, the letter of the function from the spirit of the function. The former being what the function does, the latter being what everybody believes it should do.
The important word in the above sentence is "everybody". Once you make an abstraction, it's not longer about what YOU believe it should do. You are entering the domain of social consensus. It's about what EVERYBODY involved thinks it does. And, as everybody knows, social consensus is hard.
Let's look at a practical example: What if type T for our inc_pair function is "Duration"? What will the function do? Will it increase the duration by one second? One day? One nanosecond? Individual users may disagree.
Another example: What if operator ++ on type T throws an exception while doing j++ ? Sould the function leave i and j in inconsistent state? Or should it try to keep the change atomic by doing i— ? And what if i— throws an exception while doing that? Maybe the function is supposed to copy the old values of the variables and set them back in case of an error? Nobody really knows.
In short, the decision about creating an abstraction should not be taken lightly. There's a large social cost to every abstraction and if you are churning them out without even thinking about it you are on the way towards making the project unmaintainable. If the abstractions leak to the user you are also making it unusable.
That being said, most of the projects out there are already so abstraction heavy that they are almost unmaintainable. Thus, the programmers are accustomed to the state of unmaintainability, consider it the normal state of affairs and they happily contribute to the mess by adding more and more abstraction.
I've already shown one example of adding a new abstraction for no particularly good reason: Programmer notices that two pieces of code are somewhat similar and creates a function to de-duplicate it. The cost of doing so is rarely taken into account.
Another example, a systemic one rather than an accidental one, is the use of mocking in tests. While tests are definitely useful, they often require certain piece of code to be abstracted so that it can be mocked — for example by creating an interface, then having both actual and mock implementation of the interface. Creation of an additional abstraction is just an collateral damage.
Yet another example are inheritance hierarchies. Say we want 4 classes: Egg laying fish, live bearing fish, egg laying mammals (echidna) and live-bearing mammals. The intuitive reaction is to create an inheritance hierarchy topped by class Animal, with intermediary nodes Fish and Mammal. It is often done that way even if there's no use case for directly working with Animal, Fish or Mammal class. Thus, future maintainers are left to scratch their heads about how should those abstract classes behave.
In the end I would like to look at what anti-abstraction tools we have at our disposal.
First of all, we have scoping. If a C function is declared as "static" it is visible only within that file. That limits the amount of damage it can cause. It is still an abstraction but it is prevented from leaking to the wider audience. Same can be said of Java's "private" modifier. Of course, it's not a panacea. If source file is 10,000 lines long the scope of a static function is greatly extended and it becomes almost as dangerous as a non-static function.
We also have unnamed objects (e.g. lambdas). It turns out it's hard to treat something with no name as an abstraction. (I wonder what Plato would say about that!) If the only way to refer to a function is by writing it down the letter and the spirit of the function become much more entangled.
It also seems that Go's implicit interfaces were designed to avoid unnecessary abstraction. Given that interfaces are not defined when object is implemented but rather they can be created on the fly as needed by the user, their scope can be significantly limited thus keeping the number of people affected by the abstraction lower.
My final question is whether we are doing enough to limit the amount of abstraction we have to deal with. And do we have sufficient tools to do such limiting? While in many cases it seems that it's only a question of programmer's attitude, at least in the case of mocking it's the tooling itself that contributes to the problem.
Martin Sústrik, Nov 7th, 2016
Social consensus can ,though, be enabled with technical means.
Consider Bitcoin. Their social consensus is mediated with the use of the blockchain and mining.
In dependently typed programming languages like Coq or Idris or Agda, the Type of the function is its specification. Here again the typechecker reduces the cost of Social consensus on abstractions.
Since you work on network libraries, it is worth mentioning that session types do the same thing for network protocol specifications and their implementations.
Yes, true.
I guess the thing here is that widely-accepted abstraction like bitcoin, TCP or socket API are "worth it" meaning that cost of seeking the consensus is lower than not having consensus. As for my_hacky_helper_foo() function, it's the other way round.
Dependently typed languages trade the reduction of the cost of social consensus with the cost of creating a very strict specification and proving that your implementation abides by it.
So even in these languages, you can decide to be sloppy as in any other language because you do not want to pay the upfront cost.
On the other hand, in these languages, the specification acts as an input when you program. In every step, you know whether you are doing something wrong or not. It can even generate part of the code.
In general, I think that the ability to avoid the social consensus and its cost if you define a good specification outweighs the cost of defining the specification.
(Keep in mind that the (for ex. TCP) specification is a document that is interpreted by the human brain. In Idris, the interpretation happens by the typechecker. )
Agreed with Apostolis. Costs of abstractions depend a lot on the context.
In my opinion, social consensus is necessary only due to insufficiently good tooling - mostly type systems and compilers for them. If function specification is completely defined with it's type and validated by a compiler, then this cost of abstractions goes away. Coq, Idris, Agda and, to a bit smaller degree, Haskell have strong enough type systems to avoid this cost in majority of cases.
Another aspect of abstractions is their reusability and composability. If an abstraction can be learn once and reused over and over again, their benefit overweight their cost. If an abstraction composes well into existing abstractions (i.e. it fits well into existing ecosystem), their benefits are much higher than of ad-hoc abstractions.
In my experience, the only abstractions which satisfy all these properties and in majority of cases are net positive, are abstractions based on mathematics and when used in a strongly typed language functional languages. Such abstractions are general and apply to many areas (semigroups, monoids, foldables, monads, arrows are used by almost all software developers without even knowing them) - i.e. they are reusable. These abstractions are all about composition (e.g. pure total functions, monoids, monads, arrows, categories) between their instances and also between different abstractions (e.g. it is well understood how to lift a pure function into a monadic function, or how to fold over values which form a monoid, etc.).
So, the cost of abstractions goes away for languages almost nobody uses, and the 99% of programmers wont ever touch.
Isn't that like saying, "the inner city violence problem can be solved by moving to Tokyo which has almost no violent crime"? Sure, but how's that a solution that the average person can apply?
Very insightful. I like thinking about software complexity and one of the concerns there to deal with complex software is that the design and intentions should be communicated (which means either documentation or exist as a common understanding of purpose and function).
From your perspective, it means that there is also a need to establish agreements on the levels, depth and ways abstractions in the code are formed. Indeed, I worked with software where the functions and operations weren't implemented in a messy way per sé, but the many levels of indirection, abstraction (and obscurement) made things just really difficult to read and a real tail-chaser when it came to maintenance.
Those levels can also make it much more difficult to understand the flow and the operations that are happening, because in many languages you pass references to data objects, so data gets changed in many ways.
Nice article, puts me into thinking mode again! :)
I think the cost/benefit is primarily determined by your group EVERYONE. Unlike yourself, I haven't worked on OSS projects, so haven't encountered the social and political project issues at that scale. In such contexts, where folk turn up from everywhere, with wildly different levels of skill, competence, experience … more formalism is what's required to keep the motley group together. But because we get enough of that in our day jobs and just want to get things done, I expect that's the last thing people want.
It's human to use abstractions. But abstractions are cultural. English in England uses the word Binge. But that word has no real meaning or use outside of British English.
Similarly, some C++ cultures clearly know what they expect to happen in the face of an exception, while others know and expect a different model, while some never really think too much about it up front.
Culture. Abstractions without culture. I think that's where things begin to breakdown. I think abstractions are defined within a culture.
Your inheritance hierarchy, for me, demonstrates a different problem. Using the wrong tools, abstractions included, create problems of their own.
A common unhelpful abstraction is the implicit polymorphism in dynamically-typed OOP languages - when in the majority of cases it's not even needed.
Consider a function like this pseudocode:
func foo(s):
…
do something with s.length()
…
This is polymorphic: which length() method is called depends on the type of 's' at runtime. You may have a lot of trouble locating the correct length() method in the source, especially if this code calling point is deeply nested and you have to work backwards to try to infer the type of 's'.
But in many cases, func foo may be written only to make use of objects of class Bar. It could have called the one possible function explicitly:
func foo(s):
…
do something with Bar.length(s)
…
Bar.length is exactly one function and can be quickly located in the source code. (Its implementation *might* even do different things depending on the type of s, but this would be localised)
In statically-typed languages you write func foo(Bar s) of course, restricting the type of s. What I'm saying is, you *could* have made the intent clear even in the dynamically-typed language as well; but most people don't, relying on dynamic method dispatch to save a few characters of typing.
The OOP approach also leads to unnecessary design quandries. If you have code which acts on an instance of a and b, should it be a.foo(b) or b.foo(a)? I would rather just write foo(a,b) and the problem goes away. foo is still an abstraction, but at least its implementation (and hopefully documentation) can be easily located.
I think it should be the other way around actually. If there's social consensus about something, this warrants an abstraction. Even if this makes code technically more complicated. Maintainers then know where to start analysis when there's a bug, and new features tie in more easily. I wish they had done that with sql, html, javascript, etc. rather than putting everything into a string. I cannot even imagine the cost of all the injection vulnerabilities resulting from -not- creating an abstraction for all those things.
Post preview:
Close preview