Hard Things in Computer Science: Naming things
Previous: A Microstory
- In natural languages we use existing dictionary to express our ideas. We never invent new words. That makes it easy for the listener to understand what we are saying.
- In programming languages we are inventing new names all the time. To solve a problem you invent a new language, then use that language to describe the solution. Often this is done in multiple layers: Language A is constructed to describe language B which in turn describes the solution. This makes is super hard for another person to understand a program.
- Natural languages have tens of thousands words which we learn as kids when the brain is still malleable. Learning a new language at later age is extremely hard. We can't mimic natural languages in computer science unless we are able to reduce the number of words to a manageable number.
- Enter Semitic roots. Almost every Semitic word is based on a root of three consonants. The root conveys the basic meaning. So, for instance, in Arabic, root KTB has to do with writing. Then there are different "augmentations" of the root. Type I "kataba" means "to write". Type II "kattaba" means "make someone write something". Type III "kaataba" means "correspond with someone". Type IV "aktaba" means "dictate". And so on. More examples can be found here. Each of this types can also be changed to its passive version. Also, for each type there are derived nouns. From nouns you can derive adjectives. From verbs you can derive adverbs. Even prepositions are mostly derived from the three letter roots.
- Apply the above to the programming languages. With a standardized system of name derivations one would, when writing a program, have to invent only the names corresponding to core concepts of the problem domain. All the other names, function names, object names, argument names, and so on could be derived from those in a relatively deterministic manner.
- When reading a program you would have to internalize the core concepts, say two or three of them, but after that you would be able to read the program without having to figure out what individual internally-used names mean. No more "How the hell does 'parse' function differ from 'parse2' function?"
- It should be said that this is already used to some small extent. 'parse' is a function, 'parser' is an object. The relationship is relatively clear. Unfortunately, dictionaries of programs are typically based on English which is not very good at forming derived words.
Martin Sústrik, October 30th, 2017
Previous: A Microstory
"Learning a new language at later age is extremely hard"
I'm not convinced. It really doesn't seem very difficult. There's a lot of it, but it's not difficult.
The naming of things is super important. It was one of the main thesis behind John Day's book "Patterns in Network Architecture: A Return to Fundamentals." It's priced like a college textbook but well worth the money. It changed how I think of networks and protocols.
Essentially, the name of something should map to its identity, which maps to its location (address) which maps to the underlying routing to get you there. From this architecture multi-homing and mobility fits quite naturally. With better understanding of these fundamentals we wouldn't have many of the issues (problems) we have currently with IPv4, IPv6, Ethernet broadcast domains, DHCP, security, DNS, etc.
An awesome read.
P.S. Really enjoy the blog!
Yes, I have the book in my library :)
Oh, it's fantastic how it reorganizes your thoughts about layers. Except, it does ruin your life because now everything sucks. IPv4 is a mess. IPv6 is a mess. UDP and TCP are a mess. DHCP is a mess. Ethernet is a mess! All the minor protocols that we don't really have to deal with! :-)
This applies to anything else really. If you are a perfectionist, everything seems to suck because nothing is perfect. It's better to adopt the "better than nothing" mindset, and try to think how to make things better. Now you at least know what is the "ideal" way to do things, so you know which way to go. You are no longer in the dark. I would say, it's easy to claim "everything sucks" and not do anything to improve it, but once you start actually putting effort into it, you realize it's not easy and you start to appreciate, or at least understand, the "state of art" which is often not ideal.
I just leave this there:
https://en.wikipedia.org/wiki/Hungarian_notation
In practice though I've always seen Hungarian notation to degenerate into type annotations. (Even in statically-typed languages. Yuck!) Even in the wikipedia article, if you look at examples section, it's all about types. The original Simonyi's idea of semantic annotations very much failed. No idea what's the lesson to learn there.
I guess the lesson to learn is to reify everything, inducing semantics. Semantic annotations depend very much on the context and have a very vague meaning, as opposed to type annotations, which have a very clear and well-defined meaning (based on the semantics of the programming language at hand). Similarly to how the type is declared explicitly in the variable declaration instead of just being a prefix of the variable name, the semantics should also be made concrete and explicit. This would, however, require adoption of some formalization of semantics, which 1) requires a common semantic framework and a consensus among those using it, and 2) such formalization is difficult for most people to grasp and become comfortable with using, I'm afraid.
A commenter at lobste.rs thread gives a good example: In Java, if you see entity called FooFactory it's likely to be an object factory that produces objects of type Foo. In that case, -Factory suffix acquired pretty strict semantic meaning and, importantly, the understanding of the suffix is shared by entire Java community.
This is a good example of context-specific semantics. You only understand what it is because it's so common within the context of OOP. If it weren't for GoF and the popularization of "Design Patterns" due to them, you would have pretty much no idea of what a "Factory" is. So we need a common framework in which we think about the things, and consequently name these things. OOP design patterns is an example of such a framework. Obviously such an approach has vast advantages but can be quite limiting, especially when not thought out carefully.
Post preview:
Close preview