Next: Bullshit jobs
I was asked today why I often use magic numbers in my code.
It's a small issue, but I feel passionately about it as the current trend of replacing every constant by a symbolic name is extremely annoying. And insistence of some code analysis tools on treating all magic numbers as coding style errors drives me crazy.
Just the give you the context, here's an example of magic number:
i = i + 4;
And here's a "fixed" version of the code:
#define FOO 4
...
i = i + FOO;
There are certainly cases where using symbolic names is good style. Two of them come immediately to mind:
First, the case where numeric value of the constant is irrelevant. The number is just a placeholder for a symbolic name:
#define STATE_START 1
#define STATE_ACTIVE 2
#define STATE_SHUTTING_DOWN 3
#define STATE_DONE 4
This is basically the use case for C enums or Ruby symbols (":foo" and such).
Second, the case where constant is used as a compile-time configuration parameter:
#define BUFFER_SIZE 4096
The idea here is to define the value at a single point and then use the symbolic name elsewhere. That way, if need arises, we can change the value as needed by modifying a single line of code.
All that being said, what about other use cases? Cosider this code (you often encounter that kind of thing when parsing network protocols):
uint8_t version = *((uint8_t*) ptr);
ptr += 1;
uint32_t seq = *((uint32_t*) ptr);
ptr += 4;
uint16_t id = *((uint16_t*) ptr);
ptr += 2;
What about magic numbers beaing added to 'ptr' in the example? Should we rewrite it in the following way?
#define VERSION_SIZE 1
#define SEQ_SIZE 4
#define ID_SIZE 2
...
uint8_t version = *((uint8_t*) ptr);
ptr += VERSION_SIZE;
uint32_t seq = *((uint32_t*) ptr);
ptr += SEQ_SIZE;
uint16_t id = *((uint16_t*) ptr);
ptr += ID_SIZE;
Does that make sense? Let's think about it.
Does it make the code more readable? Well, no. It actually makes it less readable. If you see a line like this:
ptr += SEQ_SIZE;
You have to find the definition of SEQ_SIZE at the different place in the file or even in a different file. Thus, smooth reading of the source code is interrupted by scrolling up and down, by grepping and so on.
So maybe the use of the symbolic constant makes the code more maintainable? Maybe you can adjust the behaviour of the program by simply changing a single line of code? Well, no.
This change:
#define SEQ_SIZE 3
doesn't produce a modified program. It produces a broken program.
The value of SEQ_SIZE is tightly bound to the program logic (namely to the fact that the field in question is of type uint32_t) and cannot be changed at will.
You can still get rid of the magic number like this:
ptr += sizeof (uint32_t);
But I would argue that it makes the code less readable when compared to simply using number "4" and I am not even speaking of extra effort you need to spend when typing the code.
As a conclusion: Don't treat magic numbers as bad coding style automatically. In some cases replacing magic numbers by symbolic constants makes the code pretty awful to read and maintain.
Martin Sústrik, November 27th, 2014
Next: Bullshit jobs
This code that reads from ‘ptr` is probably already broken anyway, except if the initial address is 3 mod 4 (maybe even 7 mod 8). This kind of unaligned reads works on x86 but probably won’t on some ARM-based platforms such as Android, causing a SIGBUS.
Yes, same thing for SPARC and other RISC platforms. However, doing it in a portable way would just make the article less clear.
Your network protocol example is not convincing. In fact, it is just broken as it relies on pointer conversions between unrelated types. This kind of code would not pass any safety audit, but even if you don't plan for verification, using such crappy code to justify that magic numbers are good for readability is just nonsense. If you start with crap, you will end up with crappy conclusions.
Now, try rewrite your network protocol parser with some real, well… parsing. You know, by consuming bytes and using them to compute higher-level values. Use some real coding standard as a guideline (MISRA-C, perhaps?) and then you will discover that you don't need any magic numbers to keep it floating. And, as a side-effect, you might even end up with code that can defend itself at safety reviews.
(And no, it will not be any slower. Pointer conversions between single bytes and words don't buy any performance over properly indexed buffer iteration.)
Well, this is the experience from parsing 3GPP protocols: They are so irregular that whatever abstraction you use (uint16_t state = read_short (stream); or whatever) you eventually end up looking directly at the undelying buffer and messing with individual bytes.
Also, even if the above was not true, the problem is just shifted to the parsing tool. It has to extract 2 bytes from the buffer. Should it use a symbolic name or constant 2? Another option, as mentioned in the article, is sizeof(uint16_t) which is just a fancy a somehow less readable way of writing constant 2.
Do you need to extract half-word (two bytes) from the buffer? Try this:
const uint8_t lsb = buffer[current_position++];
const uint8_t msb = buffer[current_position++];
const uint16_t value = ((uint16_t)msb « 8) | (uint16_t)lsb;
I leave it up to you to decide whether magic number 8 deserves to be named, but certainly there is no need to mess with number 2 - it is implicit in code as a natural consequence of *parsing* two consecutive bytes. Obviously, if you parse two bytes, then you are two bytes further down the stream, aren't you? Note also that if you change the protocol to read full word (4 bytes) instead, there is no need to mess with that number or with sizeof(T). But what is most important, endiannes is covered by means of proper computation, so the code will work correctly no matter what is the CPU architecture, without any preprocessor conditional shit. Certainly this is more robust and more portable.
And as a bonus there are no pointer conversions between unrelated types, it's is all just, well… parsing.
If you want to see a complete non-trivial serializer/deserializer written this way, see here:
http://www.inspirel.com/yami4/misra.html
In any case, I recommend a good and established coding standard before doing such work. Low-level programming does not have to mean crappy programming.
Yes, but try reading muliple such fields, then try to pad the whole thing to 4-byte boundary. Oops, suddenly you need to know how many bytes were written up to now. The stream abstraction is suddenly broken.
Anyway, the whole parser discussion is irrelevant. We are dealing with computers and computers deal with numbers. At some point down the stack you just have to start using numbers (e.g. number 8 in your example). So, when reaching that point, what are you going to do? Use literals, symbolic names or what?
That will depend on the number itself. Some of them are self-descriptive or are heavily coupled to foundational concepts, like 0, 1, 8 (num of bits in a byte), but some are not (2, 100, 1000, etc.). I tend to use literal values for the former and named values for the latter group.
In both cases the idea is that the reader should have no additional questions when reading the code and the meaning of each number (whether named or not) should be clear from the place where it is used.
Different language, same conclusions :
http://wiki.laptop.org/go/Forth_Lesson_6 (see Stylistic Point - Too Many Constants)
much safer version is:
static_assert(sizeof (seq) == SEQ_SIZE);
ptr += sizeof (seq);
also auto and decltype are your good friends.
Post preview:
Close preview