Saturday, June 20, 2015

Riding the Rough C

C is probably the closest thing there is to a "standard" programming language.  You would be hard pressed to find any modern computing system that doesn't have C available.  Although it has somewhat fallen out of favor for large computing systems, like PCs (don't tell the Linux kernel developers that!) it is almost universally used for embedded systems.  In the embedded world it is the runaway winner, although C++ is gaining ground.

But C (and C++) has a downside.  I will write about C here, but keep in mind that pretty much everything I say applies to C++ as well.  One of the characteristics of C that made it so popular is flexibility: one is allowed to do just about anything in any way they please.  But that flexibility is the achilles heel.  I am going to show a few examples to make the point.  If you haven't been bitten hard by at least one of these, you can't call yourself a C programmer.

So, you know I'm going to show some examples of things that can gotcha in C.  These will be isolated code fragments that you know have a problem.  Even considering that, notice how difficult it is to find them all.  Then consider you have one of these bugs somewhere in your thousand or so lines of code, but you don't know what or where.  Consider how hard that will be to find.  C gives you enough rope to shoot yourself in the foot.

Let's start simple:

if( my_array[n]=1)
   printf("true\n");
else
   printf("false\n");

What does that do?  Hint: there is only one answer, with two parts, and it probably isn't what you think.  First, it will set the array element "my_array[n]" to 1, even if that element doesn't exist,  and then it will print "true."

It's likely you would have said something about "it depends on what is in my_array[n]."  But it doesn't.  The programmer probably meant to write "if( my_array[n] == 1)."  The operator "==" is the equality comparison operator, and the operator "=" is the assignment operator.  The "if" statement looks at the value of the expression within the parentheses to decide which branch to take.  By writing "my_array[n] == 1" it would compare the value stored in my_array[n] to 1 and if they are equal execute the "true" part otherwise execute the "false" part.  But writing "my_array[n] = 1" assigns the value 1 to the array element and an assignment returns the value assigned as the value of the expression. So, since 1 is always assigned, 1 will always be the value inside the parentheses, which is considered true, and the "true" branch will always run.  Plus, you just put a 1 into your array.  This is perfectly legal C and you might sometime want to do an assignment that way, but not likely in an "if" statement like that.  But a standard C compiler must compile it as is, and most won't even give a warning.  This one gets even experienced, professional programmers all the time.

Remember the other part of the answer?  "Even if it doesn't exist?"  What if you declare the array like this:
int my_array[20];
and when the "if" statement above runs n is equal to 20?  The declaration "my_array[20]" means to allocate an array of 20 elements numbered 0 to 19.  There is no my_array[20].  It doesn't exist.  But the C compiler will happily produce code that writes the 1 to element 20.  Since there is not an element 20, it will write over whatever is stored there!  Chances are that some other variable is allocated that chunk of memory and it will get changed without you knowing it.  That is a really nasty bug to find.

Here is another that is quite common and hard to find.  It shows up in a lot of different forms and not always easy to even notice there is a problem.

int x = 10;
while( x>0);
{
   printf(" x = %d, x^2 = %d\n", x, x*x);
   --x;
}

What will it print? Nothing.  The program will stop when it gets to the "while" statement.  It goes into an endless loop, never getting to the next statement.  In C, the while statement is defined to be like this:
 "while  expression statement"
A statement is terminated with a semicolon.  Usually a statement does something, and a statement can be a compound statement, which is a series of statements inside curly braces ( "{  }" .)  C also allows the "null statement" which is nothing terminated with a semicolon.  Look at the while statement in the example above.  The condition is "x>0" and the statement is the null statement ";".  After that is a compound statement ( " { printf ....} ").  That means the "null statement" (";") finishes off the while statement so no code gets executed as part of the "while" loop.  The variable x never changes.  It just keeps looping.  Since the code below it (" { printf....} ") is a perfectly legal compound statement, the compiler will merrily compile it without any warnings.  But the program will effectively halt at the "while" statement.

Here is another variation of that one:

int x = 10;
while( x> 0)
   printf( "x = %d\n", x);
  --x;

What does this print?  An infinite series of "x=10" lines.  Look again at the definition of the "while" statement.  The "while expression" is followed by a single statement.  Often, that will be a compound statement, but it can be a "normal" statement.  In this example, the "printf" statement gets executed as the loop body.  But since it is not a compound statement enclosed in curly braces, the next line ("--x") is not part of the loop body.  It's all perfectly legal C and the compiler won't even warn you.

As I mentioned at the start, C is available everywhere.  It is standardized.  You would think this means it works the same everywhere, but that isn't true.  A common development method, and one I recommend, is to develop code on a PC with C that will eventually run on a microcontroller.  The PC has much greater resources for development and debugging and makes it much easier to write much of the code.  But you have to watch out for differing behaviors.  Take a look:

int x = 0;
while ( x < 40000)
{
   // do something useful here
   ++x;
}

If you run that on a PC, say with Visual Studio or gcc, it will work just fine and "do something" 40000 times.  But when you transfer that code to a small microcontroller, say an Arduino, it will hang at the "while" statement in an endless loop.

The C standard defines an "int" to be the "natural" size of integers for a given machine.  On a PC, which is a 32 or 64 bit machine, an "int" will be either 32 or 64 bits and have a range of more than plus or minus 2 billion.  But on an 8 or 16 bit microcontroller an "int" will be 16 bits.  A 16 bit integer has a range of -32768 to +32767.  Because of the way computer hardware works, as the variable "x" is counting up when it reaches 32767 and adds one, it "rolls over" from binary 0111111111111111 (32767) to binary 1000000000000000 (-32768),  and start counting up again.  But it can never reach 40,000.  Your compiler may or may not give an error or a warning with this one, but there are plenty of variations that certainly will not.

C is a great language.  It's very versatile and very popular.  You can run it on just about anything and write any kind of code you want.  But it is full of traps.  I have only scratched the surface.  It would be easy to fill a large book with examples.  If you plan to use C (or C++), put in the effort to learn enough of the language to at least be able to recognize these pitfalls when you encounter them.  There are plenty, and no list of examples could ever be complete.  It will be up to you to watch for them and find them when, not if, they happen to you.

1 comment:


  1. A guy is standing on the corner of the street smoking one cigarette after another. A lady walking by notices him and says
    "Hey, don't you know that those things can kill you? I mean, didn't you see the giant warning on the box?!"

    "That's OK" says the guy, puffing casually "I'm a computer programmer"

    "So? What's that got to do with anything?"

    "We don't care about warnings. We only care about errors."

    ReplyDelete