• Errors due to out-of-bounds array references, stray pointer references, and other forms of references to undefined regions of memory can be very difficult to track down.
  • This web page presents a program that exhibits undefined behavior, but the example behaviors listed here are far from the only kinds of undefined behaviors that are possible.
  • Basically, when you write a program in a given language, you are entering into a sort of "contract" with the language's runtime system. If you violate the terms of that contract by referencing an undefined region of memory, that frees the language implementation to get out of the contract as well. In other words, the language implementation (in most languages with potentially unsafe memory references, like C and Fortran) becomes free to do any oddball, seemingly random thing it wants to, without doing anything "illegal" from the standpoint of the contract you've violated. And to make matters worse, the oddball behavior may not actually occur until far later in the program, at a seemingly completely unrelated spot in the code.
  • The example program presented here, that demonstrates some of the things that might happen, is written in Fortran, but basically the same thing applies in C as well.
  • The example program follows. We basically declare 3 arrays (array1, array2, and array3), and then write too much data to the middle one of the 3 (array2 - but note that the compiler may not give any guarantees that array2 will actually be in memory between array1 and array3, though that is what happens in the examples below):
       program main
    
       character array1(5)
       character array2(5)
       character array3(5)
       integer i
       integer j
    
       do i=1,25
          do j=1,5
             array1(j) = 'a'
             array2(j) = 'a'
             array3(j) = 'a'
          end do
          call sub(array2,i)
          print *,i,' ',array1,' ',array2,' ',array3
       end do
    
       end
    
       subroutine sub(chars,maxind)
       character chars(*)
       integer maxind
    
       integer i
    
       do i=1,maxind
          chars(i) = 'b'
       end do
    
       return
    
       end
    
  • Here are some sample outputs from the program above. Please note how much the output can vary, depending on the CPU and/or compiler used. There are basically four failure modes here, but many kinds of failures are possible. There's a summary below the raw data.
    1. With g77 3.4.3 on 32 bit Fedora Core 3 on a 64 bit AMD CPU, generating a 32 bit executable:
         1 aaaaa baaaa aaaaa
         2 aaaaa bbaaa aaaaa
         3 aaaaa bbbaa aaaaa
         4 aaaaa bbbba aaaaa
         5 aaaaa bbbbb aaaaa
         6 aaaaa bbbbb aaaaa
         7 aaaaa bbbbb aaaaa
         8 aaaaa bbbbb aaaaa
         9 aaaaa bbbbb aaaaa
         10 aaaaa bbbbb aaaaa
         11 aaaaa bbbbb aaaaa
         12 aaaaa bbbbb aaaaa
         13 aaaaa bbbbb aaaaa
         14 aaaaa bbbbb aaaaa
         15 aaaaa bbbbb aaaaa
         16 aaaaa bbbbb aaaaa
         17 baaaa bbbbb aaaaa
         18 bbaaa bbbbb aaaaa
         19 bbbaa bbbbb aaaaa
         20 bbbba bbbbb aaaaa
         21 bbbbb bbbbb aaaaa
         22 bbbbb bbbbb aaaaa
         23 bbbbb bbbbb aaaaa
         24 bbbbb bbbbb aaaaa
         25 bbbbb bbbbb aaaaa
        
    2. With gcc version 4.0.0 20050129 (experimental) (g95!) Feb 17 2005 on Fedora Core 3 on a 32 bit Intel Xeon, generating a 32 bit executable:
         1  aaaaa baaaa aaaaa
         2  aaaaa bbaaa aaaaa
         3  aaaaa bbbaa aaaaa
         4  aaaaa bbbba aaaaa
         5  aaaaa bbbbb aaaaa
         6  ~B bbbbb aaaaa
         7   bbbbb aaaaa
        make: *** [go] Segmentation fault
        
    3. With g77 -maix64 3.3.6 on AIX 5.1 (a Fortran 77 compiler on a PowerPC CPU, generating a 64 bit executable):
         1 aaaaa baaaa aaaaa
         2 aaaaa bbaaa aaaaa
         3 aaaaa bbbaa aaaaa
         4 aaaaa bbbba aaaaa
         5 aaaaa bbbbb aaaaa
         6 aaaaa bbbbb aaaaa
         7 aaaaa bbbbb aaaaa
         8 aaaaa bbbbb aaaaa
         9 aaaaa bbbbb aaaaa
         10 aaaaa bbbbb aaaaa
         11 aaaaa bbbbb aaaaa
         12 aaaaa bbbbb aaaaa
         13 aaaaa bbbbb aaaaa
         14 aaaaa bbbbb aaaaa
         15 aaaaa bbbbb aaaaa
         16 aaaaa bbbbb aaaaa
         17 aaaaa bbbbb baaaa
         18 aaaaa bbbbb bbaaa
         19 aaaaa bbbbb bbbaa
         20 aaaaa bbbbb bbbba
         21 aaaaa bbbbb bbbbb
         22 aaaaa bbbbb bbbbb
         23 aaaaa bbbbb bbbbb
         24 aaaaa bbbbb bbbbb
         25 aaaaa bbbbb bbbbb
        
    4. With xlf95 8.1.1.5 on AIX 5.1 (a fortran 95 compiler generating a 64 bit executable):
         1 aaaaa baaaa aaaaa
         2 aaaaa bbaaa aaaaa
         3 aaaaa bbbaa aaaaa
         4 aaaaa bbbba aaaaa
         5 aaaaa bbbbb aaaaa
         6 aaaaa bbbbb aaaaa
         7 aaaaa bbbbb aaaaa
         8 aaaaa bbbbb aaaaa
         9 aaaaa bbbbb aaaaa
         10 aaaaa bbbbb aaaaa
         11 aaaaa bbbbb aaaaa
         12 aaaaa bbbbb aaaaa
         13 aaaaa bbbbb aaaaa
         14 aaaaa bbbbb aaaaa
         15 aaaaa bbbbb aaaaa
         16 aaaaa bbbbb aaaaa
         17 aaaaa bbbbb baaaa
         18 aaaaa bbbbb bbaaa
         19 aaaaa bbbbb bbbaa
         20 aaaaa bbbbb bbbba
         21 aaaaa bbbbb bbbbb
         22 aaaaa bbbbb bbbbb
         23 aaaaa bbbbb bbbbb
         24 aaaaa bbbbb bbbbb
         25 aaaaa bbbbb bbbbb
        
    5. With xlf 8.1.1.5 on AIX 5.1 (a fortran 77 compiler generating a 64 bit executable):
         1  aaaaa baaaa aaaaa
         2  aaaaa bbaaa aaaaa
         3  aaaaa bbbaa aaaaa
         4  aaaaa bbbba aaaaa
         5  aaaaa bbbbb aaaaa
         6  aaaaa bbbbb aaaaa
         7  aaaaa bbbbb aaaaa
         8  aaaaa bbbbb aaaaa
         9  aaaaa bbbbb baaaa
         10  aaaaa bbbbb bbaaa
         11  aaaaa bbbbb bbbaa
         12  aaaaa bbbbb bbbba
         13  aaaaa bbbbb bbbbb
         14  aaaaa bbbbb bbbbb
         15  aaaaa bbbbb bbbbb
         16  aaaaa bbbbb bbbbb
         17  aaaaa bbbbb bbbbb
        make: *** [go] Segmentation fault (core dumped)
        
  • To summarize the runs above, there are basically four failure modes shown above, and a fifth important failure mode that isn't shown, but should also be presented. Recall that we are writing to array2.
    1. Sometimes array1 gets overwritten with unintended data
    2. Sometimes array3 gets overwritten with unintended data
    3. Sometimes the program segfaults
    4. Sometimes an array gets filled with completely bizarre values, even when using homogeneous types as in this example (this is also very likely to happen when the types are not homogeneous)
    5. The fifth, which isn't shown above unfortunately, is that sometimes a subroutine's return address may get overwritten. In that case, everything may seem fine until a subroutine or function attempts to return to its caller, but the address pulled off of the stack has been overwritten with garbage, in which case the subroutine attempts to return to the wrong address, usually causing a segfault, but even stranger things than that could result in a pathological scenario, like suddenly executing code that shouldn't even be getting called.
  • One last comment on the program. If we change the declaration of the arrays from being of size "*" (meaning any length is legal) to size 5, and compile again with -C, -C can give an immediate error on the illegal array reference to an undefined region of memory. This makes it relatively easy to fire up a debugger like dbx or gdb, feeding them your program and the resulting "core" file, to see where the error is. In this case, it should be far simpler to determine the ultimate source of the error by looking at the surrounding code, because this time we're catching errors immediately, and the debugger will show you where the error occured, not just when things finally got so haywire that the runtime could not continue, in a potentially, completely unrelated region of code.
  • To further expand on this topic, here's an e-mail (slightly modified) I sent to a professor who had an oddball fortran problem. It may serve as both an example of how using a system call tracer might relate to this, as well as presenting a simple taxonomy of the chain of causality behind such an error.
  • More on software to make this easier (mentioned briefly in the letter above) :
  • Related articles


    Hits: 6926
    Timestamp: 2024-03-28 22:03:43 PDT

    Back to Dan's tech tidbits

    You can e-mail the author with questions or comments: