Dubious coding styles

I want to show you two examples of what I consider dubious coding practices. You, the reader, may not consider them that bad, so feel free to leave your comments.

I'll start with easier example (a bit obfuscated to hide the true origin):
if(strncmp(buf, "action", 6) == 0) { ... }
strncpy(dst, src, strlen(src));
These are two unrelated code snippets from the same program.
As I know the author, I have asked him why is he using the 'n' versions of string functions. He argued that it costs him nothing to add the "extra check" to make the code a bit more secure. Not only that the code is not more secure, the extra arguments make it less readable. Plus, there are two additional pitfalls:
  1. If you change the string in the first line, you have to remember to change its length.
  2. The second line is effectively just plain old strcpy with all its problems. Only that it runs at half the speed of strcmp because of extra length calculation.
Both examples badly broken w.r.t. the problem they are supposed to solve: possible buffer overflows. The problem is that they are taking the wrong length into account. strncmp should look at the actual length of data received over the network into the buf, and then you should use strncmp only if there exists a possibility that the string in buf is not null-terminated. Otherwise, plain strcmp works just fine. The error in strcpy is more serious: it should look at the destination buffer length, not the source buffer length.

Another practice is concerned with memory allocation and deallocation. I've argued about this with another friend. He maintains that manually deallocating acquired memory before program termination is a good coding practice in all circumstances. I maintain that it is a wasted effort for heap memory acquired by brk or mmap because it is unconditionally deallocated by the OS on program exit.

Admittedly, manual memory deallocation on exit might be neccessary under some embedded systems (it was certainly neccessary under DOS!), but the discussion was in the context of UNIX and Windows.

Why is it wasted effort? Because the operating system is much better than the programmer in tracking memory. Memory management is one of the main tasks of the OS. Also, in a larger program, there is no good place to deallocate all memory. There are other people sharing the same opinion. Look for example at this short article.

Friend's counter-argument to this is SYSV SHM, which persists after the program termination. His argument is flawed in a fundamental way: I am arguing that manual memory deallocation is wasted effort only in those cases when it is done by the OS itself. SYSV SHM doesn't fall into that category.

With SYSV SHM you might actually want it to persist in between program runs. One useful application that comes to mind is acting as a cache of most recently used data.


Xen: The platform for OS research

A quote from the Xen homepage:
Xen is a virtual machine monitor for x86 that supports execution of multiple guest operating systems with unprecedented levels of performance and resource isolation. I assume that readers know what virtualization is so I'm not going to explain it further.
There are many interesting research operating systems, for example SPIN or Exokernel. Major weakness of these and other non-mainstream operating systems is hardware support: often they support little more but the hardware available at the time of development. Since writing device drivers is not an interesting part of OS research, they quickly get behind modern hardware.

Xen has the potential to change this situation. It can greatly lower the initial effort of writing much support OS code (boot-loaders, device-drivers, etc.) and prolong life time of the new OS.

It exposes a simple system interface to the guest OS. This interface hides much ugliness of the x86 architecture and to successfully run under Xen, your OS must use it instead of directly invoking privileged instructions.

But more importantly, Xen exposes very simple interface to the virtual disk and network "devices". Being simple, a driver for the guest OS is easy to code. This driver forwards the guest OS operations to the host OS. Currently supported host OS-es are Linux and NetBSD. Which means that if there is a driver for the physical disk or network card the host OS is using, your OS within Xen can also run on that hardware.

Linux and NetBSD are actively maintained and the OS has more time to gain popularity. Its authors are relieved of keeping pace with HW development, thus having more time to concentrate on the development of the core OS. And users don't have to fear that some time in the future they won't be able to run the OS on new hardware.

Some readers may object that the OS has much more HW to deal with than disk and network (e.g. sound and video). But you don't need to develop the core of the OS.

Added bonuses of using Xen for OS development:
  • The virtualized OS runs almost at the same speed as on bare hardware. According to this benchmark, the virtualized OS runs within 10% speed on bare hardware. Bochs or Qemu are much slower. The comparison is not quite fair because they emulate all of typical PC hardware. But as I've argued above, you rarely need this in OS development.
  • No slow reboots when the OS crashes in its development phase.
  • No need for writing much boiler-plate code such as boot loaders or switching to protected mode. Your "virtual CPU" starts in flat, paged protected mode.
  • Another project has developed a remote GDB stub which makes it possible to debug Xen and guest operating systems.
This article would not be complete without mentioning OSKit (quote)
...the OSKit's goal is to lower the barrier to entry to OS R&D and to lower its costs. The OSKit makes it vastly easier to create a new OS, port an existing OS to the x86...
Its development seems to have ceased in 2002 and seems to have the same problems as other research OS I've mentioned. It provides glue code to make it possible to use drivers from Linux and FreeBSD kernels. However, Linux and FreeBSD driver architecture is changing and it is questionable how many drivers for the modern hardware can be used.

Tags: OS Virtualization


Performance scaling of sorting algorithms

It is a well-known fact that the lowest theoretical bound on complexity of comparison-based sorting is O(n log n). I was very surprised to find out that, for sufficiently large number of items, the time it takes to sort k*n items (with k fixed) is only k times the time to sort n items. For deeper analysis you can have a look at this article.

So, it actually turns out that the approach of buying faster hardware can sometimes give expected performance gains.



Python lacks sparse arrays

Last days I'm writing many python script to postprocess some calculation results and format them into a form suitable for importing into Mathematica.

In Perl I could write $array[$x]=0 and not worry if index $x actually exists in the array. If not, then the "hole" between the last valid index and $x is automatically filled with undef.

Not so in Python. There you get an exception if you try to access non-existant index in the list. It will not grow the list for you automatically. Of course, you can write the code to test whether the index you're trying to access is larger than the current list size and manually extend the list. This kind of coding defeats the purpose of having a high-level language. It is the same way I would code it in C using realloc().

Another possibility is to use a dictionary with integer indices as keys. However, access time to individual elements might not be constant and keys are traversed in an undefined order when iterating over the dictionary. Unlike Perl, you do not get None value for non-existing keys because the set of all possible keys is open and not restricted to numeric values up to the last key with value not equal to None as it would be with an automatically-growing list.

I have found several workarounds, but the mere fact of having to work around the problem annoys me. Not enough to go back to Perl, but it is annoying having to repeat the same hack over and over again.


I/O revisited

In a private e-mail a friend has suggested to use mmap() to gain performance when doing I/O. I've just shuddered at the idea because my data file is 6GB large and I'm working on a 32-bit machine, which gives me about 2.5GB of contiguous virtual address space. So I would have to deal with the following problems:
  • Map another portion of the file when I'm close to the end of the current chunk.
  • Mapping is only page-granular so I would have to be careful not to read memory that is actually beyond the end of the file.
Much of tricky bookkeeping details that are easy to get wrong and hard to find.

The problem is even worse with text files, as the data is not fixed-width, and sscanf() doesn't return the place in the string where it has stopped scanning. So I would have to write additional code that rescans the whole input item again just to find the beginning of the next item. This defeats the very purpose of using mmap() to avoid copying of the data from kernel to user buffer. All of that for a questionable gain in speed.

Even if I would get 2x speed gain in I/O, it wouldn't justify the programming effort to debug it. I've come to value program correctness more than performance. Of what use is the toolkit if even its author questions its correctness? And using mmap would make it not portable.

Some tests indicate that performance can indeed by gained by using mmap(), however this example is using a very small file which can be completely mapped into memory. It doesn't reflect the programming complexity when this is not the case. Of course, there are no address space limitations on 64-bit platforms.

Also, I want to comment on the results: they are mostly incorrect. What the article is measuring is the ratio (r_read + s) / (r_mmap + s) where r_ is the reading time and s is the scaling time which is the same, independent of the read time. This is definetly not the ratio they had intended to measure: r_read / r_mmap.

The most relevant figure is for the largest scaling factor = smallest image = least processing time - the s variable in the equation is closest to 0.

Thinking about sscanf, I've just looked up the C++ stringstream documentation - it would solve the problem sscanf() has. Out of the box, it is unusable for the amount of data I'm dealing with because the std::basic_stringbuf class makes a copy of the string passed to it.


A hidden gem: Ocaml

Recently I have discovered a hidden gem among programming languages: Objective Caml, or Ocaml. The next paragraph should have been an overview its qualities, but I've erased all of it. The web site and the documentation speak for themselves. Instead, I will describe my personal experience in the few days I have been playing with it.

In one of the earlier posts I was looking for a language that makes it easy to write higher-order programs, is reasonably easy to interface with C data types and has an efficient native-code compiler. Ocaml fits this description. If I had learned it sooner, I wouldn't have even considered writing my data analysis programs in C++. So, although I do not have an immediate use for Ocaml, I'm nevertheless learning it now so that I can apply it to future problems.

Another interesting fact is that Microsoft is developing its .NET variant of Ocaml - F#. Integration with .NET will make it possible for programmers to use large number of existing .NET libraries, something that Ocaml is still missing. OK, there is a code repository, but the number of available libraries is very small compared to "mainstream" languages like C, C++, Python or Java.

I'm learning Ocaml by solving problems from Croatian competitions in programming. These are non-trivial algorithmic problems (harder than problems usually found in introductory texts to any programming language), and they only use file I/O in their specifications, so I don't have to learn a bunch of external libraries, not directly related to the language itself, to become productive.

What follows are some observations about my programming style in Ocaml:
  • I tend to solve problems bottom-up.
  • Thorough testing.
  • I think at higher levels of abstraction and writing more general and reusable code.
  • Less time to getting correct code.
I to do bottom-up design in C and C++, but it's neither that safe (e.g. in C I use many void pointers) or convenient (templates and structs in C++). I can quickly test my code in the interactive interpreter.

Some of Ocaml's features can be simulated in C++ by function objects and templates (traits and policies come to mind), but with horrible and too verbose syntax and even more horrible error messages if you do mess something up. Not to mention long type declarations of nested templates if you want to store a function in a variable (actually, you can't do that with a real function but only with a function object).

The overloaded "auto" keyword is only a proposal for the next C++ standard (it is something akin to Ocaml's type inference) and the "typeof" keyword is still an extension, although provided by at least gcc. Maybe even lambda gets it into the language.

The only minor annoyance I have come across until now are some inconsistencies in the order of function arguments in the standard library. For example, most of the time the data structure operated upon is the last argument to the function except in a few cases where it is the first.

I have more than 10 years of experience in C programming and haven't given up on it because of its inconsistencies. I know almost all of the standard C library by heart, yet I'm unable to memorize the prototypes of three often used functions: fgets, fread and fwrite. They are inconsistent with respect to all other file functions in that that they take their FILE* argument last. No matter how often have I used those functions, I still have to look them up in the man page.

Consistency in the standard library is important. It makes life easier to the programmer. But it is not a fatal flaw which would make me disregard the whole programming language.

C++ is going to get more suitable for writing functional-style programs. But Ocaml is here right now. And unlike some other attempts to produce "languages at least fast as and 'better' than C++" which are usually limited only to x86 and/or Linux operating systems, Ocaml is truly portable: it supports nine processor architectures and many operating systems, including MS Windows and commercial Unixes.


The case for binary I/O

In C and C++ related usenet groups I have lately seen many advices by the programmers, to other programmers, that they should avoid binary I/O and use text I/O with arguments such as:
  • it is portable,
  • can be easily viewed and edited with an ordinary text editor,
  • is easier to debug.
But text I/O involves a lot of unnecessary formatting and parsing compared to binary I/O. Lately I have chosen to use C++ for a programming project, although it is inferior to LISP for the task at hand, just because I couldn't find any open-source LISP implementation that can easily deal with C binary data types (or other 'raw' machine data types for that matter).

Today I have told to myself: maybe you are wrong. Maybe you are again wrongly assuming that it would be slow. Maybe you could have used text I/O and LISP. So I've decided to set up an experiment which has shown that I didn't assume wrong this time. Read on.

I have written four (two for text and two for binary I/O) trivial C programs that
  1. Write the file of 25 million single-precision (4-byte IEEE) floating-point random numbers between 0 and 1 (both programs were started with the default random seed, so they generate identical sequences). In the text file, the delimiter was a single newline character.
  2. Read back those numbers, calculating an average along the way.
The platform on which the experiment was conducted is a dual 2GHz Xeon with 3G of RAM, running NetBSD 2.0.2. Results may vary on other stdio implementations, but I think that they give a good overall picture.

The results are most interesting; all timings are averages based on three measurements. Timing deviation was noticable only on the 2nd decimal place:
  1. Writing to text file=43.7 seconds; binary file=5.2 seconds; binary I/O comes out about 8.4 times faster.
  2. Reading from text file=13.4 seconds; binary file=3.0 seconds; binary I/O comes out about 4.4 times faster.
  3. File size ratio: binary comes out 2.25 times smaller (100M vs. 225M)
I was surprised by the large difference in ratios in writing between binary and text I/O, so I have written a third pair of programs that just writes constants in the output file, so there is no time spent in the rand function. This time the binary I/O came out 5.6 times faster. This is a much fairer comparison, because it doesn't include rand overhead.

Overall, I think that "use text I/O" cannot be given as a general recommendation. For sure, I don't want to use it because I have files with gigabytes of binary data. Text I/O would be in this case even slower due to more complicated (more than one field) input record parsing.

As I see it, the only thing in favor of text I/O is its portability. The other two arguments in the beginning of this text are just a consequence of lack of adequate tools for binary file viewing/editing. There is a bunch of hex-editors, but none of them will interpret the data for you (i.e. you can't tell it "display these 4 bytes as IEEE floating-point number) or let you define your own structures composed of different data typed.

There is GNU Data Workshop. I can't comment on it as it is written in Java, and therefore do not want to (i.e., on NetBSD, can't) use it. Not to mention that it is GUI and I'm doing all my programing on the server via ssh.

After some searching, I've come across a hex editor that offers some of the capabilities I need and is console-based: bed. However, it has an unintuitive user interface although it is menu-driven.

If you know of a good, non-Java, console-based, flexible binary editor, please drop me a note. Thanks :)


Published in Phrack!

Finally, Phrack#63 is out! And my article on runtime binary encryption is inside!

I have to admit that I didn't believe that what I had described in the article can really be done. For quite some time I was intrigued by the idea, but every time I was thinking too much about the code design. You know, sit down to think about register allocation, how to organize the program etc. Every such attempt was entirely fruitless.

After few such futile attempts, one day I sat down and told to myself "now you are going to write code". I cannot describe my thought process at that point. I guess I was just trying to reproduce the steps CPU does when executing code: fetch, decode, execute, go to fetch. I had the image of what I want to program and I was just typing code... Correct code. It took me only a week to code and debug the main decryption routine. If you look at the code, there are many comments.. more than the actual code. They were very helpful. Whenever I discovered a discrepancy between the comment and the code, I have closely examined it and found a bug.. sometimes in the code, and sometimes in the comment.

In total, the initial version had only 5 bugs in total for which I had to run the program in the debugger. One was in XDE disassembler (not mine code), one was a typo (loading from address 0 instead constant 0), two of them were related to context switching. One led me to realize that indirect jump targets have to be calculated with traced registers... I don't remember the another. The last bug was forgetting to adjust the C stack before leaving the decryption function. I was amazed at myself how could I write such complicated code with so few errors.

Unfortunately, it has come to my mind a bit too late that I could ask the Phrack staff for an author copy of the hardcover edition. All 1000 of them have been handed out on What The Hack, and few of them will be available at Ruxcon in Australia which, unfortunately, I can't attend. Oh well.

The text published in Phrack is updated quite a bit from the (now outdated) version available on my site. Soon I'm going to update the PDF there.

In the meantime, another weird idea related to code encryption has come to my mind. In short, the decryptor can decrypt another decryptor which, in turn, decrypts the real code. Soon I'm going to vacation so I hope to catch some time to experiment with it. The results will be of course published :)