2005-08-11

I/O revisited

In a private e-mail a friend has suggested to use mmap() to gain performance when doing I/O. I've just shuddered at the idea because my data file is 6GB large and I'm working on a 32-bit machine, which gives me about 2.5GB of contiguous virtual address space. So I would have to deal with the following problems:
  • Map another portion of the file when I'm close to the end of the current chunk.
  • Mapping is only page-granular so I would have to be careful not to read memory that is actually beyond the end of the file.
Much of tricky bookkeeping details that are easy to get wrong and hard to find.

The problem is even worse with text files, as the data is not fixed-width, and sscanf() doesn't return the place in the string where it has stopped scanning. So I would have to write additional code that rescans the whole input item again just to find the beginning of the next item. This defeats the very purpose of using mmap() to avoid copying of the data from kernel to user buffer. All of that for a questionable gain in speed.

Even if I would get 2x speed gain in I/O, it wouldn't justify the programming effort to debug it. I've come to value program correctness more than performance. Of what use is the toolkit if even its author questions its correctness? And using mmap would make it not portable.

Some tests indicate that performance can indeed by gained by using mmap(), however this example is using a very small file which can be completely mapped into memory. It doesn't reflect the programming complexity when this is not the case. Of course, there are no address space limitations on 64-bit platforms.

Also, I want to comment on the results: they are mostly incorrect. What the article is measuring is the ratio (r_read + s) / (r_mmap + s) where r_ is the reading time and s is the scaling time which is the same, independent of the read time. This is definetly not the ratio they had intended to measure: r_read / r_mmap.

The most relevant figure is for the largest scaling factor = smallest image = least processing time - the s variable in the equation is closest to 0.

Thinking about sscanf, I've just looked up the C++ stringstream documentation - it would solve the problem sscanf() has. Out of the box, it is unusable for the amount of data I'm dealing with because the std::basic_stringbuf class makes a copy of the string passed to it.

No comments: