Anyway, that was just an experiment. I'll be using flat files for mass data analysis.
So why would one use BerekeleyDB recno databases instead of flat files? I can think of several reasons:
- automatic management of concurrent access,
- logical record numbering and quick access to record by number,
- allows for variable-length records. The previous point is hard to realize with flat files and variable-length records.
Another note: Knuth in one of his volumes (I don't remember which) mentions that balanced AVL trees can be used for simulating linear lists, but with much better time complexity for randomly accessing an item.
Current update: I'm just testing the Python performance of reading fixed-length records from a raw binary data file. It manages to read the data at ~14MB/sec. With unpacking the data into internal representation, using the struct module, the speed drops down to ~3MB/sec. Using psyco doesn't help either.