Kakuro solver.. and solvability by logic

I have added kakuro solver to my collection of sudoku solvers. An interesting thing about this solver is that it generates a small table of possible sums before the search for the solution begins. During the search it uses only bit operations, and it is extremely fast: 0.025s is the longest time taken for the puzzles from the sample puzzles page.

All of sudoku-like puzzles are advertised to be solvable by logic only and that no guessing is neccessary. What does this mean, exactly? I don't know what do the problem-setters mean by this, but to me this means that the game must be solvable with zero lookahead. At any stage while solving the puzzle, from the very beginning, there must exist at least one cell whose contents is uniquely determined only by the current state of the grid. At no point should you have to think about consequences of the move you are about to make.

For example, if your reasoning is "putting 3 at this position will make it impossible to put 7 at some other position afterwards", this is not zero lookahead and it is not logic; it is guessing. Logic is the following reasoning "i must put 3 in this cell because..", where the argument is based only on the current state of the grid, without looking in the future.

Of course, guessing is sometimes easier to the unskilled, but the fundamental question is: can the puzzle be solved with zero lookahead, i.e. by pure deductive logic based on the rules of the game and the current contents of the grid? If the answer to this question is negative, then sudoku (and its variants) are falsely advertised in my view.




In one of my previous posts I've written about the Lua programming language and its possible usage as a standardized data exchange format instead of XML.

Lua has its own JIT, dubbed luajit. I like its architecture, especially its DynASM part which can be used for other dynamic code-generation purposes.



"Killer sudoku" solver

It's a long time since I know about the Sudoku puzzle. I've never been drawn to it, although I did think occasionaly that it would be an interesting programming exercise to implement a solver for that. Few days ago I learned about the variant called "killer sudoku" which is supposed to be harder than plain sudoku. So I set out to write a solver for that variant.

My first idea was to write the program in GNU prolog, using its finite domain solver. But then I said, heck! Let's see whether the brute-force method is feasible. And indeed, it turns out that it is. You can download the solver coded in C++ here. The user interface is lacking, but the program does what it is supposed to. On 2GHz Pentium M, the easiest (as classified by the problem setter) is solved in 5ms, and the hardest in 400ms.

One thing is bugging me though. Sudoku puzzles are supposed to be solved by logic, without guessing. How do you explain that the brute-force solver, which is employing always the same "dumb" strategy of guessing and backtracking, takes the time proportional to the puzzle level? "Harder" puzzle should imply harder logical reasoning. I fail to see the connection between "harder" logical reasoning and the amount of work the brute-force solver has to perform. Somehow I think that these two are independent.



The uglyness of C++ locale

I'm very happy how C++ handles most of the things, both in the language and in the standard library. One monstrosity is the handling of locale. Look at the following piece of code, that I came across on the boost users mailing list:

time_facet* of = new time_facet();
std::cout.imbue(std::locale(std::locale::classic(), of));

In my opinion, this is horrible. I believe that it's very flexible, as everything else in the standard library, but still... I won't even try to imagine how would you set locale to some particular language's collation sequence and character encoding.

I think that there must be a better way of handling locales in programs. Today, it's simply a mess throughout the computer world (e.g. I shiver when I remember the time I had to convince Oracle to chew up data in some particular encoding). As for the above code, I could write date/time formatting code much faster than to figure out the above piece of code.

Localization, as it is today, simply sucks big time. It should be transparent both to programmers and users. It should work out of the box. The way it is today, it is just creating headaches for everyone. Incresed effort on programmers (look at the above code!), a plethora of confusing settings to users (character sets, keyboard layouts, date/time formats, decimal points, etc). Yes, it's flexible, and all those settings should stay under some advanced options, but the system should work seamlessly with mixed locales and character sets. The user should not be forced to use a single locale system-wide (as it is in Windows) or within a signle application (as it is in Unix).

I see many open problems in mixed locales, but I don't think that they are unsolvable. They are just very hard. Some "locale grand unification framework" is missing. And until it appears, I'm afraid that the state of affairs in I18N will remain a mess it is now. Maybe (or should I say, probably!), with a bit of bad luck, it'll just get worse.


Linus Torvalds on GNOME

Here he is encouraging people to switch to KDE and here he's calling some people "fucking idiots". What stroke me in the latter post and what I would like to single out is the often-neglected logical fallacy with "majority" arguments:

In other words: your "majority" argument is total and utter BULLSHIT. It can be true for any particular feature, but it's simply not true in general.

To put it in mathematical terms: "The Intersection of all Majorities is the empty set", or its corollary: "The Union of even the smallest minorities is the universal set".

It's a total logical fallacy to think that the intersection of two majorities would still be a majority. It is pretty damn rare, in fact, because these things are absolutely not correlated.
Read the provided links for the complete context.

Personally, I don't use neither GNOME or KDE. Everything that I'm doing, I'm doing in fvwm2. Configuration is trivial to copy across different machines, and it starts up blazingly fast, compared to other "desktop systems". What's more, it isn't a constant resource hog with a bunch of background-running processes as GNOE and KDE are.



The nonsense of XML

Rick Jelliffe is proposing in his blog post "Unicode-aware CPUs" instead of Binary XML. Personally, I think that this is nonsense. With the recent talks about binary xml I'm even more convinced that XML as a data format as such including all of its companions, such as the XML Schema and XPath, should be abandoned.

With XML the data is transformed from machine ("binary") representation into XML greatly inflating its size in the process, transmitted to the other end, possibly some XSLT rules are applied, the resulting data checked against some schema and then reparsed back into the binary form. XSLT and XML Schema are programming languages in themselves, just with a hideous syntax. XML was supposed to be editable by hand without any special tools, but this is, IMHO, nearly impossible with XSLT and XML Schema. So their existence defeat the purpose of XML.

Other proponents of XML claim that the XML is self-describing. I claim that this is also nonsense. Actually, there are two levels of self-description. The 1st is the description of data types (e.g. the height tag must be a real number greater or equal to 0). This purpose is fulfilled by the accompanying XML Schema. The 2nd level is the interpretation of the data. This is done exclusively by the programs manipulating that data. Nobody can say (without actually inspecting the underlying program code) that the height tag doesn't actually specify the age of some item. The 2nd level of self-description cannot be achieved by any means.

Now, the binary XML is, IMHO, superfluous in itself. There already exists a perfectly good data description format and its binary encoding rules: the ASN.1 and BER, DER and PER (Basic, Distinguished and Packed Encoding Rules), with a good solution for namespace problems (the OID hierarchy). Why invent another one?

I propose to use a standardized virtual machine code as a means of data exchange. One good VM and accompanying language for such purpose is Lua because of its simplicity, portability and power as a programming language.

The data rarely exists in some vacuum. It is often manipulated in many ways. So why not transmit a VM program that creates the data? Instead of transmitting e.g. the XML chunk <height>20</height>, why not transmit a piece of VM code that, when executed, stores value 20 in the field height of some table? In the same way that programs currently expect specific XML tags in an XML document, they would expect certain names of tables and fields in the VM. Transformation and validation of the data would be other VM programs.

Lua has a distinct advantage here: simple declarative syntax for textual representation of data and programs, and a portable binary VM code for representing the same program. For the purposes of debugging, one would send textual representation of the data, and in the production one would send binary representation. Lua offers unified interface for loading both the VM and textual chunks of code. As far as I remember, the binary format is also portable to various architectures.

With restricted VMs it is also easy to address security concerns. The only remaining concern is the denial of service: one can e.g. construct an infinite loop that represents a piece of "data". Detecting infinite loops before executing the program is provably undecidable problem - the Halting problem. This problem can be easily solved by setting a maximum time limit on the execution time of the program (and discarding the data as invalid if it is exceeded), or by forbidding control-transfer instructions in the data description chunks of the code.

To conclude - XML is far too heavyweight, ugly and overrated. Wake up people, look around and search for alternatives.

Tags: XML


TCP/IP checksum revisited

RFC 1071 gives efficient ways on its computation along with the design rationale.


IP checksumming code

Quote from the RFC791, the Internet Protocol definition:

"The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the header."

And this is the only mentioning of the checksum algorithm in the whole RFC! How does someone come up with a code like this based on such description, i have no idea. I guess that everyone just copy-pastes code from an existing IP implementation without actually thinking what it was actually supposed to do.

I'm surprised that the today's Internet has its foundations in a definition as the above.

Google code

Yesterday I've realized that Google put some of its software into the open-source world. They have "Sparse Hash" implementation here which is particularly interesting for me and the toolkit I'm soon about to release. There is a number of other projects found here.


Monitors are broken

This post is more of a rant...

I think that the current wide-spread programming "abstractions" for concurrency (locks, condition variables) are flawed, insufficient, error-prone, whatever you like. It's easy to get into deadlock, the programmer has to be meticulous in his decisions which functions should be synchronized with respect to which objects, etc. No, I don't view Java's synchronized keyword as something good and well-designed.. It's basically still the same paradigm with more details hidden away..

My biggest objection are monitors: the name is most often mentioned in conjunction with condition variables. Looking simple on the surface, it can sometimes have surprising semantics: for example the pthread_cond_wait atomically unlocks and blocks the thread. When the thread is woken up, the mutex associated with the condition variable is reacquired. However, not atomically after unblocking. So you have to put your condition wait in a while loop and re-test on each awakening whether the condition is true. And pthread_cond_signal delivers the signal to an unspecified thread so it's very easy to have starvation.

Is it possible to make a paradigm shift in this area? Does concurrent programming really have to be so error-prone? At least some guys are trying; take a look at another project of Microsoft Research for a possible design of novel concurrency primitives.: Polyphonic C#. A quote from the online introduction: "... polyphonic methods to not lock the whole object and are not executed with 'monitor semantics'". Yaay, no monitors! :)


Python scopes, again

A friend drove my attention to the following links: this, and this. It seems that I'm not the only one annoyed by this misfeature. The second link is really worth reading as it describes the fundamental problem: operator = being used both for creating a new binding and mutating an existing one.


Python global stupidity

Consider the following seemingly trivial piece of python code, directly pasted from the interpreter:

>>> x=0
>>> def f():
... x+=3
... print x
>>> f()
Traceback (most recent call last):
File "", line 1, in ?
File "", line 2, in f
UnboundLocalError: local variable 'x' referenced before assignment
Yes, thank you, I know that global variables are "evil". I don't care when I'm writing a throw-away program.

My biggest objection to python while I was learning it was strange and unintuitive rule about scoping. And even now, when I'm pretty experienced in Python, I get bitten by it sometimes. Here's how to fix the upper piece of code: add global x just before the assignment addition. Strangely, the following piece of code works:

>>> def g():
... print x
>>> g()
Personally, I think that this is both unexpected and broken behaviour. This is not how lexical scoping (present in all modern programming languages) works. And the more I follow on the python development, the more hackish it is becoming.

If you want a nice, small, and elegant language to embed in your application for scripting, take a look at Lua.


Eclipse IDE

I'm trying out eclipse as IDE for C and C++ development. So far, I'm happy. There are several reasons why I've given up on emacs+gdb+xterm combination, in order of importance:
  1. I am tired of manual build management. Make, SCons, whatever, are all equally tedious to set up and use properly. I'm not doing it anymore.

  2. There are other C/C++ IDEs, such as Anjuta2[1]. I did not bother looking into them as I have another reason for trying out Eclipse: to become productive with it. I have a project in mind, the development of which is most easily done in Eclipse. Yes, in the (not-so-)short future, I'm going to program in Java although it sucks big time. What on - it's a secret for now.

  3. emacs is a pain to set up and use properly. Here is a brief example: Many of emacs's key combinations are too verbose for my taste, esp. a feature that I'm frequently using: marks (VI: mx, 'x in command mode). So I'm using viper, the VI emulation mode. However, I didn't manage to convince emacs so that a) viper turns on itself automatically upon emacs startup, and b) sets autoindent mode (:set ai). Everything is set in config files, but it just doesn't have any effect. And I certainly know that it's possible to do accomplish it, as I stumbled upon it before in the past.

Basically, I've gotten tired of wasting time on trivia like makefiles and programming the emacs editor.. Which in the end I never did learn how to program.

So far, I'm happy with eclipse. Although written in Java, it is pretty responsive. It does basic project management for me, has integration with subversion (although, not perfect, there are cases where I have to resort to console..), at last I'm experiencing the benefits of automatic struct field completions on Linux[2] (it's a great help not having to refer to documentation or my own header files to remember how some field is called), etc..

[1] As for Anjuta2, I got discouraged by this web page. Quote: "It currently provides an editor with syntax hilighting and a rudimentary project management system." Doesn't seem worth to try it out. This short summary sounds like emacs/vi + some project management. How did I stubmle upon that page instead of the project's home page? It's the first page listed on google when you search for "anjuta2".

[2] I've done a console program for Win32 as a part-time job. Then I was using Visual Studio.NET, and I'm happy with that IDE too. VS6 is, IMO, crap, but VS.NET.. is an excellent IDE. All that I have been annoyed at in VS6, has been fixed in VS.NET. And added features that really made my life easy. I felt that IDE was actually helping me, as opposed to VS6 where I felt that it fought against me.


Security, Microsoft, WinNT, Java, .NET, etc.

A bit of everything in this post. Yesterday and today I have given lectures on the university about file system and computer security. In the FS lecture I have mentioned NTFS, while in the security lectures I've talked a bit about the Windows NT security model and NT kernel. I have openly said in front of students that as much as I don't like Microsoft, I think that NT kernel and NTFS are well-designed. The NT kernel itself has a VMS heritage, and VMS is known as one of the most secure systems. So what happened to Windows?

Win32 API happened. Not many people know that the Win32 API is just a layer over the NT kernel which is mostly undocumented. Applications call into the Win32 API which in turn then makes a series of calls into the NT kernel. I have read a bit about the NT kernel in the Tannenbaum's operating systems book, a bit on the internet and I have to admit that it has a really good design. You can run UNIX environment on top of it and it runs an OS/2 emulation layer. Microsoft now gives away for free their SFU - Services for Unix - package. High-quality POSIX API implementation, korn shell, utilities, etc. You also have an option to install gcc! OK, there are drawbacks as well - not having proper support for position-independent code as ELF binaries have, but it could be fixed. It's a "feature" of Win32 PECOFF loader (A consequence of this is that DLL must be relocated if it can't be loaded into its load virtual address. Relocation means patching machine instructions, so you lose the advantage of sharing code pages. Suddenly you have a new, very similar, copy of the code which already is in memory.).

Although I'm a UNIX user, I have to admit that I admire the NT kernel, its security model, its object-based design and the NTFS file system. I have been unjust when I said to student that I don't like Microsoft. What I actually don't like is:
  • From the programmer's perspective, the brain-damaged Win32 API.

  • From the user's perspective, the idiotic GUI which makes it next to impossible to perform tasks efficiently.

For these reasons I currently avoid Windows in my professional and hobbys work as much as possible. Currently, it is operating system for lamers, not for scientists and programmers.

It would be fun for me if I could get the raw NT kernel, few basic device drivers, a decent shell and documentation. And then to start exploring its capabilities. To build a novel operating system on top of it. I think that it is general enough to support any kind of application area.

But things are changing in the Win32 world. I'm closely following up what is happening with .NET, and I like what I'm seeing. Look for example at the preview of features for C#3.0. Then there is Monad, their novel shell based on objects. It seems that they are striving to take over the UNIX sysadmin base and I think that they will succeed eventually. Heck, they managed to warm me up.

Now, Java. It is a brain-damaged language and platform. I sincerely hope that MS manages to kill it in the long run. I wonder why they still support running Java applications on their OS, since Java is their direct competitor. My prediction is that Java will run terribly slowly in Longhorn (even more so than now.. they will never refuse to run Java programs and VM, but it will run very slowly compared to .NET. Even now it is slower according to some benchmarks.)

There are other interesting projects from Microsoft Research, in programming languages, operating systems, etc. Check out their web site. Too bad that MS has already created an "evil" perception of themselves so that people are automatically skeptical to good things coming out of MS research.


DRM could help privacy

Most people think that DRM is evil, and I mostly agree. The main argument against DRM is that it could take the control out of the computer owner's hands. But I think that there is one good thing that could come out of it: the creator of the content decides in what ways is it allowed to use it or not. The "only" problem with this is where the control of the use of the content stops and control of the computer begins.

So what happened that made me wish have DRM? I have sent a mail to someone, and that person put another one into CC: when replying. I didn't want the mail to be CC'd or BCC'd. It was supposed to be a private mail. But there is no way that I can enforce it. I did not even dare to ask the original recipient NOT to put anyone in CC: when replying, because I know him. He's pretty sloppy and he could have forgotten to delete the relevant parts of the original message (where I ask for it not to be CC'd) in addition to CC:-ing the reply, which would make things very bad.

OK, this time no damage was done, because I half-expected up front that the recipient will CC his reply, so I was careful about what I was writing. But the question remains - how can you, with the current technology, enforce such policies and make sure that the private communication remains private? Currently, the written communication in any form (esp. digital, like emails) is, most often, a silent agreement of parties. The fact is that once the document leaves the creator's supervision, receiving parties can do anything they want with it.

Most of the time silent agreements about privacy work well. I think it is so because of the mutual interest that all parties have in keeping the communication private. Sometimes, as happened to me, they fail, i.e. there is a discrepancy between communicating parties in what should be held private, or a difference in perceived mutual interest.

If I had some kind of DRM, I could just mark my mail as "for your eyes only" and forget about the problem. I'm interested in other opinions - how do you (try to) solve this problem?

The opera problem is fixed!

Yaay! I can now go back to using Opera for posting to Blogger.


Disk encryption

I've set up my gentoo linux to use disk encryption in the following cases:
  • encrypted swap using random key on each boot
  • encrypted /tmp using random key on each boot
  • encrypted disk partition for sensitive data
I'm trying out BestCrypt from Jetico Software, a Finnish company. First I thought of using cryptoloop or dm-crypt, but they have some security weaknesses. Kernels >= 2.6.10 include some stronger IV modes, according to the author of the patch. Does anyone know some other analysis of this new encryption mode?

On the other hand, I didn't find any formal analysis of BestCrypt either. But it also doesn't have any published weaknesses. It is a commercial piece of software, not very expensive, with two bonus features:
  • hidden encrypted containers, and
  • encrypted containers are readable on Win32.
The latter feature might come in handy. BestCrypt is also very user-friendly, flexible and almost trivial to use. I think it is worth its price.

Ah yes, my threat model, and why have I decided now to use disk encryption? Soon I'm going to travel around with my laptop and I don't want my work and personal data available to strangers if it gets stolen. So I'm moving all of my mails, work and private stuff to the encrypted partition. And all of the dot-files in my home directory. It's surprising how much data can be found there.


Blogger opera problems: update

I just received a reply from the user support. They say that the problem is known and that they are working on fixing it. So, wait and see a little bit longer what happens.


Blogger ate my post!

Yesterday I was typing my post, and blogger ate it! Well, almost. I had to stop my typing so I saved the text as draft. Usually I copy the text from the browser into some local scratch file before submitting it. Yesterday I forgot to do that. When I returned to continue editing the draft - no text was there! Today I tried to make a test post, with the same result. All of this I did using the Opera browser.

So I tried to publish something using Firefox - and it worked. It wouldn't be so surprising if I had not published 3-4 posts previously with Opera.

The conclusion - Google/Blogger made some changes in the meantime, and they've caused Opera to be unusable for publishing blogs! Time to contact them, if possible.


Another NetBSD kernel crash

It's a bit ironic to start getting frequent crashes after donating some money to the NetBSD project :) Seriously, I'm very happy with NetBSD (regardless of it rarely crashing in weird circumstances) and I'm eagerly waiting for the 3.0 release, so.. what the heck. I seriously think that the NetBSD is doing an excellent job and they deserve support.

Now, about the crash. Find the details here.


Linguistics part 2: tenses in indirect speech

I've learned another easy to learn, but to me unintuitive, "feature" of the Norwegian language. It concerns the indirect speech. I've been told that it also works this way in English, so I'm going to describe it on an english example. Consider the following sentences:

  1. Each time I ask her something, she answers: "I don't know."

  2. Yesterday I asked her something and she answered: "I don't know."

  3. Yesterday I asked her something and she answered: "I didn't know."
and the following indirect speech sentences:

  1. Each time I ask her something, she answers that she doesn't know.

  2. Yesterday I asked her something, and she answered that she didn't know.

  3. Yesterday I asked her something, and she answered that she doesn't know.

The 1st and 2nd sentence in direct speech translate to 1st and 2nd in the indirect speech. So, where is the stumbling stone? Well, according to "my" logic the correspondence should be 1(direct)-1(indirect), 2-3, 3-2. In the Croatian language, changing the tense in the reported speech, changes the semantics of what the person really said.

My thinking is not quite in line with the "correct" 1-1, 2-2 with 3(indirect) being nonsense as it is currently written (to make it correct, it should be changed to some "more past" tense - "hadn't known"?). This is yet another easy rule that I can accept, but it isn't logical to me (i.e. it is not in line how it works in Croatian).

I can't know whether I'll write more linguistic themes, but I've learned the following: your 1st spoken language deeply models the way you think. I might become a very good user of some foreign language not related to Slavic languages, like Norwegian or English, but I strongly doubt that I will ever really understand it.

And a small joke at the end: What is the most spoken language in the world? Answer: English, as spoken by foreigners.


mutt 1.5.10i

So I've tried to compile it. I am not satisfied with the 1.4.something that is shipped with RHEL4 linux because I have to mess with some ugly macros to get PGP/inline working (too many people are still using broken mail clients).

Take 1: It complains that the system iconv routines are not good enough, and suggests that I should use libiconv.

Take 2: I just put --disable-iconv option to the configure script. I never use I18N features, always use just ASCII character set, and in fact always give --disable-nls to configure script. Now it compiled, but when I tried to open my IMAP folder, I've received a very bizzare and uninformative error "Bad IDN".

I google around and find out that the GNU libidn has something to do with UTF-8 and other character conversions (i didn't want to investigate further; the whole issue with I18N and localization disgusts me). So, Take 3: I find GNU libiconv somewhere on the system, give its location to the configure script, compile the program, and - voila! Now it works with IMAP.

The developers could have at least either:

  1. Document that the IMAP support needs iconv,

  2. Make a better error text instead of "Bad IDN",

  3. Check for dependencies in the configure script,

  4. Make the IMAP code independent of libidn.

Preferably, all of it. Instead, it took me about hour and a half of experimenting with various compilation options to make it work.

NetBSD hard crash

Hm, I seem to have a talent for crashing programs and OS-es with seemingly innocent stuff. I reported the course of events on the tech-kern NetBSD mailing list. You can read about the details here.


James Bond sucks

I've been chatting a little with my friends on IRC. Someone mentioned James Bond. I've said that it is the only movie that I've watched in cinema and wanted my money back. I mean, I've watched many, basically trash, movies, but JB is really the worst. I don't think that any movie can get lower than that.

Just wanted to share my thoughts with the world.


Classic buffer overflow bug; another YES for segmentation

Today I helped to debug mysterious crashes in a code roughly like the following:

void f() {
char buf[SIZE], *ptr = buf, logmsg[128];

while(some_condition) {
/* do something with ptr */
sprintf(logmsg, "something", ...)

This code crashed. This was within a multithreaded program and GDB has shown very dubious stack trace (some function calls with many unrecognized addresses in between), and it has shown very strange address in ptr. An illuminating piece of information, provided by the author of the code, is that the loop runs fine for the first time.

Since all uses of ptr in that loop are pretty innocent and couldn't set it to some really wild value that GDB has shown, my suspicion fell to the sprintf. The "something" within the sprintf was a long text which itself (without interpolated values) overflowed the logmsg buffer. Since stack grows downwards on the AMD64 architecture, the overflown buffer overwrote the ptr pointer.

What made it relatively hard to debug are two things;

  1. GDB showing dubious stack trace (this is understandable after the fact, since the whole stack was thrashed), and

  2. error manifesting itself (program crash) at a different point in the program (next time the ptr pointer is used) rather than at the point where it was made (sprintf overwriting the buffer).

Such buffer overflows could be prevented by using the segmentation mechanism present on the IA32 architecture. All memory accesses to segments that are less than 1MB in size are limit-checked in the hardware, with byte granularity. However, nobody uses these feautres for several reasons:

  • They are not portable (i.e. tied only to IA32 architecture),

  • No widely-used 32-bit compiler supports FAR pointers. MSVC ignores the far keyword, and gcc never did support it. AFAIK, the only compiler that honours it is the Watcom compiler.

  • Loading segment registers is relatively costly operation, and, what's worse, Win32 reserves the FS register for itself and other segments must not be changed. So if you have more than one buffer to access, you need to reload the segment register all the time.

The final blow to segmentation came when AMD decided to effectively kill it. The descriptors are still present for system management purposes, but segment base and limits are forced to flat 64-bit address space in hardware.

IMO, it's a shame to see a good technology getting killed because of lack of imagination on the side of programmers. And some "holy grail" of portability. If it were used properly, and OS'es designed the way they should have been, the IA32 architecture could have been almost 100% immune to buffer overflows which are to blaim for many insecurities in applications. And without all of the "NX", "virus protection", etc. hype.


Linguistics (long time - no update)

The last post doesn't really count as an update. Last days I'm pretty busy with learning for the "Mathematical optimization" and Norwegian language courses. In between I'm finishing the documentation for a software project that I'm planning to release soon.

While learning Norwegian, I've realized that I don't really understand the concept of definiteness ("a" vs. "the" in English). I'm using it by "feeling" based on experience and a large number of memorized use-cases. Most of the time (>50%) I'm right, but still too often wrong for my personal taste. I see how much I'm wrong when I give my texts to someone else for proofreading. My first language is Croatian, and we simply don't have the concept of definiteness.

The usual rule explained in schools is that definite form is used when it is known what you're talking about, or when it has been mentioned before. So, here are two counterexamples where this "rule" fails.

When talking about certaing things, you might want to ask "What's THE difference?". But why "THE" instead of "A"? The difference might not exist, might not be known to the other party and most probably it has not been mentioned before in the conversation.

Another case: I'm currently a student and if I were to introduce myself to someone in the administration, I would say "I'm A student that.." But why A, not THE? I'm something very concrete and known to myself (obviously) and also known to the other person, as I'm standing right in front of him/her.

The most plausible short explanation, that I can accept,was given to me by someone over irc: the definiteness is not about known/mentioned vs. unknown/not mentioned, but notion vs. concrete. In the first example "the difference" is definite as the conversation is about the difference in one concrete case, and in the second case "a student" is indefinite as it is only a notion that describes me in more detail. Still, I could find exceptions to this "rule" too.

I've discussed the issue with many Norwegians and almost drove them crazy. I've shown them some examples from the course book, asked them why is in this example (in)definite form and what would it mean if it were changed the other way. Nobody could give me a satisfactory answer, except that it would "sound strange". They were unsure about how the meaning changes, if at all. The teacher also failed to explain to me the definiteness issue.

The problem, when learning a new language, is that I don't have this feeling about "sounding right" that I've acquired (rather than learned) in English. In more complicated cases when there are no clear rules to apply, I'm just guessing with the expected result - being wrong about 50% of the time.

A lesser problem is when to use preteritum (sometimes also called imperfect; simple past tense in English) vs. perfect (present perfect in English). Again, the concept of definiteness creeps out in the "short rules". Preteritum is used when talking about a determined point in the past time, and perfect is used when the exact time is not important. In Croatian we have 4 past tenses. Of them, perfect is used roughly 95% of the time, plusquamperfect 4% of the time, and the other two (imperfect and aorist) 1% of the time, mostly in "artistic" literature. In the spoken language, you will hear aorist or imperfect in few phrases. For some reason, it is easier for me to grasp the "time" definiteness. It is always somehow implied in the broader context. I still sometimes do mistakes, but not nearly as often as with articles.

To conclude, the article usage (definiteness) is still a mystery to me, while I've almost grasped the usage of preteritum vs. perfect. Here are some interesting links:

If you follow other links on theese pages, you can have some quite interesting read. Until very recentlty, I've thoght that I could never get interested in linguistics. Never say never!


Switched to Opera

About a month ago I've switched from Firefox+Thunderbird to Opera+mutt. I got fed up with all too frequent security updates for Firefox. They get even more troublesome when you have to compile it yourself, like on FreeBSD. As for ditching thunderbird - I got fed up with it trying to move my existing emails from one version of TB + enigmail to totally different versions. I've spent 3 hours making it work!

Overall, I'm pretty satisfied with the switch. Opera has some stability problems - it crashes now and then. The good thing is that, unlike FF, it remembers all pages that were open before the crash. The restart takes just a few seconds, and Opera resumes seamlessly where it left off, so it's not even annoying.

Opera also has an integrated RSS, NNTP and mail readers. As I'm reading less and less news every day, due to time constraints, so I didn't even look at it. RSS reader is working, and I'm using it.

I'm not using the mail client because it currently does not support PGP. I didn't investigate whether it supports S/MIME. Another reason for not using it is the experience with thunderbird and moving archived mails around and between upgrades of TB. The option "import/export mail", assuming that it exists, is not present in the most obvious places. I've learned my lesson regarding archived mails, and I'm not going to repeat it.

With mutt, moving mails is straightforward - just move the mail folders directory. No stale setup in obscure places that needs to be cleaned up after an upgrade, as is the case with thunderbird.


Secure programming

Today I had an interesting talk with a student.. He was interested in the new features of the AMD64 architecture. When I mentioned the non-exec page support, he started to think about writing secure programs. And then he got me thinking... and what follows is the result of this thinking.

First I'd like to clarify the difference between an incorrect and an invalid program. [What I'm using may not be the standard naming, but I was lazy to look it up on the web]. The following simple code demonstrates both points:

float average(int *a, int n)
int sum = 0, i;
for(i = 0; i <= n; i++)
sum += a[i];
return sum;

The program is incorrect because it does not return the average at all - it just returns the sum of the elements in the array. What makes this program invalid is that it is accessing the array a beyond its supposed limit. There are also examples of invalid programs that behave correctly in a certain runtime environment, but break in another - e.g. ones that access free()d memory.

There is no way that any programming language or runtime environment can identify an incorrect program. Doing so would require both:

  1. An understanding of programmer's intentions (some kind of AI?), and

  2. detecting that the code does not do what the programmer intended. This is provably impossible - it is equivalent to the halting problem.

Many of today's exploits are due to invalid programs (e.g. web servers, ftp servers, public forums, etc.), and there exist many mechanisms to defend against them: the processor's hardware protection features, basic UNIX security model and its extensions like ACLs, various security patches like grsecurity, SELinux (integrated into the Linux 2.6 kernel) or TrustedBSD.

Given proper support by a programming language, invalid programs may become near (if not completely) to impossible to produce. Examples of such languages are python, Java, Haskell, Ocaml, lisp, .NET runtime, etc. All of them are more or less slower than C in raw performance by varying degrees of magnitude (check out this page for exact numbers). Some of them come very close to C. But in many real-world scenarios (moderately loaded web and ftp servers) raw performance is of little importance and security and data integrity is of paramount importance.

Now, I'm not saying that grsecurity or SELinux are useless and that they can be replaced altogether by choosing better programming language. I think that they should be the second line of defense against security breaches, not the first as it is today. Reasons why is the second line of defense needed at all:

  1. The compiler and the runtime system can be buggy themselves, and

  2. There are always authors of malicious programs that do their deeds mostly in unsafe languages and assembly.

What follows is are, in my view, some of the reasons for using an unsafe language still today:

  • Portability. I claim that C is the most portable programming language in existence today. (If someone gives a "counter-example" in the lines of compiling Win32 source on a Linux system, they fail to see the difference between an API and a programming language). Because of that,

  • C is the main language for interfacing to foreign libraries and the operating system itself. On every platform it has a well defined ABI, something that all other languages either lack, or express it by C interfaces.

  • Sometimes it is easier to code the whole program in C than make bindings for you particular language of choice. (Although tools like swig help considerably with the task.)

  • Raw performance.

  • Not wanting to use Java, and managers consider other languages too exotic.

C is the lingua franca of computing world. IMHO, it is here to stay until a feasible alternative appears. Personally, I don't see it on the horizon yet.


a pleasant surprise - gentoo

It's night and I'm sleepy, so this will be short. I've replaced ubuntu with gentoo on my laptop. I have to say that gentoo is a pleasant surprise. No behind-the-scenes "magic" - you have to do everything by yourself. All important aspects of the system are well documented. Compiling stuff is not that of a chore with the emerge - their packaging system. I also like the non-SYSV init because close to BSD-style init.

I also have to say that I have a very good experience with their support. The only package that failed to compile was bochs. I decided to report the bug and received an answer within 15 minutes. Impressive.

So far - no frustrations and a feeling of a well-designed system that gives great freedom to its users.


apt-get remove ubuntu

I am seriously considering this step. Replacement - Slackware or back to FreeBSD.

Some packages in the distribution are too old (e.g. tetex-2 when there is tetex-3 available), and some programs are configured at compile-time in a way that does not fit my needs (e.g. bochs without the internal debugger).

The worst thing is that the default installation does not install most of the needed development packages (e.g. X11 headers) so you have to download them from the net. They are not even present on the distribution CD! On top of all, since early in the morning, my local time, ubuntu repositories are down, so I can't get the packages I need.

Most of the other software I can compile myself, but I'm missing the critical part which I'd really rather not install on my own - X11 headers. Other SW that I need (or, better said, must, because of inadequate configuration) to compile on my own, depends on X11 headers.

I don't want to keep ranting on.. I can just conclude that I've never experienced so much frustration with any system on my destkop (I've tried FreeBSD and several flavors of linux) as with Ubuntu. And things that I've mentioned here you can't really blame on the beta.


Java exceptions

Recently there has been a hot discussion about Java vs. C (well, these discussions always seem hot) on sci.crypt. I have also given my contribution to the discussion.

The discussion concentrates on exceptions vs. error codes for error handling. I think that, in most cases, error codes are the better solution. Read the article for the explanation.

However, there I've started an idea on which I'd like to elaborate in more detail here. There are many problems with the naive use of exceptions, and this article gives an introduction to exception safety and exception guarantees.

To summarize the article shortly, one of bigger problems is that throwing an exception can leave your object in an inconsistent state. This is not only C++ problem, it is a problem in any language that provides exceptions, including Java and Python. So why not borrow transactions from RDBMS and introduce them in the programming language?

Consider the following artificial example (in some pseudo, C-like language):

x = 2;
begin {
// some mutating code
throw some_exception;

The transaction consists of everything within begin{}. When execution "cleanly" falls over the closing brace, the transaction is commited. If an exception is thrown within the begin{} block, then all changes to any member variables of the object done within the transaction are "rolled-back". In the above example, the x will be reverted to value 2.

Some open questions:
  • What to do static variables within transaction blocks,
  • should nested transactions be allowed,
  • implementation issues,
  • performance considerations.


Consumer clubs and privacy

I remember recently shopping something in "Turbo Sport", a sport equipment shop in Croatia. Some young girl (maybe 14-15 years old) in the queue before me has been offered a membership in the shop's "customer club". With each purchase you collect some points and when you collect enough points, you get some kind of discount or a gift. In fact, every customer is offered the membership, which I had declined.

The point is that the membership application asks you for a bunch of personal data like the date of birth, full name, the home address and so on (it took her few minutes to fill out the complete membership form). Although I'm not keen on giving away my personal data, another thing bothered me more than that at the moment.

It is the fact that each time someone uses his/her membership card, the shop has an opportunity to tie together all their past purchases and build their profile. Such kind of information is much more valuable to the shop than a few euros of discount it gives. Somebody may object that the shop can build a profile every time someone uses a credit-card, but note that Croatian citizens make their payments most often (like in more >90% cases) in cash.

Maybe I'm paranoid, but.. if they don't build your profile, why not just make a "stamp" scheme that e.g. Subway fast-food chain has. There is no single piece of personal information on that small piece of paper on which stamps are collected. When you collect 10 stamps, you get a discount and a new piece of paper.

Why does "Turbo Sport" make you fill out a form that takes a few minutes to fill out (i.e. they ask you for much data). What's worse, you have no idea what for are they using your data or how safely it is stored. Then I didn't bother to search for it, but I don't even think that there was some kind of privacy policy written on the membership form.

Croatia does have some regulations about securing personal data (I'm not acquainted with them at all), but I don't think that any law can forbid organizations to collect personal data. In the end, everyone gives it out volountarily when asked to.

And oh, BTW, I really like Subway's sandwiches. Much better (both in the terms of taste and ingredients) than McDonald's products :)


Ubuntu criticism

Yesterday I've installed Ubuntu 5.10 on a friend's recommendation. Some things I want to work on are simply not compilable at all under FreeBSD. Friend really praised Ubuntu 5.10 (currently in preview), so I said, what the heck, I might as well try it. Note that I was warned upfront that this is a preview version, but he said that everything was working perfectly on his laptop. I have Dell Latitude D800 with a Broadcom wireless card.

During the installation, I've set my location to Oslo. This resulted in using Norway ubuntu mirrors in /etc/apt/sources.list. These repositories were not usable for some reason I didn't want to investigate. On friend's suggestion, I've pointed the APT sources file to US mirrors and things started working OK.

I'm disappointed that, although they boast about fresh software packages, they only have tetex 2 in their .deb packages. I had to compile on my own tetex 3.

Another seirous criticism goes to default settings of applications. I've installed postfix in the "smart-host" mode. I was really surprised to see it was by default listening on all network interfaces! In FreeBSD, sendmail is installed and enabled by default (to allow local mail delivery), but it is listening only on It took me 5 minutes to install postfix docs, find the option and fix it, but still - I don't think that this is a reasonable, secure default.

The most serious critic goes to stability. It froze on me twice in two days. Even with Windows XP I haven't had such experience in quite a while. I don't remember the first freeze, but the second is like this: I'm working in GNOME (trying it out), lock screen, return, and I can't do anything. Processor starts heating up (I hear the fan turning up to its max), after a long wait (~30sec) the password entry box appears, but I can't type in anything. I can see that the thing isn't really frozen (i.e. there is interrupt processing - e.g. the mouse pointer moves), but I can't even switch to console to log in and kill the X session. C-A-Backspace also doesn't do kill it. So I do Ctrl-Alt-Del and succeed in cleanly rebooting the mahcine.

OK, it is beta, preview, call it whatever you like. I would expect some applications to crash, some packages not really being set up the way they should. But falling apart in a way that I have to reboot the machine - sadly, comparable to Windows. Which have, in contrast, become remarkably more stable as of XP (on my former work I used to run them, without any trouble, for over two weeks without a reboot).

My last, application-related problem, was getting Thunderbird to work properly, after restoring the backup of my home directory. To add to the trouble, my accounts were enigmail-enabled with an older version of enigmail. It took me 2 hours of trial and error to figure out how to properly move my thunderbird profile over to the new OS. Start thunderbird on an "empty" profile, don't create any accounts or profiles, and copy from the old profile just the prefs.js file and Mail directory. Just in case, I've deleted all references to enigmail in the copied prefs.js and everything worked fine. OK, this is hardly Ubuntu fault.

Not having something like "Archive accounts" and "Restore from archive" in the GUI menus to ease account and mail folder migration, I blame on the thunderbird team. Because unpacking the existing profile and using it with the new thunderbird and new plugin versions, simply does not work.

On the positive side, almost all hardware that I've had a chance to test at home is working perfectly. I didn't bother with 3D acceleration because I don't need it. Sound, USB stick hotplugging, even the wireless card, using ndiswrapper, is working OK. I didn't have the chance yet to try out the Ethernet card yet, but it is readily recognized on boot, so there should be no problems. I guess there should be no problems with other HW, either.

The only exception is that "small rubber thing between GHB keys" that serves as a mouse-replacement. This simply stops working in X after soft reboot. It works after coldstart, i.e. power-on.

Another nice surprise is Acrobat Reader 7 in the packages. I was really amazed about that! Software packages seem really fresh overall, save for the unfortunate tetex. Firefox 1.0.6 and Thunderbird 1.0.6. I haven't moved from 1.0.2 on FBSD because I was too lazy to compile it and fixes were appearing to frequently. I simply did not have the nerves to keep pace.

To conclude: I've migrated from FreeBSD 5.3 which is far from an OS targetted at desktop users, but I've had no problems setting it up. And absolutely no problems (stability or otherwise) running it. Although, to be fair, I was using only fvwm2 on FBSD, and I'm trying out the new GNOME on Ubuntu. As can be expected - more features, more bugs.

Overall, it's a nice system with a familiar Debian flavor. However, I would recommend that you wait for the release. This preview version is too much trouble in my experience.


GPL no more - part 2

Part 1 continued.. with a slight change in a title :)

Create a version of (BSD-licensed!) ld.so that will execute the program in a separate address space from its libraries. So the "procedure call" will call the stub in the BSD licensed ld.so which will just "pass a message" to the real shared library and return a result code to the application.

There are at least two interesting problems to be addressed:
  1. passing pointers, and
  2. implementing functions that manipulate the whole process state (e.g. setuid, execve, fork, etc.)
The problem of passing pointers can be easily solved in the case of data pointers - just map the application memory into the dynamic linker's memory. Sharing run-time data cannot, IMHO, be considered as a derivative work.

More interesting problem is passing of code pointers. Excluding self-modifying code, the dynamic linker does not make code segments of the dynamic library available to the executable or vice-versa. Dynamic library and the executable share code only with the dynamic linker, not mutually. First time when a callback is called in the "other" module, SIGSEGV is received. Special handler can then "patch" the pointer to point into some trampoline code within the dynamic linker, and then that trampoline code forwards the call to its real destination.

The second class of problems could be solved partly within the kernel and partly within the dynamic loader. E.g. by making the dynamic linker, the executable and dynamic libraries share the same process-structure except for memory maps.

Beware! This is just a wild idea. It may or may not be feasible in practice and it may or may not circumvent GPL. Nevertheless, it is an interesting technical problem.

This scheme may or may not have found a loophole in the GPL. Quoting Werner Koch:
Just for the record: Linking is only one indication that the whole is a derived work. There is no one to one relation ship and in particular even two separate processes might make up a derived work.
IMHO, this statement just illustrates the problem with GPL: there is no clear definition of derived work.

GPL once more - part 1

This text is divided into two parts. This (the first part) discusses the issue, and in the second part I will describe a method that will (possibly) make it possible to get rid of GPL problems related to dynamic linking - once and for all :)

On the GnuPG -devel and -users mailing lists there is again a vigorous discussion on Werner's decision not to support PKCS#11 in GnuPG. One of the major arguments is again the licensing issue.

It seems that the problem is linking of the (proprietary) PKCS#11 shared library in the same address space with the GnuPG binary. E.g. executing a GPL'd binary that uses Win32 API does not violate GPL when calling kernel because:
  1. They are not linked together in the same address space but communicate by well-defined messages.
  2. There is also an exception in the GPL for the software delivered as a "part of the operating system".
Some participants distinguish between a protocol and an API (the library for which eventually gets dynamically, or even statically, linked into your application).

I maintain that, in many cases, these two things are the same. Consider two possible implementations of PKCS#11:
  1. As a shared library, and
  2. as a daemon listening on a local ("UNIX") socket.
In both cases the amount of data sharing between the applications is exactly the same. The only difference is in the mechanism of sharing.

But for some strange reason, using the first implementation of PKCS#11 would make the whole program, under GPL interpretation, a "derivative work". User is (legally) not allowed even to dynamically link a proprietary PKCS#11 DLL with a GPL incompatible license.

However, the second case is perfectly OK use-case: GPL application communicates with proprietary daemon using a pre-established "protocol" and everything is fine. Legally, what happens if UNIX sockets are implemented by shared memory? What does GPL say in that case?

I maintain that in both cases, the amount of sharing and coupling between the application and PKCS#11 module is in both cases the same. The only difference is in mechanism used to accomplish this sharing. In both cases the application shares only data with the service provider (PKCS#11 - in either form).

In my opinion, data sharing does not and cannot (in any common-sense interpretation) constitute a "derivative work". Furthermore, GPL is only about code sharing. Then again, I'm not a lawyer - comments are welcome.

Then again, you can register callback functions with PKCS#11 driver, creating an even more interesting situation. These functions can be translated into asynchronous messages from some daemon back to the application. So in my view, even in this case there is no code sharing. There is as much code sharing as with registering a ubiquitous WindowProc callback with Win32 API.

Paradoxically, it seems that GnuPG would be allowed to use closed-source MS CAPI because it is delivered as a "part of the operating system". The way CAPI works is:

your application -> CAPI -> back-end driver

So your application interacts with CAPI (delivered as a part of the operating system - an exception permitted by the GPL), and CAPI interacts with the back-end driver for the particular hardware device.

More interested readers can look into the gnupg-users and gnupg-devel archives for the thread named "OpenPGP card". (For the future record: around 6th Sep, 2005.)

Based on these things (also read my previous posts on GPG and Bitlbee), I personally find the GPL license flawed in many respects. A good open-source license should not allow for "paradoxes" (like the above example) and it also should not prohibit mixing the code (in any way - source form, static or dynamic linking) with another, essentialy "free" or standards-conforming code, like OpenSSL[1] or PKCS#11[2] drivers.

[1] free as in both open-source and no charge
[2] free as in no charge

During these discussions on the list I had a most interesting idea. Read about it soon in the next blog entry.


Bitlbee encrypted storage patch

This patch fixes a security problem with the Bitlbee server. Namely, users' accounts, passwords and contacts are written in plaintext or weakly obfuscated form, that can't stand serious cryptanalysis, on the disk. If the public Bitlbee server gets compromised, much of your personal data can be stolen.

This patch fixes this. It employs CAST5-CBC encryption algorithm and PKCS#5 password-based key derivation with 2^16 iterations to slow down password-guessing attacks. The patch depends on OpenSSL. It also adds more extensive logging to make it possible to track for example repeated logging attempts.

It also fixes a file descriptor leak which could be used for remote DoS attack, and some memory leaks. Namely, in the original Bitlbee, one file descriptor was leaked for each invalid login attempt. If spawned from inetd (the recommended way of running Bitlbee), it is trivial to overflow the system's file descriptor table. And then everything halts.

The file descriptor leak fix made it into the mainstream (latest patches). Memory leaks I didn't report so no official patch is available.

And these bugs I've uncovered only after looking into two functions in the whole source! And I'm not intimately familiar with the code.

Will the "encrypted storage" get merged into the mainstream Bitlbee? I've contacted their authors, and they aren't keen on incorporating it into the mainstream. Guess why - OpenSSL license isn't "free enough" and compatible with GPL. They prefer one of:
  • GNU TLS which, judging from the manual, is inferior to OpenSSL in many ways and lacks features some of which are necessary for the functionality of this patch (among other things, PKCS#5 key derivation).
  • Mozilla NSS which is both an overkill and even less documented than OpenSSL.
After exchanging quite a few messages with the development team on the issues of encryption and file descriptor leaks.. they don't strike me as a serious development team. Criticizing them more (and I do have more to say) would not be fair since they did much work and are keeping the project alive.

My motivation for writing the patch? I'm user of a certain public server and I'm concerned about my data. I have convinced the administrator of that server to apply the encrypted storage patch. He likes the idea.

Bitlbee is also a really great program, although, IMHO, not yet quite ready for public servers. But it's perfectly OK for personal use.

I'm an IRC user and finally I can chat with my friends on IM networks who can't use IRC, without messing around with several different client programs. Thanks to the Bitlbee team.


Dubious coding styles

I want to show you two examples of what I consider dubious coding practices. You, the reader, may not consider them that bad, so feel free to leave your comments.

I'll start with easier example (a bit obfuscated to hide the true origin):
if(strncmp(buf, "action", 6) == 0) { ... }
strncpy(dst, src, strlen(src));
These are two unrelated code snippets from the same program.
As I know the author, I have asked him why is he using the 'n' versions of string functions. He argued that it costs him nothing to add the "extra check" to make the code a bit more secure. Not only that the code is not more secure, the extra arguments make it less readable. Plus, there are two additional pitfalls:
  1. If you change the string in the first line, you have to remember to change its length.
  2. The second line is effectively just plain old strcpy with all its problems. Only that it runs at half the speed of strcmp because of extra length calculation.
Both examples badly broken w.r.t. the problem they are supposed to solve: possible buffer overflows. The problem is that they are taking the wrong length into account. strncmp should look at the actual length of data received over the network into the buf, and then you should use strncmp only if there exists a possibility that the string in buf is not null-terminated. Otherwise, plain strcmp works just fine. The error in strcpy is more serious: it should look at the destination buffer length, not the source buffer length.

Another practice is concerned with memory allocation and deallocation. I've argued about this with another friend. He maintains that manually deallocating acquired memory before program termination is a good coding practice in all circumstances. I maintain that it is a wasted effort for heap memory acquired by brk or mmap because it is unconditionally deallocated by the OS on program exit.

Admittedly, manual memory deallocation on exit might be neccessary under some embedded systems (it was certainly neccessary under DOS!), but the discussion was in the context of UNIX and Windows.

Why is it wasted effort? Because the operating system is much better than the programmer in tracking memory. Memory management is one of the main tasks of the OS. Also, in a larger program, there is no good place to deallocate all memory. There are other people sharing the same opinion. Look for example at this short article.

Friend's counter-argument to this is SYSV SHM, which persists after the program termination. His argument is flawed in a fundamental way: I am arguing that manual memory deallocation is wasted effort only in those cases when it is done by the OS itself. SYSV SHM doesn't fall into that category.

With SYSV SHM you might actually want it to persist in between program runs. One useful application that comes to mind is acting as a cache of most recently used data.


Xen: The platform for OS research

A quote from the Xen homepage:
Xen is a virtual machine monitor for x86 that supports execution of multiple guest operating systems with unprecedented levels of performance and resource isolation. I assume that readers know what virtualization is so I'm not going to explain it further.
There are many interesting research operating systems, for example SPIN or Exokernel. Major weakness of these and other non-mainstream operating systems is hardware support: often they support little more but the hardware available at the time of development. Since writing device drivers is not an interesting part of OS research, they quickly get behind modern hardware.

Xen has the potential to change this situation. It can greatly lower the initial effort of writing much support OS code (boot-loaders, device-drivers, etc.) and prolong life time of the new OS.

It exposes a simple system interface to the guest OS. This interface hides much ugliness of the x86 architecture and to successfully run under Xen, your OS must use it instead of directly invoking privileged instructions.

But more importantly, Xen exposes very simple interface to the virtual disk and network "devices". Being simple, a driver for the guest OS is easy to code. This driver forwards the guest OS operations to the host OS. Currently supported host OS-es are Linux and NetBSD. Which means that if there is a driver for the physical disk or network card the host OS is using, your OS within Xen can also run on that hardware.

Linux and NetBSD are actively maintained and the OS has more time to gain popularity. Its authors are relieved of keeping pace with HW development, thus having more time to concentrate on the development of the core OS. And users don't have to fear that some time in the future they won't be able to run the OS on new hardware.

Some readers may object that the OS has much more HW to deal with than disk and network (e.g. sound and video). But you don't need to develop the core of the OS.

Added bonuses of using Xen for OS development:
  • The virtualized OS runs almost at the same speed as on bare hardware. According to this benchmark, the virtualized OS runs within 10% speed on bare hardware. Bochs or Qemu are much slower. The comparison is not quite fair because they emulate all of typical PC hardware. But as I've argued above, you rarely need this in OS development.
  • No slow reboots when the OS crashes in its development phase.
  • No need for writing much boiler-plate code such as boot loaders or switching to protected mode. Your "virtual CPU" starts in flat, paged protected mode.
  • Another project has developed a remote GDB stub which makes it possible to debug Xen and guest operating systems.
This article would not be complete without mentioning OSKit (quote)
...the OSKit's goal is to lower the barrier to entry to OS R&D and to lower its costs. The OSKit makes it vastly easier to create a new OS, port an existing OS to the x86...
Its development seems to have ceased in 2002 and seems to have the same problems as other research OS I've mentioned. It provides glue code to make it possible to use drivers from Linux and FreeBSD kernels. However, Linux and FreeBSD driver architecture is changing and it is questionable how many drivers for the modern hardware can be used.

Tags: OS Virtualization


Performance scaling of sorting algorithms

It is a well-known fact that the lowest theoretical bound on complexity of comparison-based sorting is O(n log n). I was very surprised to find out that, for sufficiently large number of items, the time it takes to sort k*n items (with k fixed) is only k times the time to sort n items. For deeper analysis you can have a look at this article.

So, it actually turns out that the approach of buying faster hardware can sometimes give expected performance gains.



Python lacks sparse arrays

Last days I'm writing many python script to postprocess some calculation results and format them into a form suitable for importing into Mathematica.

In Perl I could write $array[$x]=0 and not worry if index $x actually exists in the array. If not, then the "hole" between the last valid index and $x is automatically filled with undef.

Not so in Python. There you get an exception if you try to access non-existant index in the list. It will not grow the list for you automatically. Of course, you can write the code to test whether the index you're trying to access is larger than the current list size and manually extend the list. This kind of coding defeats the purpose of having a high-level language. It is the same way I would code it in C using realloc().

Another possibility is to use a dictionary with integer indices as keys. However, access time to individual elements might not be constant and keys are traversed in an undefined order when iterating over the dictionary. Unlike Perl, you do not get None value for non-existing keys because the set of all possible keys is open and not restricted to numeric values up to the last key with value not equal to None as it would be with an automatically-growing list.

I have found several workarounds, but the mere fact of having to work around the problem annoys me. Not enough to go back to Perl, but it is annoying having to repeat the same hack over and over again.


I/O revisited

In a private e-mail a friend has suggested to use mmap() to gain performance when doing I/O. I've just shuddered at the idea because my data file is 6GB large and I'm working on a 32-bit machine, which gives me about 2.5GB of contiguous virtual address space. So I would have to deal with the following problems:
  • Map another portion of the file when I'm close to the end of the current chunk.
  • Mapping is only page-granular so I would have to be careful not to read memory that is actually beyond the end of the file.
Much of tricky bookkeeping details that are easy to get wrong and hard to find.

The problem is even worse with text files, as the data is not fixed-width, and sscanf() doesn't return the place in the string where it has stopped scanning. So I would have to write additional code that rescans the whole input item again just to find the beginning of the next item. This defeats the very purpose of using mmap() to avoid copying of the data from kernel to user buffer. All of that for a questionable gain in speed.

Even if I would get 2x speed gain in I/O, it wouldn't justify the programming effort to debug it. I've come to value program correctness more than performance. Of what use is the toolkit if even its author questions its correctness? And using mmap would make it not portable.

Some tests indicate that performance can indeed by gained by using mmap(), however this example is using a very small file which can be completely mapped into memory. It doesn't reflect the programming complexity when this is not the case. Of course, there are no address space limitations on 64-bit platforms.

Also, I want to comment on the results: they are mostly incorrect. What the article is measuring is the ratio (r_read + s) / (r_mmap + s) where r_ is the reading time and s is the scaling time which is the same, independent of the read time. This is definetly not the ratio they had intended to measure: r_read / r_mmap.

The most relevant figure is for the largest scaling factor = smallest image = least processing time - the s variable in the equation is closest to 0.

Thinking about sscanf, I've just looked up the C++ stringstream documentation - it would solve the problem sscanf() has. Out of the box, it is unusable for the amount of data I'm dealing with because the std::basic_stringbuf class makes a copy of the string passed to it.


A hidden gem: Ocaml

Recently I have discovered a hidden gem among programming languages: Objective Caml, or Ocaml. The next paragraph should have been an overview its qualities, but I've erased all of it. The web site and the documentation speak for themselves. Instead, I will describe my personal experience in the few days I have been playing with it.

In one of the earlier posts I was looking for a language that makes it easy to write higher-order programs, is reasonably easy to interface with C data types and has an efficient native-code compiler. Ocaml fits this description. If I had learned it sooner, I wouldn't have even considered writing my data analysis programs in C++. So, although I do not have an immediate use for Ocaml, I'm nevertheless learning it now so that I can apply it to future problems.

Another interesting fact is that Microsoft is developing its .NET variant of Ocaml - F#. Integration with .NET will make it possible for programmers to use large number of existing .NET libraries, something that Ocaml is still missing. OK, there is a code repository, but the number of available libraries is very small compared to "mainstream" languages like C, C++, Python or Java.

I'm learning Ocaml by solving problems from Croatian competitions in programming. These are non-trivial algorithmic problems (harder than problems usually found in introductory texts to any programming language), and they only use file I/O in their specifications, so I don't have to learn a bunch of external libraries, not directly related to the language itself, to become productive.

What follows are some observations about my programming style in Ocaml:
  • I tend to solve problems bottom-up.
  • Thorough testing.
  • I think at higher levels of abstraction and writing more general and reusable code.
  • Less time to getting correct code.
I to do bottom-up design in C and C++, but it's neither that safe (e.g. in C I use many void pointers) or convenient (templates and structs in C++). I can quickly test my code in the interactive interpreter.

Some of Ocaml's features can be simulated in C++ by function objects and templates (traits and policies come to mind), but with horrible and too verbose syntax and even more horrible error messages if you do mess something up. Not to mention long type declarations of nested templates if you want to store a function in a variable (actually, you can't do that with a real function but only with a function object).

The overloaded "auto" keyword is only a proposal for the next C++ standard (it is something akin to Ocaml's type inference) and the "typeof" keyword is still an extension, although provided by at least gcc. Maybe even lambda gets it into the language.

The only minor annoyance I have come across until now are some inconsistencies in the order of function arguments in the standard library. For example, most of the time the data structure operated upon is the last argument to the function except in a few cases where it is the first.

I have more than 10 years of experience in C programming and haven't given up on it because of its inconsistencies. I know almost all of the standard C library by heart, yet I'm unable to memorize the prototypes of three often used functions: fgets, fread and fwrite. They are inconsistent with respect to all other file functions in that that they take their FILE* argument last. No matter how often have I used those functions, I still have to look them up in the man page.

Consistency in the standard library is important. It makes life easier to the programmer. But it is not a fatal flaw which would make me disregard the whole programming language.

C++ is going to get more suitable for writing functional-style programs. But Ocaml is here right now. And unlike some other attempts to produce "languages at least fast as and 'better' than C++" which are usually limited only to x86 and/or Linux operating systems, Ocaml is truly portable: it supports nine processor architectures and many operating systems, including MS Windows and commercial Unixes.


The case for binary I/O

In C and C++ related usenet groups I have lately seen many advices by the programmers, to other programmers, that they should avoid binary I/O and use text I/O with arguments such as:
  • it is portable,
  • can be easily viewed and edited with an ordinary text editor,
  • is easier to debug.
But text I/O involves a lot of unnecessary formatting and parsing compared to binary I/O. Lately I have chosen to use C++ for a programming project, although it is inferior to LISP for the task at hand, just because I couldn't find any open-source LISP implementation that can easily deal with C binary data types (or other 'raw' machine data types for that matter).

Today I have told to myself: maybe you are wrong. Maybe you are again wrongly assuming that it would be slow. Maybe you could have used text I/O and LISP. So I've decided to set up an experiment which has shown that I didn't assume wrong this time. Read on.

I have written four (two for text and two for binary I/O) trivial C programs that
  1. Write the file of 25 million single-precision (4-byte IEEE) floating-point random numbers between 0 and 1 (both programs were started with the default random seed, so they generate identical sequences). In the text file, the delimiter was a single newline character.
  2. Read back those numbers, calculating an average along the way.
The platform on which the experiment was conducted is a dual 2GHz Xeon with 3G of RAM, running NetBSD 2.0.2. Results may vary on other stdio implementations, but I think that they give a good overall picture.

The results are most interesting; all timings are averages based on three measurements. Timing deviation was noticable only on the 2nd decimal place:
  1. Writing to text file=43.7 seconds; binary file=5.2 seconds; binary I/O comes out about 8.4 times faster.
  2. Reading from text file=13.4 seconds; binary file=3.0 seconds; binary I/O comes out about 4.4 times faster.
  3. File size ratio: binary comes out 2.25 times smaller (100M vs. 225M)
I was surprised by the large difference in ratios in writing between binary and text I/O, so I have written a third pair of programs that just writes constants in the output file, so there is no time spent in the rand function. This time the binary I/O came out 5.6 times faster. This is a much fairer comparison, because it doesn't include rand overhead.

Overall, I think that "use text I/O" cannot be given as a general recommendation. For sure, I don't want to use it because I have files with gigabytes of binary data. Text I/O would be in this case even slower due to more complicated (more than one field) input record parsing.

As I see it, the only thing in favor of text I/O is its portability. The other two arguments in the beginning of this text are just a consequence of lack of adequate tools for binary file viewing/editing. There is a bunch of hex-editors, but none of them will interpret the data for you (i.e. you can't tell it "display these 4 bytes as IEEE floating-point number) or let you define your own structures composed of different data typed.

There is GNU Data Workshop. I can't comment on it as it is written in Java, and therefore do not want to (i.e., on NetBSD, can't) use it. Not to mention that it is GUI and I'm doing all my programing on the server via ssh.

After some searching, I've come across a hex editor that offers some of the capabilities I need and is console-based: bed. However, it has an unintuitive user interface although it is menu-driven.

If you know of a good, non-Java, console-based, flexible binary editor, please drop me a note. Thanks :)


Published in Phrack!

Finally, Phrack#63 is out! And my article on runtime binary encryption is inside!

I have to admit that I didn't believe that what I had described in the article can really be done. For quite some time I was intrigued by the idea, but every time I was thinking too much about the code design. You know, sit down to think about register allocation, how to organize the program etc. Every such attempt was entirely fruitless.

After few such futile attempts, one day I sat down and told to myself "now you are going to write code". I cannot describe my thought process at that point. I guess I was just trying to reproduce the steps CPU does when executing code: fetch, decode, execute, go to fetch. I had the image of what I want to program and I was just typing code... Correct code. It took me only a week to code and debug the main decryption routine. If you look at the code, there are many comments.. more than the actual code. They were very helpful. Whenever I discovered a discrepancy between the comment and the code, I have closely examined it and found a bug.. sometimes in the code, and sometimes in the comment.

In total, the initial version had only 5 bugs in total for which I had to run the program in the debugger. One was in XDE disassembler (not mine code), one was a typo (loading from address 0 instead constant 0), two of them were related to context switching. One led me to realize that indirect jump targets have to be calculated with traced registers... I don't remember the another. The last bug was forgetting to adjust the C stack before leaving the decryption function. I was amazed at myself how could I write such complicated code with so few errors.

Unfortunately, it has come to my mind a bit too late that I could ask the Phrack staff for an author copy of the hardcover edition. All 1000 of them have been handed out on What The Hack, and few of them will be available at Ruxcon in Australia which, unfortunately, I can't attend. Oh well.

The text published in Phrack is updated quite a bit from the (now outdated) version available on my site. Soon I'm going to update the PDF there.

In the meantime, another weird idea related to code encryption has come to my mind. In short, the decryptor can decrypt another decryptor which, in turn, decrypts the real code. Soon I'm going to vacation so I hope to catch some time to experiment with it. The results will be of course published :)


A GSM routing experiment

Once, a few years ago, me and two of my colleagues (I'll call us A,B,C) have made on purpose a circular call diversion on our GSM phones: A -> B -> C -> A. We were interested in what will happen if any of us gets called. All of us could normally call anybody without a problem so we called a 4th colleague to call one of us from a fixed line. The result - an automatic machine saying that the person is temporarily unavailable.

Since we were allowed to make circular diversions in the first place, I think that there does not exist a central database of all diversions. They are executed only when needed. So, how smart are these GSM switches in loop detection, and what are their limits?

Consider the same experiment, extended in several ways:
  1. There also exists an 'unavailable' diversion which takes place only if the person is, well, not available. This makes it possible to create more complicated graph topologies than simple closed rings. The more complicated the topology - the longer it takes to find a 'leaf' node by DFS (one not involved in any cycle).
  2. Invent a suitably complicated topology.
  3. Involve more, many more, people. On the order of few hundreds to few thousands. Instruct them how to arrange their call diversions. Explain to them that the experiment will cost themnothing.
  4. Get those people spatially close, in groups of about 40. I think that a single base station cannot handle more than about 30 users. So all of them will (maybe) overload a single base station. Depending on the software.
  5. Maybe involve people from different countires and/or providers in the same country. They can't share data about the (un)availability of a user.
  6. Perform a call from the fixed line to a well-chosen user in the arranged 'diversion graph' and observe what happens.
I can envision several outcomes:
  • The software simply does not respect the 'not available' diversion if it detects a loop. This leaves us with a simple ring topology and nothing interesting will happen. The caller will simply get the 'user not available' message.
  • The software has a maximum limit on the depth of DFS search for the final divert destination. So at some point, the cycle will be artificially broken by the built-in limitation and the phone of some user that is in the cycle will ring.
  • The software is really smart and traverses the whole graph before concluding that it is cyclic and replying by 'user not availabe'. A lot of the operator's bandwidth could be consumed in the process, maybe temporarily disrupting the service.
  • The operator service totally breaks down, either due to software failure or due to the overloading of base stations (too busy communicating trying to find the end user and not servicing 'real' calls).
If this is read by some engineer who has designed a GSM switch and/or software... please, do not hesitate to leave comment about its inner workings!

On the other hand, if someone seriously decides to make real the experiment I have just described.. I volounteer to invent an 'interesting' topology :)


Strange NetBSD behaviour, C++ allocators and containers

I guess I'm one of the few people who have stumbled upon a very unpleasant implementation detail (some even call it a bug) of the NetBSD (2.0.2) buffer cache. The symptom is that the ioflush daemon (kernel thread actually) that is responsible for flushing dirty pages back to disk sometimes goes crazy.

I have done some internet searches before posting to the tech-kern NetBSD mailing list, but didn't come up with anything useful. You can read my complaint and the discussion that it has generated here.

The scenario in short: I have a data-processing application that is using BerkeleyDB. I have set up the BDB environment so that it uses mmap-allocated cache (512MB) backed by the real file on the file system. Pages belonging to this file get dirty quite quickly and the ioflush gets active relatively often. When that happens, my program stops (in the 'vnlock' wait channel) and does no useful work. I can't even access the file system where these files resides because of some locking issue (also mentioned in the thread above).

I have recognized that the problem might be in locking and at first I had blaimed the in-kernel NFS server. So I've recompiled the kernel, but the problem persisted.

As you can read in the referenced thread, I was so desperate about the problem that I was even considering to install another OS. Fortunately, in a moment of inspiration (after a good night sleep) I have remembered that I've seen SYSV SHM mentioned in the BDB documentation that I had browsed some time ago. Recompiling the kernel to allow 1G of SYSV shared memory and reconfiguring the BDB 'environment' to use SYSV SHM instead of mmap()ed file-backed storage worked around the kernel problem.

You might ask how didn't it happen before - because I haven't been using the BDB 'environment' until few days ago. In that mode, BDB allocates its cache memory by malloc() which eventually uses sbrk(). Unfortunately, in NetBSD you can't have more than 1GB of sbrk() heap (but you can mmap() almost all of 3GB available to user applications) and the BDB competed for the heap memory with my memory-hungry application.

One day maybe I'll implement C++ allocator on top of AT&T's vmalloc and corresponding discipline to allocate memory by anonymous mmap(). In C++ every allocator is bound to an object type which is fixed in size. This is ideal use for vmalloc's vmpool method for allocating objects of the same size within a memory region. This has several advantages:
  • no need to maintain free block lists
  • does not produce internal fragmentation (so it wastes less memory than 'ordinary' malloc implementation)
  • it is faster
Writing a corresponding C++ wrapper around Cdt might also be a good idea. When I was developing hashed text utilities (you can find it on freshmeat) I found out that Cdt is both faster and has less memory overhead compared to the C++ STL.

Oh well. Now I have everything working reasonably efficiently (both in time and memory) so that I can do my work. These plans I'm leaving for later. Or I'll do it while waiting, if I'm forced to switch to disk-based hashes.

It's a shame not being able to use full 3G virtual address space in NetBSD using standard C and C++ libraries.