mutt 1.5.10i

So I've tried to compile it. I am not satisfied with the 1.4.something that is shipped with RHEL4 linux because I have to mess with some ugly macros to get PGP/inline working (too many people are still using broken mail clients).

Take 1: It complains that the system iconv routines are not good enough, and suggests that I should use libiconv.

Take 2: I just put --disable-iconv option to the configure script. I never use I18N features, always use just ASCII character set, and in fact always give --disable-nls to configure script. Now it compiled, but when I tried to open my IMAP folder, I've received a very bizzare and uninformative error "Bad IDN".

I google around and find out that the GNU libidn has something to do with UTF-8 and other character conversions (i didn't want to investigate further; the whole issue with I18N and localization disgusts me). So, Take 3: I find GNU libiconv somewhere on the system, give its location to the configure script, compile the program, and - voila! Now it works with IMAP.

The developers could have at least either:

  1. Document that the IMAP support needs iconv,

  2. Make a better error text instead of "Bad IDN",

  3. Check for dependencies in the configure script,

  4. Make the IMAP code independent of libidn.

Preferably, all of it. Instead, it took me about hour and a half of experimenting with various compilation options to make it work.

NetBSD hard crash

Hm, I seem to have a talent for crashing programs and OS-es with seemingly innocent stuff. I reported the course of events on the tech-kern NetBSD mailing list. You can read about the details here.


James Bond sucks

I've been chatting a little with my friends on IRC. Someone mentioned James Bond. I've said that it is the only movie that I've watched in cinema and wanted my money back. I mean, I've watched many, basically trash, movies, but JB is really the worst. I don't think that any movie can get lower than that.

Just wanted to share my thoughts with the world.


Classic buffer overflow bug; another YES for segmentation

Today I helped to debug mysterious crashes in a code roughly like the following:

void f() {
char buf[SIZE], *ptr = buf, logmsg[128];

while(some_condition) {
/* do something with ptr */
sprintf(logmsg, "something", ...)

This code crashed. This was within a multithreaded program and GDB has shown very dubious stack trace (some function calls with many unrecognized addresses in between), and it has shown very strange address in ptr. An illuminating piece of information, provided by the author of the code, is that the loop runs fine for the first time.

Since all uses of ptr in that loop are pretty innocent and couldn't set it to some really wild value that GDB has shown, my suspicion fell to the sprintf. The "something" within the sprintf was a long text which itself (without interpolated values) overflowed the logmsg buffer. Since stack grows downwards on the AMD64 architecture, the overflown buffer overwrote the ptr pointer.

What made it relatively hard to debug are two things;

  1. GDB showing dubious stack trace (this is understandable after the fact, since the whole stack was thrashed), and

  2. error manifesting itself (program crash) at a different point in the program (next time the ptr pointer is used) rather than at the point where it was made (sprintf overwriting the buffer).

Such buffer overflows could be prevented by using the segmentation mechanism present on the IA32 architecture. All memory accesses to segments that are less than 1MB in size are limit-checked in the hardware, with byte granularity. However, nobody uses these feautres for several reasons:

  • They are not portable (i.e. tied only to IA32 architecture),

  • No widely-used 32-bit compiler supports FAR pointers. MSVC ignores the far keyword, and gcc never did support it. AFAIK, the only compiler that honours it is the Watcom compiler.

  • Loading segment registers is relatively costly operation, and, what's worse, Win32 reserves the FS register for itself and other segments must not be changed. So if you have more than one buffer to access, you need to reload the segment register all the time.

The final blow to segmentation came when AMD decided to effectively kill it. The descriptors are still present for system management purposes, but segment base and limits are forced to flat 64-bit address space in hardware.

IMO, it's a shame to see a good technology getting killed because of lack of imagination on the side of programmers. And some "holy grail" of portability. If it were used properly, and OS'es designed the way they should have been, the IA32 architecture could have been almost 100% immune to buffer overflows which are to blaim for many insecurities in applications. And without all of the "NX", "virus protection", etc. hype.


Linguistics (long time - no update)

The last post doesn't really count as an update. Last days I'm pretty busy with learning for the "Mathematical optimization" and Norwegian language courses. In between I'm finishing the documentation for a software project that I'm planning to release soon.

While learning Norwegian, I've realized that I don't really understand the concept of definiteness ("a" vs. "the" in English). I'm using it by "feeling" based on experience and a large number of memorized use-cases. Most of the time (>50%) I'm right, but still too often wrong for my personal taste. I see how much I'm wrong when I give my texts to someone else for proofreading. My first language is Croatian, and we simply don't have the concept of definiteness.

The usual rule explained in schools is that definite form is used when it is known what you're talking about, or when it has been mentioned before. So, here are two counterexamples where this "rule" fails.

When talking about certaing things, you might want to ask "What's THE difference?". But why "THE" instead of "A"? The difference might not exist, might not be known to the other party and most probably it has not been mentioned before in the conversation.

Another case: I'm currently a student and if I were to introduce myself to someone in the administration, I would say "I'm A student that.." But why A, not THE? I'm something very concrete and known to myself (obviously) and also known to the other person, as I'm standing right in front of him/her.

The most plausible short explanation, that I can accept,was given to me by someone over irc: the definiteness is not about known/mentioned vs. unknown/not mentioned, but notion vs. concrete. In the first example "the difference" is definite as the conversation is about the difference in one concrete case, and in the second case "a student" is indefinite as it is only a notion that describes me in more detail. Still, I could find exceptions to this "rule" too.

I've discussed the issue with many Norwegians and almost drove them crazy. I've shown them some examples from the course book, asked them why is in this example (in)definite form and what would it mean if it were changed the other way. Nobody could give me a satisfactory answer, except that it would "sound strange". They were unsure about how the meaning changes, if at all. The teacher also failed to explain to me the definiteness issue.

The problem, when learning a new language, is that I don't have this feeling about "sounding right" that I've acquired (rather than learned) in English. In more complicated cases when there are no clear rules to apply, I'm just guessing with the expected result - being wrong about 50% of the time.

A lesser problem is when to use preteritum (sometimes also called imperfect; simple past tense in English) vs. perfect (present perfect in English). Again, the concept of definiteness creeps out in the "short rules". Preteritum is used when talking about a determined point in the past time, and perfect is used when the exact time is not important. In Croatian we have 4 past tenses. Of them, perfect is used roughly 95% of the time, plusquamperfect 4% of the time, and the other two (imperfect and aorist) 1% of the time, mostly in "artistic" literature. In the spoken language, you will hear aorist or imperfect in few phrases. For some reason, it is easier for me to grasp the "time" definiteness. It is always somehow implied in the broader context. I still sometimes do mistakes, but not nearly as often as with articles.

To conclude, the article usage (definiteness) is still a mystery to me, while I've almost grasped the usage of preteritum vs. perfect. Here are some interesting links:

If you follow other links on theese pages, you can have some quite interesting read. Until very recentlty, I've thoght that I could never get interested in linguistics. Never say never!


Switched to Opera

About a month ago I've switched from Firefox+Thunderbird to Opera+mutt. I got fed up with all too frequent security updates for Firefox. They get even more troublesome when you have to compile it yourself, like on FreeBSD. As for ditching thunderbird - I got fed up with it trying to move my existing emails from one version of TB + enigmail to totally different versions. I've spent 3 hours making it work!

Overall, I'm pretty satisfied with the switch. Opera has some stability problems - it crashes now and then. The good thing is that, unlike FF, it remembers all pages that were open before the crash. The restart takes just a few seconds, and Opera resumes seamlessly where it left off, so it's not even annoying.

Opera also has an integrated RSS, NNTP and mail readers. As I'm reading less and less news every day, due to time constraints, so I didn't even look at it. RSS reader is working, and I'm using it.

I'm not using the mail client because it currently does not support PGP. I didn't investigate whether it supports S/MIME. Another reason for not using it is the experience with thunderbird and moving archived mails around and between upgrades of TB. The option "import/export mail", assuming that it exists, is not present in the most obvious places. I've learned my lesson regarding archived mails, and I'm not going to repeat it.

With mutt, moving mails is straightforward - just move the mail folders directory. No stale setup in obscure places that needs to be cleaned up after an upgrade, as is the case with thunderbird.


Secure programming

Today I had an interesting talk with a student.. He was interested in the new features of the AMD64 architecture. When I mentioned the non-exec page support, he started to think about writing secure programs. And then he got me thinking... and what follows is the result of this thinking.

First I'd like to clarify the difference between an incorrect and an invalid program. [What I'm using may not be the standard naming, but I was lazy to look it up on the web]. The following simple code demonstrates both points:

float average(int *a, int n)
int sum = 0, i;
for(i = 0; i <= n; i++)
sum += a[i];
return sum;

The program is incorrect because it does not return the average at all - it just returns the sum of the elements in the array. What makes this program invalid is that it is accessing the array a beyond its supposed limit. There are also examples of invalid programs that behave correctly in a certain runtime environment, but break in another - e.g. ones that access free()d memory.

There is no way that any programming language or runtime environment can identify an incorrect program. Doing so would require both:

  1. An understanding of programmer's intentions (some kind of AI?), and

  2. detecting that the code does not do what the programmer intended. This is provably impossible - it is equivalent to the halting problem.

Many of today's exploits are due to invalid programs (e.g. web servers, ftp servers, public forums, etc.), and there exist many mechanisms to defend against them: the processor's hardware protection features, basic UNIX security model and its extensions like ACLs, various security patches like grsecurity, SELinux (integrated into the Linux 2.6 kernel) or TrustedBSD.

Given proper support by a programming language, invalid programs may become near (if not completely) to impossible to produce. Examples of such languages are python, Java, Haskell, Ocaml, lisp, .NET runtime, etc. All of them are more or less slower than C in raw performance by varying degrees of magnitude (check out this page for exact numbers). Some of them come very close to C. But in many real-world scenarios (moderately loaded web and ftp servers) raw performance is of little importance and security and data integrity is of paramount importance.

Now, I'm not saying that grsecurity or SELinux are useless and that they can be replaced altogether by choosing better programming language. I think that they should be the second line of defense against security breaches, not the first as it is today. Reasons why is the second line of defense needed at all:

  1. The compiler and the runtime system can be buggy themselves, and

  2. There are always authors of malicious programs that do their deeds mostly in unsafe languages and assembly.

What follows is are, in my view, some of the reasons for using an unsafe language still today:

  • Portability. I claim that C is the most portable programming language in existence today. (If someone gives a "counter-example" in the lines of compiling Win32 source on a Linux system, they fail to see the difference between an API and a programming language). Because of that,

  • C is the main language for interfacing to foreign libraries and the operating system itself. On every platform it has a well defined ABI, something that all other languages either lack, or express it by C interfaces.

  • Sometimes it is easier to code the whole program in C than make bindings for you particular language of choice. (Although tools like swig help considerably with the task.)

  • Raw performance.

  • Not wanting to use Java, and managers consider other languages too exotic.

C is the lingua franca of computing world. IMHO, it is here to stay until a feasible alternative appears. Personally, I don't see it on the horizon yet.