How do commitees invent?

This article is still a fascinating read, although it dates back to 1968. It analyzes some fundamental aspects of large systems development, and has some nice insights. Although I didn't read the book "The Mythical Man-Month", reading the article reminded me of the title.



In defense of Tcl

I found link to this article on reddit, and following the discussion, a link to this article. If you're short on time, I recommend you to read the 2nd one.


Category theory

I have just finished reading the book "Conceptual Mathematics: A First Introduction to Categories". It is one of the best mathematics books I've read. Not only because the exposed theroy is very interesting (indeed, I've learned deeper meaning of some things that I've taken for granted until now), but also for its exceptional presentation style: concepts are explained through many illustrated examples in an accessible way, without delving into deep abstractions.

If you want to read an accessible introduction to category theory, I can heartily recommended this book.



How the questions shape the answers

Many things today are advertised based on evidence obtained by polls.
This article [PDF], although a bit old (dating to 1999), is a fascinating read. Basically, polls can end up with dramatically different results, depending on the way the questions are asked. I'm wondering whether marketeers use these tricks as described in the article when constructing polls that should come out in their favor..



A critical view on upstart

There has been lately much fuss about upstart which is supposed to be a single replacement for several daemons: SysV-init, cron, inetd, hotplug... This article is a commercial trying to sell upstart, but somehow it hasn't convinced me.

The first reason I'm not comfortable with the idea is that UNIX is built on the philosophy "one tool for one job". Every tool should do one job and do it well. Merging several different tasks into one program just feels "yucky". It feels "windows-way".

The second reason is security and stability. Take for example cron. Even though it has a seemingly simple task, a very popular implementation, vixie-cron, had some security bugs in the past. Now it's going to be reimplemented again. Not to mention that upstart then becomes a single point of failure. Imagine e.g. remotely induced reboot or kernel panic by triggering some bug in upstart's networking code and making it crash. (And since it's running instead of init, it'll bring the whole system down).

Rest of this post is a dissection of the article cited above.

The first part of the article is what I call "Problem setting." In trying to explain why SysV init doesn't work today, the autor says "The simple answer is that our computer has become far more flexible." and enumerates certain situations which do not really pose a problem. Most of them are related to hotplugged hardware which is already handled (I see it working nicely on RH and SuSE). He concludes with "We've been able to hack the existing system to make much of this possible, however the result is chock-full of race conditions and bugs." While I admit that there may be some problems, saying "chock-full" would be a blatant exaggeration.

Question 1: Why replace replace everything instead of sticking with the UNIX philosophy and making the current system better?

The second part is "Design". On the surface it seems sane and well-designed, but take a look at the example list of events; the most striking one for me is "the root filesystem is now writable". He doesn't say who is supposed to generate these events! This is a shift of responsibility from getting the startup script ordering right to generating the right
at the right time. Currently we have a small, well-controlled set of dedicated processes, and the upstart system seems to lead towards an explosion of possibilites along at least two dimensions: kinds of events and when they are generated.

Question 2: Who is generating events? Who is writing event handlers? If the event handling system is extendible, how is the system integrity guaranteed (so that the faulty handler doesn't bring the whole upstart process down)? What happens when an event isn't handled because a handler is missing? Is it an error, how is it reported and to whom, is it
simply ignored..?

The third part is "showing off" or FUD-ing. Showing existing tools in black light in order sell "upstart" better. This is the funniest part! Namely, the author doesn't seem to find good arguments against initng, a dependency-based system, so he resorts to ridiculous argumentation: "However this means that you need to have goals in mind when you boot the system, you need to have decided that you want gdm to be started in order for it, and its dependencies, to be started.", continuing with "[..initng] It can reorder a fixed set of jobs, but cannot dynamically determine the set of jobs needed for that particular boot." and finishing with "initng starts with a list of goals and works out how to get there, upstart starts with nothing and finds out where it gets to."

Question 3: How is the computer supposed to figure out, even before it is turned on, what the user has in mind and what should be the target configuration? How could it know that the user on a particular boot wants e.g. xdm to boot, without any user input (e.g. without being given a goal)?

upstart seems like a solution to an invented (or, to say the least, exaggerated) problem. I hope the author does better job of coding than argumenting its usefulness.

[From personal experience, dependency-based system is used on FreeBSD, NetBSD, and on Gentoo Linux. It's very easy to maintain, and I like it better than SysV-init style boot process.]

[Another note: One should distinguish between the init program and the SysV-init style boot scripts. It is possible to use the (SysV-)init program, with a dependency-based system. And that's exactly what Gentoo is does.]



Deliberate bad engineering

Today I discussed a simple problem with a colleague. He wants to design a simple format for representing graphs (nodes and arcs). He said that it's probably going to be XML-based to which I replied that it is a very bad engineering decision (see below for short explanation why and other choices). He agreed to that, but he's going with XML anyway. He said that today's IT industry is full of people "falling" for 3-letter acronyms and that he just wants them off his back. So, from the engineering viewpoint, the better solution has lost because of XML's "psychological effect". I imagine something like "It uses XML, therefore it must be good." Bullshit.

Problem with XML is that it's not very human-friendly and it's complicated to parse. Yes, you have ready made parsers, but you still have to walk the parsed tree. I suggested embedding an interpreted language such as Lua or Tcl. Syntax is definetly more readable than XML, parser is there, the user gets additional power (e.g. programatically constructing the graph instead of tediously enumerating nodes and arcs), and there is no "walking the tree". "Tags" in the scripting language can be bound to C functions and made directly executable, thus constructing the internal graph representation as the graph description is read, w/o subsequent walking of the tree.

The better solution is obvious, he agrees that it's better, but he's still not going to use it because of the "buzzword effect". How many projects have ended up taking the buzzword route, sometimes to their own detriment (I've myself participated in one such project.. I suggested otherewise, but the management decided to go the "3-letter way", and the project was more than half a year late).



An idea for gmail

How about revoking emails? Namely, the user sending something to @gmail.com could also send a special "mail revoke" message to revoke his sent mail. If the sender is also a gmail user, he could be offered a simple GUI option. To eliminate potential privacy concerns, the feedback would be either "revoke received" or "trying to revoke invalid message". Any other kind of message would let the sender know whether his mail was already read or not.

What would be the safe contents of such message? Something like SHA1(sender_address || SHA1(mail_body)) would be sufficient, although crude in the first iteration.

More importantly, mechanisms to securely "revoke" only own messages are available, why isn't such option already implemented?