2005-12-30

Kakuro solver.. and solvability by logic

I have added kakuro solver to my collection of sudoku solvers. An interesting thing about this solver is that it generates a small table of possible sums before the search for the solution begins. During the search it uses only bit operations, and it is extremely fast: 0.025s is the longest time taken for the puzzles from the sample puzzles page.

All of sudoku-like puzzles are advertised to be solvable by logic only and that no guessing is neccessary. What does this mean, exactly? I don't know what do the problem-setters mean by this, but to me this means that the game must be solvable with zero lookahead. At any stage while solving the puzzle, from the very beginning, there must exist at least one cell whose contents is uniquely determined only by the current state of the grid. At no point should you have to think about consequences of the move you are about to make.

For example, if your reasoning is "putting 3 at this position will make it impossible to put 7 at some other position afterwards", this is not zero lookahead and it is not logic; it is guessing. Logic is the following reasoning "i must put 3 in this cell because..", where the argument is based only on the current state of the grid, without looking in the future.

Of course, guessing is sometimes easier to the unskilled, but the fundamental question is: can the puzzle be solved with zero lookahead, i.e. by pure deductive logic based on the rules of the game and the current contents of the grid? If the answer to this question is negative, then sudoku (and its variants) are falsely advertised in my view.

Tags:

2005-12-28

Luajit

In one of my previous posts I've written about the Lua programming language and its possible usage as a standardized data exchange format instead of XML.

Lua has its own JIT, dubbed luajit. I like its architecture, especially its DynASM part which can be used for other dynamic code-generation purposes.

Tags:

2005-12-27

"Killer sudoku" solver

It's a long time since I know about the Sudoku puzzle. I've never been drawn to it, although I did think occasionaly that it would be an interesting programming exercise to implement a solver for that. Few days ago I learned about the variant called "killer sudoku" which is supposed to be harder than plain sudoku. So I set out to write a solver for that variant.

My first idea was to write the program in GNU prolog, using its finite domain solver. But then I said, heck! Let's see whether the brute-force method is feasible. And indeed, it turns out that it is. You can download the solver coded in C++ here. The user interface is lacking, but the program does what it is supposed to. On 2GHz Pentium M, the easiest (as classified by the problem setter) is solved in 5ms, and the hardest in 400ms.

One thing is bugging me though. Sudoku puzzles are supposed to be solved by logic, without guessing. How do you explain that the brute-force solver, which is employing always the same "dumb" strategy of guessing and backtracking, takes the time proportional to the puzzle level? "Harder" puzzle should imply harder logical reasoning. I fail to see the connection between "harder" logical reasoning and the amount of work the brute-force solver has to perform. Somehow I think that these two are independent.

Tags:

2005-12-23

The uglyness of C++ locale

I'm very happy how C++ handles most of the things, both in the language and in the standard library. One monstrosity is the handling of locale. Look at the following piece of code, that I came across on the boost users mailing list:

time_facet* of = new time_facet();
of->set_iso_extended_format();
std::cout.imbue(std::locale(std::locale::classic(), of));

In my opinion, this is horrible. I believe that it's very flexible, as everything else in the standard library, but still... I won't even try to imagine how would you set locale to some particular language's collation sequence and character encoding.

I think that there must be a better way of handling locales in programs. Today, it's simply a mess throughout the computer world (e.g. I shiver when I remember the time I had to convince Oracle to chew up data in some particular encoding). As for the above code, I could write date/time formatting code much faster than to figure out the above piece of code.

Localization, as it is today, simply sucks big time. It should be transparent both to programmers and users. It should work out of the box. The way it is today, it is just creating headaches for everyone. Incresed effort on programmers (look at the above code!), a plethora of confusing settings to users (character sets, keyboard layouts, date/time formats, decimal points, etc). Yes, it's flexible, and all those settings should stay under some advanced options, but the system should work seamlessly with mixed locales and character sets. The user should not be forced to use a single locale system-wide (as it is in Windows) or within a signle application (as it is in Unix).

I see many open problems in mixed locales, but I don't think that they are unsolvable. They are just very hard. Some "locale grand unification framework" is missing. And until it appears, I'm afraid that the state of affairs in I18N will remain a mess it is now. Maybe (or should I say, probably!), with a bit of bad luck, it'll just get worse.

2005-12-13

Linus Torvalds on GNOME

Here he is encouraging people to switch to KDE and here he's calling some people "fucking idiots". What stroke me in the latter post and what I would like to single out is the often-neglected logical fallacy with "majority" arguments:

==QUOTE
In other words: your "majority" argument is total and utter BULLSHIT. It can be true for any particular feature, but it's simply not true in general.

To put it in mathematical terms: "The Intersection of all Majorities is the empty set", or its corollary: "The Union of even the smallest minorities is the universal set".

It's a total logical fallacy to think that the intersection of two majorities would still be a majority. It is pretty damn rare, in fact, because these things are absolutely not correlated.
==
Read the provided links for the complete context.

Personally, I don't use neither GNOME or KDE. Everything that I'm doing, I'm doing in fvwm2. Configuration is trivial to copy across different machines, and it starts up blazingly fast, compared to other "desktop systems". What's more, it isn't a constant resource hog with a bunch of background-running processes as GNOE and KDE are.

Tags:

2005-12-10

The nonsense of XML

Rick Jelliffe is proposing in his blog post "Unicode-aware CPUs" instead of Binary XML. Personally, I think that this is nonsense. With the recent talks about binary xml I'm even more convinced that XML as a data format as such including all of its companions, such as the XML Schema and XPath, should be abandoned.

With XML the data is transformed from machine ("binary") representation into XML greatly inflating its size in the process, transmitted to the other end, possibly some XSLT rules are applied, the resulting data checked against some schema and then reparsed back into the binary form. XSLT and XML Schema are programming languages in themselves, just with a hideous syntax. XML was supposed to be editable by hand without any special tools, but this is, IMHO, nearly impossible with XSLT and XML Schema. So their existence defeat the purpose of XML.

Other proponents of XML claim that the XML is self-describing. I claim that this is also nonsense. Actually, there are two levels of self-description. The 1st is the description of data types (e.g. the height tag must be a real number greater or equal to 0). This purpose is fulfilled by the accompanying XML Schema. The 2nd level is the interpretation of the data. This is done exclusively by the programs manipulating that data. Nobody can say (without actually inspecting the underlying program code) that the height tag doesn't actually specify the age of some item. The 2nd level of self-description cannot be achieved by any means.

Now, the binary XML is, IMHO, superfluous in itself. There already exists a perfectly good data description format and its binary encoding rules: the ASN.1 and BER, DER and PER (Basic, Distinguished and Packed Encoding Rules), with a good solution for namespace problems (the OID hierarchy). Why invent another one?

I propose to use a standardized virtual machine code as a means of data exchange. One good VM and accompanying language for such purpose is Lua because of its simplicity, portability and power as a programming language.

The data rarely exists in some vacuum. It is often manipulated in many ways. So why not transmit a VM program that creates the data? Instead of transmitting e.g. the XML chunk <height>20</height>, why not transmit a piece of VM code that, when executed, stores value 20 in the field height of some table? In the same way that programs currently expect specific XML tags in an XML document, they would expect certain names of tables and fields in the VM. Transformation and validation of the data would be other VM programs.

Lua has a distinct advantage here: simple declarative syntax for textual representation of data and programs, and a portable binary VM code for representing the same program. For the purposes of debugging, one would send textual representation of the data, and in the production one would send binary representation. Lua offers unified interface for loading both the VM and textual chunks of code. As far as I remember, the binary format is also portable to various architectures.

With restricted VMs it is also easy to address security concerns. The only remaining concern is the denial of service: one can e.g. construct an infinite loop that represents a piece of "data". Detecting infinite loops before executing the program is provably undecidable problem - the Halting problem. This problem can be easily solved by setting a maximum time limit on the execution time of the program (and discarding the data as invalid if it is exceeded), or by forbidding control-transfer instructions in the data description chunks of the code.

To conclude - XML is far too heavyweight, ugly and overrated. Wake up people, look around and search for alternatives.

Tags: XML

2005-12-04

TCP/IP checksum revisited

RFC 1071 gives efficient ways on its computation along with the design rationale.

2005-12-03

IP checksumming code

Quote from the RFC791, the Internet Protocol definition:

"The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the header."

And this is the only mentioning of the checksum algorithm in the whole RFC! How does someone come up with a code like this based on such description, i have no idea. I guess that everyone just copy-pastes code from an existing IP implementation without actually thinking what it was actually supposed to do.

I'm surprised that the today's Internet has its foundations in a definition as the above.

Google code

Yesterday I've realized that Google put some of its software into the open-source world. They have "Sparse Hash" implementation here which is particularly interesting for me and the toolkit I'm soon about to release. There is a number of other projects found here.