2006-11-27

C, C++ and width of integer types

Perpetual problem with C and C++ programming is very loose specification of integer types (eg. int must be at least 16 bits, without upper limit, long has to be at least 32 bits and not smaller than int). C99 solved this problem by introducing the <stdint.h> header.

The problem could have been solved by redefining the register keyword to mean:
  • register short: half of target architecture's register width
  • register int: register width
  • register long: double the register width

I wonder how many programs would break, esp. if the register keyword would be no-op if its use would violate the minimum requirements for integer types (short, int: at least 16 bits, long: at least 32 bits).

I don't how useful would this redefinition be. Afterall, C99 does include "fast" integer types. Hm. Any opinions?

2006-11-25

Spammers have gotten smarter

Today I've actually read some text that got through the spam filter. Total nonsense that kinda makes sense. Here's an excerpt:

Popularity of blogs helped also to popularize concept web content mechanisms is such as or Atom have. Xml am perform operations instead using or Feeditem Item want or cannot changed exception readunread property in attached. System time or downloaded via Http Https parsed is normalized unified Identifies updated is Merges reflect last. Those tricky a details platform shields even of supports upcoming Support or Whether is implement innovative scenarios basically deal with Common Feed List. [etc]

This reminded me of a computer-generated text so I sought for Markov chain text generators. Here's one for example. Study its output (links are near the bottom of the page) and it'll be the same kind of "nonsense making sense".

Bayesian filtering is a kind of "inverse" of Markov chain text generation - both methods are based on statistics. The problem with the Markov-generated text is that its statistical properties closely match those of real text, so the Bayesian filter doesn't classify them as spam.

Generating garbage with required statistical properties is relatively easy; it just requires a list of words and a good Markov model. Once generated, it requires real human understanding for classification.

I didn't study theory behind Markov processes and Bayesian filtering deepely. I might be talking half-rubbish :) But given the amount and kind of spam that gets through the filter, I have a feeling that spammers are slowly winning the battle.

2006-11-20

10 Immutable Laws of Security

This is a nice essay.

New C++

This article describes the new features of the upcoming C++09 standard. (Read through it, there are other good links hidden inside). My favorite addition is automatic type deduction (unfortunately, not described in the article).

2006-11-15

NILFS for Linux

Today I've seen a reference to NILFS, a log-structured filesystem for Linux. It's interesting how they've put the most important "feature" at the bottom of the "Current status" page. Looking at the page, the first important thing one notices is that the work on garbage collector is ongoing. Doesn't sound well. At the bottom of the page, they conclude under known bugs with "The system hangs on a disk full condition." How nice :)

On a more serious note, I think that it's great that someone is working on alternative filesystems. NILFS can, for example, support "time travel".

2006-11-08

XML sucks (again!)

This isn't yet another of my anti-XML rants. There actually exists a web-site with such name- xmlsucks.org.

2006-11-06

LinkedIn

I've registered myself with LinkedIn. It is a service to help maintain professional contacts. Doesn't cost anything and might be useful in the future.

2006-11-02

Makefile madness

Whenever I start a new project, there is one single thing I hate the most: maintaining a makefile. (Relatively) long time ago I found the article "Recursive Make Considered Harmful". It is both a critique of recirsive make, and a guide on how to write a good makefile.

I bit the bullet, applied the recipes given there (it didn't even take much time), and so far so good: it discovers automatically newly added files and dependencies maintain themselves. Without resorting auto*madness :)