Pages

Saturday, October 30, 2004

Do you run?

Hmm, running (or jogging if you prefer). It has passed a year - more or less - since I started practicing this sport. I must really like it because I keep running as a hobby after all this time. While running, you can hardly think of anything else than you and the road, so you can easily disconnect from any problems you have. And maybe the best thing about it is that you can practice it anytime, anywhere, with very few equipment.

I still remember the first day I went for a run: after doing a bit less than a kilometer, I was completely exhausted; I could hardly breath and my heart was going to explode. The way back home was worse: I had to stop three times to rest. And, after all, I only spent around 15 minutes.

Nowadays, I always do between 6 to 8 kilometers everytime I go for a run, spending around an hour and keeping my breathing under control. I would like to do this every day (as I used to), but it's impossible due to my actual timetable. So I only run when I have time to, which is in some weekday evenings and in the weekends.

If you don't practice any sport, consider changing your habits. Running can be a good choice since it has very few requirements, and you'll probably get addicted after noticing your improvements!

Tuesday, October 26, 2004

GNOME 2.8.1 released

The 2.8.1 version of the GNOME Desktop has been released today. This is the first minor release of the 2.8 branch, providing lots of bug fixes and minor improvements, such as new and updated translations. 2.8.2 will be published next month, if everything goes well.

Time to start working in the update of the packages in pkgsrc ;-)

Monday, October 25, 2004

Portability: unsetenv("FOO") vs. putenv("FOO")

(This happened last Friday, but I've had not enough time to write about it.) After fixing the Evolution Data Server crashes (let's call it E-D-S for simplicity), I noticed a strange problem caused by it. The GNOME Clock applet showed the right local time before it was clicked, but, after the calendar was shown (by clicking on the text), the time got changed to UTC and there was no way to reverse it (other than killing the applet).

How strange. My first action was to verify that I had selected the right timezone in Evolution's configuration; no problems here. So "it must be another problem in E-D-S", I thought. Ok... "let's debug it; it's going to be fun".

I started by looking at the applet's source code to see how it gathered the local time: it uses the localtime(3) function provided by libc; "Hmm, interesting; E-D-S is corrupting the results of a libc function". Took a quick look at its manpage, and saw that its behavior was affected by the tzname global variable and by the TZ environment variable. So it was fair to think that E-D-S was modifying one of these two variables in an unexpected way. The problem was to locate where this was happening, specially because GNU GDB doesn't work very well with threaded applications in NetBSD.

I tried to grep E-D-S sources to look for a tzname variable (but I didn't bother to look for TZ; stupid me). I didn't find what I was looking for, though I saw multiple functions in the libical library that dealt with timezones. Hmm... the GNOME Clock applet was using them. So, I took its sources and added several printf(3)s before and after calls to the timezone-related functions, showing the value returned by localtime(3). After multiple tries, I located the function with side effects: icaltime_as_timet_with_zone.

Went to E-D-S sources again, searched for this function and found some suspicious code:

/* Set TZ to UTC and use mktime to convert to a time_t. */
old_tz = set_tz ("UTC");
t = mktime (&stm);
unset_tz (old_tz);

The next step was to look for the set_tz function in the same file and check what it was doing. Effectively, it was changing the TZ environment variable, using putenv(3) function. Looking at the unset_tz function confirmed my predictions. This simple line of code:

putenv("TZ"); /* Delete from environment */

was the guilty one (I had met this problem before in some other program that I can't remember now, so this is why I know without other tests). "Aha! I've got you, little bug!", I thought (locating and fixing a bug is a good sensation ;-). Let's see why this was a problem.

putenv("FOO") deletes a variable from the environment in some systems (like Linux). However, on others (such as NetBSD), it does nothing, leaving the environment unmodified.

The solution is to use unsetenv("FOO") on systems that have it, because it behaves in a deterministic way. However, if this function does not exist, putenv(3) is probably the only way to go, hoping that it will do the right thing.

This bug is reported here.

Sunday, October 24, 2004

The libexec and libdata directories

Some time ago, a Linux-guy asked me what the libexec and libdata directories present on a BSD system (placed under /, /usr, or other top-level hierarchies) are, because he had never seen them before in his Linux box. So here is a detailed explanation.

libexec is a directory that contains daemons and utilities that can't be used directly by the user. Simply put, they rely on other programs to launch them. For example:

  • The dynamic linker (/libexec/ld.elf_so in NetBSD) is placed in this directory because it is executed by the kernel whenever a new program is started. The user can't launch this program directly.
  • Simple daemons launched by inetd(8) are here because they can't be launched as standalone daemons. Consider telnetd(8), identd(8), etc.
  • Several parts of GCC (such as cc1, cc1obj, cc1plus, etc.) are here because there is no direct use for them.
  • All GNOME Panel applets are here because they rely on the Panel to launch them (so that they can be attached to the right bar).
  • Component providers, such as evolution-data-server, are also here because other applications have to launch them through Bonobo (or whatever).
  • And a large etcetera...

On the other hand, libdata holds static data files, which are used by other programs or libraries. This data is usually in binary format, not shareable across computers and/or not intended for direct use. Otherwise, it could simply go into share. For example, in pkgsrc, we use this directory to store static databases, such as the one used by ScrollKeeper to index documentation, or the one used by GStreamer to keep track of registered plugins.

But where does Linux store these files, if it does not use these hierarchies? It depends on the distribution. The most typical place is somewhere under the lib tree, in a subdirectory named after the package name. (It may be that some distributions use these directories, though.)

Ah! Before saying that BSDs use strange trees... keep in mind that, by default, all scripts created by GNU Autoconf default to these directories!

Wednesday, October 20, 2004

Fixing Evolution Data Server crashes

Yesterday, I packaged evolution-webcal (which was a trivial task), but, as I expected, it didn't work. In fact, I realised that neither the contacts view nor the calendar view of Evolution 2.0 were working at all. I could see the components, but I couldn't interact with them. So I started to debug the problem.

In the console, there were several warnings printed out by Evolution that told it couldn't activate some bonobo component coming from Evolution Data Server. Uhm... ok; I searched through the code for that message, noted which function it was in and launched Evolution through gdb, adding a breakpoint in that function.

When the breakpoint was triggered, I saw strange things: the backtrace only contained two frames. Of these, a string parameter in frame 0 was null. "Oh, that must be the problem", I thought. Stupid me. Switched to frame 1 and saw that the string was, in fact, correct. "Ew, looks like something is going wrong in gdb". So I added several printf(3)'s in the code to check the pointer's value. All of them were correct; no null pointers anywhere.

My next thought was that gdb 5.3 (the version that comes with NetBSD) does not handle threads correctly, which made me install gdb 6.2.1 from pkgsrc. Hmm, this one has some nice features, like setting a breakpoint in a function that is still not loaded (which will be resolved when the shared library that contains it is opened). But it still showed me incorrect traces and parameters. Unfortunately, after many attempts, I had to forget about gdb. I don't know if I'm missing something or it doesn't support NetBSD threading correctly yet.

So I kept debugging with printf(3)'s, trying to understand why the call to bonobo_activation_activate_from_id (which I had already isolated) was returning an error. Well, better said, it returned no objects, because it was not handling errors at all (something that'd have saved me a lot of trouble).

Did some more tests but got bored quickly: rebuilding Evolution and running it from the source tree is not fun. Solution: create a small test case that calls the failing function with the same parameters. This took a bit of time because I had to do some research about the bonobo-activation and libbonobo APIs. But after all, I got it. And hopefully, it behaved as expected: it failed to load the components! Throwing OAFIID:GNOME_Evolution_DataServer_InterfaceCheck to the function made it fail, while picking a non-evolution-related component worked properly (I tried with OAFIID:Fontilus_Context_Menu_Factory). Ok, now, to look for differences between these two to see why one failed but not the other.

After several stupid tests, I added better error control to my test case and got a message that said something like: Cannot read from child process. Yay! This gave me the final clue.

Executed /usr/pkg/libexec/evolution-data-server-1.0 by hand and could see it dumping core. ktrace(1)'d it and saw it was calling a NetBSD 1.3 compatibility function. Huh? Tried with gdb, which this time was useful: I could see that the last call before the segfault was related to sigaction(2) and could get an useful call trace.

Inspected the code, tried to disable the signalling stuff and it ran properly! "Wow, I'm really close to the bug", I thought. Afterwards, I reenabled it, and while compiling the affected file, I could see a related warning that I'd never noticed before:

server.o(.text+0x109): In function `main':
/home/jmmv/NetBSD/pkgsrc/mail/evolution-data-server/work/
evolution-data-server-1.0.2/src/server.c:129: warning: reference to compatibility sigemptyset(); include <signal.h> for correct reference

And here is the solution: added #include <signal.h> to the server.c file and everything worked properly. Evolution Data Served does not dump core any more and Evolution 2.0 works quite well. Isn't this one of the most stupid fixes you can think about?

I'm still surprised that the lack of a header file caused these kind of problems at run time. Guess I'll have to investigate a bit more why this happens. But not now; this took me more than two hours to discover and fix!

Monday, October 18, 2004

An example of kqueue

The documentation of kqueue is quite decent but it lacks some examples. After reading its main manual pages (kqueue(9) and kevent(9)), I wasn't sure about how it worked, so I had to write a test program to verify its behavior.

Let's start by analyzing the test program to later see its full code. The program will monitor changes to the /tmp/foo file and will print messages whenever it is deleted, modified or their attributes change. The program finishes when the file being monitoring is deleted.

The steps to use kqueue are the following:

  1. Call kqueue(2) to create a new kernel event queue. The descriptor it returns will be later used by kevent(2).
  2. Open the file to monitor and keep its descriptor around. We'll need this to attach an event monitor to it.
  3. Initialize a vector of struct kevent elements that describes the changes to monitor. Since we are only monitoring a single file, we need a one-element vector. This vector is filled up with calls to the EV_SET macro. This macro takes: the descriptor of the kqueue, the descriptor of the file to monitor (ident), the filter to apply to it, several flags and optional arguments to the filter. Note that an entry in this table is identified by its ident/filter pair.
  4. Call the kevent(2) function. This system call takes the list of changes to monitor we constructed before and does not return until at least one event is received (or when an associated timeout is exhausted). The function returns the number of changes received and stores information about them in another vector of struct kevent elements (we'll only get notifications of one event at a time, hence we don't use a vector, but a simple variable).
  5. Interpret the results. If kevent(2) returned a number greater than 0, we have to inspect the output vector and see which events were received. Each filter has its semantics about the results. For example, we are using the EVFILT_VNODE filter, which takes a list of conditions to monitor in the fflags field and modifies it to include only the conditions that triggered the filter.

With these concepts clear and with help of the manual pages, you should be able to interpret the following code easily:

#include <sys/event.h>
#include <sys/time.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
int f, kq, nev;
struct kevent change;
struct kevent event;

kq = kqueue();
if (kq == -1)
perror("kqueue");

f = open("/tmp/foo", O_RDONLY);
if (f == -1)
perror("open");

EV_SET(&change, f, EVFILT_VNODE,
EV_ADD | EV_ENABLE | EV_ONESHOT,
NOTE_DELETE | NOTE_EXTEND | NOTE_WRITE | NOTE_ATTRIB,
0, 0);

for (;;) {
nev = kevent(kq, &change, 1, &amp;event, 1, NULL);
if (nev == -1)
perror("kevent");
else if (nev > 0) {
if (event.fflags & NOTE_DELETE) {
printf("File deleted\n");
break;
}
if (event.fflags & NOTE_EXTEND ||
event.fflags & NOTE_WRITE)
printf("File modified\n");
if (event.fflags & NOTE_ATTRIB)
printf("File attributes modified\n");
}
}

close(kq);
close(f);
return EXIT_SUCCESS;
}

Now compile and run the program in one terminal. In another one, modify the /tmp/foo file and see how our test program shows the events! If you delete the file, the program will terminate. (Note that we are not monitoring all possible events; we'd watch for file renames, as well as other conditions if we needed to).

Sunday, October 17, 2004

FAM and kqueue

The File Alteration Monitor, or FAM for short, is an utility that monitors changes made to files and directories and delivers asynchronous notifications to applications interested in them. GNOME uses it to keep Nautilus windows in sync with the on-disk contents, among other uses. For example, if you have your home folder open, and you do touch ~/foo from a terminal, you can see how the folder immediately updates its status to show the new file.

FAM uses imon internally, a kernel facility found in IRIX and Linux that provides notifications of changes to files and directories asynchronously (the kernel sends you an event when a change happens). If imon is not supported, it falls back to manual polling: the daemon scans the files being monitored every 6 seconds to see if there have been any changes. Polling is the only way to go if the kernel does not provide you of better ways to receive notifications, but is a suboptimal solution (there is a small time frame where the data hold by the application and the data on-disk differ). And NetBSD, up until now, was using polling.

However, NetBSD (as well as OpenBSD) has the kqueue framework, which is similar to imon in functionality. Which are the differences between kqueue and imon? On one hand, kqueue monitors open file descriptors, while imon monitors inodes (this is a problem, as you'll see later). On the other hand, kqueue is handled entirely with system calls, while imon is managed through a pseudo-device (/dev/imon). There are probably many other differences, but these are the main ones AFAICT.

Unfortunately, FAM did not use this interface in these operating systems. This was not acceptable to me (Nautilus with the polling backend is very annoying), so I've been adding support to FAM to work with kqueue. It has not been a trivial task; first, I had to read the kqueue documentation to understand how it worked; I had to understand how the FAM code worked (more or less); and, at last, implement the funcionality (which I've done twice).

The main problem in implementing kqueue support is that FAM is coded with imon and polling in mind, which makes the use of other event notification mechanisms quite difficult. Not to say that the code is not very easy to understand (in fact, some comments in the code tell you that it's messy).

So, what I have done is simulate an imon interface using kqueue. The IMonKQueue.c++ file, which you can find here, contains a good description on how the "emulation" works (it's not worth to repeat it here). I'd suggest you reading it, testing it, and spotting any errors you find ;-)

Thursday, October 07, 2004

Trying Bogofilter...

A few days ago I did some maintenance of the software installed on my small server: among other things, the packages in it were outdated and I wanted to get the Libtool changes in (something that happened in pkgsrc...). So, I seized this oportunity to give Bogofilter a try, because SpamAssassin brought the machine to its knees.

I configured Bogofilter to parse all my incoming mail, fed to it by Procmail, following the examples given in the manual page; a painless process. The filter adds the X-Bogosity header to all mails, indicating if they are spam or not (non-spam is called ham, for those that don't know), so that you can later classify them with a simple Procmail rule.

After this little setup, it was frustrating: it catched no spam... obviously, because the words database was empty. So I started classifying all new mails in an "Archive" folder (i.e., "Trash") and in a "Spam" folder by hand, and set up a cron job to scan all mails in those folders periodically to make Bogofilter learn about my spam.

Up until now, I've fed it around 150 spams and more than 1600 hams... which is starting to have some effects: it is able to detect some spam, although there are still a lot of false negatives. I'll keep manually classifying them for some days, hoping that the situation improves (I have almost no doubts about this).

Even though, SpamAssassin catched spam out of the box, without having learned anything. And after learning from more than 15,000 mails, it produced very, very, very few false negatives. I know, this program does a lot more checks than Bogofilter (which is just a bayesian filter), so it can detect spams without training. But... as my server does not swap any more, I'll try to get the best out of Bogofilter. Do you use any of these two? If so, which one, and which are your experiences?

Tuesday, October 05, 2004

Pipes over SSH

Today I had to copy a bunch of files and symlinks to a remote machine. My first attempt was to use scr directly:

$ scp -r directory host:/some/directory

But that went wrong: the symlinks were not preserved, which resulted in several files being transferred multiple times.

A simple solution could have been to create a tarball of the whole directory, copy it to the remote host and unpack it there. However, I thought... using tar on both ends through a pipe over SSH should work... so I tried... and it worked! So how is this done?

$ cd directory ; tar cvf - . | ssh sun tar xf - -C /some/directory

Yes, it's that easy! Hmm I ask myself why I didn't try this before... BTW, remember that the target directory should exist; if it doesn't, adjust the command line a bit ;-)

Saturday, October 02, 2004

New versioning scheme for NetBSD

Although this has not been announced publicly yet, it is not a secret anymore because the version changes are visible. NetBSD has changed its versioning scheme to a less confusing one.

Up until now, major releases increased the number of the previous version by one; i.e., after 1.4 we had 1.5, and then 1.6... all of these being releases with lots of new features. There were minor releases too, used to provide enhancements to major releases (mostly bug fixes); their name was the same as the major version they enhanced, but with a third digit in them, like 1.5.3, 1.6.2, etc. All this looks reasonable, you may think.

But then we had the versioning scheme for the development versions (or current), which was confusing. They were named like the latest major release but with a letter prepended to them, increased with every kernel interface change. That is, we had 1.6A, 1.6B, 1.6C etc. which could eventually lead to 1.7 (next major release). Note that all of these were "greater" than 1.6 and 1.6.x.

I said 1.7 a while ago, but as you may already know, the next public major version will be 2.0. This number was chosen because this release has lots of new features in it (more specifically, multiple development branches were merged in). Well, we could still apply the previous scheme if we didn't touch anything else... but we did.

Further major versions will increase the major number... uhmm... that sounds reasonable, isn't it? Of course! ;-) And minor releases will increase the minor number, obviously. So, all next versions will be formed by two digits. After 2.0, there will be a 2.1; concurrently, we will be developing 3.0. Here is where problems come in. With the previous scheme, any development version would be treated as being "less" than a minor release; for example, 2.0H looks like an older release than 2.1. This is very confusing, and causes problems with some system macros used to check the actual version of the system (you couldn't compare __NetBSD_Version__ directly any more).

So we changed it: from now on, the current branch will have the same major number as the latest version published and a minor number of 99. That is, we are at 2.99.9 at the moment (9 comes from H's position in alphabet plus 1).

Humm... not clear yet? Let's summarize all this in a table:

Kind of releaseOld schemeNew scheme
Stable (Major)2.02.0
Stable (Minor)2.0.12.1
Stable (Major)2.13.0
Current2.0A2.99.1
Current2.0B2.99.2

The official announcement, due to this week (hopefully), will have more information than this, as well as the rationale behind all changes; I have ommited some details (like patch releases).

Edit (4th October): This has been officially announced.

Friday, October 01, 2004

The AM_GCONF_SOURCE_2 macro

GConf comes with an m4 file to ease its usage from third party configure scripts; it provides a macro, known as AM_GCONF_SOURCE_2, which provides many features (and most importantly, encapsulates all GConf related stuff). Among these, it is used to determine the directory where .schemas files should be installed, a setting that can be fine-tuned by the end user through the --with-gconf-schema-file-dir argument.

However, many configure scripts use this macro incorrectly. That is, they call it from the configure script to detect the presence of gconftool-2, but later, in Makefile.am files, they don't use the variables defined by the macro.

For example, resuming my previous example: I've found lots of packages that have this in their Makefile.am:

schemasdir = $(sysconfdir)/gconf/schemas

To (almost) all Linux users, this will look completely sane, as that is the usual directory where schemas get installed. But this will fail miserably if the user has chosen another location (for example, pkgsrc uses /usr/pkg/share/gconf/schemas to store them). The solution is very simple, because all you have to do is to use a variable defined by the macro; that is, the Makefile.am has to be modified to say:

schemasdir = $(GCONF_SCHEMA_FILE_DIR)

Another problem I often find comes from the usage of the GCONF_SCHEMAS_INSTALL conditional, also defined by that macro. This conditional is provided to determine whether gconftool-2 has to be executed during a make install to register the schemas into GConf's system-wide database or not. Some Makefile.am files don't use this feature at all; others fail to define the "false" case, leading to problems with the BSD Make utility ("missing target").

Consider that you have the following code:

install-data-local: install-schemas
install-schemas:
GCONF_CONFIG_SOURCE=$(GCONF_SCHEMA_CONFIG_SOURCE) $(GCONFTOOL) --makefile-install-rule foobar.schemas

You should rewrite it to look like the following code; i.e., surround it with an Automake conditional, and define the install-data-local in the "false" case to do nothing. This fixes the two problems mentioned previously:

if GCONF_SCHEMAS_INSTALL
install-data-local: install-schemas
install-schemas:
GCONF_CONFIG_SOURCE=$(GCONF_SCHEMA_CONFIG_SOURCE) $(GCONFTOOL) --makefile-install-rule foobar.schemas
else
install-data-local:
endif