Tuesday, June 28, 2005

Monotone's CVS gateway

After a long time, I've finally decided to give Monotone's net.venge.monotone.cvssync branch a try. The code in it implements a bidirectional gateway between Monotone and CVS. What this means is that Monotone can be used for private development while working on a project that already uses CVS (doing the inverse could be... stupid?).

The way it works is basically the following: first of all, you synchronize your local Monotone database with a remote CVS repository, importing the whole revision tree into it using cvs_pull. Secondly, you commit to your local Monotone tree as much as you want. At last, when you want to publish your changes, you push them against the CVS repository and they get integrated nicely (each revision in your local database is translated into a single CVS commit) using cvs_push. There are some small problems, though: during a push, all the new CVS revisions get the same date, but I think this is unsolvable.

Also, if you think that importing the whole CVS tree into your local database is worthless, you can also just import from a given point of the development: i.e., start from a working copy using the cvs_takeover command.

All in all, this is amazing. It will let me work on tmpfs while I'm away during July, being able to push my changes whenever I get Internet access without losing history :-)

NetBSD conference at Partyzip@

I just realized I haven't posted about this before... A month ago or so, I was invited by the Partyzip@ organizational team to give a conference about NetBSD. For those that don't know it, Partyzip@ is a party held at Monz�n, Spain: it holds several technological conferences and workshops as well as gaming contests. This is its second year.

My conference will be given on July 10th in the morning. I will give an introduction to the NetBSD operating system covering topics such as the release scheme, project goals, existing technologies, future projects, cross compilation, etc. Of course, I'll also talk about pkgsrc in detail. And why not, I can also talk about NetBSD-SoC! ;-)

If you are interested to come, note that you must register first.

Monday, June 27, 2005

SoC: Project page ready

I've just set up the NetBSD-SoC: Efficient memory file-system project page. At the moment, it includes a list of the project goals, a copy of the original proposal text (in case you would like to read it) and a list of existing documentation.

This page will be extended to hold technical information as well as installation instructions (that is, how to merge the code in that page with NetBSD's source tree) when the project matures during the summer.

Sunday, June 26, 2005

SoC: The NetBSD-SoC project

The NetBSD Project has set up a project at the Sourceforge site that aims to centralize the development of the eight projects chosen for the Summer of Code program. Its name is NetBSD-SoC and its page contains information about all the elected projects, information about mailing lists of interest and a CVS repository for the students and their mentors. Read the official announcement for more information.

As regards my project, I'll start filling up its page in the site when I've got access to it (probably tomorrow). BTW, all the other seven projects are really nice and I hope all of them to be "finished" (or mostly working) by the end of the summer.

From here, I'd like to thank the NetBSD developers who made this possible, specially Hubert Feyrer who has been working intensively in setting up the NetBSD-SoC project.

Saturday, June 25, 2005

SoC: Accepted!

After a very long delay, Google has finally chosen the projects that will be part of the Summer of Code program. There seems to be no official announcement in the page yet, but I already received a mail... and... my project is accepted! :-)

I briefly outlined my project some days ago, but I'll explain it in more detail now (copying some paragraphs from the application form verbatim).

At the moment, NetBSD includes a memory-based file-system called mfs. mfs is is just an implementation of the regular ffs - designed for persistent storage - on top of the (volatile) virtual memory system. This means that it uses the same data structures as the on-disk implementation, rendering less than optimal performance and memory usage. As regards the latter, and in words of another NetBSD developer, the physical memory and swap space needed to back these pages constantly grows.

The NetBSD OS is in a need of an efficient memory file-system that uses its own data structures to manage the stored files. The main design goal is to make it use the correct amount of memory to work correctly and efficiently; no more, no less.

Having said this, the visible goals of the project are:

  • Implement this memory efficient file-system under the NetBSD OS using a 3-clause BSD license (without the advertising clause). I'll call it tmpfs throughout this post, as I think it'll take this name.
  • Document tmpfs in detail, describing its data structures, algorithms used, and the rationales that lead to the decisions taken.
  • Write a "file-system how-to" document explaining how to write a file-system driver for NetBSD from scratch. This will be similar in spirit to the Device Driver Writing Guide and will be probably merged into it.

Of course, another goal is to learn about kernel internals. I would love to be able to contribute more to this part of the system, but I'm actually not capable of doing much. I hope to learn the details of the VFS layer during this summer, aiming for other possible contributions in the future.

How will this project be driven? Here is a preview of the development plan:

  1. Read the file-system chapter in the Design and Implementation of the 4.4BSD Operating System book. I've been told it includes some few notes about how tmpfs could be done (or which are mfs' limitations).
  2. Read the code of some existing simple file-systems to get an idea of the whole picture (which are the call traces, how is data handled, which are the main entry points, etc.). This includes reading code from ptyfs, kernfs and maybe procfs, all of which are memory-based systems.
  3. Probably read code from mfs. The systems described in the previous point are very special as regards write support (i.e., it is almost lacking, because it makes no sense), so I feel they won't be specially useful to understand how this functionality works (but I don't know yet). On the other hand, mfs is a complete filesystem, so it will be helpful in this area.
  4. Design the necessary in-memory data structures to hold directories, files and all the associated meta-data. This will be done with memory-constraints in mind, but, of course, also trying to be fast. A correct balance between speed and memory usage needs to be achieved, hence a design is required beforehand. To make things simpler, the design could be made in two iterations: the first using very simple data structures to speed up development, and the second improving these to achieve maximum performance.
  5. Implement the file-system itself according to the decisions taken in 4. and the knowledge acquired in 2. and 3.
  6. Debug (although this will be heavily overlapped with 5.).
  7. Optimize the code, if possible, and more debugging as this goes on.
  8. Write the document describing how to write file-systems and the document explaining the internal layout of tmpfs, based on all the notes I'll have taken during the previous points.

I'm willing to start working on this. However, there will be some timing constraints I haven't planned at first: we are having some work done at home, which will take us, at the very least, another two weeks. Furthermore, I have my last final exam on the 28th, so it will be difficult to start any work before that.

At last, I think I'll be mostly away from the Internet during July, though I will surely have intermittent access. This will make things a bit more difficult as regards asking questions or contacting developers, but I'll have to manage it. Fortunately, August will be completely dedicated to the project :-)

Thursday, June 23, 2005

EasyTAG

A few days ago, I finally decided to sort all my MP3s. I started erasing lots of cruft that I never listen to and then renamed some files I had with weird names. I ended up with this layout: Artist/Album year - Album name/Song number - Song name.mp3.

When finished, I was exposed to the task of setting up the ID3 tags for all of them, something I did in the past using custom shell scripts together the id3v2 utility. However, this was a very error prone task, mainly due to string quoting in the shell.

So I decided to give a try to the EasyTAG utility. Man... what a wonderful tool! Just needed a few minutes to discover how it worked, and some minutes later, I had all my songs with correct ID3 tags in them. Certainly, give it a try.

Hmm, now it is a pleasure to use Rhythmbox, as all the songs appear correctly classified. Unfortunately, as it doesn't have an equalizer, I'll stick to Beep Media Player.

Wednesday, June 22, 2005

MSDNAA

So... my faculty became part of the MSDNAA program few months ago. This program is meant to provide students with Microsoft's software — basically operating systems and development tools — for free. The intention (I guess) is to let we, the students, learn how these software packages work so that we use them in the future in our professional career.

Well, the thing is that, thanks to this, I can finally have a legal copy of Windows on my machine, and I also have the opportunity to try programs like Visual Studio .NET 2003. It feels good; at least I can use them without regretting.

But why do I want Windows, you may ask? I already outlined it two days ago: to try if my programs are portable to Windows using typical compilers (the ones I can get legally, such as the Borland's one or the Microsoft's one). Oh well, and of course, to play a bit too ;)

Tuesday, June 21, 2005

SoC: My project proposal

In yesterday's post, I mentioned that I applied for Google's Summer of Code — a program designed to introduce students to the world of open source software development — and I realized I had not blogged anything about it yet.

Despite I'm already in the open source software development world, I thought this was a good opportunity to learn new stuff and, why not, earn some money doing something I like. So I looked for a project and sent a proposal.

My project aims to develop an efficient, memory-based file-system for NetBSD. I can agree that this is not very innovative because we already have mfs, but it is still a good project. First of all, mfs is flawed, as it's just ffs over memory pages; this means that it uses lots of data structures and algorithms designed for on-disk data when they'd be simply avoided. Think about superblocks, superblock copies, smart placement of data blocks, etc. It also has some problems as regards memory consumption, as some developers told me.

On the other hand, this project is nice because I'll seize the opportunity to write some documentation about the VFS layer, something that'll benefit other developers who may want to get involved in this area (just as I'll be doing). Furthermore, I think it's correctly sized for a two-month project, given that I'll be working on it full-time.

So... let's hope I get picked! :) But if not, I think I'll work on this idea anyway, although not as intensively as I'd otherwise.

Edit (12:45): Oh, BTW. Huber Feyrer is representing NetBSD towards Google (together with Jan Schaumann); you can find some information about how things are progressing in his NetBSD blog (e.g., here and here).

Monday, June 20, 2005

MFC: Developing for Windows

During the past two days, I've been working (again) on my Boost Process library idea. While doing so, I realized that I don't know anything at all about coding for Windows using the MFC. I must learn how to handle processes under this platform to be able to design a correct abstraction layer for process management.

It's time to do so. I booted Windows XP, downloaded Borland's C++ Builder 5 command line tools (that is, the C++ free compiler) and installed it following the instructions (not a "trivial" task). Man, it's damn fast compared to GNU g++, as seen while building some Boost's code.

After this, I was was shown with the task to write a little Windows program. The only thing I knew was that they use a WinMain function rather than a main one, so this was the starting point. I looked for some documentation and finally came up with an extremely simple, useless and stupid command line program:

#include <windows.h>

int WINAPI
WinMain(HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow)
{
return 0;
}

You see, all it does is return success. Building it was simple after some digging of the compiler options, but it took some time too. Doing bcc32.exe -tW test.cpp resulted in a test.exe file ready to be executed. I also tried some CreateProcess example I found, but I won't comment it yet since I still don't understand it perfectly.

I'll keep learning stuff during the following days, so expect more posts along the lines of this one. However, I don't know how much time I'll have to devote to Boost.Process: I applied for Google's summer of code, so in case I get picked, I'll have to work on my project rather than on anything else ;-)

Sunday, June 12, 2005

Fragmentation in Unix file systems

Back in the days when I started to use Unix-like systems (Linux), I learned that their file-systems barely suffer from fragmentation. Nobody ever told me the reason behind that statement, and I never bothered to look for it, since there was no choice in the file-systems area (ext2 at the time under Linux) and there were no defragmentation utilities. Therefore, I assumed it was true... but it's not! (At least not as I understood it.)

However, I recently started to worry about this issue because I felt that some typical tasks, such as CVS updates, were becoming slower and slower every day. As I learned in an operating system course at university, the file system does not try to (or at least it's not required to) prevent fragmentation. Even though, some of its basic data structures — fragments and blocks — mitigate the problem of internal fragmentation of small files.

So I did some empirical tests. I unpacked a copy of pkgsrc inside my fastest disk (/home, FFSv2) and the operation took around 300 seconds to complete; that is five minutes. Note that this disk hasn't been formatted for a long time and holds more than 100 GB of data (including lots of small files), so one can expect that the files are widely spread around the disk).

Then I did the same operation under a clean partition of the slower disk (the speed difference is barely noticeable between the two disks, I may add). It took less than 90 seconds... that is a gain of 233%! Keep in mind that this was just a very specific test and the results cannot be extrapolated to other uses (don't do that!), but at least proved my doubts.

So what did I do? Repartitioned my NetBSD installation. Instead of having a single root file system for everything except /home, I created three partitions, one the system itself, one for the sources (pkgsrc, src and xsrc) and one for temporary object files. This way I expect, at least, that the system binaries will be kept together giving better program startup times, and expensive operations over the source trees (the CVS updates I mentioned before) will remain quick.

Friday, June 03, 2005

pkgsrc: Documentation about pkginstall

Two days ago I started touching the FAQ chapter in the pkgsrc's Guide and I literally opened a can of worms. The thing is that I started to rewrite the question about configuration files placement (because most of the answer was just internal details of pkgsrc) and ended up writing a whole new chapter about the pkginstall framework. Hope it's an interesting read for you ;)

Now... I'm afraid to touch anything else, as I know what will happen — more heavy rewrites (I like writing). It's not that I don't want to do it, but I must do some other things first...

Wednesday, June 01, 2005

pkgsrc: mplayer switches to the options framework

My little project these last two days has been to convert the mplayer packages to use the options framework and deprecate the old-style variables to tune its behavior. This is the "new" (has been around for a while already) way to go in pkgsrc, as it's cleaner, more homogeneous and more flexible.

During the conversion, I've added a lot more optional features in the package so that users can build a smaller package if wanted. Have fun ;)