Tuesday, July 19, 2005

SoC: Thoughts about tmpfs data representation

The text below is a message I just sent to NetBSD's tech-kern mailing list. I'm reproducing it here with better formatting and with some e-mail specific sentences removed.

The tmpfs code is up to the point where I have to start implementing the vnode operations. To do this, I have to decide how to organize the file data in memory as well as all other information needed to manage it.

After thinking about this for a while, it seems that the best way to do this is to follow a layout similar to the one used for existing on-disk file systems. That is, I need:

  • A set of nodes that describe files. These could be like regular inodes.
  • A set of blocks that store file contents. These could store directories as well (i.e., the "file" representing the directory contents.).

Suppressing the former (merging it with directory contents, thus becoming something like the FAT) means that implementing hard links becomes very complex, so the separation is worth it.

Now, to store these two data structures in memory. Before I started coding, I thought I could use malloc/free to allocate and deallocate blocks of memory on the fly. I.e., when a new block is needed, I do a malloc and use it. When it is useless, I do a free and its associated memory is released.

This is inappropriate because malloc/free operate on wired memory. Furthermore, I guess malloc is an expensive operation and should not be used intensively to avoid its overhead (think about how tmpfs could abuse it).

Therefore, I started reading a bit about UVM, and although I don't understand it very well yet, it seems to me that it is possible to map a region of virtual (unwired) memory and use it at will (just as if you had done a malloc). So, for now, I'm using malloc/free to allocate some (few) memory blocks, and I hope that changing this to use unwired memory will be as easy as doing some call to UVM to request a memory region and use pointers inside it for our own use.

With this in mind, it seems that a good way to map the data structures said before into these memory blocks could be:

  • Allocate a memory block of no. nodes * sizeof node. This could be used to hold all nodes. It could be handled with a bitmap.
  • Allocate a memory block of no. blocks * sizeof blocks. As before, with an associated bitmap describing its contents.

(Note that the number of blocks and nodes is configured by the user at mount time, and is already implemented.)

The advantages of this structure are:

  • Access to blocks and nodes is constant time, as they are located in known memory positions (it's a vector).
  • Allocation of blocks and nodes is of linear time due to the location of empty holes in the bitmap. Can be solved with a pool of available entries (at a later time).
  • Deallocation of blocks and nodes is constant time.
  • It's easy to grow (and to some extent, shrink) the file system at will.
  • The implementation is straightforward.
  • It is very similar to traditional Unix file systems, so it will be easy to understand.

The code in the CVS did exactly this some revisions before HEAD. In a later change, I undid the separation explained here and used blocks for everything. But I'm starting to regret that decision because it will be very complex to store nodes if I don't want to waste much memory.

If you have any better ides, suggestions or you see something wrong in this rationale, please say it! :-)

Edit (Jul 20th, 00:00): Check out the original mail and all its follow ups. They include very interesting suggestions (such as this one) and expose that my proposal is just wrong (frankly, I could imagine this ;).

Monday, July 18, 2005

SoC: Status report 2

After several days since the previous status report, it's time for a new one. During the past week, I've improved several aspects of the existing code, without adding many new stuff.

First of all, I added several other VFS hooks needed to avoid crashes due to null pointers and completed the code up to the point where the file system can be mounted and unmounted (the latter was more difficult than I thought).

Then I also added code to manage the root node of the file system so that that information can be used where needed and completed the statvfs hook. This required the addition of multiple configurable parameters (ownership, size of the filesystem, etc.) into the mount arguments structure, which resulted in many different changes in the mount_tmpfs(8) utility. I feel some of the code implemented in this utility could go into libutil, but I won't touch that (yet) ;-)

Furthermore, and as the code is starting to be "usable", I added some regression tests following the existing structure in NetBSD. I guess they will need to be refactored soon because of the regress NetBSD-SoC project, but they do the job for now. The project's page explains how to run them.

At last, I've read a bit about UVM and memory management to see how to allocate unwired memory blocks. I still don't know yet how that works, so I've tried to organize the code to minimize the amount of changes needed when I learn the correct way to do it (i.e., I'm using malloc/free now to request a single memory block).

The next step is to design how to organize the file data and meta data into the memory region allocated for the file system. This is the same problem as the design of an on-disk file system, although it's simpler: several details can be omitted, such as redundancy or data placement (well, this last point is questionable, since "fragmentation" can cause more cache/page faults than expected). Doing this is a must now, because it is needed to implement vnode operations.

Saturday, July 09, 2005

SoC: Status report 1

During this past week, I've been working a bit on my SoC project; only a bit because I had to prepare the slides for the NetBSD presentation I'm giving tomorrow at Partyzip@. Fortunately, from now on I won't have anything else to do, so I'll be able to devote all my time to tmpfs :-)

So... here is a little status report: I started the week reading the I/O chapter from the Design and Implementation of the 4.4BSD Operating System. It was a really interesting read, as I could understand some common concepts in detail.

After that, I started creating a extremely simple skeleton for my new file system. Before doing that, I was tempted to copy a whole existing file system, such as ptyfs, to later modify its code to suit my needs. But I quickly threw away this idea because this could hide some of the learning process.

So I started with some empty files defining the basic structures for tmpfs: a mount structure and a vnode structure. I set all of its members to useless/NULL functions. After that, I got the kernel to build with the code inside it, but it failed miserably to boot (NULL pointers to functions are not good).

To fix this, I had to define some VFS hooks, such as the vfs_init one. This solved the panics during boot, but obviously, more panics appeared while trying to mount the file system using a simple mount_tmpfs(8) utility. I did the same process: define empty functions for the affected operations rather than using NULL pointers. And I could call mount(2) without a crash.

The next logical step was to fill the mount functions with something useful; I based the code structure on the one I read in ptyfs; however, I still doubt about its correctness (it surely lacks stuff).

Now, I can mount the file system (to the point that it appears as mounted when running mount(8)). However, any sync(2) operation causes a crash, surely because of the lack of other VFS hooks.

The task next week will be to continue this process, filling functions as needed and possibly starting real work on the file system itself (i.e., how to store the files and directories into memory, etc.).

All the above is explained in detail, although with very compact sentences, into the notes I'm taking.

Oh BTW, Monotone's CVS gateway has caused me some trouble, such as pushing only half of the revisions... I had to finish the process manually, and worst of all, I don't know why it happened. Next time, I'll try to do a manual test before the real operation on a local CVS repository (if I can get one)...

Wednesday, July 06, 2005

Monotone's CVS gateway, part 2

After explaining what is the Monotone's CVS gateway, I've been asked to post a little step by step tutorial about it. I'll focus the example towards pkgsrc. Here it goes:

The first step is to create a local database for Monotone and a key for personal use:

$ monotone --db=~/pkgsrc.db db init
$ monotone --db=~/pkgsrc.db genkey user@example.com

Once this is done, we can proceed to import the CVS repository into the database. We can do this in two different ways:

  • Import the files from the repository, thus getting all the history in our local Monotone database. This might be appropriate in some cases, but may take a very long time and may not be worth it. This is done by doing:

    $ monotone --db=~/pkgsrc.db --branch=org.pkgsrc cvs_pull    :ext:anoncvs@anoncvs.NetBSD.org:/cvsroot pkgsrc
  • Import the files from a local working copy. This has the advantage of being a very fast operation, but you won't have old revision history in your database. This can be achieved issuing:

    $ cd /usr/pkgsrc
    $ monotone --db=~/pkgsrc.db --branch=org.pkgsrc cvs_takeover

After any of these two steps, you'll get an org.pkgsrc branch in your local Monotone database. So the next thing to do is to check out a copy of it (just as you'd do in CVS after an import operation):

$ cd
$ monotone --db=~/pkgsrc.db --branch=org.pkgsrc checkout pkgsrc

This will create a working copy under ~/pkgsrc; you can change the name of the directory by changing the last word of the command.

And now it's time to start working on your copy. Edit the files you want and do as many commits as you want (monotone commit). You can examine your changes by viewing the log (monotone log), examining annotated changes (monotone annotate) or any other thing you want. The interface is very similar to that of CVS.

At last, when you are ready to publish your changes to the original CVS repository, you can do so using the cvs_push command, as in:

$ cd ~/pkgsrc
$ monotone --db=~/pkgsrc.db cvs_push

Keep in mind that your Monotone repository is completely independent from the CVS one. Therefore, you may want to periodically repeat the cvs_pull operation to bring new CVS revisions into your Monotone tree, and use cvs_push to send your custom revisions to the CVS repository. I'm not sure about what will happen in case of conflicts, but I'd expect a new head in the branch that you must manually merge with the other one (using monotone merge).

Friday, July 01, 2005

Going on vacation

I'm going on vacation tomorrow and will be back by the end of the month. I'll bring my laptop with me, but I won't have regular Internet access (unless there is an open AP close to the place, that is). This will make my SoC's project development a bit more difficult, as I won't have my workstation (fast build machine) nor a test machine (very useful for kernel development).

I hope I've collected enough information to start working on the project and will continue to devour it in the next days; until now, I've read a bit of the I/O chapter from the 4.4BSD book and it's very, very interesting. I have also configured NetBSD appropriately in the laptop to be able to work in the project easily.

Despite this, I hope to be able to connect to the Internet, at the very least, once per week. When I do this, I'll try to reply to any urgent mail, synchronize my local development tree with the one on NetBSD-SoC (remember Monotone's CVS gateway? ;-) )and post an status report to this blog.

See you soon!