Category Archives: Linux

GStreamer’s playbin, threads and queueing

I’ve been working on a project that uses GStreamer to play back audio files in an automatically-determined order. My implementation uses a playbin, which is nice and easy to use. I had some issues getting it to continue playback on reaching the end of a file, though.

According to the documentation for the about-to-finish signal,

This signal is emitted when the current uri is about to finish. You can set the uri and suburi to make sure that playback continues.

This signal is emitted from the context of a GStreamer streaming thread.

Because I wanted to avoid blocking a streaming thread under the theory that doing so might interrupt playback (the logic in determining what to play next hits external resources so may take some time), my program simply forwarded that message out to be handled in the application’s main thread by posting a message to the pipeline’s bus.

Now, this approach appeared to work, except it didn’t start playing the next URI, and the pipeline never changed state- it was simply wedged. Turns out that you must assign to the uri property from the same thread, otherwise it doesn’t do anything.

Fortunately, it turns out that blocking that streaming thread while waiting for data isn’t an issue (determined by experiment by simply blocking the thread for a while before setting the uri.

Chainloading Truecrypt

I recently purchased a new laptop computer (a Lenovo Thinkpad T520), and wanted to configure it to dual-boot between Windows and Linux.  Since this machine is to be used “on the go”, I also wanted to have full encryption of any operating systems on the device. My choices of tools for this are Truecrypt on the Windows side, and dm_crypt with LUKS on Linux. Mainly due to rather troublesome design on the Windows side of this setup, it was not as easy as I might have hoped. I did eventually get it working, however.

Admonishment

Truecrypt was “Discontinued” in 2014, but still works okay. VeraCrypt is substantially a drop-in replacement if you’re looking for a piece of software that is still actively maintained. As of this update (early 2017) the only non-commercial option for an encrypted Windows system booted from UEFI is Windows’ native BitLocker (with which dual-booting is possible but it won’t be possible to read the encrypted Windows partition from Linux), but if you’re booting via legacy BIOS these instructions should still work for TrueCrypt or VeraCrypt.

Windows

Installing Windows on the machine was easy enough, following the usual installation procedure. I created a new partition to install Windows to filling half of the disk, and let it do its thing. Downloading and installing Truecrypt is similarly easy. From there, I simply chose the relevant menu entry to turn on system encryption.

The first snag appeared when the system encryption wizard refused to continue until I had burned an optical disk containing the recovery information (in case the volume headers were to get corrupted). I opted to copy the iso file to another location, with the ability to boot it via grub4dos if necessary in the future (or merely burn a disc as necessary). The solution to this was to re-invoke the volume creation wizard with the noisocheck option:

C:\Program Files\TrueCrypt>TrueCrypt Format.exe /noisocheck

One reboot followed, and I was able to let TrueCrypt go through and encrypt the system. It was then time to set up Linux.

Linux

Basic setup of my Linux system was straightforward. Arch (my distribution of choice) offers good support for LUKS encryption of the full system, so most of the installation went smoothly.

On reaching the bootloader installation phase, I let it install and configure syslinux (my loader of choice simply because it is easier to configure than GRUB), but did not install it to the MBR. With the installation complete, I had to do some work to manually back up the MBR installed by Truecrypt, then install a non-default MBR for Syslinux.

First up was backing up the Truecrypt MBR to a file:

# dd if=/dev/sda of=/mnt/boot/tc.bs count=1

That copies the first sector of the disk (512 bytes, containing the MBR and partition table) to a file (tc.bs) on my new /boot partition.

Before installing a Syslinux MBR, I wanted to ensure that chainloading the MBR from a file would work. To that end, I used the installer to chainload to my new installation, and used that to attempt loading Windows. The following incantation (entered manually from the syslinux prompt) eventually worked:

.com32 chain.c32 hd0 1 file=/tc.bs

Pulling that line apart, I use the chainloader to boot the file tc.bs in the base of my /boot partition, and load the first partition on my first hard drive (that is, where Windows is installed). This worked, so I booted once more into the installer to install the Syslinux MBR:

# dd if=/usr/lib/syslinux/mbr.bin of=/dev/sda bs=1 count=440 conv=notrunc

This copies 440 bytes from the given file to my hard drive, where 440 bytes is the size of the MBR. The input file is already that size so the count parameter should not be necessary, but one cannot be too careful when doing such modification to the MBR.

Rebooting, that, sadly, did not work. It turns out that the Syslinux MBR merely scans the current hard drive for partitions that are marked bootable, and boots the first one. The Truecrypt MBR does the same thing, which is troublesome– in order for Truecrypt to work the Windows partition must be marked bootable, but Syslinux is unable to find its configuration when this is the case.

Enter albmbr.bin. Syslinux ships several different MBRs, and the alternate does not scan for bootable partitions. Instead, the last byte of the MBR is set to a value indicating which partition to boot from. Following the example from the Syslinux wiki (linked above), then, I booted once more from my installer and copied the altmbr into position:

# printf 'x5' | cat /usr/lib/syslinux/altmbr.bin - | dd bs=1 count=440 conv=notrunc of=/dev/sda

This shell pipeline echoes a single byte of value 5, appends it to the contents of altmbr.bin, and writes the resulting 440 bytes to the MBR on sda. The 5 comes from the partition Syslinux was installed on, in this case the first logical partition on the disk (/dev/sda5).

With that, I was able to boot Syslinux properly and it was a simple matter to modify the configuration to boot either Windows or Linux on demand. Selected parts of my syslinux.cfg file follow:

UI menu.c32

LABEL arch
    MENU LABEL Arch Linux
    LINUX /vmlinuz-linux
    APPEND root=/dev/mapper/Homura-root cryptdevice=/dev/sda6:HomuHomu ro
    INITRD /initramfs-linux.img

LABEL windows
    MENU LABEL Windows 7
    COM32 chain.c32
    APPEND hd0 1 file=/tc.bs

Further resources

For all things Syslinux, the documentation wiki offers documentation sufficient for most purposes, although it can be somewhat difficult to navigate. A message from the Syslinux mailing list gave me the key to making Syslinux work from the MBR. The Truecrypt documentation offered some interesting information, but was surprisingly useless in the quest for a successful chainload (indeed, the volume creation wizard very clearly states that using a non-truecrypt MBR is not supported).

High-availability /home revisited

About a month ago, I wrote about my experiments in ways to keep my home directory consistently available. I ended up concluding that DRBD is a neat solution for true high-availability systems, but it’s not really worth the trouble for what I want to do, which is keeping my home directory available and in-sync across several systems.

Considering the problem more, I determined that I really value a simple setup. Specifically, I want something that uses very common software, and is resistant to network failures. My local network going down is an extremely rare occurence, but it’s possible that my primary workstation will become a portable machine at some point in the future- if that happens, anything that depends on a constant network connection becomes hard to work with.

If an always-online option is out of the question, I can also consider solutions which can handle concurrent modification (which DRBD can do, but requires using OCFS, making that solution a no-go).

Rsync

rsync is many users’ first choice for moving files between computers, and for good reason: it’s efficient and easy to use.  The downside in this case is that rsync tends to be destructive, because the source of a copy operation is taken to be the canonical version, any modifications made in the destination will be wiped out.  I already have regular cron jobs running incremental backups of my entire /home so the risk of rsync permanently destroying valuable data is low.  However, being forced to recover from backup in case of accidental deletions is a hassle, and increases the danger of actual data loss.

In that light, a dumb rsync from the NAS at boot-time and back to it at shutdown could make sense, but carries undesirable risk.  It would be possible to instruct rsync to never delete files, but the convenience factor is reduced, since any file deletions would have to be done manually after boot-up.  What else is there?

Unison

I eventually decided to just use Unison, another well-known file synchronization utility.  Unison is able to handle non-conflicting changes between destinations as well as intelligently detect which end of a transfer has been modified.  Put simply, it solves the problems of rsync, although there are still situations where it requires manual intervention.  Those are handled with reasonable grace, however, with prompting for which copy to take, or the ability to preserve both and manually resolve the conflict.

Knowing Unison can do what I want and with acceptable amounts of automation (mostly only requiring intervention on conflicting changes), it became a simple matter of configuration.  Observing that all the important files in my home directory which are not already covered by some other synchronization scheme (such as configuration files managed with Mercurial) are only in a few subdirectories, I quickly arrived at the following profile:

root = /home/tari
root = /media/Caring/sync/tari

path = incoming
path = pictures
path = projects
path = wallpapers

Fairly obvious function here, the two sync roots are /home/tari (my home directory) and /media/Caring/sync/tari (the NAS is mounted via NFS at /media/Caring), and only the four listed directories will be syncronized. An easy and robust solution.

I have yet to configure the system for automatic syncronization, but I’ll probably end up simply installing a few scripts to run unison at boot and when shutting down, observing that other copies of the data are unlikely to change while my workstation is active.  Some additional hooks may be desired, but I don’t expect configuration to be difficult.  If it ends up being more complex, I’ll just have to post another update on how I did it.

Update Jan. 30: I ended up adding a line to my rc.local and rc.shutdown scripts that invokes unison:

su tari -c "unison -auto home"

Note that the Unison profile above is stored as ~/.unison/home.prf, so this handles syncing everything I listed above.

Experiments with a high-availability /home

I was recently experimenting with ways to configure my computing setup for high availability of my personal data, which is stored in a Btrfs-formatted partition on my SSD. When my workstation is booted into Windows, however, I want to be able to access my data with minimal effort. Since there’s no way to access a Btrfs volume natively from within Windows, I had to find another approach. It seemed like automatically syncing files out to my NAS was the best solution, since that’s always available and independent of most other things I would be doing at any time.

Candidates

The obvious first option for syncing files to the NAS is the ever-common rsync. It’s great at periodic file transfers, but real-time syncing of modifications is rather beyond the ken of rsync.  lsync provides a reasonable way to keep things reasonably in-sync, but it’s far from an elegant solution.  Were I so motivated, it would be reasonable to devise a similar rsync wrapper using inotify (or similar mechanisms) to only handle modified files and possibly even postpone syncing changes until some change threshold is exceeded.  With existing software, however, rsync is a rather suboptimal solution.

From a cursory scan, cluster filesystems such as ceph or lustre seem like good options for tackling this problem.  The main disadvantage of the cluster filesystem approach, however, is rather high complexity. Most cluster filesystem implementations require a few layers of software, generally both a metadata server and storage server. In large deployments that software stack makes sense, but it’s needless complexity for me.  In addition, ensuring that data is correctly duplicated across both systems at any given time may be a challenge.  I didn’t end up trying this route so ensuring data duplication may be easier than it seems, but a cluster filesystem ultimately seemed like needless complexity for what I wanted to do.

While researching cluster filesystems, I discovered xtreemfs, which has a number of unique features, such as good support for wide-area storage networks, and is capable of operating securely even over the internet.  Downsides of xtreemfs are mostly related to the technology it’s built on, since the filesystem itself is implemented with Linux’s FUSE (Filesystem in USErspace) layer and is implemented in Java.  Both those properties make it rather clunky to work with and configure, so I ended up looking for another solution after a little time spent attempting to build and configure xtreemfs.

The solution I ultimately settled upon was DRBD, which is a block-level replication tool.  Unlike the other approaches, DRBD sits at the block level (rather than the filesystem level), so any desired filesystem can be run on top of it.  This was a major advantage to me, because Btrfs provides a few features that I find important (checksums for data, and copy-on-write snapshotting). Handling block-level syncing is necessarily somewhat more network-intensive than running at the file level, but since I was targeting use over a gigabit LAN, network usage was a peripheral concern.

Implementation

From the perspective of normal operation, a DRBD volume looks like RAID 1 running over a network.  One host is marked as the primary, and any changes to the volume on that host are propagated to the secondary host.  If the primary goes offline for whatever reason, the secondary system can be promoted to the new primary, and the resource stays available. In the situation of my designs for use of DRBD, my workstation machine would be the primary in order to achieve normal I/O performance while still replicating changes to the NAS. Upon taking the workstation down for whatever reason (usually booting it into another OS), all changes should be on the NAS, which remains active as a lone secondary.

DRBD doesn’t allow secondary volumes to be used at all (mainly since that would introduce additional concerns to ensure data integrity), so in order to mount the secondary and make it accessible (such as via a Samba share) the first step is to mark the volume as primary. I was initially cautious about how bringing the original primary back online would affect synchronization, but it turned out to handle such a situation gracefully. When the initial primary (workstation) comes back online following promotion of the secondary (NAS), the former primary is demoted back to secondary status, which also ensures that any changes while the workstation was offline are correctly mirrored back. While the two stores are resyncing, it is possible to mark the workstation as primary once more and continue normal operation while the NAS’ modifications sync back.

Given that both my NAS and workstation machines run Arch Linux, setup of DRBD for this scheme was fairly simple. First order of business was to create a volume to base DRBD on. The actual DRBD driver is part of mainline Linux since version 2.6.33, so having the requisite kernel module loaded was easy. The userspace utilities are available in the AUR, so it was easy to get those configured and installed. Finally, I created a resource configuration file as follows:

resource home {
  device /dev/drbd0;
  meta-disk internal;

  protocol A;
  startup {
    become-primary-on Nakamura;
  }

  on Nakamura {
    disk /dev/Nakamura/home;
    address ipv4 192.168.1.10:7789;
  }
  on Nero {
    disk /dev/loop0;
    address ipv4 192.168.1.8:7789;
  }

}

The device option specifies what name the DRBD block device should be created with, and meta-disk internal specifies that the DRBD metadata (which contains such things as the dirty bitmap for syncing modified blocks) should be stored within the backing device, rather than in some external file. The protocol line specifies asynchronous operation (don’t wait for a response from the secondary before returning saying a write is complete), which helps performance but makes the system less robust in the case of a sudden failure. Since my use case is less concerned with robustness and more with simple availability and maintaining performance as much as possible, I opted for the asynchronous protocol. The startup block specifies that Nakamura (the workstation) should be promoted to primary when it comes online.

The two on blocks specify the two hosts of the cluster. Nakamura’s volume is backed by a Linux logical volume (in the volume group ‘Nakamura’), while Nero’s is hosted on a loop device. I chose to use a loop device on Nero simply because the machine has a large amount of storage (6TB in RAID5), but no unallocated space, so I had to use a loop device. In using a loop device I ended up ignoring a warning in the DRBD manual about running it over loop block devices causing deadlocks– this ended up being a poor choice, as described later.

It was a fairly simple matter of bringing the volumes online once I had written the configuration. Load the relevant kernel module, and use the userland utilities to set up the backing device. Finally, bring the volume up. Repeat this series of steps again on the other host.

# modprobe drbd
# drbdadm create-md home
# drbdadm up home

With the module loaded and a volume online, status information is visible in /proc/drbd, looking something like the following (shamelessly taken from the DRBD manual):

$ cat /proc/drbd
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by buildsystem@linbit, 2008-12-18 16:02:26
 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r---
    ns:0 nr:8 dw:8 dr:0 al:0 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

The first few lines provide version information, and the two lines beginning with ‘0:’ describe the state of a DRBD volume. Of the rest of the information, we can see that both hosts are online and communicating (Connected), both are currently marked as secondaries (Secondary/Secondary), and both have the latest version of all data (UpToDate/UpToDate). The last step in creating the volume is to mark one host as primary. Since this is a newly-created volume, marking one host as primary requires invalidation of the other, prompting resynchronization of the entire device. I execute drbdadm primary --force home on Nakamura to mark that host as having the canonical version of the data, and the devices begin to synchronize.

Once everything is set, it becomes possible to use the DRBD block device (/dev/drbd0 in my configuration) like any other block device- create filesystems, mount it, or write random data to it. With a little work to invoke the DRBD initscripts at boot time, I was able to get everything working as expected. There were a few small issues with the setup, though:

  • Nero (the NAS) required manual intervention to be promoted to the primary role. This could be improved by adding some sort of hooks on access to promote it to primary and mount the volume. This could probably be implemented with autofs for a truly transparent function, or even a simple web page hosted by the NAS which prompts promotion when it is visited.
  • Deadlocks! I mentioned earlier that I chose to ignore the warning in the manual about deadlocks when running DRBD on top of loop devices, and I did start seeing some on Nero. All I/O on the volume hosting the loop device on Nero would stall, and the only way out was by rebooting the machine.

Conclusion

DRBD works for keeping data in sync between two machines in a transparent fashion, at the cost of a few more software requirements and a slight performance hit. The kernelspace tools are in mainline Linux so should be available in any reasonably recent kernel, but availability of the userspace utilities is questionable. Fortunately, building them for oneself is fairly easy. Provided the drbd module is loaded, it is not necessary to use the userspace utilities to bring the volume online- the backing block device can be mounted without DRBD, but the secondary device will need to be manually invalidated upon reconnect. That’s useful for ensuring that it’s difficult for data to be rendered inaccessible, since the userspace utilities are not strictly needed to get at the data.

I ultimately didn’t continue running this scheme for long, mainly due to the deadlock issues I had on the NAS, which could have been resolved with some time spent reorganizing the storage on that host. I decided that wasn’t worth the effort, however. To achieve a similar effect, I ended up configuring a virtual machine on my Windows installation that has direct access to the disks which have Linux-hosted data, so I can boot the physical Linux installation in a virtual machine. By modifying the initscripts a little, I configured it to start Samba at boot time when running virtualized in order to give access to the data. The virtualized solution is a bit more of a hack than DRBD and is somewhat less robust (in case of unexpected shutdown, this makes two operating systems coming down hard), but I think the relative simplicity and absence of a network tether are a reasonable compromise.

Were I to go back to a DRBD-backed solution at some time, I might want to look into using DRBD in dual-primary mode. In most applications only a single primary can be used since most filesystems are designed without the locking required to allow multiple drivers to operate on them at the same time (this is why NFS and similar network filesystems require lock managers). Using a shared-disk filesystem such as OCFS (or OCFS2), DRBD is capable of having both hosts in primary mode, so the filesystem can be mounted and modified on both hosts at once. Using dual primaries would simplify the promotion scheme (each host must simply be promoted to primary when it comes online), but would also require care to avoid split-brain situations (in which communications are lost but both hosts are still online and processing I/O requests, so they desync and require manual intervention to resolve conflicts). I didn’t try OCFS2 at all during this experiment mainly because I didn’t want to stop using btrfs as my primary filesystem.

To conclude, DRBD works for what I wanted to do, but deadlocks while running it on a loop device kept me from using it for long. The virtual machine-based version of this scheme performs well enough for my needs, despite being rather clunky to work with. I will keep DRBD in mind for similar uses in the future, though, and may revisit the issue at a later date when my network layout changes.

Update 26.1.2012: I’ve revisited this concept in a simpler (and less automatic) fashion.

How not to distribute software

I recently acquired a TI eZ430-Chronos watch/development platform. It’s a pretty fancy piece of kit just running the stock firmware, but I got it with hacking in mind, so of course that’s what I set out to do. Little did I know that TI’s packaging of some of the related tools is a good lesson in what not to do when packaging software for users of any system that isn’t Windows..

The first thing to do when working with a new platform is usually to try out the sample applications, and indeed in this case I did exactly that. TI helpfully provide a distribution of the PC-side software for communicating with the Chronos that runs on Linux, but things cannot be that easy. What follows is a loose transcript of my session to get slac388a unpacked so I could look at the provided code.

$ unzip slac388a.zip
$ ls
Chronos-Setup
$ chmod +x Chronos-Setup
$ ./Chronos-Setup
$

Oh, it did nothing. Maybe it segfaulted silently because it’s poorly written?

$ dmesg | tail
[snip]
[2591.111811] [drm] force priority to high
[2591.111811] [drm] force priority to high
$ file Chronos-Setup
Chronos-Setup: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, stripped
$ gdb Chronos-Setup
GNU gdb (GDB) 7.3
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/tari/workspace/chronos-tests/Chronos-Setup...
warning: no loadable sections found in added symbol-file /home/tari/workspace/chronos-tests/Chronos-Setup
(no debugging symbols found)...done.
(gdb) r
Starting program: /home/tari/workspace/chronos-tests/Chronos-Setup
[Inferior 1 (process 9214) exited with code 0177]

Great. It runs and exits with code 127. How useful.</sarcasm>

A Windows-style installer, "InstallJammer Wizard". On Linux.
This is stupid.

I moved the program over to a 32-bit system, and of course it worked fine, although that revealed a stunningly brain-dead design decision. The image (to the right) says everything.

To recap, this was a Windows-style self-extracting installer packed in a zip archive upon initial download, designed to run on a 32-bit Linux system, which failed silently when run on a 64-bit system. I am simply stunned by the bad design.

Bonus tidbit: it unpacked an uninstaller in the directory of source code and compiled demo applications, as if whoever packaged it decided the users (remember, this is an embedded development demo board so it’s logical to assume the users are fairly tech-savvy) were too clueless to delete a single directory when the contents were no longer wanted. I think the only possible reaction is a hearty :facepalm:.

Pointless Linux Hacks

I nearly always find it interesting to muck about in someone else’s code, often to add simple features or to make it do something silly, and the Linux kernel is no exception to that. What follows is my own first adventure into patching Linux to do my evil bidding.

Aside from mucking about in code for fun, digging through public source code such as that provided by Linux can be very useful when developing something new.

A short story

I was doing nothing of particular importance yesterday afternoon when I was booting up my previously mentioned netbook. The machine usually runs on a straight framebuffer powered by KMS on i915 hardware, and my kernel is configured to show the famous Tux logo while booting.

Readers familiar with the logo behaviour might already see where I’m going with this, but the kernel typically displays one copy of the logo for each processor in the system (so a uniprocessor machine shows one tux, a quad-core shows four, etc..). As a bit of a joke, then, suggested a friend, why not patch my kernel to make it look like a much more powerful machine than it really is? Of course, that’s exactly what I did, and here’s the patch for Linux 2.6.38.

--- drivers/video/fbmem.c.orig	2011-04-14 07:26:34.865849376 -0400
+++ drivers/video/fbmem.c	2011-04-13 13:06:28.706011678 -0400
@@ -635,7 +635,7 @@
 	int y;

 	y = fb_show_logo_line(info, rotate, fb_logo.logo, 0,
-			      num_online_cpus());
+			      4 * num_online_cpus());
 	y = fb_show_extra_logos(info, y, rotate);

 	return y;

Quite simply, my netbook now pretends to have an eight-core processor (the Atom with SMT reports two logical cores) as far as the visual indications go while booting up.

Source-diving

Thus we come to source-diving, a term I’ve borrowed from the community of Nethack players to describe the process of searching for the location of a particular piece of code in some larger project.

Diving in someone else’s source is frequently useful, although I don’t have any specific examples of it in my own work at the moment. For an outside example, have a look at musca, which is a tiling window manager for X which was written from scratch but used ratpoison and dwm (two other X window managers) as models:

Musca’s code is actually written from scratch, but a lot of useful stuff was gleaned from reading the source code of those two excellent projects.

A personal recommendation for anyone seeking to go source-diving: become good friends with grep. In the case of my patch above, the process went something like this:

  • grep -R LOGO_LINUX linux-2.6.38/ to find all references to LOGO_LINUX in the source tree.
  • Examine the related files, find drivers/video/fbmem.c, which contains the logo display code.
  • Find the part which controls the number of logos to display by searching that file for ‘cpu’, assuming (correctly) that it must call some outside function to get the number of CPUs active in the system.
  • Patch line 638 (for great justice).

Next up in my source-diving adventures will be finding the code which controls what happens when the user presses control+alt+delete, in anticipation of sometime rewriting fb-hitler into a standalone kernel rather than a program running on top of Linux..

Btrfs

I recently converted the root filesystem on my netbook, a now rather old Acer Aspire One with an incredibly slow 1.8″ Flash SSD, from the ext3 I had been using for quite a while to the shiny new btrfs, which becomes more stable every time the Linux kernel gets updated. As I don’t keep any data of particular importance on there, I had no problem with running an experimental filesystem on it.

Not only was the conversion relatively painless, but the system now performs better than it ever did with ext3/4.

Conversion

Btrfs supports a nearly painless conversion from ext2/3/4 due to its flexible design. Because btrfs has almost no fixed locations for metadata on the disc, it is actually possible to allocate btrfs metadata inside the free space in an ext filesystem. Given that, all that’s required to convert a filesystem is to run btrfs-convert on it- the only requirement is that the filesystem not be mounted.

As the test subject of this experiment was just my netbook, this was easy, since I keep a rather simple partition layout on that machine. In fact, before the conversion, I had a single 8GB ext4 partition on the system’s rather pathetic SSD, and that was the extent of available storage. After backing up the contents of my home directory to another machine, I proceeded to decimate the contents of my home directory and drop the amount of storage in-use from about 6GB to more like 3GB, a healthy gain.

Linux kernel

To run a system on Btrfs, there must, of course, be support for it in the kernel. Because I customarily build my own kernels on my netbook, it was a simple matter of enabling Btrfs support and rebuilding my kernel image. Most distribution kernels probably won’t have such support enabled since the filesystem is still under rather heavy development, so it was fortunate that my setup made it so easy.

GRUB

The system under consideration runs GRUB 2, currently version 1.97, which has no native btrfs support. That’s a problem, as I was hoping to only have a single partition. With a little research, it was easy to find that no version of GRUB currently supports booting from btrfs, although there is an experimental patchset with provides basic btrfs support in a module. Unfortunately, to load a module, GRUB needs to be able to read the partition in which the module resides. If my /boot is on btrfs, that’s a bit troublesome. Thus, the only option is for me to create a separate partition for /boot, containing GRUB’s files and my Linux kernel image to boot, formatted with some other file system. The obvious choice was the tried-and-true ext3.

This presents a small problem, in that I need to resize my existing root partition to make room on the disc for a small /boot partition. Easily remedied, however, with application of the Ultimate Boot CD, which includes the wonderful Parted Magic. GParted, included in Parted Magic, made short work of resizing the existing partition and its filesystem, as well as moving that partition to the end of the disc, which eventually left me with a shiny new ext3 partition filling the first 64MB of the disc.

Repartitioning

After creating my new /boot partition, it was a simple matter of copying the contents of /boot on the old partition to the new one, adjusting the fstab, and changing my kernel command line in the GRUB config file to mount /dev/sda2 as root rather than sda1.

Move the contents of /boot:

$ mount /dev/sda1 /mnt/boot
$ cp -a /boot /mnt/boot
$ rm -r /boot

Updated fstab:

/dev/sda1       /boot   ext3    defaults    0 1
/dev/sda2       /       btrfs   defaults    0 1

Finishing up

Finally, it was time to actually run btrfs-convert. I booted the system into the Arch Linux installer (mostly an arbitrary choice, since I had that image laying around) and installed the btrfs utilities package (btrfs-progs-unstable) in the live environment. Then it was a simple matter of running btrfs-convert on /dev/sda2 and waiting about 15 minutes, during which time the disc was being hit pretty hard. Finally, a reboot.

..following which the system failed to come back up, with GRUB complaining loudly about being unable to find its files. I booted the system from the Arch installer once again and ran grub-install on sda1 in order to reconfigure GRUB to handle the changed disc layout. With another reboot, everything was fine.

With my new file system in place, I took some time to tweak the mount options for the new partition. Btrfs is able to tune itself for solid-state devices, and will set those options automatically. From the Btrfs FAQ:

There are some optimizations for SSD drives, and you can enable them by mounting with -o ssd. As of 2.6.31-rc1, this mount option will be enabled if Btrfs is able to detect non-rotating storage.

However, there’s also a ssd_spread option:

Mount -o ssd_spread is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. Mount -o ssd_spread is often faster on the less expensive SSD devices

That sounds exactly like my situation- a less expensive SSD device which is very slow when doing extensive writes to ext3/4. In addition to ssd_spread, I turned on the noatime option for the filesystem, which cuts down on writes at the expense of not recording access times for files and directories on the file system. As I’m seldom, if ever, concerned with access times, and especially so on my netbook, I lose nothing from such a change and gain (hopefully) increased performance.

Thus, my final (optimized) fstab line for the root filesystem:

/dev/sda2       /       btrfs   defaults,noatime,ssd_spread    0

Results

After running with the new setup for about a week and working on normal tasks with it, I can safely say that on my AA1, Btrfs with ssd_spread is significantly more responsive than ext4 ever was. While running Firefox, for example, the system would sometimes stop responding to input while hitting the disc fairly hard.

With Btrfs, I no longer have any such problem- everything remains responsive even under fairly high I/O load (such as while Firefox is downloading data from Firefox Sync, or when I’m applying updates).