ext3 write barriers and write caching

I was reading on ext4 articles just now and read its advantages compared to ext3, which is the most popular Linux file system today. I read that performance-wise, ext4 is faster at handling large files but does not provide significant improvements over real world tasks. ext4 is also backward-compatible to be mounted as ext3 as long as extents are not used in that particular file system.

Out of the few advantages ext4 has over ext3, there is one feature which is very useful to ensure integrity of the file system, journal checksumming.

Quoting Wikipedia’s article on ext3:

Ext3 does not do checksumming when writing to the journal. If barrier=1 is not enabled as a mount option (in /etc/fstab), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash.

Consider the following scenario: If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to amortize write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the “winners” (transactions with a commit block, including the invalid transaction above which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. There is a test program available to trigger the problematic behavior. If checksums had been used, where the blocks of the “fake winner” transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to EXT4.

The ext3 barrier option is not enabled by default on almost all popular Linux distributions, and thus most distributions are at risk. In addition, filesystems going through the device mapper interface (including software RAID and LVM implementations) may not support barriers, and will issue a warning if that mount option is used. There are also some disks that do not properly implement the write cache flushing extension necessary for barriers to work, which causes a similar warning. In these situations, where barriers are not supported or practical, reliable write ordering is possible by turning off the disk’s write cache and using the data=journal mount option.

ext3 apparently has a feature called write barriers which can help maintain integrity without journal checksumming, but there is a conflicting report over its performance impact.

Quoted from Andreas Dilger in a mail to ext3-users mailing list in May 2007:

Ideally, the jbd layer could be notified when the transaction blocks are flushed from device cache before writing the commit block, but the current linux mechanism to do this (write barriers) sucks perforance-wise (it sent throughput from 180MB/s to 7MB/s when enabled in our test systems). It was better to just turn off write cache entirely than to use barriers.

contradicts this post made by

Well, I think I see where ext3 gets its reputation for slow deletes. With the write cache off the delete performance is terrible, nearly 70% lower. It’s clear that enabling write barriers does something as the numbers are lower on a number of items (though not all). However it’s clear that write barriers is minor loss of performance compared to turning the write cache off. I think this leads me to consider how many servers I can run with just ext3 and md raid1 so as to keep the write cache enabled and the filesystem safe. I’ll have to weigh the performance gains against the benefits of using LVM (especially snapshots) and dm-crypt (which might have limited benefits on a server anyway).

Maybe some improvements had been done at the write barriers feature of ext3 since Andreas Dilger’s post, but I couldn’t find anything on Google about it. I couldn’t even find a website which has proper documentation on ext3’s write barriers. At least now I have specialj’s Bonnie++ test results from 2 months ago and it’s good enough to convince me enabling write barriers on all of my servers which run md raid1.

Thanks, specialj! 🙂


Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-Spam by WP-SpamShield