Linux Source-based Routing
7 12 2008 Comments : No Comments »Categories : Linux, Networking, Technical
I was reading on ext4 articles just now and read its advantages compared to ext3, which is the most popular Linux file system today. I read that performance-wise, ext4 is faster at handling large files but does not provide significant improvements over real world tasks. ext4 is also backward-compatible to be mounted as ext3 as long as extents are not used in that particular file system.
Out of the few advantages ext4 has over ext3, there is one feature which is very useful to ensure integrity of the file system, journal checksumming.
Quoting Wikipedia’s article on ext3:
Ext3 does not do checksumming when writing to the journal. If barrier=1 is not enabled as a mount option (in /etc/fstab), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash.
Consider the following scenario: If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to amortize write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the “winners” (transactions with a commit block, including the invalid transaction above which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. There is a test program available to trigger the problematic behavior. If checksums had been used, where the blocks of the “fake winner” transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to EXT4.
The ext3 barrier option is not enabled by default on almost all popular Linux distributions, and thus most distributions are at risk. In addition, filesystems going through the device mapper interface (including software RAID and LVM implementations) may not support barriers, and will issue a warning if that mount option is used. There are also some disks that do not properly implement the write cache flushing extension necessary for barriers to work, which causes a similar warning. In these situations, where barriers are not supported or practical, reliable write ordering is possible by turning off the disk’s write cache and using the data=journal mount option.
ext3 apparently has a feature called write barriers which can help maintain integrity without journal checksumming, but there is a conflicting report over its performance impact.
Quoted from Andreas Dilger in a mail to ext3-users mailing list in May 2007:
Ideally, the jbd layer could be notified when the transaction blocks are flushed from device cache before writing the commit block, but the current linux mechanism to do this (write barriers) sucks perforance-wise (it sent throughput from 180MB/s to 7MB/s when enabled in our test systems). It was better to just turn off write cache entirely than to use barriers.
contradicts this post made by specialj in October 2008:
Well, I think I see where ext3 gets its reputation for slow deletes. With the write cache off the delete performance is terrible, nearly 70% lower. It’s clear that enabling write barriers does something as the numbers are lower on a number of items (though not all). However it’s clear that write barriers is minor loss of performance compared to turning the write cache off. I think this leads me to consider how many servers I can run with just ext3 and md raid1 so as to keep the write cache enabled and the filesystem safe. I’ll have to weigh the performance gains against the benefits of using LVM (especially snapshots) and dm-crypt (which might have limited benefits on a server anyway).
Maybe some improvements had been done at the write barriers feature of ext3 since Andreas Dilger’s post, but I couldn’t find anything on Google about it. I couldn’t even find a website which has proper documentation on ext3’s write barriers. At least now I have specialj’s Bonnie++ test results from 2 months ago and it’s good enough to convince me enabling write barriers on all of my servers which run md raid1.
Thanks, specialj!
References:
http://hightechsorcery.com/2008/10/evaluating-performance-ext3-using-write-barriers-and-write-caching
http://hightechsorcery.com/2008/06/linux-write-barriers-write-caching-lvm-and-filesystems
http://archives.free.net.ph/message/20070518.190346.7a4c0f9f.en.html
Today I was installing CentOS 5.2 to a Tyan barebone server with Intel Xeon X3220 processor, 2GB of RAM, and 2 Western Digital SATA II 250GB HDD. I chose to use software RAID 1 for my /boot and /. I didn’t create RAID 1 for swap partition because it is useless and in fact it might slow down performance. It took more than 2 hours to complete the whole installation process even when I used bare minimum packages selection. The format process of / partition itself took quite a while to complete.
Once the installation was complete, it rebooted and I found a painfully slow newly-installed CentOS system. The keyboard response was slow as if I were connected to a remote server with 1 second latency. I wasn’t happy because this is a quad-core processor server with acceptable amount of RAM. When I looked at /proc/mdstat to see how the software RAID is doing, I noticed that the sync speed is only around 1000KB/s. I tried looking for solutions on Google and found this and this. The solution I got didn’t help increase the sync speed, nor the slow response I was getting typing on the console or remotely.
I then somehow managed to find another possible solution. I can’t remember how I finally got this solution (thanks to nixCraft/CyberCiti). I didn’t notice that the hard drives were actually detected as hdX (IDE/PATA) instead of sdX (SATA). Now this makes sense because PATA is much slower than SATA. It turned out to be an issue with the hardware detection process. Somehow CentOS detected the HDDs and used PATA driver instead of SATA driver, so the devices were named hdX and treated as PATA hence the slow sync speed. I made the BIOS changes as mentioned in the solution and attempted another install, it was REALLY fast this time and the software RAID 1 sync speed was over 70000KB/s.
References:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=16178
http://lists.centos.org/pipermail/centos/2005-February/002068.html
http://www.ducea.com/2006/06/25/increase-the-speed-of-linux-software-raid-reconstruction/
http://www.cyberciti.biz/faq/linux-sata-drive-displayed-as-devhda/
http://www.nodeofcrash.com/?p=57
Recent Comments