SATA drives, benchmarks, booting from USB

Published on Monday, 27 February 2006 14:00
Written by Jeff Layton
Hits: 5519

Wisdom of the ages ...

The Beowulf mailing list provides detailed discussions about issues concerning Linux HPC clusters. In this column I report on Serial ATA (SATA) drives , I/O benchmarks, cluster benchmarks, and booting from solid state USB drives. You can consult the Beowulf archives for the actual conversations.

Beowulf: SATA or SCSI drives - Multiple Read/Write Speeds

There was an interesting discussion on the beowulf mailing list that started on December 8, 2003 with a posting from Robin Laing who asked about SATA drives (Serial ATA) versus IDE drives (also called Parallel ATA or PATA drives) versus SCSI drives. In particular, he wanted to know which one was better for multiple drive read and write operations. While the resulting discussion wasn't about clusters per say, disk I/O (Input/Output) performance can have a great impact on many cluster applications. Bill Broadley responded that there was some bad information and biases floating around about drive performance (e.g. IDE versus SCSI) and strongly suggested benchmarking your own code or some disk benchmark, such as Bonnie++ or Postmark, that is close to your application. He pointed out that there are many factors that can be adjusted to affect I/O performance. The discussion then broke into two parts. The first part part discussed opinions and test results of PATA and SCSI drives and controllers.

The ever present Mark Hahn followed up with some general rules of thumb for different types of drives and pointed out that Jeff Garzik is writing all new SATA drivers for the 2.6 kernel that should greatly improve performance of SATA drives under Linux. Robert Brown asked Mark his general opinion about SATA. Mark answered Robert's questions and pointed out that his next server will have SATA controllers in it (hint, hint). David Lombard followed up that he personally liked SCSI drives because in his experience he could get much higher I/O rates regardless of the CPU load (PATA drives involve more CPU usage than SCSI drives). David mentioned that he had seen an I/O rate of 290 MB/second on x86 systems using SCSI drives. Bill Broadley followed up with a large number of questions and offered that he has seen PATA drive arrays reach speeds of 300-400 MB/second even when the CPU load was under fairly high. David and Bill discussed some technical details, including the fact that these tests were done using multiple controllers and running RAID-0 (stripping). Bill finally finished with the comment that he saw greater I/O rates using XFS as opposed to ext3.

To reinforce Bill's comments, there was a recent posting on the Linux-IDE-Arrays mailing list from Dan Yocum with some test results for a SATA drive array that uses three 3ware 8506-8 SATA RAID cards. The cards were configured with hardware RAID-5 than striped with software RAID-0 across the three cards. Using Bonnie++ on a 125 GB file with 64 KB chunks, Dan was able to achieve about 230 MB/sec for block writes and 520 MB/sec block reads.

{mosgoogle right}

The second branch of the discussion dealt with some observations about the Linux kernel in relation to I/O performance. Robin Laing responded to the initial discussion by stating that his application used one or two files that were much larger than memory while his code was running. He noticed that his machine 'stutters' for a few seconds every time there is a disk access. Mark Hahn responded that he thought the 'stutter' was not a drive problem but rather the a memory management problem within the kernel. He offered the observation that Linux seemed to over-cache and can get to a point where it's scavenging scans (looking to dump cached data that it's not likely to need and then re-caching data). Robert Brown followed this with some comments, which Mark had sent him off-line, that too much memory seems to confuse the caching system of some kernels. Robert also mentioned that when he used a 1.8 Ghz P4 system as a server he also saw some 'mini-delays.' When he took the exact same drives and put them in a 400 MHz Celeron system he got better performance.

This discussion was very useful in pointing out the need to test the entire system, from the exact hard drives, to the RAID configuration to be used (if one is to be used), to the exact kernel and kernel configuration, to determine the I/O performance. Using multiple controllers (whether SCSI or PATA or SATA), RAID-0, and XFS seems to provide the best I/O performance. However, you need to pay attention to the kernel and kernel configuration to extract the best performance possible.

Beowulf: New Cluster Benchmark

Bill Broadley posted to the Beowulf mailing list on November 17, 2003 about a better benchmark for clusters than the Top500 benchmark. This post grew out of a discussion about Virginia Tech's fantastic Top500 result on their new Apple G5 cluster. Bill was interested in the performance of larger clusters since they are starting to dominate the Top500 benchmark. In particular he thought the big difficulty for larger cluster was scaling, which is usually an interconnect issue. Jakob Oestergaard responded that he thought the Top500 was a fine benchmark for what it was. But it's definitely not a benchmark that measures the true power of a cluster for one's particular application. He thought that developing a series of benchmarks to quantify a cluster's performance would render the benchmarks useless (he was also the first person in this conversation to use the famous paraphrase, "There are lies, damn lies, and statistics... "). Robert Brown joined in stating that the one true benchmark was one's application. Robert brought up the point that he thought Microbenchmarks (a Microbenchmark tests only a small single aspect of a single system or a cluster) were more appropriate for benchmarking machines. He suggested something like Larry McVoy's lmbench benchmark suite. Moreover, he thought Larry's insistence that lmbench results could only be published if all of the results are published was a very good idea (chips companies are notorious for only publishing certain benchmark results that make their chips look good). He then stated that in his opinion he would like to see a full suite of microbenchmarks to test core functions "that are building blocks of real programs." These would include some microbenchmarks to test clusters. He finally finished with a typical Brownian comment that the Top500 benchmark was really intended to measure the size of one's, umm, cluster and nothing else.

John Hearns pointed out that the old Paralogic website has a link (since moved to here) to a set of tools called the Beowulf Performance Suite (BPS). Robert Brown followed up that Doug Eadline (previous editor of ClusterWorld Magazine and now head monkey at Clustermonkey.net) had done a good job putting together BPS and perhaps in the future a gathering of cluster experts could extend it and define a good and useful series of cluster benchmarks. Doug replied that BPS is called a Performance Suite, not a Benchmark Suite because it should be used to generate a baseline to measure changes (good or bad) to the cluster. Felix Rauch also chimed in with some very good comments about measuring network performance in clusters. Robert Brown really liked Felix's comments and went on to talk about a network microbenchmark that would watch the performance of the system and switch algorithms at the appropriate time to improve performance.

The Top500 benchmark is a simple benchmark with a long history. It has provided useful information about the general trend in high performance computing, that is, the increasing dominance of clusters. However, using it to say my cluster is faster than yours is a bit like using the heights of the basketball players to indicate how good they are. The heights are not an accurate indication of how good a team is, and the Top500 is not a measure of how useful a cluster is (although it is fun to play with). The discussion was very useful in providing good suggestions about how benchmarking for cluster should proceed.


Beowulf: Booting from USB Pen Drive

While a bit old, there was an interesting discussion on the beowulf mailing list that started with a posting by p.pennaz on 21 November, 2003, asking about booting a Linux system via a USB cartridge (USB solid-state storage device). USB storage, or any solid state storage for that matter, are very interesting because there's no moving parts and if the power goes out you don't loose your data. There was an immediate response that you should be able to boot from a USB storage device if your motherboard has a BIOS option to support it. Mark Hahn provided some simple ideas about what it would take to boot from a USB storage device. Donald Becker responded that just because a motherboard can boot from a USB storage device doesn't mean it's that easy. Many of the USB storage devices cannot be used for booting.

There are several Linux distributions that can fit onto a USB storage device and allow systems to boot directly from them. In fact, John Hearns pointed out that he has routinely booted systems from a USB memory stick that had StressLinux loaded in it.

Jim Lux also pointed out that there are simple IDE-to-CF (CF=Compact Flash) adapters that allow you to use CF cards as though they are disks. In a later post, Jim also pointed out how nice it could be to boot a diskless cluster node from a CF card using the adapter. This capability would help improve reliability (no moving parts) and reduce heat generation inside a node. Jim's intent is to use these kinds of devices on nodes that only have a wireless network (he doesn't want to ship a kernel and associated parts over a wireless network because of the low bandwidth). Andy Cater reminded everyone that Compact Flash has a number of limited rewrites, so perhaps using the CF card only for the read-only portions of the operating system and a small ramdisk for the portions that readily change (e.g. /var and /tmp).

Solid-state storage is fast becoming very inexpensive thanks to commodity uses (cameras, MP3 players, cell phones, etc.). These devices offer increased reliability and lower power usage and heat generation compared to hard drives. However, they are more expensive and slower (perhaps not an issue for read-only file systems) than hard drives. Overall, sold-state storage has much to offer and may be very useful for clusters.

{mosgoogle right}

Sidebar One: Links Mentioned in Column

Beowulf archives

linux-ide-arrays mailing list

linux-ide-arrays archive

Bonnie++

Postmark

lmbench

Stress Linux

Top500

BPS


This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He has been to 38 countries and hopes to see all 192 some day.