Autonegotiation, Diskless, PVFS, Multicast Discussions

Article Index

PVFS-Users: Performance issue with PVFS

PVFS is a Parallel Virtual FileSystem that combines the hard drive space on compute nodes or dedicated I/O nodes to form a single filesystem. PVFS is built on top of any standard Linux filesystem. There are various ways to access the filesystem including the traditional Linux filesystem commands and ROMIO (I/O for MPI). More importantly, PVFS can greatly increase I/O operations compared to traditional filesystems.

{mosgoogle right}

While a bit old, there was a discussion on the PVFS-Users mailing list that started on the 16th of July 2003 by Craig Tierney, about some performance issues he was having on two PVFS systems with read performance being significantly slower than write. There was some initial discussion about tuning some of the parameters in PVFS, such as the stripe size, and tuning some parameters on the nodes to improve performance. It had a small affect on the read performance. Then Craig and Rob Ross discovered that by switching the client code from using mmap() calls to read data, to using sendfile() calls, the performance greatly increased. These two function, mmap() and sendfile() are functions within Linux that can be used for reading or writing data. They are low-level functions that PVFS uses. The performance boost was very good resulting in about 50% improvement for one cluster, and almost 300% improvement for another cluster. In fact, the PVFS team made this an option in the current version of PVFS so people can tune their PVFS setup for maximum performance. The team is also looking at adding some tuning suggestions to their FAQ (Frequently Asked Questions) and perhaps writing a code that can be used to measure PVFS performance as a sort of "baseline" to help people tune PVFS. Of course, the ultimate code for tuning is the user's application.

An important point about open cluster software like PVFS is the fact that it can be tuned for maximum performance based on the user's application.

PVFS-Users: Size of PVFS Clusters

On August 11th 2003, Nathan Poznick asked the PVFS-Users mailing list what the largest PVFS (Parallel Virtual FileSystem) cluster was, what was the configuration, and how was the performance? Rob Ross responded that he knew of systems with 10's of servers, 100's of clients, and 10's of TeraBytes (TB) of storage that have performed quite well. He indicated that over Myrinet, they were able to achieve about 3.5 GigaBytes (GB) per second aggregate bandwidth using IDE (Integrated Drive Electronics) disks. Troy Baer responded that their current configuration has 16 I/O nodes, with just a little under 10 TB of raw disk. They have achieved sustained performance of about 1.4 to 1.6 GB/sec for simple tests (e.g. ROMIO perf) and about 100-400 MegaBytes (MB) per second for real applications such as the ASCI Flash I/O code or the NAS BTIO code. He included a link to a paper discussing the results. Crag Tierney also discussed a system he was sizing for a bid on a project that used multiple FC (Fibre Channel) arrays to keep the number of IOD (Input/Output Daemon) nodes to a minimum. He said that he has seen performance of 150 MB/sec with nodes that have a reasonably fast disk. In his opinion, he did not see and reason that a PVFS filesystem could not be designed for 10 GB/sec performance.

Keep in mind that the numbers mentioned are how fast parallel programs read or write data to a PVFS File System and are a function of I/O server, interconnect, and cluster nodes. Searching for articles on the Internet will show how PVFS was used to achieve over 1 GB/sec performance on simple clusters. If I/O performance is a crucial part of your application or if you just want to test PVFS, then visit the website. The code is Open-Source and has been used in production at several sites.

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.