A Benchmark for Parallel File Systems

Access Patterns

For the purposes of the benchmark, an Access Pattern refers to the specific mapping between an MPI process and some set of bytes within a file. In essence, how the data gets to and/or from the file system via an MPI process. This pattern is described with a spatial distribution (i.e. how the data is distributed on the file system or in memory) and temporal distribution (i.e. how the data is accessed as the code is run).

The benchmark as currently written has a number of common spatial access patterns. These patterns are simple strided, nested strided, random strided, sequential access, segmented access, tiled, flash I/O kernel, and unstructured mesh.

A simple strided access pattern divides sections of a file into logical strips. Each participating process accesses the data at some displacement relative to a stripe. Most of the time, the displacement is related to the process number (rank) in an MPI communicator group. This access pattern has been found to be very common in many applications.

A nested strided access pattern combines several simple strided access into each stripe. The I/O access may then consist of simple stride accesses of a single stripe inside a simple stride access of another stripe and so on. So a simple nested stride access can be used for accessing data in a two-dimensional array. Consequently, it can be easily extended for accessing data in three dimensional arrays. This type of access is the second most common and can be found in many codes such as linear algebra codes.

A random strided access pattern has each process read or write an amount of data in a round-robin pattern. The size of the stripe can vary depending upon how much data is accessed by each process in turn. A good example of this kind of access pattern is encoding of media where each frame size of an image is variable.

In the sequential access pattern each process opens the file and issues requests (read or write) that access a fixed size part of the file with linearly increasing offsets for successive requests. This access pattern is common for file systems that allow regions of a file to be partitioned on separate I/O servers. In particular, it maximizes reading performance (all processes can simply read the same file). However, write I/O is a different matter because you don't want multiple writes to occur for the same part of the file. The way to get around this is to have each process write a different file. This type of access is common in older codes.

A segmented access pattern is a fairly simple one. It divides the file into logical segments for as many processes as there are performing I/O. For example if a file is 4096 bytes long and there are four processes, a segmented access pattern will just assign the first 4096/4 bytes to the first process, the next 4096/4 bytes to the next process and so on. This type of access pattern is used by many different codes.

A tiled access pattern takes a two-dimensional set of data and breaks it into small rectangular regions. Depending upon how the tiles are laid out each tile access can be decomposed into a segmented access pattern or a simple strided access pattern. One example that uses this type of I/O is a large virtual screen on multiple video devices. Another example is storing two-dimensional data that is larger than memory. This occurs in out-of-core solvers.

The next access pattern is called the Flash I/O Kernel access pattern. Flash is a code that is used to study the effects of thermonuclear explosions on the surfaces of many types of stars. Specifically, a heavy dependence on I/O performance is required by simulations related to X-ray bursts of Type 1 supernovae. This access pattern is non-contiguous in memory and non-contiguous within the file. In essence, the access pattern is a sequence of 'memory blocks' where each block contains a three dimensional cube of data. Inside this three dimensional cube there are areas that need to be accessed from the file and those areas that are skipped over.

The last spatial access pattern is the unstructured mesh access pattern. The data that are written for this access pattern is typically composed of a set of points with x,y,z coordinates. At each point there are a number of properties that can range from just a couple to hundreds. Also, included is how the points relate to one another to form polygons. Typically, this type of data can be decomposed into one of the previously mentioned access patterns (simple strided or segmented access patterns) because each process has a subset of the total data. The unstructured mesh access pattern is very common in engineering applications such as Computational Fluid Dynamics, Finite Elements, terrain mapping, etc.

The benchmark suite can also accommodate temporal access patterns. It covers five patterns: read once, write once, read-modify-write, re-read, and re-write.

For the read once access pattern, data from a file is accessed and copied to some location in memory. The file access and the memory access may be contiguous or non-contiguous. The benchmark assumes that all read access is blocking. That is, all data must be in memory before the read function returns.

The write once access pattern is similar to read once but there is more care taken if the data is flushed to the storage devices or not.

The read-modify-write access is a very intensive temporal access pattern. It involves reading data, operating on it, and then writing the data back to the original location in the file. One common application of this type of access is an out-of-core simulation where there is more data than fits into memory.

A re-read access pattern is very simple. All or part of a data file is retrieved and then later, it is retrieved again. Only the second read is timed in the benchmark since file systems can take advantage of caching to speed up this operation.

A re-write is similar to the re-read pattern expect it involves a write operation, not a read. This access pattern allows the write-behind caching of a file system to be tested.


As part of his research, Mr. Shorter has written a useful benchmark suite for testing parallel file system performance. This benchmark is perhaps the most robust of all the parallel file system codes I have encountered. Here is your homework, how many different combinations of access and temporal patterns can the benchmark test?

{mosgoogle right}

Sidebar One: Links Mentioned in Column



Frank Shorter's Thesis "Design and Analysis of a Performance Evaluation Standard for Parallel File"

This article was originall published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Dr. Jeff Layton hopes to someday have a 20 TB file system in his home computer. He can sometimes be found lounging at a nearby Fry's, dreaming of hardware and drinking coffee (but never during working hours).



    Login Form

    Share The Bananas

    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.