Cluster Programming: You Can't Always Get What You Want

But it does not stop me from asking

Fifteen years ago I wrote a short article in a now defunct parallel computing magazine (Parallelogram) entitled "How Will You Program 1000 Processors?" Back then it was a good question that had no easy answer. Today, it is still a good question that still has no easy answer. Except now it seems a bit more urgent as we step into the "mulit-core" era. Indeed, when I originally wrote the article, using 1000 processors was a far off, but real possibility. Today, 1000 processors are a reality for many practitioners of HPC. As dual cores hit the server rooms, effectively doubling the processor counts, many more people will be joining the 1000P club very soon.

So let's get adventurous and ask, "How will you program 10,000 processors?" As I realized fifteen years ago, such a question may never really have a complete answer. In the history of computers, no one has ever answered the question to my liking -- even when considering ten processors. Of course there are plenty of methods and ideas like threads, messages, barrier synchronization, etc., but when I have to think more about the computer than about my problem, something is wrong.

Having spent many a night trying to program parallel computers (the most recent incarnation being clusters) I have come up with a list of qualities that I want in a parallel programming language. Since I am wish for the moon, I may be asking for the impossible, but I believe some of the features I describe below are going to be necessary before using large number of processors will become feasible for the unwashed masses of potential HPC users.

Failure Is an Option

It is said, that the Buddha's last words were "decay is inherent in all complex/component things." And, Buddha was not even a system administrator. Clusters are complex/component things. The bigger the cluster, the more things that can decay. A program that routinely uses over 1000 processors will experience component failures at some point. As an hypothetical example, if you have 1000 cluster nodes with a MTBF (Mean Time Between Failure) of 10,000 hours (1.1 years) that means you can expect one node to fail every ten hours. Given that the MTBF is fixed for most computer hardware, using more and more processors for your program ultimately becomes a losing proposition. {mosgoogle right}

In the future, I expect clusters to have constant (and expected) failures. Furthermore, the cost to increase the MTBF will probably be prohibitive and adapting to failure will be an easier solution.

I then have to ask, "How the heck do write a program for hardware you know is going to fail at some point?" The answer is quite simple, the program will have to tolerate hardware failures. In other words software must become fault tolerant. And, here is the important part, I the programmer should not have write this into my program.

Dynamic Scalability

One way to make a program fault tolerant is to make it dynamically scalable. That is, it can change the number of processors it is using on the fly. Adding fault tolerance means redoing work so some mechanism is will be needed to dynamically assign processors. Dynamic scalability is therefore, the next thing I want in my program. The idea is quite simple, I want one program that can run on 10,000 processors as well as one processor. Of course, large problem sizes may not be feasible on one processor. After all, if a large problem requires 10,000 processors for an hour it would take 1 processor 10, hours (assuming their was enough memory). I should, however, be able to run a small data set on one processor and then scale the same binary up to a maximal number of processors for that given problem size (and everything in between).

For example, if I should be able to develop a program on my laptop and move the same binary over to a sixteen processor cluster and run it without any modification. If the cluster is running other programs at the same time and there are only four idle processors, then my program should start using these four. As other processors become available it should grow only to the point that adding more processors does not help performance. At a later point in time, if I want to run my program with a larger problem size on 1000 processors, I should be able able to run the same program.

No More Standing in Line

Because my program is now dynamically scalable, I assume yours is as well. In this case our programs should be able to co-operate with one another. If we both have a program to run at the same time we should share resources optimally. In many cases, the need to schedule or queue jobs will not be necessary because the programs will manage themselves. My program will constantly negotiate with the other running programs to get the best set of cluster resources. For instance, my program might negotiate to wait while others run, if it can get exclusive access to 100 processors for one hour. I don't care how the programs do it, I just want them to behave this way and I don't want to have to write such behavior into my program.

Additionally, as part of this dynamic scheme there should be no central point of control. Programs should behave independently and not have to rely on a single resource. Indeed, within the programs themselves subparts should be as autonomous as possible. Centrally managing sixteen processors seems reasonable, managing sixteen hundred and having some time to do computation is a real challenge.

And then Some

Finally, I want an expressive language that is free of any artifacts due to the underlying hardware. I want to be as close to the application I am coding as possible. Thinking in "my problem space" is where I want to live. Concerning myself with memory management, numbers of processors, and other such details takes me away from my problem domain. In short, that is my wish list; fault tolerant, dynamically scalable, co-operative, and expressive. A simple wish, but a tall order. Realizing that I seldom get what I want, I have set my expectations high so maybe, just maybe, I'll get what I need. How are we going to get to this software nirvana? I thought you would never ask. I have some ideas, but first, lets address a few issues that always seem to come up when I talk about this topic.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.