Cluster Programming: Explicit Implications of Cluster Computing

Article Index

I Do Declare

This is the part of the column where we need to call up those few remaining neurons who have stubbornly remained outside the box of conventional wisdom. To achieve my goal of moving the programming experience closer to problem and further away from the computer we will need to lift the programmer to a higher level of abstraction. That is, the programmer must not be allowed to think about how the program will be run on the computer. Just like Fortran lifted the original HPC programmer above the minutia of assembly language programming and closer to the problem they wanted to program. (recall that Fortran was short for FORmula TRANslation language).

In general, approaches to programming can be broken into two main groups, Imperative and Declarative. Imperative programming is what most people have done in the past. Fortran, C, Basic, Java, Perl, Python, all fall into the imperative camp where the computation retains knowledge of the system state. In an imperative program, you are telling the computer "how to solve" your problem. For instance, you assign data to a variable and continue until you reach a limit. The programmer explicitly tells the computer how to solve the problem. Program flow is assumed to be from one line to the next.

Declarative programming, on the other hand, allows the program to say what is to be done, but does not say how. The declarative compiler/interpreter uses implied knowledge from the program to figure out how to solve the problem. The program assumes nothing about the system state when writing the program. Instead of moving from one line to the next, declarative programs often resolve or satisfy the current line of the program, by using other lines of the program. This distinction allows the underlying language to determine "how" to resolve the current line.

In a way, declarative programming insulates the programmer from the internals of the machine (the "how" to solve it part). For all the declarative programmer knows, his computer could be little monkeys and typewriters. The insulation has tremendous advantages. It lets the compiler figure out the best way to optimize for a given computing environment. The ZPL language is great example of this method because the programmer has not expressed "how" to solve a matrix operation, but described what they want to do, the compiler can optimize for a particular environment. If they had explicitly told the computer how to do the matrix operation, the compiler is constrained to work within the context of implied (implicit) program flow.

Sidebar: The Tale of Patent Number 5471622

As you may guessed, I have spent a fair amount of time thinking about these ideas. Actually over ten years ago, I thought so hard about it I manages go get a patent entitled: "Run-time system having nodes for identifying parallel tasks in a logic program and searching for available nodes to execute the parallel tasks." Basically the patent outlined a dynamic method of running a Prolog programs on directly connected processors (For those that remember the Inmos Transputer.) The method is applicable to clusters and in the my last column (April 2006) I outlined the method of the patent with some obvious (and not so obvious) refinements.

Contrary to the patent system of today, the process of obtaining the patent spanned six years (1989-1995) and required that I argue and demonstrate the "non obvious nature" of the invention. The original patent has been abandoned due to my belief that the current patent system in the US is quite broken and does very little to protect unique inventions because anything is patentable in this point in time. Sharing these ideas and creating "prior art" now seems the best way to ensure that they will not get stolen by yet another obvious patent. For those interested in reading the patent, a small amount of googling will reward you with enough lawyer speak to last a year or two.

Declarative approaches sound nice. In fact they are very elegant. Some examples you may have encountered before are are SQL, Lisp, Scheme, Prolog, Haskell, and others. Declarative applications can often be written in less lines of code than their imperative counterparts as you are not expending resources to control the machine. If you use SQL you will notice that from the programmers point of view, you do not tell the computer how to resolve queries, which are often done in parallel. Less lines of code results in less bugs and better productivity. The problem with declarative approaches is often efficiency. The abstraction layer has a cost. To be fair, some declarative approaches often have minimal imperative features to aid in programming (we are not going to venture down that path today).

The Explicit Implications

Now that I have breezed through enough computer since topics to make our head spin, lets try and tie this up in neat little package.

My experience has shown that parallel computing includes a large cost for application development. In addition, as clusters get bigger, the traditional way we create and run applications may not work well in "fault expected" environments. Existing applications are probably not going to be re-written any time soon, so my remaining comments have to do with those killer applications that have yet to be written.

In order to solve the problems I have outlined, at a reasonable cost, I propose that a declarative approach to parallel computing is one of the promising avenues in which to proceed. As distasteful as it sounds, I see no other way out of our current conundrum of hard to program cantankerous beasts we call clusters. If programmers can create and test ideas quickly on any computer (one node or ten thousand nodes) and not worry about how the program gets executed, then clusters will become more useful to everyone.

There is no doubt that declarative approaches will hurt efficiency and lead to many dismissing this approach to parallel computing. But, let's take a look at the economics for a minute. The cost of computing power is going down. Dual cores now cost as much as single cores did last year. The effective computing power has doubled in the last year. This trend will continue. Now consider what costs will increase. The cost of people will always rise hence the cost to create software will as well. Where should the effort to enhance efficiency be focused?

Two columns ago I started with my cluster wish list. My solution to this list looks like a sandwich with three distinct parts many of which do not exist just yet. We start off with a layer of inexpensive and massively parallel hardware, slap on a layer of dynamic execution methods, and top it off with a declarative programming interface. Goes great with side of Linux. Now get in the kitchen and get to work.

Note: This article is the last of three part series. The first and second parts are:

{mosgoogle right}
Sidebar: Resources
OpenMP

HPF - High Performance Fortran

LINDA - dynamic tuple space

UPC - Unified Parallel C

ZPL - parallel matrix language

Declarative Programming

This article was originally published in Linux Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Douglas Eadline can be found swinging randomly through the trees at here at ClusterMonkey

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.