Here are some important steps to consider
Building a Grid is no simple task: It takes planning and coordination. This column discusses some rules of thumb to consider while setting up your own Grid. It is important to remember that grids are defined by three criteria,
- A Grid must coordinate resources that are not subject to centralized control and that cross organizational boundaries.
- A Grid must use standard, open, general-purpose protocols and interfaces.
- A Grid must deliver nontrivial qualities of service.
Meeting the first criterion will be a recurring theme of this column. Grids have an added layer of complexity on top of "simple" clusters, and sites will have existing policies that must be worked around instead of relying on a centralized control that simply rewrites local policies.
We will start by addressing multisite policies and team issues to be resolved before you begin construction of your Grid. Then we'll give some guidelines on how to define a software stack, what to do about security infrastructure, how to verify your Grid is fully up and running, and how to address user support for your organization.
Before You Begin
Before starting to set up a Grid, everyone involved should be clear about the goal for resource usage. For example, an organization may intend to use the resources for running a specific application, or as a platform for testing software scalability issues, or both. By defining success metrics at the outset, the people involved can avoid "mission creep" and misunderstanding of expectations.With multiple sites, the problems of setting up a single machine or cluster within one administrative domain can be multiplied tenfold. The number of people who need to be involved in every decision increases, the policy for each site can be different, and global and local policies must be reconciled.
As everyone has experienced, communication is hard. And when you start to consider communication among different sites, possibly from different parts of the United States or even from different countries, these issues can be multiplied by large differences in communication styles. Recognizing that what you think is "chiming in with a helpful suggestion" may be regarded by one of your colleagues as a "rude interruption" is vital to maintaining a good working relationship.
Critical to the success of a Grid is establishing teams to address specific issues regarding Grid creation and maintenance. Johnston suggests the formation of at least two teams: an engineering working group to implement the deployment at each site, and an application specialist team. The engineering team should contain members from each site involved in the Grid and should have well-defined liaisons with the local system administrators and network administrators for all the resources to be used in the Grid. The application specialist team should contain people familiar with both Grid middleware and end-user applications; this team will act as an interface between the users and the administrators (who often speak very different languages).
Due to the cross-organizational nature of Grids, it is frequently impossible to maintain central control. Individual sites may be required to maintain local autonomy and control over their resources. This situation means that Grid middleware must support reconciling local and global policies across a Grid. For example, a site may define user names based on some local mapping. At one site, a username for a researcher might be the alphanumeric u11270, while at a second it could be the researcher's last name. In this case, the different local policies are reconciled by having a gridmap file that matches a certificate to a specific local mapping.
Other policies that will need to be reconciled include usage agreements, cross-site charging policies, security policies, and information-sharing policies. These should be addressed as early as possible in Grid creation to avoid conflicts later on.
Setting Up Your Grid
Once initial policy concerns have been addressed, it is time to define a software stack, establish a security infrastructure, verify your Grid functionality, and address user support for your organization.Common Software Stack: A usable Grid requires that a common environment be defined for the users. One important element is the definition of a software stack, that is, the list of software and versions a user can expect to find on a resource, in three different categories: middleware components, user software, and environment variables for consistency.
Middleware components must address four basic functionalities: resource management, information services, data management, and security. Many Grids use the Globus Toolkit® to provide these functionalities. Other software packages - such as CondorG, for easier resource management interfaces, and Ganglia, for more detailed monitoring - have well-defined interfaces to the Globus Toolkit and are commonly co-deployed.
User software includes software for development environments (compilers, debuggers), application-specific libraries (BLAS and GMP for mathematical libraries, for example), and system tools and libraries (open ssh, glibc, etc.). This software should be defined down to the version needed for compatibility between resources.
Users will have a much easier time switching between resources if their home environments are set up in such a way that differences in paths are hidden from them. Common ways to do this include using softenv, using modules, or publishing software locations in a Grid information system such as the Globus Toolkit Monitoring and Discovery System. Each of these approaches enables a local site to add a level of indirection to permit local control over where packages are installed, while at the same time allowing for a consistent global policy at the user level.
Several projects have started grouping together sets of software as a first approach to defining common software stacks. The Globus Toolkit distribution contains the main Grid middleware components. Several projects, such as the GriPhyN Virtual Data Toolkit (VDT) and the Grids Center NMI release distributes binaries of Globus, Condor, the Network Weather Service, MyProxy and some other related tools that have undergone additional compatibility testing and support. TeraGrid is also defining a software stack - but for all levels of the hierarchy, not just middleware tools.
 
  
