Table of Contents
README for Installing Red Hat HPC Solution - Beta 1
What is the Red Hat HPC Solution? 1
Installation Prerequisites 1
Installation Procedure 2
Preparing To Install 2
Starting the Install 2
Installing Additional Red Hat HPC Kits 3
Verifying the Red Hat HPC install 3
Adding Nodes to the Cluster 4
Managing Node Groups 7
Adding RPM Packages in RHEL to Node Groups 8
Adding RPM Packages not in RHEL to Node Groups 9
Adding Fedora Repository to the Installer Node 11
Associating a Repository to Node Groups 12
Adding Kit Components to Node Groups 16
Synchronizing Files in the Cluster 17
Updating the Installer node and the Compute Node Repository 19
The Red Hat HPC Solution is a fully integrated software stack that enables the creation, management and usage of a high performance computing cluster running Red Hat© Enterprise Linux. The cluster management tools provided with the Red Hat HPC Solution are based on Platform OCS from Platform Computing Corporation.
For more information about Platform OCS, visit http://www.platform.com/Products/platform-open-cluster-stack
Installing Red Hat HPC Solution - Beta (Red Hat HPC) will require one system to be designated as an installer node. This installer node will be responsible for installing the rest of the machines. Prior to installing Red Hat HPC confirm that the designated machine has Red Hat Enterprise Linux 5.1 installed and meets the following requirements:
SELinux must be disabled.
One or more Network Interfaces not using DHCP, but statically defined IP addresses. These should be connected to the networks where the machines will be provisioned.
Installed with Red Hat Enterprise Linux version 5 update 1.
A partition with at least 10GBytes free.
Red Hat Enterprise Linux Version 5 installation media.
A valid subscription to Red Hat Network is required including an entitlement to Red Hat HPC Channel
The firewall (iptables) must be configured to permit the services needed for installation on all networks used to provision nodes (HTTP, HTTPS, TFTP, DNS, NTP, BOOTPS, etc)
Red Hat HPC will create a private DNS zone for all machines under its control. The name of this zone must NOT be the same as any other DNS zone within the organization where the cluster is installed.
Verify that the installer node meets the prerequisites above.
Register on Red Hat Network and subscribe to the appropriate channels.
Log into the machine as root and install the Red Hat HPC bootstrap RPM by running the following:
# yum install ocs
# source /etc/profile.d/kusuenv.sh
The Red Hat HPC bootstrap (called ocs) command will be downloaded. The Red Hat HPC package provides a tool for completing the installation and setup of Red Hat HPC. Run the following:
# /opt/kusu/sbin/ocs-setup
The script will detect your network settings and provide a summary per NIC like:
NIC: eth0
============================================================
Device = eth0 IP = 172.25.243.44
Network = 172.25.243.0
Subnet = 255.255.255.0
mac = 00:0C:29:C4:61:06 Gateway =
172.25.243.2
dhcp = False boot = 1
Red Hat HPC cannot setup provisioning on DHCP configured networks only statically configured networks. The setup script will ask if you want to provision on all networks, and if not which ones to provision on.
Red Hat HPC creates a separate DNS zone for the nodes it installs. The tool will prompt for this zone.
Warning: Do not use the same DNS zone as any other in your organization. Using an existing zone will cause DNS name resolution problems.
Red Hat HPC needs to store a copy of the Operating System media, and installation images. The setup script will prompt for the location of the directory to store the Operating System. The default is /depot. If another location is used a symbolic link to /depot will be created.
The setup script will need to copy the Red Hat Enterprise Linux 5.1 media on to the local disk in order to build the repository for installing nodes. It will ask for the DVD/CD's, a directory containing the contents of the OS media, or an ISO file providing the media.
The setup script will copy the Operating System to /depot this will take some time (5-10 minutes from a CD/DVD). Once completed you will see something like:
Congratulations! You should be able to install compute nodes on:
Network 1.2.3.4 on interface ethX
The installer node is now ready to begin installing other nodes in the cluster.
Additional software tools such as Nagios® and Cacti™ are packaged as software kits. Software packaged as a kit is much easier to install onto a Red Hat HPC Cluster. A kit contains, rpms for the software, rpms for meta-data and configuration files. To install Nagios and Cacti onto the Red Hat HPC cluster use the following commands:
# yum install ocs-kit-cacti
# /opt/kusu/sbin/install-kit-cacti
# yum install ocs-kit-nagios
# /opt/kusu/sbin/install-kit-nagios
The yum command above downloads the kit from Red Hat Network. Included in the kit is an installation script that adds the kit to the Red Hat HPC cluster repository and rebuilds the cluster repository. Every kit that is downloaded from Red Hat Network has a corresponding script used to install the kit into the cluster repository.
Once the installer node is successfully configured the next step is to verify that all software components are installed and working correctly. The following steps can be used to verify the Red Hat HPC Install
1. Start the web browser. The cluster homepage should display
2. Check for any hardware issues by using the dmsg command
3. Check all network interfaces to see if they are configured and up.
a. # ifconfig | more
4. Verify the routing table is correct
a. # route
b. Make sure the following system services are running:

5. Run some basic Red Hat HPC commands
a. List the installed repositories
i. # repoman –l
b. List the installed kits
i. # kitops –l
c. Run the Node Group Editor
i. # ngedit
d. Run the Add Host tool
i. # addhost
e. Check that cacti is installed
i. From the Web browser enter the following URL:
ii. Login to Cacti with username: admin, password: admin
f. Check that Nagios is installed
i. From the Web browser enter the following URL:
ii. Login to Nagios with username: admin, password: admin
Adding Nodes to a Red Hat HPC cluster is accomplished by running the addhost tool. Addhost listens on a network interface for nodes that are PXE booting and adds them to a specified node group. Node groups are templates that define common characteristics such as network, partitioning, operating system and kits for all nodes in a node group. To add nodes, open a terminal window or login to the installer node as root:
1. # addhost
2. Select the node group for the new nodes. Normally compute nodes are added to compute-rhel

3. Select the network interface to listen on for new PXE booted node

4. Indicate the rack number where the nodes are located

5. Addhost will now wait for the nodes to boot

6. Boot the nodes you want to add to the cluster

7. When a node is successfully detected by addhost a line will appear in the ‘installing node status’ window.

8. Exit add host when Red Hat HPC has detected all nodes.
Red Hat HPC cluster management is built around the concept of node groups. Node Groups are a powerful template mechanism that allows the cluster administrator to define common shared characteristics among a group of nodes. Red Hat HPC ships with a default set of node groups for, Installer nodes, packaged installed compute nodes, diskless compute nodes and imaged compute nodes. The default node groups can be modified or new node groups can be created from the default node groups. All of the nodes in a node group share the following:
Node Name format
Operating System Repository
Kernel parameters
Kits and components
Network Configuration and available networks
Additional rpm packages
Custom scripts (for automated configuration of tools)
Partitioning
A typical HPC cluster is created from a single installer node and many compute nodes. Normally compute nodes are exactly the same as each other with just a few exceptions, like the node name or other host specific configuration files. A node group for compute nodes makes it easy to configure and manage 1 or 100 nodes all from the same node group. The ngedit command is a graphical TUI (Text User Interface) run by the cluster Administrator to create, delete and modify node groups. The ngedit tool modifies cluster information in the Red Hat HPC database and also automatically calls other tools and plugins to perform actions or update configuration files automatically. For example, modifying the set of packages associated with a node group in ngedit automatically calls cfm (configuration file manager) to synchronize all of the nodes in the cluster using yum to add and remove the new packages, while modifying the partitioning on the node group notifies the administrator that a re-install must be performed on all nodes in the cluster in order to change the partitioning on all nodes. The Red Hat HPC database keeps track of the node group state, thus several changes can be made to a node group simultaneously and the physical nodes in the group can be updated immediately or at a future time and date using the cfmsync command.
Open a Terminal and run the node group editor as root.
# ngedit
Select the compute-rhel node group and move through the Text User Interface screens by pressing F8 or by choosing next on the screen. Stop at the Optional Packages screen.

Additional rpm packages are added by selecting the package in the tree list. Pressing the space bar expands or contracts the list to display all of the available packages. By default packages are sorted alphabetically. The list of packages can be sorted by Red Hat groups, just choose Toggle View to re-sort the packages. Select the additional packages using the spacebar when a package is selected an asterisk will display beside the package name. Package dependencies are automatically handled by yum, thus if any selected package requires other packages they will be automatically included when the package is installed on the cluster nodes. Ngedit will automatically call cfm to synchronize the nodes and install new packages but will not automatically remove packages from nodes in the cluster (this is by design). If required pdsh and rpm can be used to completely remove packages from the rpm database on each node in the cluster.
Red Hat HPC maintains a repository containing all of the rpm packages that ship with Red Hat Enterprise Linux, for most customers this repository is sufficient. Rpm packages that are not in Red Hat Enterprise Linux can also be added to a Red Hat HPC repository by placing the rpms into the appropriate contrib directory under /depot. For example:
1. Start with the rpms that are not in Red Hat Enterprise Linux or in a Red Hat HPC Kit
2. Create the appropriate subdirectories in /depot/contrib:
#
cd /depot
# mkdir –p rhel/5/x86_64
#
cp foo.rpm /depot/contrib/rhel/5/x86_64/foo.rpm
3. Rebuilt the Red Hat HPC repository with repoman:
# repoman –u –r rhel5_x86_64
4. It will take some time to rebuild the repository and associated images.
5. Run ngedit and navigate to the Optional Packages screen.
6. Select the new package by navigating within the package tree and using the spacebar to select.
7. Continue through the ngedit screens and either allow ngedit to synchronize the nodes
immediately or perform the node synchronization manually with cfmsync –p at a later time.
Example: selecting a rpm that is not included in Red Hat Enterprise Linux

The contrib. Directory may not exist in /depot/.
if it does not exist create the directory. Contributions can be
added to more than one Red Hat HPC repository, the directory
structure is as follows:
/depot/contrib/<os_name>/<version>/<architecture>
For example adding contributions to a Fedora Core 6 on x86 repository requires the following directory structure in /depot/contrib.
/depot/contrib/fedora/6/i386
Adding other Red Hat based Operating Systems such as Fedora to Red Hat HPC is quite straight-forward, but does require a few steps. In order to Add Fedora to the installer node you will need a copy of the Fedora media or a Fedora iso. Once you have the Fedora media or iso, just add Fedora to Red Hat HPC using the kitops command. Type the following to add Fedora, mounted on /media/CDROM to Red Hat HPC:
# kitops -a -m /media/CDROM/ --kit=fedora
Adding a kit to Red Hat HPC makes the software available for use in a repository, so the next step is to create a Fedora repository
# repoman –n –r Fedora-6-i386
Now add the required Operating System kit to the repository
# repoman –a –r Fedora-6-i386 –kit=fedora
Add the Red Hat HPC base kit to the repository. The base kit contains all of the tools required by Red Hat HPC for managing the cluster.
# repoman –a –r Fedora-6-i386 –kit=base
The Operating System and base kits are always required in a repository, at this point the repository can be used to install nodes or you can add more kits to the repository. One final step must be performed on the repository to rebuilt the repository with the new Operating System and base kit
# repoman –u –r Fedora-6-i386
Congratulations, you should now have a new repository added to your cluster, you can view the available repositories with the following command:
# repoman –l
A single Red Hat HPC installer node can contain more than one Red Hat Operating System Repository. Adding a new Operating System such as Fedora to Red Hat HPC involves several steps:
1. Add Fedora Operating System CDs/DVD/iso as a Red Hat HPC Kit using kitops
2. Create a new repository for Fedora using repoman -n
3. Add the Fedora kit to the new repository with repoman -a
4. Add the Red Hat HPC base kit to the repository with repoman -a
5. Update the repository with repoman -u. This assembles all of the kits into a complete repository
Once steps 1-5 are completed the new repository can be added to Node Groups with the ngedit tool. Run ngedit from a terminal, and create a copy of an existing node group. In our example we will copy the compute-rhel node group.

Edit the newly created node group then on the Repository screen change the repository to Fedora (or your snapshot repository)

By changing the repository to your new repository you have effectively added this new node group to your new repository. Continue moving through the rest of the ngedit screens selecting or modifying settings as needed. Upon exit, ngedit will automatically update the database .
Adding kit components to nodes in a node group is very similar to adding additional rpm packages. Open a terminal and start the ngedit tool choose the compute-rhel node group, press F8 or choose Next and proceed to the Components screen. Each Red Hat HPC kit installs an application or a set of applications, the kit also contains components which are meta-rpm packages designed for installing and configuring applications onto a cluster. By choosing the appropriate components it is easy to configure all nodes in a node group. For example the cacti kit contains two components, component-cacti and component-cacti-monitored-node. The component-cacti installs and configures cacti, sets up the web pages and connection to the database, this component is normally installed on the cluster installer node or any other node (or set of nodes) designated as the management node. The other component in the cacti Kit, component-cacti-monitored-node contains the cacti agent code that runs on compute nodes in the cluster. Most Red Hat HPC Kits come configured with automatic node group association and component selection, this makes the process of adding Kits to node groups much easier than manually selecting them in ngedit. For example, the Platform Lava Kit automatically associates the Lava master with the installer node group and the Lava compute nodes with the compute-rhel node group.

HPC clusters are built from many individual compute nodes and all of these nodes must have copies of common files such as /etc/passwd, /etc/shadow, /etc/group, and others. Red Hat HPC contains a file synchronization service called cfm (Configuration File Manager). Cfm runs on each compute node in the cluster and when new files are available on the installer node a message is sent to all of the nodes notifying them that files are available. Each compute node connects to the installer node and copies the new files using the httpd daemon on the installer node. All of the files to by synchronized by cfm are located in the directory tree /etc/cfm/<nodegroup>. Cfm organizes file synchronization trees by node group. A directory exists for each nodegroup under /etc/cfm and below the nodegroup name is a tree that replicates the file structure of the machines in the node group, for example:

In the screenshot above /etc/cfm directory contains several node group directories such as compute-diskless and compute-rhel. In each of those directories is a directory tree where the /etc/cfm/<nodegroup> directory represents the root of the tree. The /etc/cfm/compute-rhel/etc directory contains several files or symbolic links to system files. These system files will be synchronized across all of the nodes in the node group automatically by cfm. Creating symbolic links for the files in cfm allows the compute nodes to be automatically synchronized with system files on the installer node.
Adding files to cfm is very simple just create the new file in the appropriate directory, you must create all of the directories and subdirectories for the file then place the file in the correct location. Existing files can also have a <filename>.append file. The contents of a <filename>.append file is automatically appended to the existing <filename> file on all nodes in the node group.
To notify all of the nodes in all node groups or nodes in a single node group use the cfmsync command, for example
# cfmsync –f –n compute-rhel
Synchronizes all files in the compute-rhel node group.
# cfmsync –f
Synchronizes all files in all node groups
For more information on cfmsync view the man pages.
Red Hat HPC manages updates to the installer nodes differently from all other nodes in the cluster. The rpm packages and updates to the Operating System Repository for all nodes provisioned by the installer (and that includes compute nodes and diskless nodes) is managed independently from updating the installer node. To update the installer node use the following command:
# yum update
The yum tool will download all of the required updates for the operating system and install them on the Installer node. Since updating installer nodes and compute nodes is separate you can choose just to update the installer node – and either choose to update the compute nodes or not update the compute nodes.
To update the compute nodes in a Red Hat HPC cluster the following command must be used:
# repopatch –r rhel5_x86_64
The repopatch tool will download all of the required updates for the operating system and install them into the repository for the compute nodes. Repopatch may display an error if it is not properly configured, for example:
# repopatch –r
rhel5_x86_64
Getting updates for rhel-5-x86_64. This may take
awhile…
Unable to get updates. Reason: Please configure
/opt/kusu/etc/updates.conf
Edit the /opt/kusu/etc/updates.conf file adding your username and password for Red Hat Network to the [rhel] section of the file, for example:
[fedora]
url=http://download.fedora.redhat.com/pub/fedura/linux/
[rhel]
username=
password-=
url=https://rhn.redhat.com/XMLRPC
yumrhn=https://rhn.redhat.com/rpc/api
After configuring the /opt/kusu/etc/updates.conf file repopatch should download all of the updates from Red Hat Network and create an update kit which is then associated with the rhel-5-x86_64 repository using ngedit. Repopatch should automatically associate the update kit with the correct repository, you can view the list of update kit components from ngedit on the Components screen and list the available update kits with the kitops command, for example:

Note: Remember that yum is used to update the installer node directly from Red Hat Network or other yum repositories. The repopatch command updates the compute nodes or other nodes provisioned by the installer node.