Getting Started with InfiniBand

The first step to using a new infiniband based network is to get the right packages installed. These are the infiniband related packages we ship and what they are there for (Note, the Fedora packages have not all been built or pushed to the repos yet, so their mention here is as a "Coming soon" variety, not an already done variety):

Once you have the right packages installed, the next thing is to get a basic system setup in place. The infiniband stack is both kernel based and user based. In order to use any given piece of hardware, you need both the kernel driver for that hardware, and the user space driver for that hardware. In the case of mlx4 hardware (which is a two part kernel driver), that means you need the core mlx4 kernel driver (mlx4_core) and also the infiniband mlx4 driver (mlx4_ib). When using infiniband, it is best to make sure you have the openib package installed. Then, these drivers will be automatically loaded into the kernel for you if you enable the openidb service. That will take care of the kernel space component (you don't need to tell openibd what hardware you have, it figures it out for itself). You can tweak the kernel space modules that are loaded by editing /etc/ofed/openib.conf to suit your needs.

If you attempt to run an infiniband application and it says you have no hardware, then that means you either haven't enabled the openibd service or you really don't have any hardware ;-) However, if it says it can't find a driver for your hardware, then it's talking about the user space driver, not the kernel space driver.

For the user space component, you need the core infiniband hardware library, libibverbs, installed and you also need hardware specific library packages installed. The common hardware specific library packages are: libmthca, libmlx4, libipathverbs, libcxgb3, libehca, libnes. Once these packages are installed, the libibverbs library automatically loads whichever ones are necessary in order to support the hardware the kernel detected in your machine.

Once you have the proper kernel space and user space hardware drivers loaded, you are almost ready to start rudimentary testing. The next thing you need is to select one (or more, but more requires manual editing of settings that I don't cover here) machine to be the subnet manager for your infiniband network. This machine will need to have the opensm package installed and its service script, opensmd, enabled. The default settings will work for most people, however, if you have multiple, redundant infiniband network fabrics, then you will need to configure more than one machine to start opensm as it will only attach to one fabric each time it is run, and you will need to configure the additional instances of opensm to bind to the proper port so that it manages the redundant network fabric instead of the default fabric. You can edit /etc/ofed/opensm.conf to change what port opensm binds to (NOTE: the opensm.conf file in rhel5.3/4.8 is changing format compared to the one in rhel5.2/4.7, so hand modifications will need to be forward ported to the new config file when these updates are released).

Once you have the opensm machine selected, and you've started the machine with both the openibd and opensmd services enabled, you should have a functional infiniband fabric. An easy way to test this is to make sure that the libibverbs-utils package is installed and run ibv_devinfo and ibv_devices to see what infiniband/iwarp devices the system thinks are present. Assuming that your devices are found and ibv_devinfo shows your port state to be active, then you are ready to run programs on the infiniband fabric.

In addition to this, you can create tcp/ip interfaces over the infiniband network (IPoIB). To do so, you will need to create the ifcfg-ib0 (and possibly ifcfg-ib1) file in /etc/sysconfig/network-scripts. IPoIB interface types have not been added to our system-config-network tool, hence the need to manually create the files. In addition, IPoIB interfaces can not support dhcp, so they must be statically configured. A sample ifcfg-ib0 file looks like this:

    DEVICE=ib0
    TYPE=Infiniband
    BOOTPROTO=static
    BROADCAST=192.168.0.255
    IPADDR=192.168.0.1
    NETMASK=255.255.255.0
    NETWORK=192.168.0.0
    ONBOOT=yes
    


In the case that you have two IB ports plugged into the same infiniband fabric (aka, on the same subnet, not each port on its own subnet) and that you also have IPoIB enabled on both ports, then in order to avoid possible confusion over why things sometimes work and sometimes don't when using IPoIB interface addresses to initiate connections between machines, it is best to add the following lines to your /etc/sysctl.conf file:

    net.ipv4.conf.all.arp_ignore=1
    net.ipv4.conf.ib0.arp_ignore=1
    net.ipv4.conf.ib1.arp_ignore=1
    


If you intend to be able to run infiniband using applications as any user other than root, you will also need to adjust the maximum locked memory for the system. This is done by modifying the /etc/security/limits.conf file. Depending on whether or not you want to release the limit on a specific group that is allowed to run infiniband applications or on all logins, your change should look something like this:

    @ib_user - memlock 8192
    


or

    * - memlock 8192
    


The value used above is a sample value. You can set the limit to -1 to remove the limit entirely. The actual amount of locked memory your application will need depends on how many connections it is going to open and how large of a message queue it allocates for each connection plus memory for the actual read/write buffers it sends. All RDMA memory must be locked in to physical memory so that the infiniband/iwarp hardware can safely access the memory via DMA.

Once you have reached this point, you should have at least a functional network of infiniband using machines. You should be able to use the various tools listed in the package list above to test both basic functionality and performance of your infiniband network. Now, putting that network to use is another matter ;-)