banner.png

GlusterFS Overview

Author:Bowe Strickland, Curriculum Manager
Contact:bowe@redhat.com
Copyright:Copyright © 2012, Red Hat, Inc. All rights reserved.

Red Hat Storage Software Appliance

Goal

Provide scalable, redundant network available storage with minimal administrative headaches.

Overview: Nouns

  • Volume

    A Network available filesystem which can be mounted using native GlusterFS, NFS, CIFS, ...

  • Trusted Storage Pool

    A collection of peered nodes which provides volumes

  • Node

    A host running the glusterd daemon, with locally attached bricks.

  • Brick

    A locally attached xfs filesystem.

gluster.png

Overview: Adjectives

Volumes can be...

  • Distributed
  • Replicated
  • (Striped)

Some Details: Installation

  • Officially supported hardware

    • Specific Dell, HP, and SuperMicro configurations
    • Check User Guide, Knowledge base
  • RHSSA distributed as installation ISO

    • http://access.redhat.com

    • rhssa-3.2-dvd.iso

      ...
      |-- images              <-- installer images
      |   `-- pxeboot
      |-- isolinux            <-- installer bootloader
      |-- Packages            <-- reduced RHEL Server 6.1
      |-- repodata
      |-- SRPMS
      `-- SSA                 <-- GlusterFS, XFS filesystem utils
          `-- repodata
      
  • Install via DVD, USB, PXE

    • Minimal configuration: partitioning, root password

    • Embedded Kickstart File:

      %packages --nobase
      @core
      @ssa-tools
      @glusterfs-all
      @scalable-file-systems
      %end
      
      %post --nochroot
      sed -i -e '/ONBOOT/c ONBOOT=yes\nBOOTPROTO=dhcp' \
          /mnt/sysimage/etc/sysconfig/network-scripts/ifcfg-eth0
      %end
      
      %post
      /usr/sbin/fix-grub.sh
      %end
      

Preparing Bricks

  • Create and Mount XFS filesystem:

    [root@gnode1 ~]# pvcreate /dev/vdb
      Physical volume "/dev/vdb" successfully created
    
    [root@gnode1 ~]# vgcreate vg_node /dev/vdb
      Volume group "vg_node" successfully created
    
    [root@gnode1 ~]# lvcreate -n gnode1a -L 1g vg_node
      Logical volume "gnode1a" created
    
    [root@gnode1 ~]# mkfs.xfs -i size=512 /dev/vg_node/gnode1a
    meta-data=/dev/vg_node/gnode1a   isize=512    agcount=4, agsize=65536 blks
    ...
    
  • Define persistent mount in /etc/fstab file

  • ext3/ext4 filesystem support deprecated

  • Mount points must be unique throughout the entire storage pool.

Verbs: Nodes

  • Nodes

    • probe:

      [root@gnode1 ~]# gluster peer status
      No peers present
      
      [root@gnode1 ~]# gluster peer probe gnode2
      Probe successful
      
      [root@gnode1 ~]# gluster peer status
      Number of Peers: 1
      
      Hostname: gnode2
      Uuid: eaf84d73-78c0-4758-a49f-c2cba7768b18
      State: Peer in Cluster (Connected)
      
    • detach, status

Verbs: Volumes

  • Volumes

    • create:

      [root@gnode1 ~]# gluster volume create gvola
      gnode1:/mnt/bricks/gnode1a gnode2:/mnt/bricks/gnode2a
      Creation of volume gvola has been successful. Please start the volume to
      access data.
      
      [root@gnode1 ~]# gluster volume info gvola
      
      Volume Name: gvola
      Type: Distribute
      Status: Created
      Number of Bricks: 2
      Transport-type: tcp
      Bricks:
      Brick1: gnode1:/mnt/bricks/gnode1a
      Brick2: gnode2:/mnt/bricks/gnode2a
      
    • start:

      [root@gnode1 ~]# gluster volume start gvola
      Starting volume gvola has been successful
      
      [root@gnode1 ~]# gluster volume info gvola
      
      Volume Name: gvola
      ...
      Status: Started
      ...
      
    • stop, info

More Interesting Verbs: Volumes

  • Volumes

    • {add,remove,replace}-brick

      Volumes can be resized by adding or removing storage, possibly replacing failed equipment. Changes can be performed live on active filesystems.

    • rebalance

      For replicated systems, new data will begin using newly attached storage immediately. Old data will need to be manually rebalanced to take advantage of the new configuration.

    • (heal)

      For replicated systems, if access to a particular node is interrupted, data reconstruction (healing) will occur when access is resumed.

Clients

  • Native GlusterFS

    • Mounting:

      [root@gnode4 ~]# mkdir -p /mnt/volumes/gvola
      [root@gnode4 ~]# mount -t glusterfs gnode1:gvola /mnt/volumes/gvola
      
    • (Obviously, this is a bit artificial. Volumes would be mounted on remote clients, not storage software appliances.)

    • Any peer could be referenced for the mount

    • requires glusterfs-fuse-<version>.el6.x86_64

    • implemented in userspace using FUSE

  • NFSv3 (tcp)

    • Mounting:

      [root@gnode4 ~]# showmount -e gnode2
      Export list for gnode2:
      /gvola *
      
      [root@gnode4 ~]# mkdir -p /mnt/volumes/gvola_nfs
      [root@gnode4 ~]# mount -t nfs -o tcp,vers=3 gnode2:/gvola /mnt/volumes/gvola_nfs
      
  • For default distributed storage, storage space aggregates:

    [root@gnode4 ~]# df -h
    Filesystem            Size  Used Avail Use% Mounted on
    ...
    gnode1:gvola          2.0G   65M  2.0G   4% /mnt/volumes/gvola
    gnode2:/gvola         2.0G   65M  2.0G   4% /mnt/volumes/gvola_nfs
    

Labs

Starting Environment

  • 4 RHSSA nodes as virtualized guests: gnode[1234]
  • root password: redhat
  • 4GiB of available storage (/dev/vdb)
  • Access to sudo /usr/bin/virt-manage

Step 1: Configure bricks

On each of the 4 guests, create and mount a 1GiB partition formatted with an xfs filesystem. The partition should be mounted to the /mnt/bricks/gvol<NODENUMBER>a directory. For flexibility, you are encouraged to create logical volumes.

As a convenience, the following script is available to your guests. It can be run on each of your nodes, with the node number as a single argument. The following would be appropriate for node gnode2:

SUMMIT=http://192.168.0.254/pub/materials/summit
curl -O $SUMMIT/init_storage.sh
bash init_storage.sh 2

Step 2: Add peer nodes

Choose one node to operate from. On that node, use the gluster peer probe command to peer with the remaining 3 nodes. Use the gluster peer status command to confirm your progress.

Step 3: Create and start a distributed volume

On one of your nodes, use the gluster volume create command to create a distributed volume name gvola which consists of the bricks provided by gnode1 and gnode2.

Use the gluster volume info command to monitor your progress.

Step 4: Mount the distributed volume

  1. On gnode4, create the /mnt/volumes/gvola mount point. Use the mount command to mount the volume using the native glusterfs filesystem.
  2. Once mounted, copy an arbitrary file into the volume.
  3. Examine the brick directories on gnode1 and gnode2. Is evidence of the file found in one? in both?
  4. How does the capacity of the mounted volume compare to the capacities of the relevant bricks?

Alternative Lab A: Create a replicated volume

  1. On one of your nodes, use the gluster volume create command to create a 2-way replicated volume using gvolrepl which consists of the bricks provided by gnode3 and gnode4.

    Your gluster volume create command will probably look similar to the following:

    [root@gnode4 ~]# gluster volume create gvolrepl replica 2 \
            gnode3:/mnt/bricks/gnode3a gnode4:/mnt/bricks/gnode4a
    
  2. Start your volume.

  3. On gnode4, create the mount point /mnt/volumes/gvolrepl, and mount the volume using the glusterfs native filesystem.

  4. How does the capacity of the mounted volume compare to the capacities of the relevant bricks?

  5. Copy an arbitrary file into the mounted volume.

  6. Examine the brick directories on gnode3 and gnode4. Is evidence of the file found in one? in both?

  7. Using virt-manager, use "Force Stop" to immediatly terminate the gnode3 guest. Is the mounted volume still available on gnode4?

  8. While gnode3 is stopped, copy new arbitrary files into the mounted volume.

  9. Using virt-manager, restart the gnode3 guest. Upon rebooting, monitor the contents of its brick. Can you observe the self-healing as the files copied in the previous step are reconstructed?