Overview

OpenShift Origin clusters can be provisioned with persistent storage using Ceph RBD.

Persistent volumes (PVs) and persistent volume claims (PVCs) can share volumes across a single project. While the Ceph RBD-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.

This topic presumes some familiarity with OpenShift Origin and Ceph RBD. See the Persistent Storage concept topic for details on the OpenShift Origin persistent volume (PV) framework in general.

Project and namespace are used interchangeably throughout this document. See Projects and Users for details on the relationship.

High-availability of storage in the infrastructure is left to the underlying storage provider.

Provisioning

To provision Ceph volumes, the following are required:

  • An existing storage device in your underlying infrastructure.

  • The Ceph key to be used in an OpenShift Origin secret object.

  • The Ceph image name.

  • The file system type on top of the block storage (e.g., ext4).

  • ceph-common installed on each schedulable OpenShift Origin node in your cluster:

    # yum install ceph-common

Creating the Ceph Secret

Define the authorization key in a secret configuration, which is then converted to base64 for use by OpenShift Origin.

In order to use Ceph storage to back a persistent volume, the secret must be created in the same project as the PVC and pod. The secret cannot simply be in the default project.

  1. Run ceph auth get-key on a Ceph MON node to display the key value for the client.admin user:

    apiVersion: v1
    kind: Secret
    metadata:
      name: ceph-secret
    data:
      key: QVFBOFF2SlZheUJQRVJBQWgvS2cwT1laQUhPQno3akZwekxxdGc9PQ==
  2. Save the secret definition to a file, for example ceph-secret.yaml, then create the secret:

    $ oc create -f ceph-secret.yaml
  3. Verify that the secret was created:

    # oc get secret ceph-secret
    NAME          TYPE      DATA      AGE
    ceph-secret   Opaque    1         23d

Creating the Persistent Volume

Ceph RBD does not support the 'Recycle' recycling policy.

Developers request Ceph RBD storage by referencing either a PVC, or the Gluster volume plug-in directly in the volumes section of a pod specification. A PVC exists only in the user’s namespace and can be referenced only by pods within that same namespace. Any attempt to access a PV from a different namespace causes the pod to fail.

  1. Define the PV in an object definition before creating it in OpenShift Origin:

    Example 1. Persistent Volume Object Definition Using Ceph RBD
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: ceph-pv (1)
    spec:
      capacity:
        storage: 2Gi (2)
      accessModes:
        - ReadWriteOnce (3)
      rbd: (4)
        monitors: (5)
          - 192.168.122.133:6789
        pool: rbd
        image: ceph-image
        user: admin
        secretRef:
          name: ceph-secret (6)
        fsType: ext4 (7)
        readOnly: false
      persistentVolumeReclaimPolicy: Retain
    1 The name of the PV that is referenced in pod definitions or displayed in various oc volume commands.
    2 The amount of storage allocated to this volume.
    3 accessModes are used as labels to match a PV and a PVC. They currently do not define any form of access control. All block storage is defined to be single user (non-shared storage).
    4 The volume type being used, in this case the rbd plug-in.
    5 An array of Ceph monitor IP addresses and ports.
    6 The Ceph secret used to create a secure connection from OpenShift Origin to the Ceph server.
    7 The file system type mounted on the Ceph RBD block device.

    Changing the value of the fstype parameter after the volume has been formatted and provisioned can result in data loss and pod failure.

  2. Save your definition to a file, for example ceph-pv.yaml, and create the PV:

    # oc create -f ceph-pv.yaml
  3. Verify that the persistent volume was created:

    # oc get pv
    NAME                     LABELS    CAPACITY     ACCESSMODES   STATUS      CLAIM     REASON    AGE
    ceph-pv                  <none>    2147483648   RWO           Available                       2s
  4. Create a PVC that will bind to the new PV:

    Example 2. PVC Object Definition
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: ceph-claim
    spec:
      accessModes: (1)
        - ReadWriteOnce
      resources:
        requests:
          storage: 2Gi  (2)
    1 The accessModes do not enforce access right, but instead act as labels to match a PV to a PVC.
    2 This claim looks for PVs offering 2Gi or greater capacity.
  5. Save the definition to a file, for example ceph-claim.yaml, and create the PVC:

    # oc create -f ceph-claim.yaml

Ceph Volume Security

See the full Volume Security topic before implementing Ceph RBD volumes.

A significant difference between shared volumes (NFS and GlusterFS) and block volumes (Ceph RBD, iSCSI, and most cloud storage), is that the user and group IDs defined in the pod definition or container image are applied to the target physical storage. This is referred to as managing ownership of the block device. For example, if the Ceph RBD mount has its owner set to 123 and its group ID set to 567, and if the pod defines its runAsUser set to 222 and its fsGroup to be 7777, then the Ceph RBD physical mount’s ownership will be changed to 222:7777.

Even if the user and group IDs are not defined in the pod specification, the resulting pod may have defaults defined for these IDs based on its matching SCC, or its project. See the full Volume Security topic which covers storage aspects of SCCs and defaults in greater detail.

A pod defines the group ownership of a Ceph RBD volume using the fsGroup stanza under the pod’s securityContext definition:

spec:
  containers:
    - name:
    ...
  securityContext: (1)
    fsGroup: 7777  (2)
1 The securityContext must be defined at the pod level, not under a specific container.
2 All containers in the pod will have the same fsGroup ID.