name: inverse class: left, middle, bg layout: true --- layout: true class: top, left, bg --- template: inverse class: center, middle # SELinux in the containerized world Jan Šafránek, jsafrane@redhat.com
jsafrane --- # Agenda * Brief SELinux intro * Containers * Kubernetes * OpenShift --- template: inverse class: center, middle # Linux without SELinux --- # Example: Apache web server without SELinux
--- # Example: Apache web server without SELinux * ... until a CVE like "*[CVE-2023-4911](https://nvd.nist.gov/vuln/detail/CVE-2023-4911) could allow a local attacker [...] to execute code with elevated privileges*."
--- template: inverse class: center, middle # Security-Enhanced Linux (SELinux) --- # Example: Apache web server with SELinux
Policy: ``` allow httpd_t httpd_sys_content_t:file { getattr ioctl lock map open read }; ``` --- template: inverse class: center, middle # Containers --- # Containers * Runs processes in isolated environment. -- * ... until they don't: * CVE-2015-3629 * CVE-2015-3627 * CVE-2015-3630 * CVE-2015-3631 * CVE-2016-9962 * ... --- template: inverse class: center, middle # SELinux in containers --- # SELinux in containers ## Default: Multi-Category Security (MCS)
--- # SELinux in containers ## Default: Multi-Category Security (MCS)
--- # SELinux in containers ## Default: Multi-Category Security (MCS)
*What happens in a container stays in the container.* --- # SELinux in containers ## `podman`, `cri-o` * SELinux support enabled by default. ## `docker`, `containerd` * SELinux support must be explicitly enabled. --- # Container volumes * Containers are ephemeral. ``` $ podman run mongo ``` -- * ... except for *volumes*. ``` $ mkdir /mnt/my-mongo $ mount /dev/sdb /mnt/my-mongo $ podman run -v /mnt/my-mongo:/data/db mongo ``` --- # SELinux in container volumes ``` $ podman run -v /mnt/my-mongo:/data/db mongo find: '/data/db': Permission denied chown: changing ownership of '/data/db': Permission denied ``` --- # SELinux Container volumes relabeling `docker / podman run -v
:
:
` * Empty options: no relabeling at all. * `:Z` - apply a *private* label, with all categories (`container_file_t:s0:c634,c767`). * Only a process with the same categories can access it (= the same container). * `:z` - apply a *shared* label, without any categories (`container_file_t:s0`). * Any container can access it. ``` $ podman run -v /mnt/my-mongo:/data/db:Z mongo ... "msg":"MongoDB starting" ... ``` --- # SELinux Container volumes relabeling ``` $ podman run -v /mnt/my-mongo:/data/db:Z mongo ... "msg":"MongoDB starting" ... ``` 1. Podman allocates random + unique SELinux categories, say `c1,c2`. 2. Podman upacks `mongo` image + relabels it to `container_file_t:s0:c1,c2`. 3. Podman re-labels `/mnt/my-mongo` to `container_file_t:s0:c1,c2`. 4. Podman runs the main mongo process as `container_t:s0:c1,c2`. -- *Every time the container starts.* --- # SELinux container pitfalls. * `:z`/`:Z` relabels the whole volume. * Can be slow. -- * `:z`/`:Z` relabels the whole volume. * `docker run -v /etc/shadow:/host/shadow:Z busybox sh` -- * `docker run -v /:/host:Z busybox sh` -- * Use rootless `docker` / `podman`. -- * Avoid using `:z`/`:Z` by providing SELinux policy for your container * *`mongo` container can access `/mnt/my-mongo` on the host.* * `podman`: `udica`, [Using SELinux with container runtimes](https://www.youtube.com/watch?v=FOny29a31ls). * Kubernetes/OpenShift: [Security Profiles Operator](https://github.com/kubernetes-sigs/security-profiles-operator). * `HostPath` volumes only. --- template: inverse class: center, middle # SELinux in Kubernetes --- # SELinux in Kubernetes * Simple optional API to set SELinux label. ``` ... containers: - name: web image: nginx:latest securityContext: seLinuxOptions: type: "my_label_t" level: "s0:c1,c2" ``` --- # SELinux in Kubernetes * Defaults to container runtime defaults. * -> Random pair of MCS categories. -- * All\* container volumes have *private* label. * No API to set shared label. * No API to not to relabel. * Sharing volumes among containers: **hard**. * All containers that need access the same volume must have the same SELinux label. * Policy engine *could* help (Gatekeeper, Kyverno, ...) *) Except for HostPath volumes. -- * Constant relabeling on container startup. --- # SELinux in OpenShift * Only `cri-o`, SELinux `Enforcing`. * Security Context Constraints, SCC, as a "policy engine". --- # Security Context Constraints, SCC * Defaults in namespace annotations. * Generated automatically. * Each namespace has unique MCS categories. ``` apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/sa.scc.mcs: s0:c55,c20 ``` --- # Security Context Constraints, SCC ``` apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/sa.scc.mcs: s0:c55,c20 ``` * SCC `restricted-v2` (and most of the others): * Pod *must* run with the label from namespace. * All pods in the same namespace have the same SELinux label. * *What happens in a namespace stays in the namespace.* * Sharing volumes within a namespace: **easy**. * Sharing volumes among namespaces: **hard**. -- * SCC `privileged`: * No SELinux label injection. * Pod author must explicitly provide one. * `cri-o` will allocate a random one if not provided! --- # Kubernetes vs OpenShift ## Kubernetes * Sharing volumes within namespace: **hard**. * Sharing volumes among namespaces: **hard**. ## OpenShift * Sharing volumes within namespace: **easy**. * Sharing volumes among namespaces: **hard**. --- ## `podman` / `docker` User can choose: * No volume relabeling. * *I will label files by myself and/or I will provide my own policy.* * Apply shared label. * *Any container* can access the volume. * Apply private label. * *No other container* can access the volume. ## Kubernetes / OpenShift * Always applies private label on volumes. * Except for `HostPath`, [Security Profiles Operator](https://github.com/kubernetes-sigs/security-profiles-operator). * See later for workarounds. --- # Constant relabeling ## Use case: Jenkins 1. Alice installs Jenkins with (empty) PersistentVolume for test artifacts (logs, configs, dumps ...). * -> Instant relabeling. 2. Every Jenkins job leaves 20 test artifacts per job. * -> They get the right SELinux label on creation. 3. Jenkins runs 20 test jobs / hour. 4. After half a year, Alice updates Jenkins to a new version. * -> cri-o needs to relabel 1.7 million of files! * -> Jenkins Pod is not running. * -> Alice calls support. * -> /me is not happy. --- # Constant relabeling workaround ## Use `spc_t` * Must be set explicitly in Pod. * Kubernetes: allowed by default. * OpenShift: needs elevated privileges. * No relabeling. * No SELinux protection. --- # Constant relabeling workaround ## Skip relabeling if already done * `cri-o` only. * Requires custom `RuntimeClass`. * Requires custom cri-o config file. * Requires annotation `io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel: "true"` on a Pod. * Does the volume root dir have the right label? * No -> Relabel everything. * Yes -> NOOP. --- # Constant relabeling solution ## [Mount with SELinux label](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling) `mount -o context="container_file_t:s0:c1,c99"` * Requires Kubernetes with the feature implemented. * 1.29: `SELinuxMountReadWriteOncePod` Beta (OpenShift 4.16). * 1.30: `SELinuxMount` Alpha (OpenShift 4.17). * Requires compatible CSI driver. * No changes needed for block-based volumes (ext4, xfs, btrfs, ...) * Some changes needed for shared filesystems (NFS, SMB) * Requires SELinux label set in the Pod. * Kubernetes: label needs to be set explicitly. * OpenShift: default with `restricted` SCC. * May break some use cases. * Two pods sharing the same volume, but different subpaths of it. * Testers wanted! --- # Summary * *What happens in a container stays in the container.* * *Or in a namespace.* * Kubernetes: needs policy engine (or manual configuration). * OpenShift: SCCs are useful! * Beware of relabeling! * Testers of SELinuxMount wanted! ---