Demystifying systemd - Hands On


Introduction


Welcome! This lab consists of several mini-labs that focus on particular areas or components of systemd. Overall, systemd is relatively simple to learn and, thanks to the excellent documentation the project has, even advanced functionality is simple to reference when needed.

Outline

NOTE: There are several options to configure this lab

  1. Local VM running your favorite distro.

Note: The systemd specific parts of this lab will work on any distribution that ships a recent version of systemd. The parts of this lab that are *not* systemd specific were written using Fedora 24. You are welcome to use any distro, but if you do use an alternative option, you’re responsible for translating things like rpm/dnf, package names, etc.

  1. Fedora cloud images are available here: https://getfedora.org/en/cloud/download/
  2. Simple script to create a VM w/ libvirt

  1. Use Vagrant to accomplish the same thing
  1. Fedora libvirt/KVM Image: https://download.fedoraproject.org/pub/fedora/linux/releases/24/CloudImages/x86_64/images/Fedora-Cloud-Base-Vagrant-24-1.2.x86_64.vagrant-libvirt.box
  2. Fedora Virtualbox Image: https://download.fedoraproject.org/pub/fedora/linux/releases/24/CloudImages/x86_64/images/Fedora-Cloud-Base-Vagrant-24-1.2.x86_64.vagrant-virtualbox.box 

  1. Use a systemd-nspawn container (note that a few sections of this lab won’t be possible w/ this option)
  1. Debian
  1. debootstrap --arch=amd64 stable ~/DebianJessie
  2. systemd-nspawn -bD ~/DebianJessie
  1. Fedora
  1. sudo dnf --releasever=24 --installroot=/var/lib/machine/f24 install systemd
  2. passwd dnf fedora-release
  3. systemd-nspawn -bD /var/lib/machine/f24 passwd
  4. systemd-nspawn -bD /var/lib/machine/f24

Initial Setup:

Once you’ve chosen your environment make sure your system has the following:

  1. Add your non-root account to /etc/sudoers
  2. Verify that you are running a version systemd >= v208.

        ps -p1; systemctl --version

Lab 1: Basic Concepts & Components


Systemctl is the primary tool for interacting with systemd. Although the `service` and `chkconfig` commands still function it is highly recommended to switch to systemctl to take advantage of the full capabilities.

        Scenario: Discover how this system is setup and configure a basic LAMP server.

  1. First, let’s make sure bash-completion is installed. This will give us a much nicer environment for working with systemctl. `rpm -qa |grep bash`  if not present install via: `sudo dnf -y install bash-completion` If it was not installed you’ll need to logout and back into your terminal.

  1. With the shell configured, let’s take a look at the system boot stats.

  1. Let’s identify the default target, which is similar to the concept of a runlevel.

  1. Let’s look at what’s installed and running on the system.
  1. Open a terminal and run `systemctl status`. This shows a nice summary of the system.
  2. Just running `systemctl` w/ no verbs or arguments will show the state of every unit on the system. This can be a bit much to look at on some systems.
  3. Let’s filter it by services: `systemctl -t service`. This is similar to `chkconfig --list` except it will only show the loaded units, or basically which services are currently running.

  1. Run `systemctl list-unit-files -t service` for a more complete list of installed unit files. This will include unit files that are not currently loaded. Compare the output of this command with the previous one.

  1. Now that we have a good idea of what’s installed on our systems, let’s get a basic lamp stack running.

  1. Next, let’s set these to start on boot.

  1. Now start the services: `sudo systemctl start httpd mariadb` and view their status: `systemctl status httpd mariadb`. 

**Notice we can see the service is “enabled” to start on boot, current status, the main and worker PIDs, cgroup hierarchy, and the last 10 log messages.  

**It’s also worth noting that passing -H [hostname] will execute these commands on remote systems via SSH, e.g. `systemctl -H dns1.mydomain.org status named`

  1. Run `systemctl -t service. Now we can see both httpd & mariadb in the list now.

  1. Let’s check the dependencies of httpd: `systemctl list-dependencies httpd`
  1. Now compare that w/ `systemctl list-dependencies mariadb`
  2. diff <(systemctl list-dependencies httpd) <(systemctl list-dependencies mariadb)
  3. Why is this interesting?
  4. Let’s examine the [Unit] & [Install] sections of these units `systemctl cat httpd mariadb`
  5. Which unit will be started first?

  1. Now let’s learn the easy way to enable/disable services. Let’s set these units back to their original state. `sudo systemctl preset httpd mariadb`
  1. Now take a look at `systemctl status httpd mariadb`, under “Loaded” compare the status vs the vendor preset.

  1. Another nice trick systemctl gives us is the ability to start/stop & enable/disable in a single command. First stop the running services: `sudo systemctl stop httpd mariadb` and now verify they are not running: `systemctl status httpd mariadb`.
  1. Run `sudo systemctl enable --now httpd mariadb`
  2. Check out the status w/ `systemctl status httpd mariadb`
  3. This can be reversed with `systemctl disable --now httpd mariadb`. Go ahead and run this so the services are prepped for the next lab.


Lab 2: Unit Files


systemd unit files are declarative config files that follow a syntax similar to Windows ini files. They’re simple to create and are incredibly powerful.

Creating Unit files

Scenario: We want to create a few unit files. We’ll first create a simple example unit file followed by a few more that we’ll use in later labs for load testing.

  1. Create a unit file under /etc/systemd/system called helloworld.service

[Unit]

Description=The most impressive unit file of all time.

[Service]

ExecStart=/usr/bin/echo "hello world"

        

  1. Notify systemd of the unit via `systemctl daemon-reload`.

Note: this is safe to run on production systems. There are fringe cases when many thousands of unit files are being used where this can cause a periodic performance hit, but for the vast majority of systems this can be safely run at anytime.

  1. Start the unit with `systemctl start helloworld`. Pretty neat, eh? What did or did not happen?
  2. Next, let’s check the logs to see the output: `journalctl -u helloworld`. We’ll come back to journalctl in a later lab.
  3. Take a look at the unit defaults via `systemctl show helloworld`

Add socket activation for helloworld.service

In this section, we’ll do something slightly more interesting w/ our unit file.

  1. Create /etc/systemd/system/helloworld.socket

[Unit]
Description=activate hello world
[Socket]
ListenStream=8080
[Install]
WantedBy=sockets.target

  1. Notice that the unit name, helloworld, matches w/ both of these units. This unit will tell our system to listen on port 8080 and will activate the helloworld.service w/ a connection.
  2. Notify systemd of the changes via `systemctl daemon-reload`
  3. Run `systemctl start helloworld.socket` (note if this fails disable SELinux via `setenforce 0`)
  4. In a separate terminal tail the journal via `journalctl -f -u helloworld` (do you think you can remember those journalctl options?)
  5. In the original terminal run `curl http://localhost:8080` and watch the output from the second terminal. Why did it stop? Hint run the answer can be found in `systemctl show helloworld`. Don’t cheat yourself!! Find the answer before continuing!!   

Now that you have a renewed sense of self-worth from finding the answer w/ out looking ahead, please continue.

  1. Now pretend you didn’t cheat and raise the StartLimitBurst to something bigger than the default. Experiment with 50 or 500. Note, you will cause a large load on your system if you try w/ 5 million or something so resist the urge for now.
  1. Edit /etc/systemd/system/helloworld.service and under [Service] add StartLimitBurst=50.
  2. Notify systemd of the changes. Remember that command?
  1. Run the curl command again. What happened? Why?
  1. Hint: run `systemctl status helloworld.socket`
  1. Restart helloworld.socket and run the curl statement again. Check the logs in the other terminal. How many times was helloworld.service started? Pretty slick.

Create the benchmark units

Unit files also have a powerful syntax for handling start order and dependencies. We’ll leverage some of this for our benchmarks.

        

  1. Create ab-bench.service

[Unit]

Description=Benching httpd test.

After=httpd.service

Wants=httpd.service

[Service]

Type=forking

ExecStart=/usr/bin/sh -c "/usr/bin/ab -c 100 -n 999999 http://localhost/index.html &"

  1. Create mariadb-bench.service

[Unit]

Description=Benching mariadb test.

After=mariadb.service

Wants=mariadb.service

[Service]

Type=forking

ExecStart=/usr/bin/sh -c 'cd /usr/share/sql-bench; ./run-all-tests --server=mysql --cmp=mysql,pg,solid --user=root &'

  1. Let’s look at the dependencies of these units for a minute.
  1. “Wants” tells systemd that IF httpd & mariadb are installed to start them when these units are run.
  2. “After” means to start these units after httpd & mariadb are running.  Let’s test this.
  3. The first time mariadb starts it takes a few seconds to start while it initializes. Let’s get that out of the way via `systemctl start mariadb; sleep 20 ; systemctl stop mariadb`
  4. After notifying systemd of our new unit files, run `systemctl start mariadb-bench` and check the status of mariadb & mariadb-bench
  5. Do the same for httpd & ab-bench. What was the result? What was different? Why did this happen? If it fails to start the first time, run `systemctl start ab-bench` a second time and see if it runs.

  1. Create userjob1.service  (adjust the User to match a non-root user on your system)

[Unit]
Description=Lame example user workload.

[Service]

Type=forking

User=cloud-user

Slice=user.slice

ExecStart=/usr/bin/sh -c "/usr/bin/cat /dev/zero > /dev/null &"


Systemd tracks a wide range of options that can be customized per unit. Using drop-in configuration files it’s simple to customize and tune services.

Customizing Unit files

Scenario: We want to configure our VM so it is resilient and monitors the status of key services.

  1. It’s important to understand a few basics about file locations:

  1. Let’s look at a unit file to get an idea of what a they look like:

  1. The above unit file for httpd instructs systemd when to start it in the boot processes, the type of service, commands to start, stop, and reload, as well as the target to install the unit in. This is the type of information needed to write your own unit files, but systemd provides many more options.

  1. While the defaults for unit files are sensible and won’t need to be altered most of the time, there will be circumstances where tweaking these is quite beneficial. These could include hardware or software watchdog monitoring, tunings, resource management, or many other reasons.

  1. We’ll modify the httpd unit with a drop-in configuration file.

  1. Open /etc/systemd/system/httpd.service.d/50-httpd.conf in your favorite editor. Which is obviously vi, rather than emacs.   ;)

[Service]
Restart=always

OOMScoreAdjust=-1000

StartLimitAction=reboot

Nice=-10

  1. After making these changes we need to notify systemd via `systemctl daemon-reload`
  2. Verify that systemd recognizes our “Drop-in” via `systemctl status httpd`

  1. The changes will also be reflected in `systemctl show httpd` but we can easily filter these via: `systemctl show -p Restart httpd` or `systemctl show -p StartLimitAction httpd`

  1. Open a second terminal and run `watch -n 1 systemctl status httpd`, note the uptime under “Active”, then from your original terminal run `sudo pkill httpd`. …..pretty neat huh?

  1. Remember we also set StartLimitAction=reboot; this will prevent service restart loops. Use the up arrow to quickly run `sudo pkill httpd` 7 times. What happened?

  1. Now we need to do something similar for mariadb.
  1. First quickly examine the unit file with `systemctl cat mariadb`
  2. Now run `systemctl edit mariadb`
  3. In the editor add:

[Service]

Restart=always

  1. Save and exit the editor.
  1. Verify that systemd recognizes the settings with `systemctl show -p Restart mariadb`
  2. Start mariadb if it’s not already running and test the policy using `sudo pkill mysqld`

Lab 3: Leveraging cgroups with systemd


systemd will setup a single root cgroup hierarchy under /sys/fs/cgroup/systemd. This is used to keep track of services and their child processes. We’ll need to understand the following unit types.

By default, the system will have three slices: system, user, and machine. System.slice is the default location for service units. The services that ship will RHEL, e.g. httpd and mariadb, will be located in this slice by default. User.slice contains all user sessions and processes. The machine.slice contains VMs and containers, and isn’t created until one or more of these are started.  Each of these slices are directly under the root slice and get equal scheduler time while the system is under contention. The default configuration prevents processes from any slice from monopolizing the entire system. It also goes a step further and prevents the same thing within each slice and scope. Previously scheduling time was done per thread, and services with a disproportional amount of processes received a disproportionate amount of CPU time.

By default slices, scopes, and services default to CPUShares=1024.  This will work perfectly for most workloads out of the box.

For more information see: http://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/

Scenario: Become familiar with systemd’s tools for viewing and allocating resources at a service.

  1. Run systemd-cgls to view the default hierarchy on our VM.

  1. Open a second terminal and run systemd-cgtop. Leave this running for the remainder of this section of the lab. This is always a good idea and will help prevent SFTHD, aka shooting from the hip disorder.

  1. Let’s put some load on apache, and monitor it. From the original terminal run `systemctl start ab-bench`

  1. We need to enable accounting for the cgroup controllers that we want to tune or monitor. The available high-level controllers, as of recent versions of systemd, are CPUAccounting, MemoryAccounting, TaskAccounting and BlockIOAccounting. Let’s enable three for httpd.

[Service]

CPUAccounting=1

MemoryAccounting=1

BlockIOAccounting=1

  1. Now let’s run our benchmark again on the system. Run `systemctl start ab-bench` and monitor the output of the other terminal.
  1. Note: The number of CPUs on the system will vary the percentage CPU split between the user & system slice. Why?
  2. Play around with enabling/disabling CPUs: echo [0|1] >

  1. Let’s enable the same controllers for mariadb.

CPUAccounting=1

MemoryAccounting=1

BlockIOAccounting=1

  1. Now we’re going to put the whole system under load. Run `systemctl start mariadb-bench ab-bench userjob1`

  1. With the system under load and cgroup accounting enabled, we can now start tweaking the cgroups.

***TIP*** In step 8 we’re making drastic changes to show how this works. It’s recommended to make smaller, incremental changes when doing this in real life. Instead of jumping all the way to 3000, start with something closer to 1200 and monitor the environment. If you need more CPU time on the scheduler, move to 1400, etc. Also note that cgroup accounting is expensive and should be disabled after the tuning is complete.

  1. Let’s promote httpd to it’s own slice. This is useful if you have an application or service that you want to guarantee gets significantly more scheduler time than other user or system processes/services.

Slice=apache.slice        #<---remember the ‘S’ in Slice is case sensitive.

  1. Did that do what you expected it would?

  1.  Once the settings are configured appropriately for your use case and workload, drop the --runtime option to persist the settings.

Lab 4: The Journal

Logging in the 21st Century


systemd includes a powerful logging component called the Journal. The systemd-journald service captures all logging messages with metadata, indexes them, and exposes them to the user via a utility called journalctl. In Red Hat Enterprise Linux 7 the journal is not persistent by default and forwards all messages to rsyslog. Rsyslog is the default logging daemon because of the rich enterprise features it provides. That said, the journal provides an amazing toolset and can be leveraged in parallel or in place of rsyslog.

Scenario: Become familiar w/ the journal and learn how to filter it.

  1. Run `journalctl` and page through using the spacebar. Notice that any warning messages (technically notice & warnings, priorities 4 & 5) are bold and errors are red. The formatted text will instantly draw your eyes to messages at these priorities.

  1. Experiment with the following filters on the journal. Run journalctl followed each of the following options (or a mixture of options):

-r

Reverse the journal; newest entries first

-e

Show last 1000 lines

-b

View messages since last boot (-b -1 previous boot)

-u

Filter by a unit file, e.g journalctl -u httpd.service

-k

show kernel messages, basically dmesg

--since=DATE

YYYY-MM-DD 00:00:00

--until=DATE

YYYY-MM-DD 00:00:00

*Notice that by default the output is formatted or “paged” with less. Don’t forget that you can use the right and left arrows to scroll horizontally on smaller terminals. This can be disabled by passing the --no-pager option.

  1. Non-root users will need to be in the systemd-journal group to access to the system’s journal.

  1. On some distributions the journal writes to a ring buffer in memory. To enable persistence simply run: `mkdir /var/log/journal` and restart the systemd-journald.

  1. The journal stores metadata around each entry. Let’s take a look at what’s stored and indexed. Run `journalctl -n 1 -o verbose`

  1. Next let’s use shell completion to help us filter the journal.

  1. Step 6 helped us quickly identify what’s currently in the journal. If there’s a binary you want to filter by simply append it. Just like with systemctl we can “glob” files and the output will be interleaved. This is incredibly useful for seeing how services, applications, users, etc interact with each other.

  1. We can put these filters together to quickly view specific & relevant logs quickly.

Sealing the journal.

Forward Secure Sealing (FSS) is used to ensure the integrity of the journal. “Sealing” keys are automatically regenerated (15 min by default) and used to cryptographically seal the logs. A separate verification key is used to validate the contents. This prohibits an attacker from altering the log’s history on a system. For details on FSS please see: https://eprint.iacr.org/2013/397.pdf

  1. Run `journalctl --setup-keys` As the name implies, this command will create the key pair used to seal the journal. It is important not to store the validation key on the local system. The QR code makes it simple to transfer.

journal.png

  1. We can now verify the journal’s integrity using journalctl --verify-key=[secret-verification-key]

For more information on using the journal please visit: http://0pointer.de/blog/projects/journalctl.html

Lab 5: (optional) Self-paced Learning


With the remainder of time, pick a component of systemd that you’re least familiar with and do some self-study. This could be anything from reading man pages, blog posts, or experimenting with a live system. Anything goes!


References:

Systemd project page:

http://www.freedesktop.org/wiki/Software/systemd

Lennart Poettering's systemd blog entries (read them all)

http://0pointer.de/blog/projects/systemd-for-admins-1.html

Systemd FAQ

http://www.freedesktop.org/wiki/Software/systemd/FrequentlyAskedQuestions/

Tips & Tricks

http://www.freedesktop.org/wiki/Software/systemd/TipsAndTricks/

Appendix

Libvirt create VM script: (please tweak for your environment)

IMAGE=./Fedora-Cloud-Base-24-1.2.x86_64.qcow2

#create meta-data
{ echo instance-id: node0; echo local-hostname: node0.example.com; } > meta-data


#create simple user-data w/ passwords fedora
printf '#cloud-config\nusers: \n  - default\n  - name: cloud-user\n    sudo: ["ALL=(ALL) NOPASSWD:ALL"]\n    groups: wheel,adm,systemd-journal\n    ssh_pwauth: True\nchpasswd:\n  list: |\n    root:fedora\n    cloud-user:fedora\n  expire: False\n' > user-data

#create iso
genisoimage -output cloudinit.iso -volid cidata -joliet -rock user-data meta-data

#create snapshot
sudo qemu-img create -f qcow2 -b $IMAGE $IMAGE.snap.qcow2

#create VM
sudo virt-install -v \
--name f24-systemd \
--os-variant fedora23 \
--ram 2048 \
--vcpus 2 \
--disk bus=virtio,path=$IMAGE.snap.qcow2 \
--disk device=cdrom,path=cloudinit.iso \
-w network=default,model=virtio \
--noautoconsole \
--boot hd