This is my internal ramblings that occurred while learning how to deploy a Red Hat OpenStack Platform 17.0 cluster for NFV integration testing.

Big picture

Here is what I will be working with:

  • 1G access to an external/management network.
  • 10G top of rack switch.
  • 3 Dell r640 servers (labelled as hypervisor, compute1 and compute2).
  • Each server with an Intel X710 4x10G NIC.
  • First port is connected to the 1G management network. The BMCs of compute1 and compute2 are connected to that network.
  • The other three ports are connected to a private 10G switch.
  • Each port is configured as trunk with VLANs 400-409.
  • Ports eno2 are configured with a native VLAN 400 as well. These ports will be used for the control plane network (PXE boot provisioning, openstack services internal traffic, etc.)

Physical topology

This represents the physical layout of machines and switches.

raw-topology.svg

OpenStack roles topology

After reading through the docs and asking the same questions over and over, here is what I ended up with.

  • The undercloud will be a VM running on the hypervisor machine.
  • The controller node will also be a VM running on the same machine. Raw performance is not a concern in this context. There will not be high traffic going through the controller.
  • The other two physical machines will be used as bare-metal compute nodes.
  • The BIOS of both physical machines must be reconfigured to PXE boot on their eno2 interface.

osp-topology.svg

Each of the physical networks will carry multiple logical networks on separate VLANs:

  • management
    • External access (no VLAN)
    • IPMI management (no VLAN)
  • control plane
    • PXE boot provisioning (“flat”, no VLAN or native VLAN 400)
    • Internal API for OpenStack services (trunk VLAN 401)
    • Storage: VM disk access (trunk VLAN 402)
    • Storage “management”: Ceph replication (trunk VLAN 403) not used, but required
  • user plane
    • Tenant networks (trunk VLAN 404)
    • Provider networks (trunk VLANs 405-409)

There will not be any DCN configuration here. It will simplify the network configuration as there will be only one subnet per logical network.

Hypervisor preparation

Firstly, a virtual machine must be started to serve as the undercloud (or “director”). This VM must be connected to the management network and to the control plane network. I’ll create three virtual networks: management, ctlplane and user. The management will replace libvirt’s default network.

# disable avahi daemon as it messes up with network manager somehow
systemctl disable --now avahi-daemon.socket
systemctl disable --now avahi-daemon.service

# prepare bridges for libvirt networks
nmcli connection del eno2 eno3 br-ctrl br-user
for br in br-ctrl br-user; do
    nmcli connection add con-name $br type bridge ifname $br \
        ipv4.method disabled ipv6.method disabled
done
nmcli connection add con-name eno2 type ethernet ifname eno2 master br-ctrl
nmcli connection add con-name eno3 type ethernet ifname eno3 master br-user
for c in eno2 eno3 br-ctrl br-user; do
    nmcli connection up $c
done

dnf install libvirt
systemctl enable --now libvirtd.socket

# no DHCP on the management network
# we need to have a static IP for the undercloud
cat > /tmp/management.xml <<EOF
<network>
  <name>management</name>
  <ip address="192.168.122.1" prefix="24"/>
  <bridge name="br-mgmt"/>
  <forward mode="nat" dev="eno1"/>
</network>
EOF
cat > /tmp/ctlplane.xml <<EOF
<network>
  <name>ctlplane</name>
  <bridge name="br-ctrl"/>
  <forward mode="bridge"/>
</network>
EOF
cat > /tmp/user.xml <<EOF
<network>
  <name>user</name>
  <bridge name="br-user"/>
  <forward mode="bridge"/>
</network>
EOF

# remove libvirt default network
virsh net-destroy --network default
virsh net-undefine --network default

for net in management ctlplane user; do
    virsh net-define "/tmp/$net.xml"
    virsh net-autostart $net
    virsh net-start $net
done

Now is the time to start the undercloud VM. We’ll use a default RHEL 9 guest image as a starting point.

dnf install -y guestfs-tools virt-install

# download the latest RHEL 9.0 image
url="http://download.eng.brq.redhat.com/rhel-9/rel-eng/RHEL-9/latest-RHEL-9.0/compose/BaseOS/x86_64/images"
curl -LO "$url/SHA256SUM"
qcow2=$(sed -nre 's/^SHA256 \((rhel-guest-image.+\.qcow2)\) =.*/\1/p' SHA256SUM)
curl -LO "$url/$qcow2"
sha256sum -c --ignore-missing SHA256SUM
mv -v "$qcow2" /var/lib/libvirt/images/rhel-guest-image-9.0.qcow2

# create an empty image for the undercloud
undercloud_img=/var/lib/libvirt/images/undercloud.qcow2
qemu-img create -f qcow2 $undercloud_img 80G
# copy the default RHEL image into it (expanding the main partition)
virt-resize --expand /dev/sda4 \
    /var/lib/libvirt/images/rhel-guest-image-9.0.qcow2 $undercloud_img

# assign a static IP address to the interface connected to br-mgmt
undercloud_net="nmcli connection add type ethernet ifname eth0 con-name mgmt"
undercloud_net="$undercloud_net ipv4.method static"
undercloud_net="$undercloud_net ipv4.address 192.168.122.2/24"
undercloud_net="$undercloud_net ipv4.gateway 192.168.122.1"
dns_servers=$(sed -nre 's/^nameserver //p' /etc/resolv.conf | xargs echo)
undercloud_net="$undercloud_net ipv4.dns '$dns_servers'"

# customize the image
# the VM must *NOT* be called "director" or "director.*". It will interfere with
# the undercloud deployment which alters /etc/hosts on the undercloud vm. This causes
# issues during deployment.
virt-customize -a $undercloud_img --smp 4 --memsize 4096 \
    --hostname dirlo.redhat.local --timezone UTC \
    --uninstall cloud-init \
    --run-command "useradd -s /bin/bash -m stack" \
    --write "/etc/sudoers.d/stack:stack ALL=(root) NOPASSWD:ALL" \
    --chmod "0440:/etc/sudoers.d/stack" \
    --password stack:password:stack \
    --ssh-inject "stack:string:$(curl -L https://meta.sr.ht/~rjarry.keys)" \
    --firstboot-command "$undercloud_net" \
    --firstboot-command "nmcli connection up mgmt" \
    --selinux-relabel

# define and start the virtual machine
virt-install --ram 32768 --vcpus 8 --cpu host --os-variant rhel9.0 --import \
    --graphics none --autoconsole none \
    --disk "path=$undercloud_img,device=disk,bus=virtio,format=qcow2" \
    --network network=management \
    --network network=ctlplane \
    --name undercloud-director

Next, we need to define (not start) the controller VM. Note that we specify a fixed MAC address for the ctlplane network as this will be required for PXE boot provisioning later on.

qemu-img create -f qcow2 /var/lib/libvirt/images/controller.qcow2 60G
virt-install --ram 16384 --vcpus 2 --cpu host --os-variant rhel9.0 \
    --graphics none --autoconsole none \
    --disk "path=/var/lib/libvirt/images/controller.qcow2,device=disk,bus=virtio,format=qcow2" \
    --network network=ctlplane,mac=52:54:00:ca:ca:01 \
    --network network=user \
    --name overcloud-controller \
    --dry-run --print-xml > controller.xml
virsh define --file controller.xml

Since the controller will be a virtual machine, we need to start a virtual BMC for the undercloud to control. The vbmc name must match the virtual machine name that we created previously. This VM will be started via IPMI by the undercloud when actually running the OpenStack deployment.

dnf install python3-pip
pip3 install virtualbmc

# start the virtual bmc daemon
vbmcd
# define and start one virtual BMC on port 6230
vbmc add overcloud-controller --port 6230 --username admin --password admin
vbmc start overcloud-controller

Undercloud installation

The undercloud VM should now be running and be accessible via ssh on 192.168.122.2. Since I installed my public key for the stack user, I can login without a password.

[root@hypervisor ~]# ssh stack@192.168.122.2
The authenticity of host '192.168.122.2 (192.168.122.2)' can't be established.
ED25519 key fingerprint is SHA256:FvbUQg7EtQofWyZDzfJXmw1Lm/ZT3m4FjZarngrvUtE.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.122.2' (ED25519) to the list of known hosts.
Register this system with Red Hat Insights: insights-client --register
Create an account or view all your systems at https://red.ht/insights-dashboard
[stack@dirlo ~]$

Once connected, the first thing is to enable some repositories where to download packages, update the system and install some utilities. From now on, everything will be done with the stack user to emphasize what commands need elevated privileges.

# as stack@dirlo

# Install red hat internal certificate. This is required to use the internal container
# images registry without declaring it as "insecure".
sudo curl -Lo /etc/pki/ca-trust/source/anchors/RH-IT-Root-CA.crt \
    https://password.corp.redhat.com/RH-IT-Root-CA.crt
sudo update-ca-trust extract

# enable OSP repositories
sudo dnf install -y \
    http://download.eng.brq.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm
sudo rhos-release 17.0
sudo dnf update -y
sudo dnf install -y vim

# only required if there was a kernel upgrade
# after reboot is complete, ssh stack@192.168.122.2 back in the undercloud vm
sudo systemctl reboot

Now, we need to install the TripleO client (this takes a bit of time).

# as stack@dirlo
sudo dnf install -y python3-tripleoclient

Next, we should prepare a configuration file that will describe where to find the container images and what are their names and versions:

# as stack@dirlo
openstack tripleo container image prepare default \
    --local-push-destination \
    --output-env-file ~/containers-prepare-parameter.yaml

Now the ~/containers-prepare-parameter.yaml file must be edited by hand to tune the exact OSP version and image registries that we will use. The default file points on the official GA version and on the external registry servers. Since we are running tests on development versions in Red Hat’s internal network, this will be adjusted based on this file provided for reference.

# ~/containers-prepare-parameter.yaml
---
parameter_defaults:
  ContainerImagePrepare:
    - push_destination: true  # download images in the undercloud local registry
      set:
        # red hat internal registry
        namespace: registry-proxy.engineering.redhat.com/rh-osbs
        # development container images have different names
        name_prefix: rhosp17-openstack-
        name_suffix: ''
        tag: '17.0_20220908.1'
        # the rest is only used for overcloud deployment, the same file is reused
        neutron_driver: ovn
        rhel_containers: false
        ceph_namespace: registry-proxy.engineering.redhat.com/rh-osbs
        ceph_image: rhceph
        ceph_tag: 5-287
        ceph_alertmanager_image: openshift-ose-prometheus-alertmanager
        ceph_alertmanager_namespace: registry-proxy.engineering.redhat.com/rh-osbs
        ceph_alertmanager_tag: v4.10
        ceph_grafana_image: grafana
        ceph_grafana_namespace: registry-proxy.engineering.redhat.com/rh-osbs
        ceph_grafana_tag: latest
        ceph_node_exporter_image: openshift-ose-prometheus-node-exporter
        ceph_node_exporter_namespace: registry-proxy.engineering.redhat.com/rh-osbs
        ceph_node_exporter_tag: v4.10
        ceph_prometheus_image: openshift-ose-prometheus
        ceph_prometheus_namespace: registry-proxy.engineering.redhat.com/rh-osbs
        ceph_prometheus_tag: v4.10
      tag_from_label: '{version}-{release}'

Next, we need to copy the undercloud.conf template file containing all settings that are used to initialize the undercloud system.

[stack@dirlo ~]$ cp /usr/share/python-tripleoclient/undercloud.conf.sample ~/undercloud.conf

The template file contains a lot of settings. Here is a stripped down version with only the ones we are concerned about:

# ~/undercloud.conf
[DEFAULT]

# If not specified, these settings will default to the system's current values.
# I only repeated them here for clarity.
undercloud_hostname = dirlo.redhat.local
undercloud_timezone = UTC

# Network interface on the Undercloud that will be handling the PXE
# boots and DHCP for Overcloud instances. (string value)
# XXX: this is the interface connected to br-ctrl
local_interface = eth1

# IP address of eth1 along with network mask
local_ip = 172.16.0.1/24

# Virtual IP or DNS address to use for the public endpoints of
# Undercloud services. Only used with SSL. (string value)
# (not used, only to avoid install failure)
undercloud_public_host = 172.16.0.10

# Virtual IP or DNS address to use for the admin endpoints of
# Undercloud services. Only used with SSL. (string value)
# (not used, only to avoid install failure)
undercloud_admin_host = 172.16.0.11

# these are using Red Hat internal servers, adjust to taste
undercloud_nameservers = 10.38.5.26,10.11.5.19
undercloud_ntp_servers = clock.redhat.com,clock2.redhat.com

# DNS domain name to use when deploying the overcloud. The overcloud
# parameter "CloudDomain" must be set to a matching value. (string
# value)
overcloud_domain_name = localdomain

# REQUIRED if authentication is needed to fetch containers. This file
# should contain values for "ContainerImagePrepare" and
# "ContainerImageRegistryCredentials" that will be used to fetch the
# containers for the undercloud installation. `openstack tripleo
# container image prepare default` can be used to provide a sample
# "ContainerImagePrepare" value. Alternatively this file can contain
# all the required Heat parameters for the containers for advanced
# configurations. (string value)
# XXX: this is the file we generated in the previous step
container_images_file = /home/stack/containers-prepare-parameter.yaml

# List of routed network subnets for provisioning and introspection.
# Comma separated list of names/tags. For each network a section/group
# needs to be added to the configuration file with these parameters
# set: cidr, dhcp_start, dhcp_end, inspection_iprange, gateway and
# masquerade_network. Note: The section/group must be placed before or
# after any other section. (See the example section [ctlplane-subnet]
# in the sample configuration file.) (list value)
subnets = ctlplane-subnet

# Name of the local subnet, where the PXE boot and DHCP interfaces for
# overcloud instances is located. The IP address of the
# local_ip/local_interface should reside in this subnet. (string
# value)
local_subnet = ctlplane-subnet

[ctlplane-subnet]

# Network CIDR for the Neutron-managed subnet for Overcloud instances.
# (string value)
cidr = 172.16.0.0/24

# Network gateway for the Neutron-managed network for Overcloud
# instances on this network. (string value)
gateway = 172.16.0.1

# Start of DHCP allocation range for PXE and DHCP of Overcloud
# instances on this network. (list value)
dhcp_start = 172.16.0.20

# End of DHCP allocation range for PXE and DHCP of Overcloud instances
# on this network. (list value)
dhcp_end = 172.16.0.120

# Temporary IP range that will be given to nodes on this network
# during the inspection process. Should not overlap with the range
# defined by dhcp_start and dhcp_end, but should be in the same ip
# subnet. (string value)
inspection_iprange = 172.16.0.150,172.16.0.180

# The network will be masqueraded for external access. (boolean value)
masquerade = true

Now that these files are available, the undercloud installation can be started. This is a long process which should take around 20 minutes.

[stack@dirlo ~]$ openstack undercloud install
...
The Undercloud has been successfully installed.

Useful files:

Password file is at /home/stack/tripleo-undercloud-passwords.yaml
The stackrc file is at ~/stackrc

Use these files to interact with OpenStack services, and
ensure they are secured.

Let’s check that what containers are running for undercloud services:

[stack@dirlo ~]$ sudo podman ps --format 'table {{.Names}}\t{{.Status}}'
NAMES                                                       STATUS
memcached                                                   Up 3 hours ago (healthy)
haproxy                                                     Up 3 hours ago
rabbitmq                                                    Up 3 hours ago (healthy)
mysql                                                       Up 3 hours ago (healthy)
iscsid                                                      Up 3 hours ago (healthy)
keystone                                                    Up 3 hours ago (healthy)
keystone_cron                                               Up 3 hours ago (healthy)
logrotate_crond                                             Up 3 hours ago (healthy)
neutron_api                                                 Up 3 hours ago (healthy)
ironic_api                                                  Up 3 hours ago (healthy)
neutron_ovs_agent                                           Up 3 hours ago (healthy)
neutron_l3_agent                                            Up 3 hours ago (healthy)
neutron_dhcp                                                Up 3 hours ago (healthy)
ironic_neutron_agent                                        Up 3 hours ago (healthy)
ironic_conductor                                            Up 3 hours ago (healthy)
ironic_pxe_tftp                                             Up 3 hours ago (healthy)
ironic_pxe_http                                             Up 3 hours ago (healthy)
ironic_inspector                                            Up 3 hours ago (healthy)
ironic_inspector_dnsmasq                                    Up 3 hours ago (healthy)
neutron-dnsmasq-qdhcp-0bddcc19-d0bb-41a9-84d2-af8935e0cd37  Up 2 hours ago

With the undercloud installation now finished, the next step is to actually deploy the overcloud.

Overcloud planning

This is the most confusing part so far. The official documentation contains a lot of information and I had to cherry-pick some essential parts (which are often buried in a sub-sub chapter) to come up with a coherent and usable setup.

Also, the order in which the planning steps are described seems backwards. This causes some notions to be introduced early with no explanations and actually explained later on in a different chapter. I changed the order of the planning steps on purpose to make the process easier to understand without any lookahead.

Ironic images

Before importing the nodes and running hardware introspection, we need to extract the ironic images and place them in the correct location:

[stack@dirlo ~]$ sudo dnf install -y rhosp-director-images-x86_64
...
[stack@dirlo ~]$ mkdir -p ~/images
[stack@dirlo ~]$ cd ~/images
[stack@dirlo images]$ tar -xf /usr/share/rhosp-director-images/ironic-python-agent-latest.tar
[stack@dirlo images]$ tar -xf /usr/share/rhosp-director-images/overcloud-full-latest.tar
[stack@dirlo images]$ source ~/stackrc
(undercloud)$ openstack overcloud image upload
Image "file:///var/lib/ironic/images/overcloud-full.vmlinuz" was copied.
+------------------------------------------------------+----------------+----------+
|                         Path                         |      Name      |   Size   |
+------------------------------------------------------+----------------+----------+
| file:///var/lib/ironic/images/overcloud-full.vmlinuz | overcloud-full | 11173488 |
+------------------------------------------------------+----------------+----------+
Image "file:///var/lib/ironic/images/overcloud-full.initrd" was copied.
+-----------------------------------------------------+----------------+----------+
|                         Path                        |      Name      |   Size   |
+-----------------------------------------------------+----------------+----------+
| file:///var/lib/ironic/images/overcloud-full.initrd | overcloud-full | 65063256 |
+-----------------------------------------------------+----------------+----------+
Image "file:///var/lib/ironic/images/overcloud-full.raw" was copied.
+--------------------------------------------------+----------------+------------+
|                       Path                       |      Name      |    Size    |
+--------------------------------------------------+----------------+------------+
| file:///var/lib/ironic/images/overcloud-full.raw | overcloud-full | 3295805440 |
+--------------------------------------------------+----------------+------------+
(undercloud)$ ls -lhF /var/lib/ironic/httpboot /var/lib/ironic/images
/var/lib/ironic/httpboot:
total 406M
-rwxr-xr-x. 1 root  42422  11M Nov  4 13:44 agent.kernel*
-rw-r--r--. 1 root  42422 396M Nov  4 13:44 agent.ramdisk
-rw-r--r--. 1 42422 42422  758 Oct 27 11:40 boot.ipxe
-rw-r--r--. 1 42422 42422  464 Oct 27 11:33 inspector.ipxe

/var/lib/ironic/images:
total 2.0G
-rw-r--r--. 1 root 42422  63M Nov  4 13:44 overcloud-full.initrd
-rw-r--r--. 1 root 42422 3.1G Nov  4 13:44 overcloud-full.raw
-rw-r--r--. 1 root 42422  11M Nov  4 13:44 overcloud-full.vmlinuz

The BIOS ironic images are required because we have a virtual machine as controller and the UEFI image does not work with the ipxe implementation of QEMU yet. The physical machines NEED to be configured to boot as UEFI. If they are configured to boot as legacy BIOS, the provisioning will not work since ironic ipxe image only handles UEFI in OSP 17.0. Surprisingly, hardware introspection will work but the provisioning step will fail with obscure timeout errors. The only way to debug the issue was to tcpdump traffic and see that the wrong boot image was being sent after the DHCP negotiation.

In the case of mixed BIOS and UEFI machines, we need both kinds of images. That requires to extract the UEFI image after and run the upload command again. Otherwise the UEFI image takes precedence and the BIOS image is ignored:

[stack@dirlo ~]$ sudo dnf install -y rhosp-director-images-uefi-x86_64
...
[stack@dirlo ~]$ mkdir -p ~/images
[stack@dirlo ~]$ cd ~/images
[stack@dirlo images]$ tar -xf /usr/share/rhosp-director-images/overcloud-hardened-uefi-full-latest.tar
[stack@dirlo images]$ source ~/stackrc
(undercloud)$ openstack overcloud image upload --update-existing
Image "file:///var/lib/ironic/images/overcloud-hardened-uefi-full.raw" was copied.
+----------------------------------------------------------------+------------------------------+------------+
|                              Path                              |             Name             |    Size    |
+----------------------------------------------------------------+------------------------------+------------+
| file:///var/lib/ironic/images/overcloud-hardened-uefi-full.raw | overcloud-hardened-uefi-full | 6442450944 |
+----------------------------------------------------------------+------------------------------+------------+
Image file "/var/lib/ironic/httpboot/agent.kernel" is up-to-date, skipping.
Image file "/var/lib/ironic/httpboot/agent.ramdisk" is up-to-date, skipping.
(undercloud)$ ls -lhF /var/lib/ironic/images
total 3.9G
-rw-r--r--. 1 root 42422  63M Nov  4 13:44 overcloud-full.initrd
-rw-r--r--. 1 root 42422 3.1G Nov  4 13:44 overcloud-full.raw
-rw-r--r--. 1 root 42422  11M Nov  4 13:44 overcloud-full.vmlinuz
-rw-r--r--. 1 root 42422 6.0G Nov  4 13:53 overcloud-hardened-uefi-full.raw

Hosts inventory

Now, we can make an inventory of all the physical/virtual machines that will make our overcloud. This is quite straightforward, all we need to do is create a nodes.yaml file listing each machine along with the Ethernet MAC address of the interface that will be used for PXE boot and the power management connection details.

# ~/nodes.yaml
---
nodes:
  - name: overcloud-controller  # the name is required but does not matter
    # vbmc that we started earlier
    pm_addr: 192.168.122.1
    pm_port: 6230
    pm_user: admin
    pm_password: admin
    pm_type: ipmi
    ports:
      - address: "52:54:00:ca:ca:01"  # fixed mac address defined earlier
        physical_network: ctlplane

  - name: dell-r640-oss-11
    pm_addr: dell-r640-oss-11-mm.lab.eng.brq2.redhat.com
    pm_port: 623
    pm_user: root
    pm_password: calvin
    pm_type: ipmi
    ports:
      - address: "e4:43:4b:5c:96:71"  # eno2
        physical_network: ctlplane

  - name: dell-r640-oss-12
    pm_addr: dell-r640-oss-12-mm.lab.eng.brq2.redhat.com
    pm_port: 623
    pm_user: root
    pm_password: calvin
    pm_type: ipmi
    ports:
      - address: "e4:43:4b:5c:97:c1"  # eno2
        physical_network: ctlplane

Once this file is defined, the nodes can be imported:

[stack@dirlo ~]$ source ~/stackrc
(undercloud)$ openstack overcloud node import ~/nodes.yaml
Successfully registered node UUID c51e6af7-fb0f-41a3-a727-e4280a12d2dd
Successfully registered node UUID 40c0c3b3-1564-46f7-9fa3-90a17ad3af8f
Successfully registered node UUID a8172746-0e60-4f36-aab5-bffd467d370d

And the hardware introspection can be started:

[stack@dirlo ~]$ source ~/stackrc
(undercloud)$ openstack overcloud node introspect --all-manageable --provide
...
PLAY [Baremetal Introspection for multiple Ironic Nodes] ***********************
...
Successfully introspected nodes: ['c51e6af7-fb0f-41a3-a727-e4280a12d2dd', '40c0c3b3-1564-46f7-9fa3-90a17ad3af8f', 'a8172746-0e60-4f36-aab5-bffd467d370d']
...
PLAY [Overcloud Node Provide] **************************************************
....
Successfully provided nodes: ['c51e6af7-fb0f-41a3-a727-e4280a12d2dd', '40c0c3b3-1564-46f7-9fa3-90a17ad3af8f', 'a8172746-0e60-4f36-aab5-bffd467d370d']
(undercloud)$ openstack baremetal node list
+--------------------------------------+----------------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name                 | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------------------+---------------+-------------+--------------------+-------------+
| c51e6af7-fb0f-41a3-a727-e4280a12d2dd | overcloud-controller | None          | power off   | available          | False       |
| 40c0c3b3-1564-46f7-9fa3-90a17ad3af8f | dell-r640-oss-11     | None          | power off   | available          | False       |
| a8172746-0e60-4f36-aab5-bffd467d370d | dell-r640-oss-12     | None          | power off   | available          | False       |
+--------------------------------------+----------------------+---------------+-------------+--------------------+-------------+

Network layout

There are multiple network layout samples provided in /usr/share/openstack-tripleo-heat-templates/network-data-samples. We will use the default IPv4 network isolation and adjust it to our needs:

[stack@dirlo ~]$ SAMPLES=/usr/share/openstack-tripleo-heat-templates/network-data-samples
[stack@dirlo ~]$ cp $SAMPLES/default-network-isolation.yaml network_data.yaml

I do not have any external network, I removed the section. I adjusted the tenant network prefix to 172.16.4.0/24 to avoid conflict with the ctlplane network:

diff -u /usr/share/openstack-tripleo-heat-templates/network-data-samples/default-network-isolation.yaml network_data.yaml
--- /usr/share/openstack-tripleo-heat-templates/network-data-samples/default-network-isolation.yaml
+++ network_data.yaml
@@ -1,3 +1,4 @@
+---
 - name: Storage
   name_lower: storage
   vip: true
@@ -8,7 +9,7 @@
       allocation_pools:
         - start: 172.16.1.4
           end: 172.16.1.250
-      vlan: 30
+      vlan: 402
 - name: StorageMgmt
   name_lower: storage_mgmt
   vip: true
@@ -19,7 +20,7 @@
       allocation_pools:
         - start: 172.16.3.4
           end: 172.16.3.250
-      vlan: 40
+      vlan: 403
 - name: InternalApi
   name_lower: internal_api
   vip: true
@@ -30,27 +31,15 @@
       allocation_pools:
         - start: 172.16.2.4
           end: 172.16.2.250
-      vlan: 20
+      vlan: 401
 - name: Tenant
   vip: false  # Tenant network does not use VIPs
   mtu: 1500
   name_lower: tenant
   subnets:
     tenant_subnet:
-      ip_subnet: 172.16.0.0/24
+      ip_subnet: 172.16.4.0/24
       allocation_pools:
-        - start: 172.16.0.4
-          end: 172.16.0.250
-      vlan: 50
-- name: External
-  name_lower: external
-  vip: true
-  mtu: 1500
-  subnets:
-    external_subnet:
-      ip_subnet: 10.0.0.0/24
-      allocation_pools:
-        - start: 10.0.0.4
-          end: 10.0.0.250
-      gateway_ip: 10.0.0.1
-      vlan: 10
+        - start: 172.16.4.4
+          end: 172.16.4.250
+      vlan: 404

Also, we need a vip_data.yaml file which we can almost take as-is:

[stack@dirlo ~]$ SAMPLES=/usr/share/openstack-tripleo-heat-templates/network-data-samples
[stack@dirlo ~]$ cp $SAMPLES/vip-data-default-network-isolation.yaml vip_data.yaml
diff -u /usr/share/openstack-tripleo-heat-templates/network-data-samples/vip-data-default-network-isolation.yaml vip_data.yaml
--- /usr/share/openstack-tripleo-heat-templates/network-data-samples/vip-data-default-network-isolation.yaml
+++ vip_data.yaml
@@ -33,7 +33,5 @@
   dns_name: overcloud
 - network: storage
   dns_name: overcloud
-- network: external
-  dns_name: overcloud
 - network: ctlplane
   dns_name: overcloud

Roles definitions

Next, we need to determine what roles will be used in our deployment. The overcloud consists of nodes (bare metal and/or virtual machines) that have predefined roles such as Controller, Compute, CephStorage, etc. Each of these roles contains:

  • A short description
  • Some user defined tags (optional)
  • The list of networks they have access to (defined in the previous section)
  • Some default parameters (optional)
  • The set of OpenStack services that will be running on machines assigned to this role

There are some built-in role definitions shipped in /usr/share/openstack-tripleo-heat-templates/roles on the undercloud node. They can be listed with the following command:

[stack@dirlo ~]$ openstack overcloud role list
BlockStorage
CephStorage
Compute
ComputeHCI
ComputeOvsDpdk
Controller
...

The overcloud deployment needs a roles_data.yml file containing all roles that will be used. In my case, I will only be using: Controller and ComputeOvsDpdk. To generate and tweak the file, here are the commands I used:

[stack@dirlo ~]$ openstack overcloud role generate -o ~/roles_data.yaml.orig Controller ComputeOvsDpdk
[stack@dirlo ~]$ cp ~/roles_data.yaml.orig ~/roles_data.yaml
[stack@dirlo ~]$ vim ~/roles_data.yaml

The default settings for the roles contain a lot of settings which are described here. The default files should be suitable for standard deployments. In our case, we do not have an External network. We need to remove it from the networks section and replace the default_route_networks setting with ctlplane which goes through the undercloud machine via NAT to reach the external network (see diagram above).

diff -u roles_data.yaml.orig roles_data.yaml
--- roles_data.yaml.orig
+++ roles_data.yaml
@@ -16,8 +16,6 @@
     # ML2/OVS without DVR)
     - external_bridge
   networks:
-    External:
-      subnet: external_subnet
     InternalApi:
       subnet: internal_api_subnet
     Storage:
@@ -28,7 +26,7 @@
       subnet: tenant_subnet
   # For systems with both IPv4 and IPv6, you may specify a gateway network for
   # each, such as ['ControlPlane', 'External']
-  default_route_networks: ['External']
+  default_route_networks: ['ctlplane']
   HostnameFormatDefault: '%stackname%-controller-%index%'
   RoleParametersDefault:
     OVNCMSOptions: "enable-chassis-as-gw"

Network interfaces configuration

After defining a network layout, we need to map each network to actual network interfaces. This is done via Jinja 2 ansible templates. There are multiple examples shipped with Triple-O. Unfortunately, I did not find any that matches our network topology. I will reuse the ones that are somewhat close to what I need. Details about all available fields in these templates are available in the docs.

[stack@dirlo ~]$ templates=/usr/share/ansible/roles/tripleo_network_config/templates
[stack@dirlo ~]$ cp $templates/multiple_nics_vlans/multiple_nics_vlans_dpdk.j2 .
[stack@dirlo ~]$ cp $templates/single_nic_vlans/controller_no_external.j2 .

ATTENTION: if there is a Jinja syntax error in these files, the deployment engine will not report where the error is located (nor in which file):

2023-01-13 16:04:32.299771 | 52540097-fea0-e819-b418-000000000127 |       TASK | Render network_config from template
An exception occurred during task execution. To see the full traceback, use -vvv. The error was:     use_dhcp: false
2023-01-13 16:04:33.396767 | 52540097-fea0-e819-b418-000000000127 |      FATAL | Render network_config from template | controller-0 | error={"changed": false, "msg": "AnsibleError: template error while templating string: expected token ',', got 'string'. String: # vim: ft=yaml\n---\n{% set mtu_list = [ctlplane_mtu] %}\n{% for network in role_networks %}\n{{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}\n{%- endfor %}\n{% set min_viable_mtu = mtu_list | max %}\nnetwork_config:\n# First interface holds the control plane network \"flat\" (or native vlan)\n- type: interface\n  name: eth0\n  mtu: {{ min_viable_mtu }}\n  use_dhcp: false\n  dns_servers: {{ ctlplane_dns_nameservers }}\n  domain: {{ dns_search_domains }}\n  addresses:\n  - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}\n  routes: {{ ctlplane_host_routes }}\n\n# We do not have any External network. The first interface gets every\n# OSP service networks. I chose to exclude these VLANs from OvS to avoid\n# connectivity interruption when the OvS daemon is restarted.\n{% for network in role_networks if network != \"Tenant\" %}\n{% set net = networks_lower[network] %}\n- type: vlan\n  device: eth0\n  mtu: {{ lookup('vars', net '_mtu') }}\n  use_dhcp: false\n  vlan_id: {{ lookup('vars', net ~ '_vlan_id') }}\n  addresses:\n  - ip_netmask: {{ lookup('vars', net ~ '_ip') }}/{{ lookup('vars', net ~ '_cidr') }}\n  routes: {{ lookup('vars', net ~ '_host_routes') }}\n{% endfor %}\n\n# Tenant is confined on the second interface through an OvS bridge.\n- type: ovs_bridge\n  name: br-eth1\n  mtu: {{ min_viable_mtu }}\n  use_dhcp: false\n  # Add the tenant vlan tag directly on the bridge to avoid creation of a linux\n  # vlan interface by os-net-config. This allows provider networks (other VLAN\n  # ids) to be accepted on the physical interface and not interfere with the\n  # tenant VLAN id.\n  ovs_extra: set port br-eth1 tag={{ lookup('vars', 'tenant_vlan_id') }}\n  addresses:\n  - ip_netmask: {{ lookup('vars', 'tenant_ip') }}/{{ lookup('vars', 'tenant_cidr') }}\n  routes: {{ lookup('vars', 'tenant_host_routes') }}\n  members:\n  - type: interface\n    name: eth1\n    mtu: {{ min_viable_mtu }}\n    use_dhcp: false\n\n"}

The only way to ensure that the templates are correct is to try and parse them using a python shell:

>>> import jinja
>>> s = open("controller_no_external.j2").read()
>>> t = jinja2.Template(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 1031, in __new__
    return env.from_string(source, template_class=cls)
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 941, in from_string
    return cls.from_code(self, self.compile(source), globals, None)
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 638, in compile
    self.handle_exception(source=source_hint)
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/usr/lib/python3.9/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "<unknown>", line 27, in template
jinja2.exceptions.TemplateSyntaxError: expected token ',', got 'string'

The issue actually had nothing to do with a missing comma:

--- a/controller_no_external.j2
+++ b/controller_no_external.j2
@@ -24,7 +24,7 @@
 {% set net = networks_lower[network] %}
 - type: vlan
   device: eth0
-  mtu: {{ lookup('vars', net '_mtu') }}
+  mtu: {{ lookup('vars', net ~ '_mtu') }}
   use_dhcp: false
   vlan_id: {{ lookup('vars', net ~ '_vlan_id') }}
   addresses:

After extensive modifications, here is what I ended up with:

Controller node

# ~/controller_no_external.j2
---
{% set mtu_list = [ctlplane_mtu] %}
{% for network in role_networks %}
{{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }}
{% endfor %}
{% set min_viable_mtu = mtu_list | max %}
network_config:
# First interface holds the control plane network "flat" (or native vlan)
- type: interface
  name: enp1s0
  mtu: {{ min_viable_mtu }}
  use_dhcp: false
  dns_servers: {{ ctlplane_dns_nameservers }}
  domain: {{ dns_search_domains }}
  addresses:
  - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}
  routes: {{ ctlplane_host_routes }}

# We do not have any External network. The first interface gets every
# OSP service networks. I chose to exclude these VLANs from OvS to avoid
# connectivity interruption when the OvS daemon is restarted.
{% for network in role_networks if network != "Tenant" %}
{% set net = networks_lower[network] %}
- type: vlan
  device: enp1s0
  mtu: {{ lookup('vars', net ~ '_mtu') }}
  use_dhcp: false
  vlan_id: {{ lookup('vars', net ~ '_vlan_id') }}
  addresses:
  - ip_netmask: {{ lookup('vars', net ~ '_ip') }}/{{ lookup('vars', net ~ '_cidr') }}
  routes: {{ lookup('vars', net ~ '_host_routes') }}
{% endfor %}

# Tenant is confined on the second interface through an OvS bridge.
- type: ovs_bridge
  name: br-enp2s0
  mtu: {{ min_viable_mtu }}
  use_dhcp: false
  # Add the tenant vlan tag directly on the bridge to avoid creation of a linux
  # vlan interface by os-net-config. This allows provider networks (other VLAN
  # ids) to be accepted on the physical interface and not interfere with the
  # tenant VLAN id.
  ovs_extra: set port br-enp2s0 tag={{ lookup('vars', 'tenant_vlan_id') }}
  addresses:
  - ip_netmask: {{ lookup('vars', 'tenant_ip') }}/{{ lookup('vars', 'tenant_cidr') }}
  routes: {{ lookup('vars', 'tenant_host_routes') }}
  members:
  - type: interface
    name: enp2s0
    mtu: {{ min_viable_mtu }}
    use_dhcp: false

Compute nodes

# ~/multiple_nics_vlans_dpdk.j2
---
network_config:
- type: interface
  name: eno1
  use_dhcp: false
  defroute: false  # eno1 is completely disabled in our setup

- type: interface
  name: eno2
  mtu: {{ ctlplane_mtu }}
  dns_servers: {{ ctlplane_dns_nameservers }}
  domain: {{ dns_search_domains }}
  routes: {{ ctlplane_host_routes }}
  use_dhcp: false
  addresses:
  - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_subnet_cidr }}

{% for network in networks_all if network not in networks_skip_config %}
{% if network not in ["External", "Tenant"] and network in role_networks %}
- type: vlan
  device: eno2
  mtu: {{ lookup('vars', networks_lower[network] ~ '_mtu') }}
  use_dhcp: false
  vlan_id: {{ lookup('vars', networks_lower[network] ~ '_vlan_id') }}
  addresses:
  - ip_netmask:
      {{ lookup('vars', networks_lower[network] ~ '_ip') }}/{{ lookup('vars', networks_lower[network] ~ '_cidr') }}
  routes: {{ lookup('vars', networks_lower[network] ~ '_host_routes') }}
{% endif %}
{% endfor %}

- type: ovs_user_bridge
  name: br-eno3
  use_dhcp: false
  mtu: 9000
  # Add the tenant vlan tag directly on the bridge to avoid creation of a linux
  # vlan interface by os-net-config. This allows provider networks (other VLAN
  # ids) to be allowed on the physical interface and not interfere with the
  # tenant VLAN id.
  ovs_extra: set port br-eno3 tag={{ lookup('vars', 'tenant_vlan_id') }}
  addresses:
  - ip_netmask:
      {{ lookup('vars', 'tenant_ip') }}/{{ lookup('vars', 'tenant_cidr') }}
  routes: {{ lookup('vars', 'tenant_host_routes') }}
  members:
  - type: ovs_dpdk_port
    name: dpdk-eno3
    rx_queue: 4
    members:
    - type: interface
      name: eno3
      mtu: 9000
      primary: true  # use the same MAC address in the bridge
      use_dhcp: false

- type: interface
  name: eno4
  use_dhcp: false
  defroute: false  # eno4 is completely disabled in our setup

Putting it all together

Before actually running the provisioning and deployment, we need to create another file that will actually aggregate all the ones we did define. Its main point is to declare how many nodes of each roles we need. But it can also be used to add node and/or role-specific parameters.

# ~/baremetal_deployment.yaml
---
- name: Controller  # must match a role name in ~/roles_data.yaml
  count: 1
  defaults:
    # by default, the image is uefi based, force the bios kind
    image:
      href: file:///var/lib/ironic/images/overcloud-full.raw
      kernel: file:///var/lib/ironic/images/overcloud-full.vmlinuz
      ramdisk: file:///var/lib/ironic/images/overcloud-full.initrd
    # We are using "network isolation". The list of networks *MUST* be listed *AGAIN*
    # here otherwise the baremetal provisioning will fail with obscure timeouts.
    networks:
      - network: ctlplane
        vif: true
      - network: internal_api
        subnet: internal_api_subnet
      - network: tenant
        subnet: tenant_subnet
      - network: storage
        subnet: storage_subnet
      - network: storage_mgmt
        subnet: storage_mgmt_subnet
    network_config:
      # need to use absolute paths
      template: /home/stack/controller_no_external.j2
      default_route_network:
        - ctlplane
    config_drive:
      # cloud init config to allow ssh connections with a password
      cloud_config:
        ssh_pwauth: true
        disable_root: false
        chpasswd:
          list: 'root:redhat'
          expire: false
  instances:
    - hostname: controller-0
      name: overcloud-controller  # must match a node in ~/nodes.yaml

- name: ComputeOvsDpdk  # must match a role name in ~/roles_data.yaml
  count: 2
  defaults:
    # by default, the image is uefi based, force the bios kind
    image:
      href: file:///var/lib/ironic/images/overcloud-full.raw
      kernel: file:///var/lib/ironic/images/overcloud-full.vmlinuz
      ramdisk: file:///var/lib/ironic/images/overcloud-full.initrd
    # We are using "network isolation". The list of networks *MUST* be listed *AGAIN*
    # here otherwise the baremetal provisioning will fail with obscure timeouts.
    networks:
      - network: ctlplane
        vif: true
      - network: internal_api
        subnet: internal_api_subnet
      - network: tenant
        subnet: tenant_subnet
      - network: storage
        subnet: storage_subnet
    network_config:
      template: /home/stack/multiple_nics_vlans_dpdk.j2
      default_route_network:
        - ctlplane
    config_drive:
      cloud_config:
        ssh_pwauth: true
        disable_root: false
        chpasswd:
          list: 'root:redhat'
          expire: false
  instances:
    - hostname: compute-0
      name: dell-r640-oss-11  # must match a node in ~/nodes.yaml
    - hostname: compute-1
      name: dell-r640-oss-12  # must match a node in ~/nodes.yaml
  ansible_playbooks:
    # Hugepages and cpu isolation need to be configured via kernel boot parameters.
    # These settings are applied via ansible playbooks during the provision
    # phase (see below) to avoid an additional reboot during the deploy phase.
    - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-kernelargs.yaml
      extra_vars:
        kernel_args: 'default_hugepagesz=1GB hugepagesz=1G hugepages=160 intel_iommu=on iommu=pt isolcpus=4-23,28-47'
        tuned_isolated_cores: '4-23,28-47'
        tuned_profile: 'cpu-partitioning'
        reboot_wait_timeout: 1800
    # OvS DPDK must also be configured during the provision phase to avoid additional
    # reboots. Even if the physical NIC is only on NUMA node 0, we need to allocate
    # PMD cores in both sockets.
    - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-openvswitch-dpdk.yaml
      extra_vars:
        pmd: '4-7,28-31'
        socket_mem: '2048,2048'
        # if memory_channels is omitted, the default is an empty string which is invalid
        # it makes the ovs dpdk initialization fail with an obscure timeout error
        memory_channels: 4

Overcloud deployment

Now that we have all these files created, it is time to do the actual deployment. This is done in two phases.

Provision

These commands will take the files that we created and generate other files that will be required for the actual deploy command. This is also where we actually do the baremetal nodes installation (day 1). This step only needs to be done once.

source ~/stackrc

openstack overcloud network provision ~/network_data.yaml \
    --stack overcloud --output ~/networks-deployed-environment.yaml -y

openstack overcloud network vip provision ~/vip_data.yaml \
    --stack overcloud --output ~/vip-deployed-environment.yaml -y

openstack overcloud node provision ~/baremetal_deployment.yaml \
    --stack overcloud --output ~/overcloud-baremetal-deployed.yaml -y --network-config

Deploy

This step will do that actual OpenStack installation on the already deployed baremetal nodes. It requires some additional environment files to override the default settings. I will only need two:

# ~/global-config.yaml
---
parameter_defaults:
  # repeat the cloud-init parameters from ~/baremetal_deployment_dpdk.yaml
  PasswordAuthentication: 'yes'
  SshFirewallAllowAll: true
  NodeRootPassword: 'redhat'
# ~/dpdk-config.yaml
---
parameter_defaults:
  ComputeOvsDpdkParameters:
    # These settings are copied from ~/baremetal_deployment_dpdk.yaml
    # They need to be duplicated here to avoid reboots and also to be able to update
    # them if needed.
    KernelArgs: default_hugepagesz=1GB hugepagesz=1G hugepages=160 intel_iommu=on iommu=pt isolcpus=4-23,28-47
    IsolCpusList: "4-23,28-47"
    OvsDpdkSocketMemory: "2048,2048"
    OvsDpdkMemoryChannels: 4
    OvsPmdCoreList: "4-7,28-31"
    # Configure nova to allow scheduling vms on isolated cores that are used neither
    # by the kernel nor by ovs.
    NovaReservedHostMemory: "4096"
    NovaComputeCpuSharedSet: "0-3,24-27"
    NovaComputeCpuDedicatedSet: "8-23,32-47"
    # An array of filters used by Nova to filter a node. These filters will be
    # applied in the order they are listed. By default, there is no filter at all
    # and any vm can be scheduled on any compute node.
    NovaSchedulerEnabledFilters:
      - AvailabilityZoneFilter
      - ComputeFilter
      - ComputeCapabilitiesFilter
      - ImagePropertiesFilter
      - ServerGroupAntiAffinityFilter
      - ServerGroupAffinityFilter
      - PciPassthroughFilter
      - NUMATopologyFilter

We now have everything ready, time to run the deployment:

source ~/stackrc

BUILTINS=/usr/share/openstack-tripleo-heat-templates/environments

set --

# Use templates, without that argument, builtins will not be found
set -- "$@" --templates

# These need to be specified here
set -- "$@" --networks-file ~/network_data.yaml
set -- "$@" --roles-file ~/roles_data.yaml

# Use pre-provisioned overcloud nodes (`openstack overcloud node provision`)
# This is not described in the docs, sadly.
set -- "$@" --deployed-server

# generated by `openstack overcloud network provision`
set -- "$@" -e ~/networks-deployed-environment.yaml
# generated by `openstack overcloud network vip provision`
set -- "$@" -e ~/vip-deployed-environment.yaml

# enable debug logs
set -- "$@" -e $BUILTINS/debug.yaml
# location of overcloud container registries
set -- "$@" -e ~/containers-prepare-parameter.yaml

# OVN is the default networking mechanism driver in OSP 17.0. However, to have the
# deployment working, this environment file MUST be added to the openstack overcloud
# deploy command. This is only said in a small note a few paragraphs before the actual
# command invocation.
set -- "$@" -e $BUILTINS/services/neutron-ovn-ha.yaml
# Since we also have DPDK compute nodes, we need this extra file. It is not referenced
# anywhere in the OSP deployment guide.
set -- "$@" -e $BUILTINS/services/neutron-ovn-dpdk.yaml

# Override default settings from roles. These files can be modified and the command
# executed again to update the cluster configuration.
set -- "$@" -e ~/global-config.yaml
set -- "$@" -e ~/dpdk-config.yaml

# generated by `openstack overcloud node provision`
set -- "$@" -e ~/overcloud-baremetal-deployed.yaml

# run the actual deployment using all args
openstack overcloud deploy "$@"

After more than one hour, you should see this message indicating that the deployment is done.

Overcloud Deployed successfully

Basic operation

What now?

VM images creation

Let’s create some VM images first. I will create a traffic generator image based on TRex and another one to simulate a VNF based on testpmd. Since virt-customize is not available on the undercloud machine and I don’t want to mess up with the cluster, I will prepare these images on the hypervisor (that is running the undercloud and controller VMs).

# use RHEL 8 since trex does not work on rhel 9
base_url="http://download.eng.brq.redhat.com/rhel-8/rel-eng/RHEL-8/latest-RHEL-8.6"
url="$base_url/compose/BaseOS/x86_64/images"
curl -LO "$url/SHA256SUM"
qcow2=$(sed -nre 's/^SHA256 \((rhel-guest-image.+\.qcow2)\) =.*/\1/p' SHA256SUM)
curl -LO "$url/$qcow2"
sha256sum -c --ignore-missing SHA256SUM
mv -v "$qcow2" rhel-guest-image-8.6.qcow2

cp rhel-guest-image-8.6.qcow2 trex.qcow2
cp rhel-guest-image-8.6.qcow2 testpmd.qcow2

# set common options for both trex and testpmd
set --      --smp 8 --memsize 8192
set -- "$@" --run-command "rm -f /etc/yum.repos.d/*.repo"
set -- "$@" --run-command "curl -L $base_url/repofile.repo > /etc/yum.repos.d/rhel.repo"

export LIBGUESTFS_BACKEND=direct

virt-customize -a trex.qcow2 "$@" \
    --install pciutils,driverctl,tmux,vim,python3,tuned-profiles-cpu-partitioning \
    --run-command "curl -L https://content.mellanox.com/ofed/MLNX_OFED-5.7-1.0.2.0/MLNX_OFED_LINUX-5.7-1.0.2.0-rhel8.6-x86_64.tgz | tar -C /root -zx" \
    --run-command "curl -L https://trex-tgn.cisco.com/trex/release/v3.02.tar.gz | tar -C /root -zx && mv /root/v3.02 /root/trex" \
    --run-command "cd /root/MLNX_OFED* && ./mlnxofetinstall --without-fw-update" \
    --selinux-relabel

virt-customize -a testpmd.qcow2 "$@" \
    --install dpdk,dpdk-tools,tuned-profiles-cpu-partitioning,driverctl,vim,tmux \
    --selinux-relabel

scp trex.qcow2 testpmd.qcow2 stack@192.168.122.2:

After SSHing back into the undercloud VM, we can import both images:

source ~/overcloudrc

openstack image create --min-disk 10 --min-ram 2048 --disk-format qcow2 \
    --container-format bare --file ~/trex.qcow2 --public trex

openstack image create --min-disk 10 --min-ram 2048 --disk-format qcow2 \
    --container-format bare --file ~/testpmd.qcow2 --public testpmd