These helper scripts provide a virtualized infrastructure for use with OpenShift baremetal IPI deployment, and then use OpenShift Baremetal Deploy Ansible Installer to deploy a cluster on that virtualized infrastructure. They do the following:
- Prepare the provisioning host for OCP deployment (required packages, firewall, etc)
- Start DHCP and DNS containers for the OCP baremetal network
- Set up NAT forwarding and masquerading to allow the baremetal network to reach an external routable network
- Create VMs to serve as the cluster's masters and workers
- Create virtual BMC endpoints for the VMs
- Clone the OpenShift Baremetal Deploy Ansible Installer and prepare it for use with the virtualized infrastructure
- Execute the aforementioned Ansible playbook
- Provisioning host machine must have an externally-facing NIC on a separate VLAN if you wish the cluster to have Internet connectivity
- Provisioning host machine must have externally-facing NICs on a separate VLAN for the provisioning and baremetal networks if you wish for the VMs or DHCP/DNS services to be reachable by nodes outside the host
- Provisioning host machine must be RHEL 8.1 or CentOS 8.1
- If RHEL 8.1, an active subscription is required
- A non-root user must be available to execute the scripts and the Ansible playbook. You could add one like so:
sudo useradd kni echo "kni ALL=(root) NOPASSWD:ALL" | sudo tee -a /etc/sudoers.d/kni sudo chmod 0440 /etc/sudoers.d/kni sudo su - kni -c "ssh-keygen -t rsa -f /home/kni/.ssh/id_rsa -N ''"
sudo dnf install -y make git
- Copy your OpenShift pull secret to your non-root user's home directory (i.e.
/home/kni
) and call itpull-secret.txt
(this location is ultimately configurable, however -- see below)
- As your non-root user (such as
kni
), clone the repo to your provisioning host machine and go to the directory:git clone https://github.com/redhat-nfvpe/kni-ipi-virt.git cd kni-ipi-virt
- Set your environment variables in
common.sh
. These values and their purpose are described in the file. make all
- To remove the VMs, DNS and DHCP containers, use
make clean
- Clone the repo to your provisioning host machine and go to the directory:
git clone https://github.com/redhat-nfvpe/kni-ipi-virt.git cd kni-ipi-virt
- Set your environment variables in
common.sh
. These values and their purpose are described in the file. - Execute
prep_host.sh
, which requires the following variables to be set incommon.sh
:
BM_BRIDGE
BM_GW_IP
DNS_IP
PROV_BRIDGE
- If you wish external nodes to be able to reach the services/VMs listed below, you will also need:
BM_INTF
PROV_INTF
- Assuming steps above have been completed, the individual DNS, DCHP and VM bash scripts can be utilized alone to make use of their atomic functionality.
Create a CoreDNS container to provide DNS on your baremetal network. The following variables are required to be set in common.sh
:
API_VIP
BM_GW_IP
BM_INTF
(if you want external nodes to be able reach this service)CLUSTER_DOMAIN
CLUSTER_NAME
DNS_IP
DNS_VIP
EXT_DNS_IP
INGRESS_VIP
PROJECT_DIR
Create and start the CoreDNS container:
./dns/start.sh
Stop and remove the CoreDNS container:
./dns/stop.sh
Create a Dnsmasq container to provide DHCP on your baremetal network. The following variables are required to be set in common.sh
:
BM_GW_IP
BM_INTF
(if you want external nodes to be able reach this service)CLUSTER_DOMAIN
CLUSTER_NAME
DHCP_BM_MACS
DNS_IP
PROJECT_DIR
If using the DHCP container with existing machines, you will need to set DHCP_BM_MACS
. DHCP_BM_MACS
should list your master and worker baremetal network MACs like so: <master0>,..,<masterN>,<worker0>,..,<workerN>
. If you do not set this variable, MASTER_BM_MAC_PREFIX
and WORKER_BM_MAC_PREFIX
will be used (as they would in "Bundled Usage"), which will most likely result in incorrect Dnsmasq configuration (unless you happen to be using the Dnsmasq container with VMs generated by this tool's VM-generation scripts).
Create and start the Dnsmasq container:
./dhcp/start.sh
Stop and remove the Dnsmasq container:
./dhcp/stop.sh
Create a certain number of VMs for use with an OCP deployment. The following variables are required to be set in common.sh
:
CLUSTER_NAME
LIBVIRT_STORAGE_POOL
MASTER_BM_MAC_PREFIX
MASTER_CPUS
MASTER_MEM
MASTER_PROV_MAC_PREFIX
MASTER_VBMC_PORT_PREFIX
NUM_MASTERS
NUM_WORKERS
PROJECT_DIR
WORKER_BM_MAC_PREFIX
WORKER_CPUS
WORKER_MEM
WORKER_PROV_MAC_PREFIX
WORKER_VBMC_PORT_PREFIX
Create the VMs and their vBMCs:
./vms/prov-vms.sh
Destroy the VMs and their vBMCs:
./vms/clean-vms.sh
-
If you are unable to start the DNS container because of an error message like so...
Error: error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]
...try stopping/removing all containers and killing all remaining
slirp4nets
processes, and then try to start the container again. Sometimespodman
fails to clean up theslirp4netns
forwarding processes when it stops/removes the DNS container. -
Sometimes the Ironic Python Agent used by the underlying Metal3 components (which are themselves part of the IPI installation process) gets stuck while cleaning the VMs' disks. Using a
vncviewer
such as TigerVNC, you can view the console of the VM and see if the agent's heartbeart is looping continuously (for more than 10 minutes or so). If so, a simple option is to just try the deployment again, but you of course run the chance of hitting a cleaning issue again. A better option is to use the Openstack CLI tool to talk with Ironic and attempt cleaning the problematic nodes manually. The tool can be installed like so:sudo pip3 install python-openstackclient sudo pip3 install python-ironicclient sudo pip3 install python-ironic-inspector-client mkdir -p ~/.config/openstack/ tee "$HOME/.config/openstack/clouds.yaml" > /dev/null << EOF clouds: metal3-bootstrap: auth_type: none baremetal_endpoint_override: http://172.22.0.2:6385 baremetal_introspection_endpoint_override: http://172.22.0.2:5050 metal3: auth_type: none baremetal_endpoint_override: http://172.22.0.3:6385 baremetal_introspection_endpoint_override: http://172.22.0.3:5050 EOF
If it's a master node that is stuck:
export OS_CLOUD=metal3-bootstrap
Else, if it's a worker node:
export OS_CLOUD=metal3
You can then see the nodes like so:
openstack baremetal node list
Find the node(s) stuck in the clean wait
state. Then do the following to abort the current cleaning:
openstack baremetal node abort <node UUID>
openstack baremetal node maintenance set <node UUID>
openstack baremetal node power off <node UUID>
openstack baremetal node manage <node UUID>
openstack baremetal node maintenance unset <node UUID>
Now the node should be in a state where you can execute manual cleaning, as described here.