KNI IPI Virt

These helper scripts provide a virtualized infrastructure for use with OpenShift baremetal IPI deployment, and then use OpenShift Baremetal Deploy Ansible Installer to deploy a cluster on that virtualized infrastructure. They do the following:

Prepare the provisioning host for OCP deployment (required packages, firewall, etc)
Start DHCP and DNS containers for the OCP baremetal network
Set up NAT forwarding and masquerading to allow the baremetal network to reach an external routable network
Create VMs to serve as the cluster's masters and workers
Create virtual BMC endpoints for the VMs
Clone the OpenShift Baremetal Deploy Ansible Installer and prepare it for use with the virtualized infrastructure
Execute the aforementioned Ansible playbook

Prerequisites

Provisioning host machine must have an externally-facing NIC on a separate VLAN if you wish the cluster to have Internet connectivity
Provisioning host machine must have externally-facing NICs on a separate VLAN for the provisioning and baremetal networks if you wish for the VMs or DHCP/DNS services to be reachable by nodes outside the host
Provisioning host machine must be RHEL 8.1 or CentOS 8.1
If RHEL 8.1, an active subscription is required

A non-root user must be available to execute the scripts and the Ansible playbook. You could add one like so:

sudo useradd kni
echo "kni ALL=(root) NOPASSWD:ALL" | sudo tee -a /etc/sudoers.d/kni
sudo chmod 0440 /etc/sudoers.d/kni
sudo su - kni -c "ssh-keygen -t rsa -f /home/kni/.ssh/id_rsa -N ''"

sudo dnf install -y make git
Copy your OpenShift pull secret to your non-root user's home directory (i.e. /home/kni) and call it pull-secret.txt (this location is ultimately configurable, however -- see below)

Bundled Usage

As your non-root user (such as kni), clone the repo to your provisioning host machine and go to the directory:
```
git clone https://github.com/redhat-nfvpe/kni-ipi-virt.git
cd kni-ipi-virt
```
Set your environment variables in common.sh. These values and their purpose are described in the file.
make all
To remove the VMs, DNS and DHCP containers, use make clean

Isolated Usage

Clone the repo to your provisioning host machine and go to the directory:

git clone https://github.com/redhat-nfvpe/kni-ipi-virt.git
cd kni-ipi-virt

Set your environment variables in common.sh. These values and their purpose are described in the file.
Execute prep_host.sh, which requires the following variables to be set in common.sh:

BM_BRIDGE
BM_GW_IP
DNS_IP
PROV_BRIDGE

If you wish external nodes to be able to reach the services/VMs listed below, you will also need:

BM_INTF
PROV_INTF

Assuming steps above have been completed, the individual DNS, DCHP and VM bash scripts can be utilized alone to make use of their atomic functionality.

DNS

Create a CoreDNS container to provide DNS on your baremetal network. The following variables are required to be set in common.sh:

API_VIP
BM_GW_IP
BM_INTF (if you want external nodes to be able reach this service)
CLUSTER_DOMAIN
CLUSTER_NAME
DNS_IP
DNS_VIP
EXT_DNS_IP
INGRESS_VIP
PROJECT_DIR

Create and start the CoreDNS container:

./dns/start.sh

Stop and remove the CoreDNS container:

./dns/stop.sh

DHCP

Create a Dnsmasq container to provide DHCP on your baremetal network. The following variables are required to be set in common.sh:

BM_GW_IP
BM_INTF (if you want external nodes to be able reach this service)
CLUSTER_DOMAIN
CLUSTER_NAME
DHCP_BM_MACS
DNS_IP
PROJECT_DIR

If using the DHCP container with existing machines, you will need to set DHCP_BM_MACS. DHCP_BM_MACS should list your master and worker baremetal network MACs like so: <master0>,..,<masterN>,<worker0>,..,<workerN>. If you do not set this variable, MASTER_BM_MAC_PREFIX and WORKER_BM_MAC_PREFIX will be used (as they would in "Bundled Usage"), which will most likely result in incorrect Dnsmasq configuration (unless you happen to be using the Dnsmasq container with VMs generated by this tool's VM-generation scripts).

Create and start the Dnsmasq container:

./dhcp/start.sh

Stop and remove the Dnsmasq container:

./dhcp/stop.sh

VMs

Create a certain number of VMs for use with an OCP deployment. The following variables are required to be set in common.sh:

CLUSTER_NAME
LIBVIRT_STORAGE_POOL
MASTER_BM_MAC_PREFIX
MASTER_CPUS
MASTER_MEM
MASTER_PROV_MAC_PREFIX
MASTER_VBMC_PORT_PREFIX
NUM_MASTERS
NUM_WORKERS
PROJECT_DIR
WORKER_BM_MAC_PREFIX
WORKER_CPUS
WORKER_MEM
WORKER_PROV_MAC_PREFIX
WORKER_VBMC_PORT_PREFIX

Create the VMs and their vBMCs:

./vms/prov-vms.sh

Destroy the VMs and their vBMCs:

./vms/clean-vms.sh

Troubleshooting

If you are unable to start the DNS container because of an error message like so...

Error: error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]

...try stopping/removing all containers and killing all remaining slirp4nets processes, and then try to start the container again. Sometimes podman fails to clean up the slirp4netns forwarding processes when it stops/removes the DNS container.

Sometimes the Ironic Python Agent used by the underlying Metal3 components (which are themselves part of the IPI installation process) gets stuck while cleaning the VMs' disks. Using a vncviewer such as TigerVNC, you can view the console of the VM and see if the agent's heartbeart is looping continuously (for more than 10 minutes or so). If so, a simple option is to just try the deployment again, but you of course run the chance of hitting a cleaning issue again. A better option is to use the Openstack CLI tool to talk with Ironic and attempt cleaning the problematic nodes manually. The tool can be installed like so:

sudo pip3 install python-openstackclient
sudo pip3 install python-ironicclient
sudo pip3 install python-ironic-inspector-client
mkdir -p ~/.config/openstack/
tee "$HOME/.config/openstack/clouds.yaml" > /dev/null << EOF
clouds:
  metal3-bootstrap:
    auth_type: none
    baremetal_endpoint_override: http://172.22.0.2:6385  
    baremetal_introspection_endpoint_override: http://172.22.0.2:5050
  metal3:                                                            
    auth_type: none                                                  
    baremetal_endpoint_override: http://172.22.0.3:6385              
    baremetal_introspection_endpoint_override: http://172.22.0.3:5050
EOF

If it's a master node that is stuck:

export OS_CLOUD=metal3-bootstrap

Else, if it's a worker node:

export OS_CLOUD=metal3

You can then see the nodes like so:

openstack baremetal node list

Find the node(s) stuck in the clean wait state. Then do the following to abort the current cleaning:

openstack baremetal node abort <node UUID>
openstack baremetal node maintenance set <node UUID>
openstack baremetal node power off <node UUID>
openstack baremetal node manage <node UUID>
openstack baremetal node maintenance unset <node UUID>

Now the node should be in a state where you can execute manual cleaning, as described here.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
dhcp		dhcp
dns		dns
iptables		iptables
vms		vms
Makefile		Makefile
README.md		README.md
common.sh		common.sh
prep_ansible.sh		prep_ansible.sh
prep_host.sh		prep_host.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KNI IPI Virt

Prerequisites

Bundled Usage

Isolated Usage

DNS

DHCP

VMs

Troubleshooting

About

Releases

Packages

Contributors 2

Languages

redhat-nfvpe/kni-ipi-virt

Folders and files

Latest commit

History

Repository files navigation

KNI IPI Virt

Prerequisites

Bundled Usage

Isolated Usage

DNS

DHCP

VMs

Troubleshooting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages