Designing Red Hat OpenShift with Hosted Control Planes
One of the most asked about features in Red Hat OpenShift (OCP) recently has been Hosted Control Planes (HCP). Based on the upstream HyperShift technology, Red Hat released general availability support for HCP in OpenShift version 4.16. Since then, newer versions of HCP have added new capabilities to support a broad way of deploying and managing OpenShift Clusters, including autoscaling clusters to meet demand.
Hosted Control Planes Explained
Before we jump into an actual deployment of HCP running within the Global Service Provider Experience Center (GSPx) of the Advanced Technology Center (ATC), let's talk through some of the benefits, drawbacks and architecture behind the solution.
Architecture
Hosted, Hosting, Management and Hub. What does all of this mean?!?! Then again, what are hosted control planes in the first place? Let's start from the beginning to simplify the concepts and design execution for the technology.
In a standalone cluster deployment of Red Hat OpenShift Container Platform (OCP), both the control plane and worker node functions reside within the same cluster. If there are only three physical servers available for the cluster, a typical design will have all 3 nodes serving both control plane and worker node functions in order to provide high availability (HA).
Red Hat's open design architecture doesn't end there though. In some cases, the control plane is decoupled from the 3-node hardware and built as virtual machines on another platform. This can ease the load on the physical servers and reduce the overall pod count, but now there are two solutions to maintain instead of just the one. Scale this up to more than one cluster and now you have multiple VMs running on a third-party hypervisor for control plane functions in addition to the physical nodes for each cluster build.
This is where the Hosted Control Planes comes in. Under HCP, an Administrator can design the control plane for new clusters in a multi-site solution to deploy as pods in the centralized management cluster. The management cluster is often called a hub cluster as well, due its hub and spoke model design. For simplicity, we'll call it the management cluster and the hosted cluster, which is the cluster that has its control plane running on the management cluster. Here's an example of what a multi-site deployment would look like at high level.
When HCP is used to build new clusters, the management cluster takes on the control plane services for API, etcd, and other control plane related components by deploying pods in a separate project. Generally, if you deployed a cluster called 'clustera' the project containing the control plane will be called 'clustera-clustera'. All API calls to the hosted cluster now run through the management cluster, so if the management cluster is offline, new API calls cannot be made to the hosted cluster, but any workloads running on the hosted cluster will continue to work. This is similar to other hypervisor platforms that use a centralized management server for maintaining clusters.
Another option besides HCP running pods on the management cluster is to take advantage of OpenShift Virtualization (OCPv). By using OpenShift Virtualization, we can move the control plane to VMs running inside the management cluster as another design choice. This provides further isolation for multi-tenancy since the virtual machines would operate as dedicated control plane nodes as opposed to pods running on the management cluster instance of OpenShift.
Benefits
- Cost Reduction
- Reduce physical server costs of deploying dedicated control plane nodes for each managed cluster, or combining control plane and worker functions on the same hosts which can increase host count requirements too.
- Infrastructure Automation
- Deploy more efficiently and at-scale using the Advanced Cluster Management (ACM) user interface in OpenShift.
- Not all clusters in ACM need to run Hosted Control Planes. If some sites require standalone clusters due to latency, security, tenant isolation, etc. an Administrator build and manage both from ACM.
- Simplified Management
- Manage clusters across both on-premise and in the cloud from a single pane with ACM. Red Hat OpenShift on AWS (ROSA) also provides Hosted Control Plane support.
- Support multiple versions of OpenShift to manage compatibility for applications and the Ecosystem Software Catalog. In ACM, the general rule on release support is the current version and two versions back. So, with 4.21, you can deploy new versions of 4.19 and 4.20.
Drawbacks
Management Cluster Dependency
Let's address the largest concern, which is putting your control planes all on one cluster. It's the most asked question, which I briefly touched on earlier, which is the workloads on the hosted cluster will still function if the management cluster is offline. However, there are other considerations for recovery, and how the management cluster got to that state. Here are some design questions to think about.
- What's the recovery point objective (RPO) and recovery time objective (RTO) of the site?
- If the RPO/RTO demands high availability of the management components, is site replication using asymmetric or symmetric replication being considered?
Kubernetes three node clusters are designed to support a single node failure. This means that an Administrator can recover a failed node while the environment continues to function and provide workload access. A multi-node failure or Datacenter failure would warrant designing for datacenter replication in this case, but not everyone requires it to meet their Service Level Agreement (SLA).
If you're designing for replication, consider the increased cost for more infrastructure components whether the DR environment is in the same Datacenter vs. a remote Datacenter. Perhaps a standalone cluster is a better option, or just having a disaster recovery plan in place.
Size Calculation
For each hosted cluster, the hub cluster adds approximately 78 pods, 5 vCPUs, and 18Gib of memory to create the control plane functions for the hosted cluster. Given that Red Hat's current recommendation is 500 pods per node limit, we need to do a bit of math and include high availability in the mix, in case a node in the hub cluster goes down.
Example: 3-node Hub Cluster (Modified to 500 Pods per node)
- After installing OpenShift 4.21 and operators from the ecosystem software catalog, there are 414 pods in use. 94 of those pods were for OpenShift Data Foundation (ODF), so we can safely assume it takes ~320 pods for an average deployment.
- Operators deployed include NMState, Node Feature Discovery, Advanced Cluster Management, Multi-Cluster Engine, OpenShift Virtualization (For Fleet Virtualization), Cluster Observability and Local Storage. Your deployment may differ, which means it could be more or less pods running. For example, deploying applications, GitOps or CSI operators from the Ecosystem Software Catalog will all consume more resources and require additional pods.
- On to the equation. Assuming that if one node goes down, my max pods to run is 1,000 in a 3-node cluster. This doesn't take into account any node-specific pods that may not restart on the healthy nodes when a HA event occurs, but it gives a good estimate for design planning.
1,000-320 = 680 pods
680 / 78 pods = ~8.7 hosted clusters
3-Node Cluster = local-cluster + 8 hosted clusters
Additional HCP Sizing Considerations
- Assign the label:
hypershift.openshift.io/control-plane: trueto dedicate one or more nodes to running HCP workloads. This is useful for clusters running a lot of pods for other services, and you need to keep HCP services separate. - Assign the label:
hypershift.openshift.io/cluster: ${HostedControlPlane Namespace}for dedicated node(s) to a single hosted cluster. One use case for this label applies to dedicating one or more nodes in a multi-hosted cluster environment to separate the HCP services from hosted cluster A to hosted cluster B. - Infrastructure nodes can also be used for deploying hosted control plane services so the nodes utilized don't count towards the Red Hat OpenShift subscription. Add the label:
node-role.kubernetes.io/infrato nodes that will operate as an Infrastructure node.
Getting Started
Below is a summary of steps to complete to deploy a hosted cluster within a hub cluster environment using bare-metal hosts. For this environment, we used Dell PowerEdge servers for the hosted cluster, which has GPUs installed on them as well. Other compute platforms might vary in the configuration steps regarding BMC, HPE iLO and Dell iDRAC connectivity.
High-Level Requirements
- 500 pods per node limit set
- OpenShift 4.16+ installed (GSPx lab used is running 4.21)
- Multi-Cluster Engine Operator
- Advanced Cluster Management Operator (optional)
- This operator also deploys the multi-cluster engine for Kubernetes operator
- At least (1) bare-metal node available for the hosted cluster deployment
- Create the hosted control plane zones for even distribution of HCP components
Modifying the Pod Maximum
When first setting up an OpenShift cluster to run hosted control planes, the nodes running the pods need to be reconfigured from the default max per node count of 250. Red Hat provides a simple YAML file in their documentation to modify the nodes, as shown below. In the case of this example, if the hub cluster is running more than 50 cores, the maxPods count will be the limiting factor, and not the podsPerCore. If there is a need for more pods in the hub cluster, do not change the pod count beyond the untested max, but rather add more nodes to the cluster.
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/master: ""
kubeletConfig:
podsPerCore: 10
maxPods: 500- Create a YAML file with the code above and then apply it by running the command
oc create -f <filename>.yaml. - Monitor the process of the node change by running
oc get mcp. When the change is running on the machine pool, the UPDATING status will indicate 'True'. This process will take some time to complete, including a reboot of the nodes. Additional monitoring throughoc get nodeswill show the current node the change is being applied to.
- Once the
oc get mcpcommand returns 'False' and the node count for ReadyMachineCount is correct, one final check can be done running the commandoc get node <nodename> -o jsonpath='{.status.capacity.pods}', which should return the value of 500.
Creating Zones for HCP
A zone label configured on the management nodes is necessary in a hosted control plane environment to allow for pods to run on the multiple nodes of the cluster. Without it, when a new hosted cluster is deployed, all the pods would operate on just a single host.
So, for a 3-node management cluster, the following examples would provide the unique zone requirements for the pods to be spread across the nodes. Replace the node name 'ocphubX' with the node name in the cluster.
oc label node/ocphub01 topology.kubernetes.io/zone=zone1
oc label node/ocphub02 topology.kubernetes.io/zone=zone2
oc label node/ocphub03 topology.kubernetes.io/zone=zone3
Adding the Host Inventory
Once the pod maximum has been changed, the next procedure takes place in the Advanced Cluster Management interface.
- From the Fleet Management view of the Red Hat OpenShift console, expand Infrastructure > Host Inventory. If this is the first-time setup, click the Configure host inventory settings link and modify the storage size for the image repository and database. This will take some time to complete, but once finished an Administrator can create a new infrastructure environment.
- Click the Create Infrastructure Environment button.
- The wizard will appear, asking a series of questions for the following:
- Environment name
- Network type
- CPU Architecture
- Location
- Infrastructure Provider Credentials
- Stores the pull secret and public key for quick access
- Pull Secret (Gathered from console.redhat.com)
- SSH Public Key (Key used to access nodes after deployment)
- After completion of the wizard, the host inventory details appear.
- Click the Add Hosts button. The context menu appears with options for adding hosts. In this example, we chose With BMC form.
- The BMC Form appears, prompting for the following:
- Name
- Hostname
- BMC Address
- Boot NIC MAC Address
- The first MAC to boot
- BMC Username
- BMC Password
- NMState Config (YAML)
- MAC to NIC mapping for RHCOS
- For a Dell PowerEdge host, specifying the idrac virtual media path will configure the idrac to use a remote ISO during the host standup process. The default path is
idrac-virtualmedia://<iDRACIP>/redfish/v1/Systems/System.Embedded.1
- For a static IP network configuration, the NMState will need to define each ethernet port being configured as well as the IP address. The NMState example below is for creating a LACP bonded network on a specific VLAN for the host. (Note: Additional bonds can be added later via the OpenShift console)
interfaces:
- name: eno1np0
type: ethernet
state: up
- name: eno2np1
type: ethernet
state: up
- name: bond0
type: bond
state: up
link-aggregation:
mode: 802.3ad
options:
miimon: '100'
port:
- eno1np0
- eno2np1
- name: bond0.100
type: vlan
state: up
ipv4:
address:
- ip: 10.20.20.101
prefix-length: 25
enabled: true
vlan:
base-iface: bond0
id: 100
dns-resolver:
config:
server:
- <DNS1 IP>
- <DNS2 IP>
routes:
config:
- destination: 0.0.0.0/0
next-hop-address: 10.20.20.1
next-hop-interface: bond0.100- After clicking Create in the BMC form, click the Hosts tab in the Host Inventory window. This will show the current status of the host. The status should change from Registering to Provisioning, and then finally Available. During the process, if the iDRAC is open you can monitor the virtual console to see how far along the host is in the provisioning process.
- Once the host is available, click the arrow next to the hostname to view the hardware details of the host including CPU, memory, disks, network cards, GPUs, etc.
- Repeat the add hosts process to add any additional hosts that will be part of this host inventory collection.
Creating the DNS Records
Just like the management cluster, a hosted cluster needs to have similar DNS records pre-created for the new environment before provisioning the nodes. A total of two IP addresses are needed at minimum, which will cover the API server and then the ingress for applications running on the cluster (i.e. OpenShift Console). Each A record created should have reverse PTR records associated with it.
- Base DNS Domain: <example.com>
- Cluster Domain: <clusterA.example.com> Also referred to as a subdomain
- Apps Subdomain in Cluster Domain: <apps.clusterA.example.com>
- Internal API A Record: <api-int.clusterA.example.com> Use first IP address
- External API A Record: <api.clusterA.example.com> Use first IP address
- Apps Ingress: <*.apps.clusterA.example.com> Use second IP address for wildcard A record
Deploy MetalLB on the Management Cluster
A hosted cluster defaults to using NodePort configurations for ingress to the API service operating the hosted cluster. Instead of using a single node, the MetalLB operator will be deployed to offer ingress load balancing services for high availability and resiliency for accessing the hosted cluster environment once deployed.
- From the OpenShift console, switch to the Core Platform view to manage the local-cluster instance of the management cluster. Click Ecosystem > Software Catalog in the tree pane followed by typing metal in the search field. Click the tile for the MetalLB operator.
- Click Install to add the subscription to the cluster.
- Review the namespace path as metallb-system and any other details required. Click Install to complete the operator installation.
- Once installation has completed, click View Operator.
- Details of the operator is shown, and the option to configure the MetalLB operands are provided in the UI. Locate the MetalLB tab and then click Create MetalLB.
- Default settings work for the operand, so just click Create.
- Once installed, the MetalLB service will show as Available.
- Next is creating the IP Address Pool followed by the Layer 2 Advertisement of the new IP pool. Here are the sample YAML files you can apply via the UI under the related tab in the operator, or use via 'oc' command line.
Example IP Pool
IP addresses can be either IPv4 or IPv6. A range can be specified, or a full subnet like 10.30.30.0/24. Also, with ranges and subnets, IPs can also be excluded. By setting the pool to autoAssign, the first available IP is chosen for the API address, so make sure that the first IP in a range matches the DNS A record configuration for api.clusterA.example.com
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: hcp-ip-pool
namespace: metallb-system
spec:
addresses:
- 10.10.10.70-10.10.10.72
autoAssign: true
avoidBuggyIPs: falseExample Layer 2 Advertisement
The layer 2 advertisement can be defined across all interfaces by omitting the spec:interfaces section if required.
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: hcp-l2
namespace: metallb-system
spec:
ipAddressPools:
- hcp-ip-pool
interfaces:
- br-ex
- bond0.100
Deploy the Hosted Cluster
- From the OpenShift console, click on Core Platform and change the view to Fleet Management.
- In Fleet Management, expand Infrastructure > Clusters. Click Create Cluster.
- Click Host Inventory.
- Click Hosted. Note: Administrators can also deploy a full standalone cluster here instead of using Hosted Control Planes.
- The Create Cluster wizard appears. Select the credentials that were created previously and then input the cluster name <clusterA>, the base domain<example.com>, and select the OpenShift version to deploy. The pull secret should be auto-populated by the saved credentials. Click Next.
- In Node Pools, select the node pool created previously from the host inventory. Here autoscaling can be enabled so that additional hosts are added to the cluster as needed by defining the minimum and maximum host count. If the cluster is being built from multiple node pools, add additional pools as needed, otherwise click Next.
- In Networking, select LoadBalancer and click Next. This enables the use of MetalLB for the API address.
- Lastly, review the details and click Create. The wizard closes and the new cluster installation status appears. This will take some time to complete, as the bare metal hosts will need to power on and boot the remote ISO image from the management cluster to begin the installation. If you're eager to see what's happening beyond the cluster status page, the Core Platform page will also show details such as pods starting under the project <clusterA-clusterA> and under Compute > Bare Metal Hosts, switch to the project name matching the host inventory name to see the current provisioning state.
- When the installation is complete, not all of the conditions will show green. That's because access to the console UI still needs to be established. To do this, we need to install the MetalLB operator again, but this time on the hosted cluster instead of the management cluster.
- To begin administering the new cluster, download the kubeconfig file and copy the contents to ~/.kube/config on a linux based jump host that can connect to the environment. Run the command
oc get nodesto verify that the API service is running. The kubeadmin account for the cluster is also displayed on the cluster inventory page towards the bottom.
Deploy MetalLB on the Hosted Cluster
As mentioned earlier, the MetalLB operator on the management cluster manages the API ingress for all of the hosted clusters connected to it. However, the console pods run on the hosted cluster itself. Here's the steps to deploy MetalLB on the hosted cluster, which has to be done via CLI, since the console is not available yet.
- Using the kubeconfig for the hosted cluster, create each of these files and run the command
oc create -f <filename>.yamlfor each of them in order.
MetalLB Subscription Example
This YAML file creates the subscription for accessing the operator.
apiVersion: v1
kind: Namespace
metadata:
name: metallb-system
labels:
openshift.io/cluster-monitoring: "true"
annotations:
workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: metallb-operator-operatorgroup
namespace: metallb-system
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: metallb-operator
namespace: metallb-system
spec:
channel: "stable"
name: metallb-operator
source: redhat-operators
sourceNamespace: openshift-marketplaceMetalLB Installation Example
Once the subscription is active, MetalLB can be installed using the YAML below.
apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
name: metallb
namespace: metallb-systemTo verify everything is created, run oc get metallb -n metallb-system. The deployment should appear in the list.
MetalLB IP Pool Example
For console ingress services, a single IP should be used, which needs to match the IP address created for the wildcard DNS A record <*.apps.clusterA.example.com> that was created previously.
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: hcp-ip-pool
namespace: metallb-system
spec:
protocol: layer2
autoAssign: false
addresses:
- 10.10.10.75-10.10.10.75Verification of the IP Pool can be done by running oc get ipaddresspool -n metallb-system. The output should look similar to this:
NAME AUTO ASSIGN AVOID BUGGY IPS ADDRESSES
hcp-ip-pool false false ["10.10.10.75-10.10.10.75"]
MetalLB Layer 2 Advertisement Example
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: hcp-l2
namespace: metallb-system
spec:
ipAddressPools:
- hcp-ip-pool
interfaces:
- br-ex
- bond0.100Verification of the IP Pool can be done by running oc get l2advertisement -n metallb-system. The output should look similar to this:
NAME IPADDRESSPOOLS IPADDRESSPOOL SELECTORS INTERFACES
hcp-l2 ["hcp-ip-pool"] ["br-ex","bond0.100"]
MetalLB Service Example
The last step is to create the service file to map the IP address to the OpenShift console.
kind: Service
apiVersion: v1
metadata:
annotations:
metallb.io/address-pool: hcp-ip-pool
name: metallb-ingress
namespace: openshift-ingress
spec:
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443
selector:
ingresscontroller.operator.openshift.io/deployment-ingresscontroller: defaultVerification of the service can be completed by running oc get svc -n openshift-ingress. The output should look similar to this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metallb-ingress LoadBalancer 172.31.73.52 10.10.10.75 80:32363/TCP,443:30315/TCP 106s
router-internal-default. ClusterIP 172.31.110.78 <none> 80/TCP,443/TCP,1936/TCP 157m
After all the files are applied to the cluster, the console should now be accessible. Test connectivity by using the URL from the Cluster Inventory page in ACM, which is something similar to https://console-openshift-console.apps.clusterA.example.com . The OCP login page should appear and allows the kubeadmin account from the cluster inventory screen to login.
Next steps
That wraps up a hosted cluster deployment. There's a lot more to cover here on setting up LDAP, installing more operators, exploring cluster pools for on-demand scalable solutions, and importing existing clusters to the Advanced Cluster Management (ACM) interface. For details on this and what other solutions we're showcasing in the GSP Experience Center, contact your account team and ask for a demo!