Skip to main content

A modern cloud native (and self serve) way to manage Virtual Machines

Really!! Are there could native way to deploy, LCM VMs and add Self Serve on top ????

In this post I will describe an art of the possibility using the below tools:

All of these projects can be run on Red Hat OpenShift (Open source project: OKD) OR on other Kubernetes distribution or on VMs (you pick your underlying infra. For this post I have used OpenShift for simplicity of deployment, integrated tools and narrowly focusing on the usecases instead of the deployment of the tools). 


The main goal here is to: Easily deploy and lifecycle applications and stuffs in VMs.

Here're the purpose of the tools:

  • RHDH to be used as self serve portal for operations teams (VM Management)
  • RHDH to be used as self serve portal for application teams (Release VM based application and depedencies)
  • OCP Virtualization and BootC is used to treat VMs as like they are container, at least from DevOps perspective, without compromising VM type operations and security (Please don't qoute me here as I am not the SME on this matter)
  • RHEL BootC is the used for 
    • packaging applicatoins with immutability
    • works really well with OCP Virtualization 
    • significantly simplifies the LCM of the application (and underlying depedencies and OS layer) without compromising security posture 
  • GitOps is used as the tying actions with RHDH (eg: VMs deployment, trigger AAP etc)



Now that the what and the why is out of the way I will describe the how part.

Demo Video:


How do we put an application in RHEL BootC VM:


step1:

Just like we would package an application in a container we will package it in a RHEL BootC container.  

See below the Dockerfile where I am packaging a LAMP stack in a BootC container


FROM registry.redhat.io/rhel9/rhel-bootc:9.4
          
RUN mkdir -p /run/httpd /run/mariadb /var/lib/php/opcache

RUN subscription-manager register --activationkey=xxxx --org=xxxx
RUN subscription-manager status


RUN dnf module enable -y php:8.2 nginx:1.22 && dnf install -y httpd mariadb mariadb-server php-fpm php-mysqlnd && dnf clean all

RUN dnf -y install cloud-init && \
    ln -s ../cloud-init.target /usr/lib/systemd/system/default.target.wants && \
    dnf clean all


RUN systemctl enable httpd mariadb php-fpm

RUN mkdir -p /etc/firewalld/services
RUN systemctl enable firewalld
RUN firewall-offline-cmd --add-service=http

RUN echo '<h1 style="text-align:center;">{{ appconfig }}</h1> <?php phpinfo(); ?>' >> /usr/local/index.php
RUN ln -s /usr/local/index.php /var/www/html/index.php

Note: We can also fit this into an existing SOI process as well. For example, a SOI engineer can create and release the base image from BootC. An application engineer can then base off their image from that.


step2:

This is our base container or golden image from which we can generate distribution for different hypervisors. Fot this post, because I am going to deploy it on OCP Virtualization I am going to generate a qcow2 disk out of it.

podman run --privileged \
    -v /var/lib/containers/storage:/var/lib/containers/storage \
    -v .:/output \
    quay.io/centos-bootc/bootc-image-builder:latest \
    --type qcow2 \
    <the-container-image>

step3:

To make it a bootable (VM bootable that is) container we will wrap the disk in another container called UBI. Here's a Dockerfile for that:

FROM registry.redhat.io/{{ ubi }}/ubi:{{ ubiversion }} AS builder
ADD --chown=107:107 disk.qcow2 /disk/ 
RUN chmod 0440 /disk/*

FROM scratch
COPY --from=builder /disk/* /disk/


This is the container we can use as bootable container for OCP Virtualization. This will create a RHEL VM (just like any other RHEL VM) but using container. When I read about it first it also sounded to me "that nuts! How the hell??" But really the credit here goes to OCP Virtualization and UBI image.

At this point if you feel like "I could have done that with any Linux disti, what's so special about bootc?" -- you would be right. The real benefit of RHEL BootC is about "lifecycle the container way". It will make sense in the "How do we LCM it" section.


How do we automate the process:

It is a 3 parts process. We will handle first 2 parts here. The 3rd part deserves its own section.

Part 1: Automating the application release process: 

We already know how to do it (as described in the "How do we put an application in RHEL BootC VM" section). We can simply automate the steps in an AAP playbook with few more additional steps to templatise and upload to a container registry. Below is the playbook:


---
- name: Hardcoded BootC to UBI Workflow with Registry Login
  hosts: localhost
  connection: local
  gather_facts: false

  environment:
    # STORAGE_DRIVER: overlay
    STORAGE_OPTS: "ignore_chown_errors=true"

  tasks:

    # --- PHASE -1: CLEANUP ---
    - name: Reset Podman storage to fix driver mismatch
      ansible.builtin.command: podman system reset --force
      ignore_errors: true # In case it's already empty


    # --- PHASE 0: AUTHENTICATION ---
    - name: Login to Red Hat Registry
      ansible.builtin.command: >
        podman login registry.redhat.io 
        -u "xxx" 
        -p "xxxxxxxxxxxxxxx"

    - name: Login to Quay.io
      ansible.builtin.command: >
        podman login quay.io 
        -u "xxxxxxxx" 
        -p "xxxxxxxxxxx"

    # --- PHASE 1: SETUP WORKSPACE ---
    - name: Create build directories
      ansible.builtin.file:
        path: "{{ item }}"
        state: directory
      loop: ["./bootc", "./ubi9", "./output"]

    
    # --- PHASE 1.2: INLINE DOCKERFILES (Parameterized) ---
    - name: Create Inline BootC Dockerfile
      ansible.builtin.copy:
        dest: "./bootc/Dockerfile"
        content: |-
          FROM registry.redhat.io/rhel9/rhel-bootc:9.4
          
          RUN mkdir -p /run/httpd /run/mariadb /var/lib/php/opcache

          RUN subscription-manager register --activationkey=xxxxxxx --org=xxxxxxxxx
          RUN subscription-manager status

          
          RUN dnf module enable -y php:8.2 nginx:1.22 && dnf install -y httpd mariadb mariadb-server php-fpm php-mysqlnd && dnf clean all

          RUN dnf -y install cloud-init && \
              ln -s ../cloud-init.target /usr/lib/systemd/system/default.target.wants && \
              dnf clean all

          
          RUN systemctl enable httpd mariadb php-fpm

          RUN mkdir -p /etc/firewalld/services
          RUN systemctl enable firewalld
          RUN firewall-offline-cmd --add-service=http
          
          RUN echo '<h1 style="text-align:center;">{{ appconfig }}</h1> <?php phpinfo(); ?>' >> /usr/local/index.php
          RUN ln -s /usr/local/index.php /var/www/html/index.php

    - name: Create Inline UBI9 Wrapper Dockerfile
      ansible.builtin.copy:
        dest: "./ubi9/Dockerfile"
        content: |-
          FROM registry.redhat.io/{{ ubi }}/ubi:{{ ubiversion }} AS builder
          ADD --chown=107:107 disk.qcow2 /disk/ 
          RUN chmod 0440 /disk/*

          FROM scratch
          COPY --from=builder /disk/* /disk/

    
    
    # --- PHASE 2: BOOTC BUILD ---
    # We use --userns=host to fix the 'invalid argument' lchown error
    - name: 1. Build the BootC Image
      ansible.builtin.command: >
        podman build 
        --userns=host 
        --network=host
        -t localhost/rhel9-bootc-lamp:latest 
        ./bootc/
      
    # --- PHASE 3: QCOW2 EXTRACTION ---
    - name: Run BootC Image Builder (Privileged)
      vars:
        # This gets the actual current directory of the pod at runtime
        current_dir: "{{ lookup('pipe', 'pwd') }}"
      ansible.builtin.command: >
        podman run --rm --privileged
        --network host
        -v /var/lib/containers/storage:/var/lib/containers/storage:Z
        -v {{ lookup('pipe', 'pwd') }}/output:/output:Z
        registry.redhat.io/rhel9/bootc-image-builder:latest
        --type qcow2
        --local
        localhost/rhel9-bootc-lamp:latest


    # --- PHASE 4: UBI WRAPPER BUILD ---
    - name: Prepare UBI Build Context
      ansible.builtin.copy:
        src: "./output/qcow2/disk.qcow2"
        dest: "./ubi9/disk.qcow2"
        remote_src: true

    # --- PHASE 5: UBI WRAPPER BUILD ---
    - name: 3. Build Final UBI9 Wrapper Image
      ansible.builtin.command: >
        podman build 
        --network=host
        --userns=host 
        -t quay.io/coolproject/my-rh9-bootc:{{ imageversion }}
        ./ubi9/

    # --- PHASE 6: PUSH ---
    - name: Push Golden Image to Registry
      ansible.builtin.command: >
        podman push quay.io/coolproject/my-rh9-bootc:{{ imageversion }}

Notice the templatisation and ignore the passwords and usernames.

A good question here is "why templatise". Well, 1 it is a good practice and 2 we will leverage the template to add "Self Serve" to it.

Here's a screen shot of the container registry hosting the built golden image:

Part 2: Automating the VM deployment process:

Ok, in this example, we are not automating it. We are "self Serv"ing it. But the process remains the same. It can easily be automated using the same way via an automation. 

Note: I have used AAP to automate here. But the good thing about this approach it we can literally use any CI tools (eg: Tekton, github actions, azure devops etc) to achieve the same.


How do we add Self Serve to it:

We are using RHDH as the self serve portal. We need few plugins for the RHDH templates which is more related to the topic of "deploying and enabling RHDH on OpenShift". So, rather than describing it in this post here is the link to the github repo where relevant yamls exist.

Once RHDH has the relevant plugings enabled we will implement 2 templates for self serving “the release” and “the deployment” of the applications/VMs. Ofcourse the CI and CD could be a triggered process with gitops. For this example lets make it self serve on demand (rather than auto triggered with git commit; this example has some details to implement the auto trigger using tools like tekton, github action, azure devops etc).

Here're a few screenshot from my RHDH for this:




Template # Self Serve the "release" of the package:

This github repo contains the yamls for the template.



The thing to highlight in this template is that we have a templated an "app config" block through with we can change the configuration of the application being package. Imagine the similar concept to change what we install as dependencies or other apps on VM OR have different template for different category of golden images for the different VMs (really this is only limited to the imagination).


Template # Self Serve deploy VMs from the golden image:

Once golden images for VMs are are created (from the above described) we have the image ready to be deployed. In this case the bootable container is ready in our container registry. All we need to do is to create a deployment definition of a OpenShift Virtualization VirtualMachine CRD. Ofcouse we can easily templatise this using RHDH. Basically through this template we can control which golden image to deploy, vm configuraions (eg: CPU, Memory, Storage etc) and any other thing that we (as the ops team) want to give control to the self serving users (the app team or any other downstream teams). 

This github repo contains the yamls for the template.

Here's the screenshot of the template in action:



Here's the VirtualMachine definition is generated:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: bootc-hello-world
  namespace: myvms1
  finalizers:
    - kubevirt.io/virtualMachineControllerFinalize
spec:
  dataVolumeTemplates:
    - metadata:
        name: bootc-hello-world
      spec:
        source:
          registry:
            url: "docker://quay.io/alitestseverything/my-rh9-bootc:hw1"
            secretRef: quay-registry-secret 
        storage:
          resources:
            requests:
              storage: 15Gi
  running: true
  template:
    metadata:
      annotations:
        vm.kubevirt.io/flavor: medium
        vm.kubevirt.io/os: rhel.9
        vm.kubevirt.io/workload: server
      creationTimestamp: null
      labels:
        network.kubevirt.io/headlessService: headless
        kubevirt.io/domain: bootc-hello-world
        kubevirt.io/size: medium
    spec:
      accessCredentials:
        - sshPublicKey:
            propagationMethod:
              noCloud: {}
            source:
              secret:
                secretName: common-vm-ssh-key
      architecture: amd64
      domain:
        cpu:
          cores: 2
          sockets: 1
          threads: 1
        memory:
          guest: 8Gi
        devices:
          disks:
            - disk:
                bus: virtio
              name: rootdisk
            - disk:
                bus: virtio
              name: cloudinitdisk
          interfaces:
            - masquerade: {}
              model: virtio
              name: default
        features:
          acpi: {}
          smm:
            enabled: true
        firmware:
          bootloader:
            efi: {}
        machine:
          type: q35
        resources: {}
      networks:
        - name: default
          pod: {}
      terminationGracePeriodSeconds: 180
      volumes:
        - dataVolume:
            name: bootc-hello-world
          name: rootdisk
        - cloudInitNoCloud:
            userData: |
              #cloud-config
              chpasswd:
                expire: false
              password: xxxx
              user: xxxx
          name: cloudinitdisk


Here's the screenshot of VM deployed:



Right now the way the self serve works (eg: triggering the golden image build process, deploying the VM from bootable container etc) together is via GitOps. The RHDH action is to create a pull request with relevant K8s objects such as

  • Kind: AnsibleJob to trigger ansible job template for creating golden image. Enhancement note: I think, it is possible to create a RHDH/Backstage custom action to deploy the object directly in the OpenShift by passing the GitOps process in the middle. The reason is that these AnsibleJob definitions are basically garbage object that's not needed after the ansible job is triggered.
  • Kind: VirtualMachine to create VMs in the OpenShift cluster. The GitOps should remain here as this is also IaC (which is K8s native).
  • etc

Upon approval of the pull request the GitOps (ArgoCD) deploys the artifact in the cluster and the actual object (VM, Jobs etc) is handled by K8s it self as the orchestrator.


How do we LCM it:

Ok, now that we have created a very cool automated process to release golden images and deploy that to VMs lets focus on LCM of the VM or Package. This is where the power of BootC comes in handy. To describe it (to the best of my ability without confusing people), it is a 2 part process.

The process is:

Part -1: Create the bootc container:

This is the release of new version of the package / golden image. The new version could be for a new release of application or upgrade of OS or depedency tools. Regardless, we need to create the bootc container with updated contents. This is described in the "step1" of "How do we put an application in RHEL BootC VM" section (we dont need to do step2 or step3 to create the release).

Part -2: upgrade or deploy with new release:

We have 2 choices here. 

Choice - 1: We can perform an upgrade inside the existing VM. This is very cool, because this works slightly differently that traditional VM upgrade process. The way it works is when the upgrade command is executed in the VM is downloads the updated layers of the container image and updates them. This process is faster and simple and when prepared in the operations the right way it can save time.

Here's the code to perform bootc upgrade in the VM:

# download the upgraded layers

sudo bootc upgrade
# Reboot to apply the upgrade sudo reboot

More details here and here.

We coded another AAP playbook for automating the manual upgrade process. Which could be linked to RHDH for self serve trigger upgrade. 

Note: the upgrade could be tied to new release of app as well (because it’s not the traditional VM image and everything is container layers)



Choice - 2: Perform the step1,2 and 3 from "How do we put an application in RHEL BootC VM" section and rather than upgrading a new bootable container image is created and deployed (just like container).


Conclusion:

Hopefully, I was able to articulate it. The question here is that is this a way an organisation can simplify their VMs fleet management? I would imagine the answer is not that simple but it is a good start. 





Comments

Popular posts from this blog

Passwordless Auth to Azure Key Vault using External Secret and Workload Identity

I want to fetch my secrets from Azure KV and I don't want to use any password for it. Let's see how this can be implemented. This is yet another blog post (YABP) about ESO and Azure Workload Identity. Why Passwordless Auth: It is a common practice to use some sort of "master password" (spn clienid, clientsecret etc) to access Secret Vaults (in this case it is AZ KV) but that master password becomes a headache to manage (rotate, prevent leak etc). So, the passwordless auth to AKV is ideal.  Why ESO: This is discussed and addressed in the conclusion section. Workload Identity (Passwordless Auth): Lets make a backward start (just for a change). I will try to explain how the passwordless auth will work. This will make more sense when you will read through the detailed implementation section. Here's a sequence diagram to explain it: There's no magic here. This is a well documented process by microsoft  here . The below diagram (directly copied from the official doc...

The story of a Hack Job

"So, you have hacked it" -- Few days ago one of the guys at work passed me this comment on a random discussion about something I built. I paused for a moment and pondered: Do I reply defending how that's not a hack. OR Do I just not bother I picked the second option for 2 reasons: It was late. It probably isn't worth defending the "hack vs" topic as the comment passed was out of context. So I chose the next best action and replied "Yep, sure did and it is working great.". I felt like Batman in the moment. In this post I will rant about the knowledge gap around hacking and then describe about one of the components of my home automation project (really, this is the main reason for this post) and use that as an example how hacking is cool and does not always mean bad. But first lets align on my definition of hacking: People use this term in good and bad, both ways. For example: "He/she did a hack job" -- Yeah, that probably...

Openshift-Powered Homelab | Why, What, How

I wanted to build a Homelab for some time but it was taking a backseat as I always had access to cloud environments (eg: cloud accounts, VMware DC etc) and the use cases I was focusing on didn't really warrant for one. But lately, some new developments and opportunities in the industry triggered the need to explore use cases in a bare-metal server environment, ultimately leading to the built of my own homelab, called MetalSNO. In this post, I will discuss some of my key reasons for building a homelab, the goals I set for it, and the process I followed to building one from scratch. I'll conclude with some reflections on whether it was truly worth it and what I plan to do with it going forward. Compelling reasons (The Why ) My uses cases for a homelab weren't about hosting plex server, home automation etc (I have them on Raspberry PIs for some years now). My Homelab is really about exploring technologies and concepts that are on par with industry trend. Below are some of the ...

Understanding The Ingress and The Mesh components of Service Mesh

I wrote about the key concepts about service mesh and how to evaluate the requirements for a service mesh in my previous post here:  Deciphering the hype of Service Mesh . This post is a follow up from there covering the technical aspects. Part 1:   Deciphering the hype of Service Mesh Part 2:   Understanding The Ingress and The Mesh components of Service Mesh. Part 3: Uderstanding the observability component of Service Mesh (TBD in another post).  Almost all popular service mesh technologies/tools (eg: Istio, LinkerD) have both ingress and mesh capabilities. Conceptually, I see them as 2 mutually exclusive domain (integrated nicely by the underlying tool). Understanding  the ingress  and  the mesh  components individually, such as what they offer, what I can do with them etc, was the basic building block to my understanding of service mesh technology as a whole. This is arguably the most mis-represented topic in the internet. So, I thought,...

CKA Exam; May 2024: My take on it and cheat sheet

So, I finally got the little green tick of having CKA certification in my certification list. I put off this exam for so long that it seriously became not funny anymore. The internet has quite literally way more than 1000 posts on this topic. But what harm would one more post cause? So here's mine. I will write it from my perspective. I am writing this post just in case if anyone benefits from it, as I predict there could be many on the same boat as me. Background: Kubernetes, modern application architecture, DevSecOps etc are not new territory for me. In fact, I think I am fairly versed in K8s and related tech stack. But due my own imposter syndrome I have been putting off sitting the CKA exam. However, last week I thought about the CKA as "just another approval for my skills" and got the nudge to sit the exam.  Here's what I did till the day I sat for the exam. (Everybody is different but the below worked for me the best) The preparation: As I have been working with...