Skip to main content

Managing devices using Edge Manager

Managing edge devices has been a complex process as traditional IT ops tools fall short in distributed, low-connectivity environment to manage huge quantity of devices.  Red Hat Edge Manager (Open source project: FlightControl, GA'd by Red Hat on late Jan, 2026) solves these challenges by providing streamlined management of edge devices and applications through a declarative approach. Now, there's a fair bit to unpack here. But for simplicity this is how I am going to map those 3 things here:

  • Management of edge devices: I am mapping this to LCM (including upgrade, patch etc) of the underlying OS (in this case RHEL OS of BootC flavor or at least UBI based RHEL ).
  • Managing applications: Mapping this to deploying applications and LCM of the applications stack on the OS.
  • Declarative approach: This one is super interesting. To me this is very K8s-yy but in the world of edge devices running linux (RHEL OS, as of today).
And then this thing also has MCP: This is my next probe. It is still fairly new, but so far what I have found it certainly packs a lot of potential. Keep an eye here for my take on this (coming soon..)

The Setup:

Before we get onto its usecases (the 3 points mentioned above) lets quickly look at "the setup".

I have an OpenShift Hub cluster (OCP 4.20.15) where I have RHACM 2.15. I manage my edge servers (SNOs) from this RHACM. I deployed Red Hat Edge Manager on this Hub Cluster and exposed its UI integrated to the Fleet Management UI (that came with RHACM UI).

Note: The RHEM (Red Hat Edge Manager) helm chart is only available from OCP 4.19 on wards. It also requires RHACM 2.15 for UI integration.

Below is the diagram to visualise it.


Demo

RHEM (+bootc) or AAP (+bootc)

To be honest, I am not sure if there's a generic "one over the other" answer here. It may depend on factors like operational process, team size, team skills number of devices, upgrade cycle, release processes etc. Here's a diagram to visualise the differences:


In my opinion:
  • RHEM + bootc is more like continuous delivery whereas AAP + bootc is more like releases in cycles.
  • RHEM + bootc introduces separation of concerns by design where os release, config changes and applications releases are independent whereas AAP + bootc needs to package applications, configurations etc in bootc container image and released as part of the OS release.
  • Continuous config drift management is baked in by design in RHEM whereas using AAP it needs to be implemented. But BootC makes this technical difference very low impacting. 
  • When bootc is used their differences becomes more of a architectural principle (of course technical differences exist bit both may perform the same). This is because bootc packages it anyway and separation of concern may become a design choice rather than technical limitation.

Ok, with that out of the way lets get into technical bits and bobs.

RHEM Setup:

The deployment of RHEM on to the Hub Cluster and expose its UI in the Fleet Management interface was easy enough. The official documentation is pretty good (for the most part; I will list out few gotchas in this post). I followed it to deploy RHEM 1.0 on my OCP 4.20.

RHEM deployment on OCP:


Step1: 
Create NS and extract the cert to be used.

kubectl create ns redhat-rhem

kubectl get configmap default-ingress-cert \
    -n openshift-config-managed \
    -o jsonpath='{.data.ca-bundle\.crt}' > /tmp/ingress-ca.crt


Step2:
Select the project and install RHEM by browsing ecosystem > software catalog. Select the right operator (at the time of this post, only helm was available). Just before you deploy go to "yaml view" and edit the yaml to add the ingress-ca.crt previously extracted. Finally click deploy to deploy it. You can check the pods coming the created namespace (eg: redhat-rhem). Here's mine:


## most of it is default


global:
  auth:
    k8s:
      apiUrl: 'https://kubernetes.default.svc'
      createAdminUser: true
    oidc:
      organizationAssignment:
        organizationName: default
        type: static
      usernameClaim:
        - preferred_username
      roleAssignment:
        claimPath:
          - groups
        type: dynamic
      clientId: flightctl-client
    openshift:
      clusterControlPlaneUrl: 'https://kubernetes.default.svc'
      createAdminUser: true
    caCert: |
      -----BEGIN CERTIFICATE-----
      extacted cert contemt
      -----END CERTIFICATE-----    
    insecureSkipTlsVerify: false
  gateway:
    ports:
      http: 80
      tls: 443
  enableMulticlusterExtensions: auto
  enableOpenShiftExtensions: auto
  exposeServicesMethod: auto
  generateCertificates: auto
  imagePullPolicy: IfNotPresent
  multiclusterEngineNamespace: multicluster-engine
db:
  builtin:
    image:
      image: registry.redhat.io/rhel9/postgresql-16
      tag: 9.6-1752571367
    storage:
      size: 60Gi
    resources:
      requests:
        cpu: 512m
        memory: 512Mi
    maxConnections: 200
  external:
    port: 5432
  name: flightctl
  type: builtin
kv:
  image:
    image: registry.redhat.io/rhel9/redis-7
    tag: 9.6-1752567986
  loglevel: warning
  maxmemory: 1gb
  maxmemoryPolicy: allkeys-lru
alertmanager:
  image:
    image: registry.redhat.io/rhacm2/prometheus-alertmanager-rhel9
    tag: v2.13.2
  enabled: true
api:
  image:
    image: registry.redhat.io/rhem/flightctl-api-rhel9
  rateLimit:
    trustedProxies:
      - 10.0.0.0/8
      - 172.16.0.0/12
      - 192.168.0.0/16
    authRequests: 20
    authWindow: 1h
    enabled: true
    requests: 300
    window: 1m
cliArtifacts:
  image:
    image: registry.redhat.io/rhem/flightctl-cli-artifacts-rhel9
  enabled: true
worker:
  image:
    image: registry.redhat.io/rhem/flightctl-worker-rhel9
  clusterLevelSecretAccess: false
periodic:
  image:
    image: registry.redhat.io/rhem/flightctl-periodic-rhel9
  consumers: 5
alertExporter:
  image:
    image: registry.redhat.io/rhem/flightctl-alert-exporter-rhel9
  enabled: true
alertmanagerProxy:
  image:
    image: registry.redhat.io/rhem/flightctl-alertmanager-proxy-rhel9
  enabled: true
telemetryGateway:
  image:
    image: registry.redhat.io/rhem/flightctl-telemetry-gateway-rhel9
ui:
  image:
    image: registry.redhat.io/rhem/flightctl-ui-rhel9
    pluginImage: registry.redhat.io/rhem/flightctl-ui-ocp-rhel9
  auth:
    insecureSkipTlsVerify: false
  enabled: true
clusterCli:
  image:
    image: registry.redhat.io/openshift4/ose-cli-rhel9
    tag: v4.20.0
dbSetup:
  image:
    image: registry.redhat.io/rhem/flightctl-db-setup-rhel9
  wait:
    sleep: 2
    timeout: 60
  migration:
    activeDeadlineSeconds: 0
    backoffLimit: 2147483647
upgradeHooks:
  scaleDown:
    deployments:
      - flightctl-periodic
      - flightctl-worker
    condition: chart
    timeoutSeconds: 120
  databaseMigrationDryRun: true


Step3:
I my case (rhem version 1.0), I had to deploy the RHEM console plugin (this may not be required in future versions and the helm deployment should be able auto detect RHACM presence and integrate the console plugin accordingly).

cat <<EOF | kubectl apply -f -
apiVersion: console.openshift.io/v1
kind: ConsolePlugin
metadata:
  name: flightctl-plugin
spec:
  displayName: 'Red Hat Edge Manager'
  backend:
    type: Service
    service:
      name: flightctl-ui
      namespace: redhat-rhem
      port: 8080
      basePath: '/'
  proxy:
    - alias: api-proxy
      authorization: UserToken
      endpoint:
        type: Service
        service:
          name: flightctl-ui        #---> pay attention here. point to UI instead of API.
          namespace: redhat-rhem
          port: 8080                #---> pay attention here
EOF

kubectl patch cm flightctl-ui -n redhat-rhem --type=merge -p '{
  "data": {
    "FLIGHTCTL_SERVER": "/api/proxy/plugin/flightctl-plugin/api-proxy/",
    "FLIGHTCTL_SERVER_EXTERNAL": "/api/proxy/plugin/flightctl-plugin/api-proxy/"
  }
}'

kubectl patch console.operator.openshift.io cluster \
--type=json -p='[{"op": "add", "path": "/spec/plugins/-", "value": "flightctl-plugin"}]'


Step4:
Create RHEM organization.

kubectl create ns rhem-default-org
oc label namespace rhem-default-org io.flightctl/instance=redhat-rhem


Note: from documentation, it is said that:
  • Namespace-to-Organization Mapping: Red Hat Edge Manager uses a 1:1 mapping between OpenShift namespaces and Organizations.
  • Automatic Discovery: The act of labeling a namespace with io.flightctl/instance=<helm_release-name> triggers the automatic discovery and initialization of that namespace as a Red Hat Edge Manager Organization. Here helm-release-name=redhat-rhem
Step5:
Configure authorization for this org:

oc adm policy add-role-to-user view admin -n rhem-default-org
oc adm policy add-role-to-user flightctl-admin-redhat-rhem admin -n rhem-default-org



That's it. RHEM should be up and running and accessible via UI and API. See screen shots below:




RHEM integration with RHEL:

In order to integrate the device to RHEM (ie: bring the device under RHEM's management) we need to do 2 things on the RHEL. In my case, since I was using BootC I added the below code in my bootc Dockerfile:

Install flightctl-agent

I added the below line in my Dockerfile:
RUN dnf -y install dnf-plugins-core && dnf clean all
RUN dnf config-manager --add-repo https://rpm.flightctl.io/flightctl-epel.repo
RUN dnf -y install flightctl-agent && dnf clean all

Add flightctl config to the OS 

This is so that it knows where to report to and where to get the desired states from.
In order to do this I first needed to install flightctl cli on local machine. In my case, I spun up a docker (Dockerfile). Then executed the blow commands:

flightctl login https://api.redhat-rhem.apps.myhub.xyz -k --token=$(oc whoami -t)

flightctl certificate request \
--signer=enrollment \
--expiration=365d --output=embedded > config.yaml

Then I used this config.yaml file like below in my Docker:
RUN mkdir -p /etc/flightctl/
ADD config.yaml /etc/flightctl/config.yaml

Note: Since I am using BootC, managing the config.yaml is a critical part of the device identity lifecycle. The way the "bind" is done determines how the device first communicates with the management server. 
  • Early Binding (Baked into the Image): This is what I have done. 
  • Late Binding (Injected at Provisioning): In this scenario, the OS image is "generic," and the config.yaml is provided to the VM at the very first boot. This is done using cloud-init (NoCloud/ConfigDrive) or Ignition to write the file to /etc/flightctl/config.yaml during the hardware provisioning phase.
That's it for the setup of RHEM.
Summarising what is done so far:
  • deployed/enabled RHEM on OCP 
  • exposed its integrated UI with RHACM. 
  • established pattern for early binding registering a BootC RHEL 9.7 (eg: a kiosk simulation) by adding the flightctl config.yaml in the bootc definition (ie: Dockerfile).

Setup declarative desired state:

Below is a visualisation of how this works:


Describing the diagram:
  1. I created a Git repo to store the declarative state definition files.
  2. I create a Repository object in RHEM containing reference to the Git repo and authentication details (no, k8s git auth creds won't work here, at least for now).
  3. I created a ResourceSync object that contains references such as which git repo, which path, branch etc. This is the object that syncs the source code from Git repo to RHEM. This is almost like GitOps type behavior. But it is limited to Fleet object (at least for now).
  4. The Fleet object is the definition of declarative state of the devices. It lives in the Git repo and get created, updated, deleted etc in RHEM.
  5. When a device/OS (containing the config.yaml from RHEM server) starts it registers itself with RHEM. The system is represented as Device object. These Device objects needs to be approved. These devices can also be tagged matching with tag of a Fleet.

Note: All these object can be created without having any Git repo or config as code via RHEM UI. But ConfigAsCode is my preferred method.

Repository OBJ:

apiVersion: flightctl.io/v1beta1
kind: Repository
metadata:
  name: rhem-default
spec:
  url: https://github.com/xxxxxx/rhem-repo.git
  type: git
  httpConfig:
    username: xxxxxxxx
    password: xxxxxxxxx
    skipServerVerification: true


ResourceSync OBJ:

apiVersion: flightctl.io/v1beta1
kind: ResourceSync
metadata:
  name: fleet
spec:
  repository: rhem-default
  path: /rhem/fleets/default-fleet.yaml
  targetRevision: main

Note: Although these objects looks like K8s objects, (I learned the hard way that) the rhem deployment does not create CRD for them (at least, as of the time this post is written). Meaning, kubectl create/apply won't work and needs to be done using flightctl cli.
If these becomes CRDs then we can apply these using gitops or RHACM policies.

flightctl apply -f repository.yaml
flightctl apply -f resourcesync.yaml

Fleet OBJ:

Now the cool part begins. The fleet object contains declarative definition for the devices that will be or are under its management. Is specifies:
  • OS image
  • OS configs
  • Applications
The above are great for:
  • mitigating config drifs
  • lowering operating cost


apiVersion: flightctl.io/v1beta1
kind: Fleet
metadata:
  name: default
spec:
  selector:
    matchLabels:
      fleet: default
  template:
    metadata:
      labels:
        fleet: default
    spec:
      os:
        image: quay.io/alitestseverything/my-rhel9-bootc:kioskv2
applications: - appType: container envVars: {} image: quay.io/alitestseverything/product-api:v1 name: product-api ports: - '3000:3000' config: - name: "silence-audit-logs" inline: - path: "/etc/sysctl.d/20-quiet-printk.conf" content: "kernel.printk = 3 4 1 3" mode: 0644 # STEP 1: Force the creation of the writable directory via tmpfiles # - name: prep-writable-storage # inline: # - path: "/etc/tmpfiles.d/kiosk.conf" # content: | # d /var/www 0755 root root - # d /var/www/html 0755 root root - # d /etc/flightctl/hooks.d/afterupdating 0755 root root - # mode: 0644 - name: prep-writable-storage inline: - path: "/etc/tmpfiles.d/kiosk.conf" content: | d /etc/flightctl/hooks.d/afterupdating 0755 root root - mode: 0644 - name: kiosk-refresh-hook inline: - path: /etc/flightctl/hooks.d/afterupdating/10-refresh-browser.yaml content: | - run: /usr/local/bin/kiosk-refresh.sh if: - path: /var/lib/kiosk/html op: [created, updated, deleted] # - name: gdm-restart-hook # inline: # - path: /etc/flightctl/hooks.d/afterupdating/11-restart-gdm.yaml # content: |- # - run: /usr/bin/systemctl restart gdm # timeout: 10s # if: # - path: /var/lib/kiosk/html # op: [created, updated, deleted] - name: motd-update inline: - path: "/etc/motd" content: "This system is managed by flightctl." mode: 0644 - name: kiosk-frontend configType: GitConfigProviderSpec gitRef: repository: rhem-default targetRevision: main path: /rhem/kiosk resources: - monitorType: CPU alertRules: - severity: Warning duration: 10m percentage: 70 description: 'CPU load is above 50% for more than 10 minutes' samplingInterval: 30s systemd: matchPatterns: - chronyd.service

Let's re-read the Fleet object definition. There's a lot happening here.  

In order to make sense and the importance of this Fleet object we will need take a step back to the OS SOE, in my case it was a bootc RHEL 9.7 image defined using this Dockerfile. The way I have set up release the SOE image is that:
  • My bootc contains lean OS layer with essential dependencies that are generic (eg: chromium for kiosk disply) 
  • It contains dependencies for the apps (eg: podman, installed).
  • It has an initial placeholder of the front-end app (this is really just so; there was no need).
  • It does not have any apps.
  • It does not have edge site specific configuration.
Then once the device starts and registers itself for the first time with RHEM and is approved and tagged with fleet's label it device inherits the desired state definition from the fleet object. In this case below is what is happening:
  • An application named product-api is deployed (container image) image quay.io/alitestseverything/product-api:v1
  • A front-end application (really set of files) maintained in Git dir: /rhem/kiosk is also deployed. (Under the hood the front-end connects to the backend, product-api, app to provide application functionality and the browser displays the front-end app.)
  • I have added 2 sample configuration at the OS layer called "motd-update" and "silence-audit-logs". Assume/imagine these are site specific configs (eg: different printers, different networks, wifi settings etc).
  • I have a RHEM specific hook implementation called "kiosk-refresh-hook" to refresh the browser after front-end app is applied.
  • There exists templatisation feature as well (eg: {{ .metadata.labels.region }}) to make these configuration a bit more dynamic. But this is, at least in version 1.0, seems limited. For example: I wanted to point to a local mirror-registry (a common edge deployment pattern is to cache contents closer to site to cater for intermittent connectivity) dynamically. But found out that the registry URL templating is not possible as of yet.
The main things I will highlight here in terms of is what was in the RHEL OS before and what was applied via fleet object are:
  • Application layer is deliberately segregated from OS layer. This is standard DevOps. 
  • Site specific configs are segregated keeping the OS lean and generic. 
  • By segregating this functionalities we keep the separation of concerns among teams intact, reduce team dependency and keep the release process independent. 
  • RHEM's fleet object also provides a dashboard view of "state of devices" across fleet. This is like looking at hard of cattle (instead of each sheep individually) holistically. 
In my opinion, these are super powerful things to have when managing devices in bulk and are the advantages of RHEM over traditional automated processes.


Note: spec.config[name: kiosk-frontend].path is a tricky one. The way this "path" works is that fleetagent on the OS/device will try to map the fullpath from there. For example: my github dir structure looks like below:

rh-gitops.git
-rhem
--kiosk
---var
----lib
-----kiosk
------html
-------index.html, main.js, main.css
-------images
--------health.png, logo.png etc

flightctl agent maps this and deploys to the local device (that it is installed on; ie: the kiosk machine) like this: it will deploy index.html, main.js, main.css, images/* to /var/lib/kiosk/html/ dir because I mentioned path=/rhem/kiosk in the spec.config[name: kiosk-frontend].path field. (read this a few times to understand how the path field works)

Device OBJ:

Because I have added the config.yaml file as part of the bootc image building process (in Dockerfile) when the VM/Device starts it auto registers itself to RHEM server but as "pending approval". This shows up as device. Here we can do either of 2 things:
  • We can simply approve it (and give it a friendly name) and make changes in UI or via flightctl cli.
  • We can approve it (and give it a friendly name) and during approval we can assign labels matching to a fleet definition. In my case, I labeled it "fleet=default" matching it to the Fleet OBJ created earlier. This makes the device inherit the configs defined in the fleet object and that's how things are done in mass/bulk (20K, 100K devices). This is really powerful when ops team wants to manage 10s of thousands devices across different sites (could be at the edge, could be not so at the edge).
Here's a screenshot of a device in the Edge Manager UI with pending approval:


In this case the device just finished starting and registered in RHEM. Here's a screenshot of kiosk simulation device (running on low powered qemu):


Notice the front-end app simply displays a place holder "initial bootc rhel".

Upon approval and tagging to add it to a fleet object the device should pull the configs and images (apps and os) as per fleet definition. 


Device starts "updating" as soon as it is approved.


It takes a few mins to complete the "update" process; meaning pulling the configs and applying it to the device, pulling images from image registry and running it as container and executing the hooks.


This also means, as per my fleet/desired state definition, the device is also "refreshed" with a running/functioning app (instead of the initial placeholder). 




Conclusion:

The RHEM at its 1.0 GA'd in Jan, 2026 (although the opensource FlightControl project has been around for a few years prior and RHEM had been in tech preview for a few years) does come packed with very cool features and functionalities and it certainly feels like "made for edge". I also think for fleet management this might be useful too even though they may not be edge devices. It does have some limitation (eg: no template for container registry,  K8s CRDs for pure gitops, OSes beyond RHEL etc) but I am hopeful those may be in the road map.

I cannot form a opinion here whether to use automation or RHEM for fleet management because bootc narrows the gaps. It is very possible that RHEM is a new tool and introduces changes to ops process and there may already be existing automated processes (using pipeline, AAP etc). But it certainly makes a very strong case to fleet management.

 

Comments

Popular posts from this blog

Passwordless Auth to Azure Key Vault using External Secret and Workload Identity

I want to fetch my secrets from Azure KV and I don't want to use any password for it. Let's see how this can be implemented. This is yet another blog post (YABP) about ESO and Azure Workload Identity. Why Passwordless Auth: It is a common practice to use some sort of "master password" (spn clienid, clientsecret etc) to access Secret Vaults (in this case it is AZ KV) but that master password becomes a headache to manage (rotate, prevent leak etc). So, the passwordless auth to AKV is ideal.  Why ESO: This is discussed and addressed in the conclusion section. Workload Identity (Passwordless Auth): Lets make a backward start (just for a change). I will try to explain how the passwordless auth will work. This will make more sense when you will read through the detailed implementation section. Here's a sequence diagram to explain it: There's no magic here. This is a well documented process by microsoft  here . The below diagram (directly copied from the official doc...

Openshift-Powered Homelab | Why, What, How

I wanted to build a Homelab for some time but it was taking a backseat as I always had access to cloud environments (eg: cloud accounts, VMware DC etc) and the use cases I was focusing on didn't really warrant for one. But lately, some new developments and opportunities in the industry triggered the need to explore use cases in a bare-metal server environment, ultimately leading to the built of my own homelab, called MetalSNO. In this post, I will discuss some of my key reasons for building a homelab, the goals I set for it, and the process I followed to building one from scratch. I'll conclude with some reflections on whether it was truly worth it and what I plan to do with it going forward. Compelling reasons (The Why ) My uses cases for a homelab weren't about hosting plex server, home automation etc (I have them on Raspberry PIs for some years now). My Homelab is really about exploring technologies and concepts that are on par with industry trend. Below are some of the ...

The story of a Hack Job

"So, you have hacked it" -- Few days ago one of the guys at work passed me this comment on a random discussion about something I built. I paused for a moment and pondered: Do I reply defending how that's not a hack. OR Do I just not bother I picked the second option for 2 reasons: It was late. It probably isn't worth defending the "hack vs" topic as the comment passed was out of context. So I chose the next best action and replied "Yep, sure did and it is working great.". I felt like Batman in the moment. In this post I will rant about the knowledge gap around hacking and then describe about one of the components of my home automation project (really, this is the main reason for this post) and use that as an example how hacking is cool and does not always mean bad. But first lets align on my definition of hacking: People use this term in good and bad, both ways. For example: "He/she did a hack job" -- Yeah, that probably...

Smart wifi controlled irrigation system using Sonoff and Home Assistant on Raspberry Pi - Part 1

If you have a backyard just for the sake of having one or it came with the house and you hate watering your garden or lawn/backyard then you have come to the right place. I genuinely believe that it is a waste of my valuable time. I would rather watch bachelorette on TV than go outside, turn on tap, hold garden hose in hand to water. Too much work!! Luckily, we have things like sprinkler system, soaker etc which makes things a bit easy. But you still have to get off that comfy couch and turn on tap (then turn off if there's no tap timer in place). ** Skip to the youtube video part if reading is not your thing   When I first moved into my house at first it was exciting to get a backyard (decent size), but soon that turned on annoyance when it came down maintaining it, specially the watering part. I laid bunch sprinklers and soaker through out the yard and bought tap timer but I still needed to routinely turn on the tap timer. Eventually few days ago I had enough of this rub...

Jenkins on k8s - can it be this easy?

 As developers or devops we have had a somewhat love and hate relationship with Jenkins like "love oss based ci/cd that can be hosted on any environment with ranges of community plugins for pretty much anything" BUT "hate messy UI, lack of documentations, difficult to configure" etc etc. But this post isn't about pros and cons of Jenkins, rather it is about how you can get Jenkins on your k8s super quick and easy (using Merlin). Git Repo:  https://github.com/alinahid477/jenkinsonk8s Table of contents: Why Jenkins Why Merlin for Jenkins What is Merlin for Jenkins How Merlin for Jenkins works How Jenkins on k8s work Some anticipated FAQs Why Jenkins Jenkins remains a popular choice when it comes to CICD solution with a massive community of users and contributors (despite the fact there are new cool kids in block like Tekton etc). The way I see it (because of our love and hate relationship with it) "Jenkins is not CICD tool that you want it's the CICD t...