Understanding The Ingress and The Mesh components of Service Mesh

I wrote about the key concepts about service mesh and how to evaluate the requirements for a service mesh in my previous post here: Deciphering the hype of Service Mesh. This post is a follow up from there covering the technical aspects.

Part 1: Deciphering the hype of Service Mesh

Part 2: Understanding The Ingress and The Mesh components of Service Mesh.

Part 3: Uderstanding the observability component of Service Mesh (TBD in another post).

Almost all popular service mesh technologies/tools (eg: Istio, LinkerD) have both ingress and mesh capabilities. Conceptually, I see them as 2 mutually exclusive domain (integrated nicely by the underlying tool). Understanding the ingress and the mesh components individually, such as what they offer, what I can do with them etc, was the basic building block to my understanding of service mesh technology as a whole. This is arguably the most mis-represented topic in the internet. So, I thought, I will share my point of view.

Note: The observability component of a Servive Mesh is not described in this post but I will attempt to cover it in a “part 3”.

Below is how I understood Istio's ingress and mesh capabilities, features and functionalities:

The Ingress (based on Istio)

Gateway
Virtual Service
Destination Rule

The Mesh (based on Istio)

mTLS
Service discovery
Circuit breaker
Service Entry
Service Mesh for Mono to Micro

Conclusion

Note: Although, the sample codes I provided and technical details I mentioned here are based on Istio, it may be similar for for most OSS Service Mesh technology as of today.

For the sample codes in this post, I used Istio
from Tanzu Service Mesh for implementation of service mesh and ingress functionalities. Here's partial crude diagram of my application:

Let's dive into it.

The Ingress:

In this context, the Ingress is how a user request gets into a mesh. It can also be used to expose services that have nothing to do with mesh. In my opinion, the Mesh (or Service Mesh) is the core capability of service mesh tech (eg: FOSS: Istio and Enterprise: Tanzu Service Mesh) and the Ingress is an added bonus. Istio's ingress features goes beyond just basic ingress. Below are the features:

Gateway:

This powers the L4 - L6 capabilities like ports expose, TLS (not mTLS) for ingress requests, hosts grouping and mapping for L7 (via virtual services), ingress and egress traffic management etc. Here's an example of Gateway object.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: my-frontend-gw
spec:
  selector:
    istio: ingressgateway # --> selecting the preconfigured proxy deployment istio-ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - test.anahidapps.xyz

This gateway configuration lets HTTP traffic from test.anahidapps.xyz received by the istio-ingressgateway into a service or a mesh on port 80. This is applicable when LoadBalancer offloads TLS and, in my opinion, should be the most usable gateway definition. The reason is, I always like to offload the computation of TLS decryption to a dedicated server such as the LoadBalancer.

In case, the usecase insists on handling TLS at the application level, below definition provides the way send TLS encrypted traffic to workload.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: my-frontend-gw
spec:
  selector:
    istio: ingressgateway # --> selecting the preconfigured proxy deployment istio-ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - test.anahidapps.xyz
    tls:
      mode: SIMPLE
      credentialName: my-cert

This gateway configuration lets HTTPS traffic from test.anahidapps.xyz received by the default istio-ingressgateway into a service or a mesh on port 443. Note: the workload must use the same tls.key and tls.crt files (used to create my-cert secret in K8s) to handle the encrypted traffic.

Virtual Services:

It can be considered as the building block of Istio's traffic management capabilities. At a basic level, it is used for configuring how requests are routed to a service. This is the component that operates at L7 level, meaning we can configure routing rule by looking at request headers, path etc. Then there can be advanced functionalities added on top such as timeouts, retries, circuit breakers etc. Below is an example of a VirtualService definition for the above Gateway.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-vs
spec:
  gateways:
  - my-frontend-gw
  hosts:
  - test.anahidapps.xyz
  http:
  - match:
    - port: 443
    route:
    - destination:
        host: myapp
        port:
          number: 443

This VirtualService is targeting all traffic on port 443 ingressing through the Gateway: my-frontend-gw and directing it to a single K8s service: myapp

Since, the VirtualService operates on L7 level we can also direct to multiple K8s services based on uri path, request header etc. For example:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp-vs
spec:
  gateways:
  - my-frontend-gw
  hosts:
    - test.anahidapps.xyz
  http:
  - match:
    - uri:
        prefix: /app1
    route:
    - destination:
        host: myapp1
  - match:
    - uri:
        prefix: /app2
    route:
    - destination:
        host: myapp2

  - match:

    - headers:

        test:

          exact: true

    route:

    - destination:

        host: myapp2-test

Destination Rule:

The way I understood destination rule is that how it is used for implementing A/B testing, Blue-Green deployment, Canary deployment etc. Ofcourse, AB testing is one of its implementations or usage among many.

Lets understand this A/B testing at its basic form. Here's a diagram to help with visualisation:

Description of the above diagram:

The initial state was that the v1 version of myapp application (name: myapp-v1, labels: app=myapp, version: v1) was deployed and exposed via myapp-svc K8s service. The Istio virtual service myapp-vs was pointing to myapp-svc K8s service.
We deployed myapp-v2 deployment for v2 of myapp application (name: myapp-v2, labels: app=myapp, version: v2) exposing via myapp-v2-svc K8s service by selecting the v2 pods (eg: spec.selector: app=myapp & version=v2).
In order to perform AB testing between v1 and v2 versions of myapp:

we deployed myapp-v1-svc K8s service selecting v1 pods (eg: spec.selector: app=myapp & version=v1).
we modified myapp-svc K8s service to remove the version selector. So the new spec.selector: app=myapp (instead of spec.selector: app=myapp & version=v1). Thus myapp-svc becomes a placeholder service to work with Istio's VirtualService. This is important.
we deployed Istio's DestinationRule definition/object and configured the existing VirtualService: myapp-vs with traffic distribution for AB testing accordingly (eg: 45% to v1 and 55% to v2).
We gradually added more weights (traffic) to v2 of myapp until all traffic was going to v2. We kept v1 for dormant for sometime just in case we needed to rollback.

Finally, Once we are satisfied with the AB testing result we

modified the myapp-svc K8s service again to select v2 of myapp (eg: spec.selector: app=myapp & version=v2).
deleted the myapp-v1-svc and myapp-v2-svc K8s services and myapp-v1 deployment.

Below are the yamls to implement as described above:

Initial state:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: myapp
    version: v1
  name: myappv1
  namespace: ns2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: v1
  template:
    metadata:
      labels:
        app: myapp
        version: v1
    spec:
      containers:
        - name: myapp

          image: my.repo.io/workload-myapp:1.0

...

---

apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
  namespace: ns2
spec:
  ports:
    - name: http-web
      port: 80
      protocol: TCP
      targetPort: 3030
  selector:
    app: myapp
    version: v1

---

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-vs

  namespace: ns2
spec:
  gateways:
  - myapp-gw
  hosts:
  - myapp.anahidapps.xyz
  http:
  - match:
    - port: 80
    route:
    - destination:
        host: myapp-svc
        port:
          number: 80

AB testing state:

---
apiVersion: apps/v1
kind: Deployment
metadata:

  ...
  name: myappv2
  namespace: ns2
spec:

  template:
    metadata:
      labels:
        app: myapp
        version: v2
...
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-v2-svc
  namespace: ns2
spec:
  selector:
    app: myapp
    version: v2

...

---

apiVersion: v1
kind: Service
metadata:
  name: myapp-v1-svc
  namespace: ns2
spec:
  selector:
    app: myapp
    version: v1
...

---
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
  namespace: ns2
spec:
  ...
  selector:
    app: myapp

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: myapp-ab-testing
  namespace: ns2
spec:
  host: myapp-svc
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp-vs
  namespace: ns2
spec:
  hosts:
  - myapp.anahidapps.xyz
  http:
  - route:
    - destination:
        host: myapp-svc # Route traffic to the Pods that match the labels defined in the DestinationRule v1 subset
        subset: v1
      weight: 45
    - destination:
        host: myapp-svc
        subset: v2
      weight: 55

Testing completed state:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: myappv2
  namespace: ns2
spec:
  selector:
    matchLabels:
      app: myapp
      version: v2

...

---

apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
  namespace: ns2
spec:

  ...
  selector:
    app: myapp
    version: v2

---

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-vs

  namespace: ns2
spec:
  gateways:
  - myapp-gw
  hosts:
  - myapp.anahidapps.xyz
  http:
  - match:
    - port: 80
    route:
    - destination:
        host: myapp-svc
        port:
          number: 80

By adopting this AB Testing methodology we can implement deployment strategies such as Blue-Green, Canary.

However, manually implementing this for a large suite of micro-services with agile cycle is not very practical. That's why tools like Flagger exists to automate this process.

Flagger.app automates the canary deployment process. See this documentation to understand how the below diagram is gets implemented.

image source: Istio A/B Testing using Flagger

At a high level this is what flagger does:

image source: Flagger Introduction

Note: The Canary or Blue-Green deployment capabilities are not limited to Istio Service Mesh. Most modern ingress controllers (eg: Contour), now a days, have capabilities that can be leveraged to implement the deployment strategies.

The Mesh:

In my opinion, as service mesh at its base level "is a dedicated layer for facilitating service-to-service commutations among micro-services (of an application)" --- and that is a mesh. The mesh is the core component yet probably the simplest component to understand of a service mesh technology (eg: Istio).

To explain the above lets understand the below crude diagram:

There's a lot to unpack here:

There are 3 K8s clusters being used here to strategically place micro-services where it makes sense. For example:

Services handling user requests (eg: user login, load balancer etc) are placed in AWS.
The warehouse management system (and hence, the data store) exists in GCP the services needing inventory data are placed in a K8s cluster in GCP for close proximity.
For PCI-DSS compliance services handling and processing payment data (eg: payment profile, pre-authorisation token etc) are placed in private cloud.

Tanzu Service Mesh (and its Global Namespace capability) is used to create a service mesh (using Istio) across the 3 K8s clusters for the services. The external services for Payment Service Providers are included in the mesh.
I have chosen to use service mesh here for

service discovery,
circuit breaking,
security
SLOs of the services and
telemetry for day2 ops

Now lets dive into the mesh features that are used here.

mTLS (aka mesh security):

This is an obvious one. It is an out-of-the-box feature from the mesh. When ever workloads or pods are placed in the mesh (namespace label: istio-injection=enabled), Istio auto injects envoy-proxy sidecar and encrypts and offload outgoing and incoming traffics among the workloads where the cert authority is Istio itself.

In this setup I, also, have used mTLS mode: STRICT as default for the mesh and where needed I used mTLS mode: PERMISSIVE. For example: the Warehouse system is not part of the mesh yet it needs to access Inventory system. Below are the Istio definitions for mTLS settings as described.

---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system # note: applying to istio-system makes it mesh wide default settings
spec:
  mtls:
    mode: STRICT

---

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: inventory-system-permissive
  namespace: inventory-space # note: applying to istio-system makes it mesh wide default settings
spec:
  selector:
    matchLabels:
      app: inventory-system
  mtls:
    mode: PERMISSIVE

Service Discovery:

This is an out-of-the-box feature from the mesh. For my apps I just declared service and the mesh took care of creating VirtualServices entries against them across namespaces and clusters (Tanzu Service Mesh capability). Thus, whenever I want to call the process order service I just call it by its service name (following a naming convention) like http://process-order/{orderobj}.

Circuit Breaker:

I configured circuit breaker to enable extra layer of application resiliency by reducing the impact of cascading failure.

Image source: Deciphering the hype of Service Mesh

I added the below definition leveraging DestinationRule to implement the circuit breaker functionality against the necessary services. For example: The external PSPs are out of my application's domain. Hence, I added a circuit breaker rule matching to the SLO of the external PSPs.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: paypal-circuit-breaker
  namespace: payment-system
spec:
  host: paypal.apis.com
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 5
      http:
        http1MaxPendingRequests: 2
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 2
      interval: 1s
      baseEjectionTime: 3m
      maxEjectionPercent: 100

Service Entry:

The Payment Service Providers are external to my application but I wanted to treat them as part of my mesh for reasons such as service discovery, circuit breaker, telemetry etc. The ServiceEntry definition is applied to achieve so:

apiVersion: networking.istio.io/v1alpha3

kind: ServiceEntry
metadata:
  name: paypal
spec:
  hosts:
  - paypal.apis.com
  location: MESH_EXTERNAL
  ports:
  - number: 443
    name: https
    protocol: TLS
  resolution: DNS

Service Mesh for Mono to Micro transformation projects:

As you can see this ServiceEntry component can generally be used for connecting any service available into the mesh. You can take advantage of this to bring in legacy services into the mesh and as we componentise the monolith nature of the legacy application into micro-services the new micro-services can keep connecting to the "yet-to-be-converted" services in the monolith. Not bad :).

Note: I am intentionally leaving out the telemetry part of Service Mesh as it is neither ingress nor mesh and I covered it in my previous post.

Conclusion:

After understanding these 2 core aspects (the ingress and the mesh) of a service mesh, implementing it for my application (comprised of several micro-services) was easy. And I think, here are the reasons why:

I used a commercial enterprise product called Tanzu Service Mesh

This made rolling out Istio super simple -- just onboard a cluster. The rest is taken care of.
It has a "thing" called Global Namespace --- with this I was able to establish mesh across multiple clusters without breaking a sweat. Without this Global Namespace I probably wouldn't have chosen service mesh for my application because the micro-services of my app needed to be spread across multiple cluster and implementing cross cluster mesh is veeeryyy difficult with just Istio.
The Global Namespace It also makes isolation and connectivity among meshes (not to be mixed with workloads in mesh) super easy.

The Istio.io documentation is comprehensive. I really just followed the documentation to understand service mesh.
Understanding service mesh from application architecture lens rather rather networking or security lens and mapping the apps requirements with Istio features was the key. Below factors were my considerations for my micro-service strategy:

How to access the application: Service mesh's Gateway
How to release the micro-services: Service mesh's AB testing functionality
How to find and use micro-services: Service mesh's Service discovery
How to secure the application: Service mesh's mTLS, Tanzu Service Mesh's threat detection (eg: PII detection, Data leakage, Code injection, Protocol attack, Mesh isolation etc), Tanzu Service Mesh's access policy etc.
How to increase application resiliency: Service Mesh's Circuit Breaker, Tanzu Service Mesh's SLO policies.
How mono-to-micro while keeping the legacy functional: Service mesh's Service Entry
To Mesh or Not to Mesh, aka API Gateway vs Service Mesh (more on this is described below).
Multi cluster mesh: Tanzu Service Mesh's Global Namespace
Installing/Deploying Service mesh: Tanzu Service Mesh's global control plane.
Reduce development overhead: Service Mesh's Service Discovery, mTLS

To mesh or not not mesh: I weighed in development and operational effort for my micro-services (which was getting segmented out from its monolith predecessor) in the context of Service Mesh pattern vs API Gateway pattern and Service Mesh was the winner. Major reasons were:

with API Gateway I would have had to document the APIs and registered them. With service mesh I could simply access it.
with API Gateway I would have had to create unnecessary APIs and register them just so I can use it from internal service. This is anti-pattern. But with Service Mesh I stayed on the course and created APIs there were meant to be API, the rest fell under service-to-service communication umbrella. And at the same time, I eliminated the security concerns as well.
with API Gateway pattern I would have had to version control and release of my service and APIs individually in parallel. But using Service Mesh it was just releasing versions of the workloads.

The features I described in this post are, in my opinion, the main one and the building blocksto get you started with service mesh and should cover 70% of the use cases. There are several other functionalities of Istio I did not discuss here for brevity. Please see the Istio Documentation and Tanzu Service Mesh Documentation.

Thanks for reading.

According To Ali

Search This Blog