Skip to main content

Understanding The Ingress and The Mesh components of Service Mesh

I wrote about the key concepts about service mesh and how to evaluate the requirements for a service mesh in my previous post here: Deciphering the hype of Service Mesh. This post is a follow up from there covering the technical aspects.


Part 1: Deciphering the hype of Service Mesh

Part 2: Understanding The Ingress and The Mesh components of Service Mesh.

Part 3: Uderstanding the observability component of Service Mesh (TBD in another post). 


Almost all popular service mesh technologies/tools (eg: Istio, LinkerD) have both ingress and mesh capabilities. Conceptually, I see them as 2 mutually exclusive domain (integrated nicely by the underlying tool). Understanding the ingress and the mesh components individually, such as what they offer, what I can do with them etc, was the basic building block to my understanding of service mesh technology as a whole. This is arguably the most mis-represented topic in the internet. So, I thought, I will share my point of view.


Note: The observability  component of a Servive Mesh is not described in this post but I will attempt to cover it in a “part 3”. 


Below is how I understood Istio's ingress and mesh capabilities, features and functionalities:

Note: Although, the sample codes I provided and technical details I mentioned here are based on Istio, it may be similar for for most OSS Service Mesh technology as of today. 


For the sample codes in this post, I used Istio
from Tanzu Service Mesh for implementation of service mesh and ingress functionalities. Here's partial crude diagram of my application:

Let's dive into it.

The Ingress: 

In this context, the Ingress is how a user request gets into a mesh. It can also be used to expose services that have nothing to do with mesh. In my opinion, the Mesh (or Service Mesh) is the core capability of service mesh tech (eg: FOSS: Istio and Enterprise: Tanzu Service Mesh) and the Ingress is an added bonus. Istio's ingress features goes beyond just basic ingress. Below are the features:

Gateway: 

This powers the L4 - L6 capabilities like ports expose, TLS (not mTLS) for ingress requests, hosts grouping and mapping for L7 (via virtual services), ingress and egress traffic management etc. Here's an example of Gateway object.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: my-frontend-gw
spec:
  selector:
    istio: ingressgateway # --> selecting the preconfigured proxy deployment istio-ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - test.anahidapps.xyz

This gateway configuration lets HTTP traffic from test.anahidapps.xyz received by the istio-ingressgateway into a service or a mesh on port 80. This is applicable when LoadBalancer offloads TLS and, in my opinion, should be the most usable gateway definition. The reason is, I always like to offload the computation of TLS decryption to a dedicated server such as the LoadBalancer.

In case, the usecase insists on handling TLS at the application level, below definition provides the way send TLS encrypted traffic to workload.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: my-frontend-gw
spec:
  selector:
    istio: ingressgateway # --> selecting the preconfigured proxy deployment istio-ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - test.anahidapps.xyz
    tls:
      mode: SIMPLE
      credentialName: my-cert

This gateway configuration lets HTTPS traffic from test.anahidapps.xyz received by the default istio-ingressgateway into a service or a mesh on port 443. Note: the workload must use the same tls.key and tls.crt files (used to create my-cert secret in K8s) to handle the encrypted traffic.

Virtual Services: 

It can be considered as the building block of Istio's traffic management capabilities. At a basic level, it is used for configuring how requests are routed to a service. This is the component that operates at L7 level, meaning we can configure routing rule by looking at request headers, path etc. Then there can be advanced functionalities added on top such as timeouts, retries, circuit breakers etc. Below is an example of a VirtualService definition for the above Gateway.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-vs
spec:
  gateways:
  - my-frontend-gw
  hosts:
  - test.anahidapps.xyz
  http:
  - match:
    - port: 443
    route:
    - destination:
        host: myapp
        port:
          number: 443

This VirtualService is targeting all traffic on port 443 ingressing through the Gateway: my-frontend-gw and directing it to a single K8s service: myapp
Since, the VirtualService operates on L7 level we can also direct to multiple K8s services based on uri path, request header etc. For example:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp-vs
spec:
  gateways:
  - my-frontend-gw
  hosts:
    - test.anahidapps.xyz
  http:
  - match:
    - uri:
        prefix: /app1
    route:
    - destination:
        host: myapp1
  - match:
    - uri:
        prefix: /app2
    route:
    - destination:
        host: myapp2
  - match:
    - headers:
        test: 
          exact: true
    route:
    - destination:
        host: myapp2-test


Destination Rule:

The way I understood destination rule is that how it is used for implementing A/B testing, Blue-Green deployment, Canary deployment etc. Ofcourse, AB testing is one of its implementations or usage among many. 
Lets understand this A/B testing at its basic form. Here's a diagram to help with visualisation:



Description of the above diagram:
  • The initial state was that the v1 version of myapp application (name: myapp-v1, labels: app=myapp, version: v1) was deployed and exposed via myapp-svc K8s service. The Istio virtual service myapp-vs was pointing to myapp-svc K8s service. 
  • We deployed myapp-v2 deployment for v2 of myapp application (name: myapp-v2labels: app=myapp, version: v2) exposing via myapp-v2-svc K8s service by selecting the v2 pods (eg: spec.selector: app=myapp & version=v2).
  • In order to perform AB testing between v1 and v2 versions of myapp:
    • we deployed myapp-v1-svc K8s service selecting v1 pods (eg: spec.selector: app=myapp & version=v1).  
    • we modified myapp-svc K8s service to remove the version selector. So the new spec.selector: app=myapp (instead of spec.selector: app=myapp & version=v1). Thus myapp-svc becomes a placeholder service to work with Istio's VirtualService. This is important. 
    • we deployed Istio's DestinationRule definition/object and configured the existing VirtualService: myapp-vs with traffic distribution for AB testing accordingly (eg: 45% to v1 and 55% to v2).
    • We gradually added more weights (traffic) to v2 of myapp until all traffic was going to v2. We kept v1 for dormant for sometime just in case we needed to rollback. 
  • Finally, Once we are satisfied with the AB testing result we 
    • modified the myapp-svc K8s service again to select v2 of myapp (eg: spec.selector: app=myapp & version=v2).
    • deleted the myapp-v1-svc and myapp-v2-svc K8s services and myapp-v1 deployment.

Below are the yamls to implement as described above:

Initial state:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: myapp
    version: v1
  name: myappv1
  namespace: ns2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: v1
  template:
    metadata:
      labels:
        app: myapp
        version: v1
    spec:
      containers:
        - name: myapp
          image: my.repo.io/workload-myapp:1.0
          ...
          
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
  namespace: ns2
spec:
  ports:
    - name: http-web
      port: 80
      protocol: TCP
      targetPort: 3030
  selector:
    app: myapp
    version: v1
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-vs
  namespace: ns2
spec:
  gateways:
  - myapp-gw
  hosts:
  - myapp.anahidapps.xyz
  http:
  - match:
    - port: 80
    route:
    - destination:
        host: myapp-svc
        port:
          number: 80


AB testing state:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: myappv2
  namespace: ns2
spec:
  template:
    metadata:
      labels:
        app: myapp
        version: v2
... --- apiVersion: v1 kind: Service metadata: name: myapp-v2-svc namespace: ns2 spec: selector: app: myapp version: v2
...

---
apiVersion: v1
kind: Service
metadata:
  name: myapp-v1-svc
  namespace: ns2
spec:
  selector:
    app: myapp
    version: v1
...

---
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
  namespace: ns2
spec:
  ...
  selector:
    app: myapp

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: myapp-ab-testing
  namespace: ns2
spec:
  host: myapp-svc
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp-vs
  namespace: ns2
spec:
  hosts:
  - myapp.anahidapps.xyz
  http:
  - route:
    - destination:
        host: myapp-svc # Route traffic to the Pods that match the labels defined in the DestinationRule v1 subset
        subset: v1
      weight: 45
    - destination:
        host: myapp-svc
        subset: v2
      weight: 55


Testing completed state:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  name: myappv1
  namespace: ns2
spec:
  selector:
    matchLabels:
      app: myapp
      version: v1
  ...
          
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
  namespace: ns2
spec:
  ...
  selector:
    app: myapp
    version: v2
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-vs
  namespace: ns2
spec:
  gateways:
  - myapp-gw
  hosts:
  - myapp.anahidapps.xyz
  http:
  - match:
    - port: 80
    route:
    - destination:
        host: myapp-svc
        port:
          number: 80

By adopting this AB Testing methodology we can implement deployment strategies such as Blue-Green, Canary. 
However, manually implementing this for a large suite of micro-services with agile cycle is not very practical. That's why tools like Flagger exists to automate this process.  
Flagger.app automates the canary deployment process. See this documentation to understand how the below diagram is gets implemented.


At a high level this is what flagger does:

image source: Flagger Introduction

Note: The Canary or Blue-Green deployment capabilities are not limited to Istio Service Mesh. Most modern ingress controllers (eg: Contour), now a days, have capabilities that can be leveraged to implement the deployment strategies. 

The Mesh:

In my opinion, as service mesh at its base level "is a dedicated layer for facilitating service-to-service commutations among micro-services (of an application)" --- and that is a mesh. The mesh is the core component yet probably the simplest component to understand of a service mesh technology (eg: Istio). 

To explain the above lets understand the below crude diagram:



There's a lot to unpack here:
  • There are 3 K8s clusters being used here to strategically place micro-services where it makes sense. For example: 
    • Services handling user requests (eg: user login, load balancer etc) are placed in AWS.
    • The warehouse management system (and hence, the data store) exists in GCP the services needing inventory data are placed in a K8s cluster in GCP for close proximity.
    • For PCI-DSS compliance services handling and processing payment data (eg: payment profile, pre-authorisation token etc) are placed in private cloud.
  • Tanzu Service Mesh (and its Global Namespace capability) is used to create a service mesh (using Istio) across the 3 K8s clusters for the services. The external services for Payment Service Providers are included in the mesh.
  • I have chosen to use service mesh here for 
    • service discovery, 
    • circuit breaking, 
    • security
    • SLOs of the services and
    • telemetry for day2 ops
Now lets dive into the mesh features that are used here.

mTLS (aka mesh security):

This is an obvious one. It is an out-of-the-box feature from the mesh. When ever workloads or pods are placed in the mesh (namespace label: istio-injection=enabled), Istio auto injects envoy-proxy sidecar and encrypts and offload outgoing and incoming traffics among the workloads where the cert authority is Istio itself. 



In this setup I, also, have used mTLS mode: STRICT as default for the mesh and where needed I used mTLS mode: PERMISSIVE. For example: the Warehouse system is not part of the mesh yet it needs to access Inventory system. Below are the Istio definitions for mTLS settings as described.

---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system # note: applying to istio-system makes it mesh wide default settings
spec:
  mtls:
    mode: STRICT          
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: inventory-system-permissive
  namespace: inventory-space # note: applying to istio-system makes it mesh wide default settings
spec:
  selector:
    matchLabels:
      app: inventory-system
  mtls:
    mode: PERMISSIVE


Service Discovery:

This is an out-of-the-box feature from the mesh. For my apps I just declared service and the mesh took care of creating VirtualServices entries against them across namespaces and clusters (Tanzu Service Mesh capability). Thus, whenever I want to call the process order service I just call it by its service name (following a naming convention) like http://process-order/{orderobj}.

Circuit Breaker:

I configured circuit breaker to enable extra layer of application resiliency by reducing the impact of cascading failure.
I added the below definition leveraging DestinationRule to implement the circuit breaker functionality against the necessary services. For example: The external PSPs are out of my application's domain. Hence, I added a circuit breaker rule matching to the SLO of the external PSPs.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: paypal-circuit-breaker
  namespace: payment-system
spec:
  host: paypal.apis.com
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 5
      http:
        http1MaxPendingRequests: 2
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 2
      interval: 1s
      baseEjectionTime: 3m
      maxEjectionPercent: 100



Service Entry:

The Payment Service Providers are external to my application but I wanted to treat them as part of my mesh for reasons such as service discovery, circuit breaker, telemetry etc. The ServiceEntry definition is applied to achieve so:

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: paypal
spec:
  hosts:
  - paypal.apis.com
  location: MESH_EXTERNAL
  ports:
  - number: 443
    name: https
    protocol: TLS
  resolution: DNS

Service Mesh for Mono to Micro transformation projects: 

As you can see this ServiceEntry component can generally be used for connecting any service available into the mesh. You can take advantage of this to bring in legacy services into the mesh and as we componentise the monolith nature of the legacy application into micro-services the new micro-services can keep connecting to the "yet-to-be-converted" services in the monolith. Not bad :).

Note: I am intentionally leaving out the telemetry part of Service Mesh as it is neither ingress nor mesh and I covered it in my previous post.

Conclusion:

After understanding these 2 core aspects (the ingress and the mesh) of a service mesh, implementing it for my application (comprised of several micro-services) was easy. And I think, here are the reasons why:
  • I used a commercial enterprise product called Tanzu Service Mesh
    • This made rolling out Istio super simple -- just onboard a cluster. The rest is taken care of.
    • It has a "thing" called Global Namespace --- with this I was able to establish mesh across multiple clusters without breaking a sweat. Without this Global Namespace I probably wouldn't have chosen service mesh for my application because the micro-services of my app needed to be spread across multiple cluster and implementing cross cluster mesh is veeeryyy difficult with just Istio.
    • The Global Namespace It also makes isolation and connectivity among meshes (not to be mixed with workloads in mesh) super easy. 
  • The Istio.io documentation is comprehensive. I really just followed the documentation to understand service mesh. 
  • Understanding service mesh from application architecture lens rather rather networking or security lens and mapping the apps requirements with Istio features was the key. Below factors were my considerations for my micro-service strategy:
    • How to access the application: Service mesh's Gateway
    • How to release the micro-services: Service mesh's AB testing functionality
    • How to find and use micro-services: Service mesh's Service discovery 
    • How to secure the application: Service mesh's mTLS, Tanzu Service Mesh's threat detection (eg: PII detection, Data leakage, Code injection, Protocol attack, Mesh isolation etc), Tanzu Service Mesh's access policy etc.
    • How to increase application resiliency: Service Mesh's Circuit Breaker, Tanzu Service Mesh's SLO policies. 
    • How mono-to-micro while keeping the legacy functional: Service mesh's Service Entry
    • To Mesh or Not to Mesh, aka API Gateway vs Service Mesh (more on this is described below).
    • Multi cluster mesh: Tanzu Service Mesh's Global Namespace
    • Installing/Deploying Service mesh: Tanzu Service Mesh's global control plane.
    • Reduce development overhead: Service Mesh's Service Discovery, mTLS
  • To mesh or not not mesh: I weighed in development and operational effort for my micro-services (which was getting segmented out from its monolith predecessor) in the context of Service Mesh pattern vs API Gateway pattern and Service Mesh was the winner. Major reasons were:
    • with API Gateway I would have had to document the APIs and registered them. With service mesh I could simply access it.
    • with API Gateway I would have had to create unnecessary APIs and register them just so I can use it from internal service. This is anti-pattern. But with Service Mesh I stayed on the course and created APIs there were meant to be API, the rest fell under service-to-service communication umbrella. And at the same time, I eliminated the security concerns as well.
    • with API Gateway pattern I would have had to version control and release of my service and APIs individually in parallel. But using Service Mesh it was just releasing versions of the workloads. 
The features I described in this post are, in my opinion, the main one and the building blocksto get you started with service mesh and should cover 70% of the use cases. There are several other functionalities of Istio I did not discuss here for brevity. Please see the Istio Documentation and Tanzu Service Mesh Documentation

Thanks for reading.

Popular posts from this blog

The story of a Hack Job

"So, you have hacked it" -- Few days ago one of the guys at work passed me this comment on a random discussion about something I built. I paused for a moment and pondered: Do I reply defending how that's not a hack. OR Do I just not bother I picked the second option for 2 reasons: It was late. It probably isn't worth defending the "hack vs" topic as the comment passed was out of context. So I chose the next best action and replied "Yep, sure did and it is working great.". I felt like Batman in the moment. In this post I will rant about the knowledge gap around hacking and then describe about one of the components of my home automation project (really, this is the main reason for this post) and use that as an example how hacking is cool and does not always mean bad. But first lets align on my definition of hacking: People use this term in good and bad, both ways. For example: "He/she did a hack job" -- Yeah, that probably

Hall of justice - Authorisation Greeting System

Ever since I watched the Young Justice EP-1 the security system of the Hall Of Justice and Mount Justice wow-ed me. After all it was built by Batman. You see similar AI driven voice guided system in pretty much in all sci-fi series these days. I always dreamed of having something similar of my own. Well, now I have it (sort of). Although we not quite in the flying cars era yet (disappointment) but IOT powered locks are somewhat normal these days. The adoption rate is great.  Some background: What is this Hall Of Justice Authorisation system? This is the security system that Batman built for Hall Of Justice. The movies haven't shown it yet but there're several scenes in the animated series and comic books. Basically, it is a AI powered voice guided intelligent security system that scans bio signatures (like retina, body dimensions, temperature, heart rate) through a scanning device and identifies which member of the justice league it is, logs entry then gr

Best practice of using “Best Practice”

As a consultant (I in particular) we love throwing the term best practice whenever and wherever it is convenient. And TBH I am not the only one. Many, if not all, of us (consultants) do this on a perhaps hourly basis. Some possibly enjoy this more than others. There has been times when this has worked for me and times when it worked against me. Really I have mix feeling about it. So, this fine morning I woke up feeling about ranting/venting about it. Caution: This post is a rant and biased based on my opinions (hint hint: This blog site is called "According To Ali"). So let's begin. Let's get our definition part sorted. So here's According to Wiki Wiki says:  " A  best practice  is a method or technique that has been generally accepted as superior to any alternatives because it produces results that are superior to those achieved by other means or because it has become a standard way of doing things, e.g., a standard way of complying with legal or