I wrote about the key concepts about service mesh and how to evaluate the requirements for a service mesh in my previous post here: Deciphering the hype of Service Mesh. This post is a follow up from there covering the technical aspects.
Part 1: Deciphering the hype of Service Mesh
Part 2: Understanding The Ingress and The Mesh components of Service Mesh.
Part 3: Uderstanding the observability component of Service Mesh (TBD in another post).
Almost all popular service mesh technologies/tools (eg: Istio, LinkerD) have both ingress and mesh capabilities. Conceptually, I see them as 2 mutually exclusive domain (integrated nicely by the underlying tool). Understanding the ingress and the mesh components individually, such as what they offer, what I can do with them etc, was the basic building block to my understanding of service mesh technology as a whole. This is arguably the most mis-represented topic in the internet. So, I thought, I will share my point of view.
Note: The observability component of a Servive Mesh is not described in this post but I will attempt to cover it in a “part 3”.
- The Ingress (based on Istio)
- The Mesh (based on Istio)
- Conclusion
Note: Although, the sample codes I provided and technical details I mentioned here are based on Istio, it may be similar for for most OSS Service Mesh technology as of today.
For the sample codes in this post, I used Istio
from Tanzu Service Mesh for implementation of service mesh and ingress functionalities. Here's partial crude diagram of my application:
Let's dive into it.
The Ingress:
Gateway:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: my-frontend-gw
spec:
selector:
istio: ingressgateway # --> selecting the preconfigured proxy deployment istio-ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- test.anahidapps.xyz
In case, the usecase insists on handling TLS at the application level, below definition provides the way send TLS encrypted traffic to workload.
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: my-frontend-gw
spec:
selector:
istio: ingressgateway # --> selecting the preconfigured proxy deployment istio-ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- test.anahidapps.xyz
tls:
mode: SIMPLE
credentialName: my-cert
Virtual Services:
apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: myapp-vs spec: gateways: - my-frontend-gw hosts: - test.anahidapps.xyz http: - match: - port: 443 route: - destination: host: myapp port: number: 443
apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: myapp-vs spec: gateways: - my-frontend-gw hosts: - test.anahidapps.xyz http: - match: - uri: prefix: /app1 route: - destination: host: myapp1 - match: - uri: prefix: /app2 route: - destination: host: myapp2
- match:
- headers:
test:
exact: true
route:
- destination:
host: myapp2-test
Destination Rule:
- The initial state was that the v1 version of myapp application (name: myapp-v1, labels: app=myapp, version: v1) was deployed and exposed via myapp-svc K8s service. The Istio virtual service myapp-vs was pointing to myapp-svc K8s service.
- We deployed myapp-v2 deployment for v2 of myapp application (name: myapp-v2, labels: app=myapp, version: v2) exposing via myapp-v2-svc K8s service by selecting the v2 pods (eg: spec.selector: app=myapp & version=v2).
- In order to perform AB testing between v1 and v2 versions of myapp:
- we deployed myapp-v1-svc K8s service selecting v1 pods (eg: spec.selector: app=myapp & version=v1).
- we modified myapp-svc K8s service to remove the version selector. So the new spec.selector: app=myapp (instead of spec.selector: app=myapp & version=v1). Thus myapp-svc becomes a placeholder service to work with Istio's VirtualService. This is important.
- we deployed Istio's DestinationRule definition/object and configured the existing VirtualService: myapp-vs with traffic distribution for AB testing accordingly (eg: 45% to v1 and 55% to v2).
- We gradually added more weights (traffic) to v2 of myapp until all traffic was going to v2. We kept v1 for dormant for sometime just in case we needed to rollback.
- Finally, Once we are satisfied with the AB testing result we
- modified the myapp-svc K8s service again to select v2 of myapp (eg: spec.selector: app=myapp & version=v2).
- deleted the myapp-v1-svc and myapp-v2-svc K8s services and myapp-v1 deployment.
--- apiVersion: apps/v1 kind: Deployment metadata: labels: app: myapp version: v1 name: myappv1 namespace: ns2 spec: replicas: 3 selector: matchLabels: app: myapp version: v1 template: metadata: labels: app: myapp version: v1 spec: containers: - name: myapp
image: my.repo.io/workload-myapp:1.0
...
---
apiVersion: v1 kind: Service metadata: name: myapp-svc namespace: ns2 spec: ports: - name: http-web port: 80 protocol: TCP targetPort: 3030 selector: app: myapp version: v1
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp-vs
namespace: ns2
spec:
gateways:
- myapp-gw
hosts:
- myapp.anahidapps.xyz
http:
- match:
- port: 80
route:
- destination:
host: myapp-svc
port:
number: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
...
name: myappv2
namespace: ns2
spec:
template:metadata: labels: app: myapp version: v2
... --- apiVersion: v1 kind: Service metadata: name: myapp-v2-svc namespace: ns2 spec: selector: app: myapp version: v2
...
---
apiVersion: v1 kind: Service metadata: name: myapp-v1-svc namespace: ns2 spec: selector: app: myapp version: v1
...
---
apiVersion: v1 kind: Service metadata: name: myapp-svc namespace: ns2 spec:
... selector: app: myapp
---
apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: myapp-ab-testing namespace: ns2 spec: host: myapp-svc subsets: - name: v1 labels: version: v1 - name: v2 labels: version: v2 --- apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: myapp-vs namespace: ns2 spec: hosts:
- myapp.anahidapps.xyz http: - route: - destination: host: myapp-svc # Route traffic to the Pods that match the labels defined in the DestinationRule v1 subset subset: v1 weight: 45 - destination: host: myapp-svc subset: v2 weight: 55
--- apiVersion: apps/v1 kind: Deployment metadata: ... name: myappv2 namespace: ns2 spec: selector: matchLabels: app: myapp version: v2
...
---
apiVersion: v1 kind: Service metadata: name: myapp-svc namespace: ns2 spec:
... selector: app: myapp version: v2
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp-vs
namespace: ns2
spec:
gateways:
- myapp-gw
hosts:
- myapp.anahidapps.xyz
http:
- match:
- port: 80
route:
- destination:
host: myapp-svc
port:
number: 80
The Mesh:
- There are 3 K8s clusters being used here to strategically place micro-services where it makes sense. For example:
- Services handling user requests (eg: user login, load balancer etc) are placed in AWS.
- The warehouse management system (and hence, the data store) exists in GCP the services needing inventory data are placed in a K8s cluster in GCP for close proximity.
- For PCI-DSS compliance services handling and processing payment data (eg: payment profile, pre-authorisation token etc) are placed in private cloud.
- Tanzu Service Mesh (and its Global Namespace capability) is used to create a service mesh (using Istio) across the 3 K8s clusters for the services. The external services for Payment Service Providers are included in the mesh.
- I have chosen to use service mesh here for
- service discovery,
- circuit breaking,
- security
- SLOs of the services and
- telemetry for day2 ops
mTLS (aka mesh security):
--- apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: default namespace: istio-system # note: applying to istio-system makes it mesh wide default settings spec: mtls: mode: STRICT
---
apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: inventory-system-permissive namespace: inventory-space # note: applying to istio-system makes it mesh wide default settings spec:selector: matchLabels: app: inventory-systemmtls:mode: PERMISSIVE
Service Discovery:
Circuit Breaker:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: paypal-circuit-breaker
namespace: payment-system
spec:
host: paypal.apis.com
trafficPolicy:
connectionPool:
tcp:
maxConnections: 5
http:
http1MaxPendingRequests: 2
maxRequestsPerConnection: 2
outlierDetection:
consecutiveErrors: 2
interval: 1s
baseEjectionTime: 3m
maxEjectionPercent: 100
Service Entry:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: paypal
spec:
hosts:
- paypal.apis.com
location: MESH_EXTERNAL
ports:
- number: 443
name: https
protocol: TLS
resolution: DNS
Service Mesh for Mono to Micro transformation projects:
Conclusion:
- I used a commercial enterprise product called Tanzu Service Mesh
- This made rolling out Istio super simple -- just onboard a cluster. The rest is taken care of.
- It has a "thing" called Global Namespace --- with this I was able to establish mesh across multiple clusters without breaking a sweat. Without this Global Namespace I probably wouldn't have chosen service mesh for my application because the micro-services of my app needed to be spread across multiple cluster and implementing cross cluster mesh is veeeryyy difficult with just Istio.
- The Global Namespace It also makes isolation and connectivity among meshes (not to be mixed with workloads in mesh) super easy.
- The Istio.io documentation is comprehensive. I really just followed the documentation to understand service mesh.
- Understanding service mesh from application architecture lens rather rather networking or security lens and mapping the apps requirements with Istio features was the key. Below factors were my considerations for my micro-service strategy:
- How to access the application: Service mesh's Gateway
- How to release the micro-services: Service mesh's AB testing functionality
- How to find and use micro-services: Service mesh's Service discovery
- How to secure the application: Service mesh's mTLS, Tanzu Service Mesh's threat detection (eg: PII detection, Data leakage, Code injection, Protocol attack, Mesh isolation etc), Tanzu Service Mesh's access policy etc.
- How to increase application resiliency: Service Mesh's Circuit Breaker, Tanzu Service Mesh's SLO policies.
- How mono-to-micro while keeping the legacy functional: Service mesh's Service Entry
- To Mesh or Not to Mesh, aka API Gateway vs Service Mesh (more on this is described below).
- Multi cluster mesh: Tanzu Service Mesh's Global Namespace
- Installing/Deploying Service mesh: Tanzu Service Mesh's global control plane.
- Reduce development overhead: Service Mesh's Service Discovery, mTLS
- To mesh or not not mesh: I weighed in development and operational effort for my micro-services (which was getting segmented out from its monolith predecessor) in the context of Service Mesh pattern vs API Gateway pattern and Service Mesh was the winner. Major reasons were:
- with API Gateway I would have had to document the APIs and registered them. With service mesh I could simply access it.
- with API Gateway I would have had to create unnecessary APIs and register them just so I can use it from internal service. This is anti-pattern. But with Service Mesh I stayed on the course and created APIs there were meant to be API, the rest fell under service-to-service communication umbrella. And at the same time, I eliminated the security concerns as well.
- with API Gateway pattern I would have had to version control and release of my service and APIs individually in parallel. But using Service Mesh it was just releasing versions of the workloads.