Service to service communication within GKE cluster

In my last blog, I covered options to access GKE services from external world. In this blog, I will cover service to service communication options within GKE cluster. Specifically, I will cover the following options:

  • Cluster IP
  • Internal load balancer(ILB)
  • Http internal load balancer
  • Istio
  • Traffic director

In the end, I will also compare these options and suggest matching requirement to a specific option. For each of the options, I will deploy a helloworld service with 2 versions and then have a client access the hello service. The code that includes manifest files for all the options is available in my github project here.

Pre-requisites

Create a VPC native GKE cluster with 4 nodes. Have Istio and Httploadbalancing addon enabled.

gcloud beta container clusters create demo-cluster   --zone us-central1-b   --scopes=https://www.googleapis.com/auth/cloud-platform   --num-nodes=4   --enable-ip-alias   --addons=HttpLoadBalancing,Istio --istio-config=auth=MTLS_PERMISSIVE

The additional scope is needed for the cluster to access traffic director API. This can be achieved through service account as well.

cluster IP

“clusterIP” is the default option for services to talk to each other. Each service exposes a VIP and kube-dns is used to map service name to IP address.

ClusterIP flow

The manifest file consists of 2 deployments for the 2 hello versions and 2 services exposing the 2 versions. There is also a “client” deployment that accesses the “hello” services.

Following command deploys the application with all deployments and services:

kubectl apply -f clusterip

Following outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
hello        ClusterIP   10.113.9.117    <none>        8080/TCP   107m
hello2       ClusterIP   10.113.14.114   <none>        8080/TCP   107m
kubernetes   ClusterIP   10.113.0.1      <none>        443/TCP    15d
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           108m
hello    2/2     2            2           108m
hello2   2/2     2            2           108m
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
client-77c749d7f8-x4khg   1/1     Running   0          108m
hello-bc787595d-55925     1/1     Running   0          108m
hello-bc787595d-d5b95     1/1     Running   0          108m
hello2-7494666cc7-k4lj7   1/1     Running   0          108m
hello2-7494666cc7-rnllc   1/1     Running   0          108m

Following outputs shows the “client” pod accessing “hello” and “hello2” services:

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -- curl -s hello
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-fdhnq

$ kubectl exec $client -- curl -s hello2
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-dv4rj

Internal load balancer(ILB)

The primary usecase for ILB is applications residing outside the GKE cluster to access GKE services that are in the same network. ILB operates at L4 and is a regional service. Global access from outside the region is possible as a “beta” option currently. This is very similar to GKE Network load balancer with the only difference being the IP’s exposed here is internal to the VPC. In the Kubernetes service manifest, we can set the type to “Load balancer” with an additional annotation stating that the load balancer type is “internal”. This takes care of creating the load balancer as well as setting up the backends.

ILB flow

Following annotation specifies that the load balancer type is “internal”:

annotations:
    cloud.google.com/load-balancer-type: "Internal"

Following command deploys the application with all deployments and services:

kubectl apply -f ilb

Following outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
hello        LoadBalancer   10.113.12.203   <pending>     80:31503/TCP   17s
hello2       LoadBalancer   10.113.0.238    <pending>     80:32343/TCP   18s
kubernetes   ClusterIP      10.113.0.1      <none>        443/TCP        16d

$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           37s
hello    2/2     2            2           33s
hello2   2/2     2            2           35s

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
client-77c749d7f8-4sp57   1/1     Running   0          42s
hello-bc787595d-jkk4g     1/1     Running   0          38s
hello-bc787595d-sl2wx     1/1     Running   0          38s
hello2-7494666cc7-cph5m   1/1     Running   0          40s
hello2-7494666cc7-p6jst   1/1     Running   0          40s

Following command shows the 2 internal load balancers that are created for the 2 versions of “hello” service:

$ gcloud compute forwarding-rules list
NAME                              REGION        IP_ADDRESS      IP_PROTOCOL  TARGET
a8bb317531c3a11eab92442010a80009  us-central1   10.128.0.51     TCP          us-central1/backendServices/a8bb317531c3a11eab92442010a80009
a8c71e8cb1c3a11eab92442010a80009  us-central1   10.128.0.52     TCP          us-central1/backendServices/a8c71e8cb1c3a11eab92442010a80009

Following outputs shows the “client” pod accessing “hello” and “hello2” services through ILB:

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -- curl -s 10.128.0.51
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-p6jst

$ kubectl exec $client -- curl -s 10.128.0.52
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-jkk4g

Istio

Istio provides control plane for service mesh and envoy provides the data plane. Istio provides a lot of features around traffic redirection, telemetry and encryption. The best part of Istio is that these features can be achieved without changing the source application. In this example, we will use Istio to connect the client service with the hello service. When the cluster was created, Istio was enabled as add-on in the cluster. The first step is to enable default proxy injection.

Istio flow
kubectl label namespace default istio-injection=enabled --overwrite

Following command deploys the application with all deployments and services:

kubectl apply -f istio

Following outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
hello        ClusterIP   10.113.4.179   <none>        80/TCP    11s
kubernetes   ClusterIP   10.113.0.1     <none>        443/TCP   16d

$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           16s
hello    2/2     2            2           16s
hello2   2/2     2            2           15

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
client-77c749d7f8-lt8n5   2/2     Running   0          23s
hello-b6db8c6bf-lq6cp     2/2     Running   0          23s
hello-b6db8c6bf-xd5q8     2/2     Running   0          23s
hello2-68c4445dfc-d6slt   2/2     Running   0          22s
hello2-68c4445dfc-p87w9   2/2     Running   0          22s

As we can see in the pods output, all the pods have 2 containers. 1 of them is the main container and another is the envoy proxy container.

Following outputs shows the containers of a single pod. We can see that there are 3 containers, first is the Init container that sets up some networking constructs and goes away after that. The second is the application container, third is the proxy container.

$ kubectl describe pod hello-b6db8c6bf-lq6cp | grep -A1 "Container ID"
    Container ID:  docker://d080a0ffb61dc9efe437774c822b7296b8c18c732c8a0609334d100ed7fdd835
    Image:         gke.gcr.io/istio/proxy_init:1.1.16-gke.0
--
    Container ID:   docker://7eab7155a12e7652ad0886a84b9b164b71008a4d64c3ac38b423bd19408f7740
    Image:          gcr.io/google-samples/hello-app:1.0
--
    Container ID:  docker://8461e711321349d02665d8835542a5f56d9be0586b8d737481813657ab755967
    Image:         gke.gcr.io/istio/proxyv2:1.1.16-gke.0

Initially, I could not get this to work and after struggling for half a day, I found that the root cause was a weird requirement in Istio about port naming convention. This requirement is captured here, it is needed that the service port should have the following convention “name: <protocol>[-<suffix>]”. I renamed my service port from “hello” to “http-hello” and things started working fine. I learnt this tip from this NEXT19 video on Istio debugging. This video is very helpful to debug Istio issues.

Following outputs shows client accessing the 2 hello versions. We have setup the Istio virtualservice rules for “/v1” to land up in version 1, “/v2” to land up in version 2 and default to version 2.

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -c client -- curl -s hello/v1
Hello, world!
Version: 1.0.0
Hostname: hello-b6db8c6bf-xd5q8

$ kubectl exec $client -c client -- curl -s hello/v2
Hello, world!
Version: 2.0.0
Hostname: hello2-68c4445dfc-d6slt

$ kubectl exec $client -c client -- curl -s hello
Hello, world!
Version: 2.0.0
Hostname: hello2-68c4445dfc-p87w9

Http Internal load balancer

Http internal load balancer is regional L7 load balancer that is implemented underneath using Envoy proxy. This feature is in “beta” currently. In the case of external HTTP load balancer, its integrated well with Kubernetes “Ingress” type and all the GCP load balancer configurations are created automatically. In the case of HTTP ILB, configurations have to be done manually. A proxy subnet needs to be created in every region where HTTP load balancer needs to be placed.

HTTP ILB flow

Following command deploys the application with all deployments and services:

kubectl apply -f http-ilb

Following annotation in the service file tells GKE to create NEG associated with the services.

annotations:
    cloud.google.com/neg: '{"exposed_ports":{"80":{}}}'

Following command shows the 2 NEGs that gets created. Since each deployment contains 2 replicas, we see that there are 2 endpoints associated with each NEG.

$ gcloud compute network-endpoint-groups list
NAME                                        LOCATION       ENDPOINT_TYPE   SIZE
k8s1-757bee5a-default-hello-80-bdd7955e     us-central1-b  GCE_VM_IP_PORT  2
k8s1-757bee5a-default-hello2-80-a882c49f    us-central1-b  GCE_VM_IP_PORT  2

Lets store the NEG’s in a variable that can be used later when we associate backend service to NEG:

$ neg1=k8s1-757bee5a-default-hello-80-bdd7955e
$ neg2=k8s1-757bee5a-default-hello2-80-a882c49f

Following are the steps to create HTTP ILB.

  • Create proxy subnet
  • Create firewall rules for communication within subnet as well as communication from proxy subnet to backend subnet
  • Create backend services, http load balancer components and tie them to the firewall.

Create proxy subnet:

Following command creates proxy subnet. The cluster is in the “default” subnet in us-central1 region. Proxy subnet is created in the same “us-central1” region.

gcloud beta compute networks subnets create proxy-subnet \
  --purpose=INTERNAL_HTTPS_LOAD_BALANCER \
  --role=ACTIVE \
  --region=us-west1 \
  --network=lb-network \
  --range=10.129.0.0/26

Create firewall rules:

Following set of commands creates the necessary firewall rules. 1 thing that I found the hard way is to make sure that proxy firewall rule has to contain all ports that container exposes. In my case, container is exposing port 8080, so I needed to add to the proxy firewall rule.

#firewall rule to communicate within backend subnet
gcloud compute firewall-rules create fw-allow-backend-subnet \
    --network=lb-network \
    --action=allow \
    --direction=ingress \
    --source-ranges=10.1.2.0/24 \
    --rules=tcp,udp,icmp

#allow ssh
gcloud compute firewall-rules create fw-allow-ssh \
    --network=lb-network \
    --action=allow \
    --direction=ingress \
    --target-tags=allow-ssh \
    --rules=tcp:22

#allow health check
gcloud compute firewall-rules create fw-allow-health-check \
    --network=lb-network \
    --action=allow \
    --direction=ingress \
    --source-ranges=130.211.0.0/22,35.191.0.0/16 \
    --target-tags=load-balanced-backend \
    --rules=tcp

# proxies to connect to backend
gcloud compute firewall-rules create fw-allow-proxies \
  --network=lb-network \
  --action=allow \
  --direction=ingress \
  --source-ranges=10.129.0.0/26 \
  --target-tags=load-balanced-backend \
  --rules=tcp:80,tcp:443,tcp:8080

Create backend services and load balancer components:

Following set of commands creates health checks, backend services, associates backend services to NEG, http proxies and forwarding rules.

#health check create
gcloud beta compute health-checks create http l7-ilb-gke-basic-check \
--region=us-west1 \
--use-serving-port

# create backend service1
gcloud beta compute backend-services create l7-ilb-gke-backend-service1 \
--load-balancing-scheme=INTERNAL_MANAGED \
--protocol=HTTP \
--health-checks=l7-ilb-gke-basic-check \
--health-checks-region=us-west1 \
--region=us-west1

# create backend service2
gcloud beta compute backend-services create l7-ilb-gke-backend-service2 \
--load-balancing-scheme=INTERNAL_MANAGED \
--protocol=HTTP \
--health-checks=l7-ilb-gke-basic-check \
--health-checks-region=us-west1 \
--region=us-west1

# add neg associated with backend service 1
gcloud beta compute backend-services add-backend l7-ilb-gke-backend-service1 --network-endpoint-group=$neg1  --network-endpoint-group-zone=us-west1-b  --region=us-west1  --balancing-mode=RATE  --max-rate-per-endpoint=5

# add neg associated with backend service 2
gcloud beta compute backend-services add-backend l7-ilb-gke-backend-service2 \
 --network-endpoint-group=$neg2 \
 --network-endpoint-group-zone=us-west1-b \
 --region=us-west1 \
 --balancing-mode=RATE \
 --max-rate-per-endpoint=5

# create url map for /hello1 and /hello2
gcloud beta compute url-maps create hello-map    --default-service=l7-ilb-gke-backend-service1 --region=us-west1

gcloud beta compute url-maps add-path-matcher hello-map \
  --default-service l7-ilb-gke-backend-service1 \
  --path-matcher-name pathmap-port \
  --path-rules=/hello1=l7-ilb-gke-backend-service1,/hello2=l7-ilb-gke-backend-service2 \
  --new-hosts=”*” --region=us-west

# create target proxy
gcloud beta compute target-http-proxies create l7-ilb-gke-proxy \
--url-map=hello-map \
--url-map-region=us-west1 \
--region=us-west1
# create forwarding rule

# create forwarding rule
gcloud beta compute forwarding-rules create l7-ilb-gke-forwarding-rule \
--load-balancing-scheme=INTERNAL_MANAGED \
--network=lb-network \
--subnet=backend-subnet \
--address=10.1.2.199 \
--ports=80 \
--region=us-west1 \
--target-http-proxy=l7-ilb-gke-proxy \
--target-http-proxy-region=us-west1

Following set of outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
hello        ClusterIP   10.113.6.88    <none>        80/TCP    40m
hello2       ClusterIP   10.113.6.139   <none>        80/TCP    40m
kubernetes   ClusterIP   10.113.0.1     <none>        443/TCP   43m
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           40m
hello    2/2     2            2           40m
hello2   2/2     2            2           40m
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get pods
NAME                        READY   STATUS        RESTARTS   AGE
client-77c749d7f8-xjqmw     1/1     Running       0          40m
hello-bc787595d-8bcrt       1/1     Running       0          40m
hello-bc787595d-pddsz       1/1     Running       0          40m
hello2-7494666cc7-rm788     1/1     Running       0          40m
hello2-7494666cc7-rqhk6     1/1     Running       0          40m

Following outputs shows client accessing the 2 hello versions.

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -- curl -s 10.128.0.199
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-pddsz

$ kubectl exec $client -- curl -s 10.128.0.199/hello1
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-8bcrt

$ kubectl exec $client -- curl -s 10.128.0.199/hello2
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-rm788

“10.128.0.199” is the IP address in “default” subnet that is created using forwarding rules. This gets intercepted by the proxy ip address in the proxy subnet and gets forwarded to the backend services.

Traffic director

In the Istio service mesh world, Pilot along with Mixer and Citadel provides the control plane and Envoy provides the data plane. Traffic director provides a GCP managed Pilot service which will go ahead and program Envoy using open standard xDS api. Traffic director provides global load balancing for internal service to service communication.

Traffic Director flow

Following command deploys the application with all deployments and services:

kubectl apply -f traffic-director

Following command deploys the application with all deployments and services:

$ gcloud compute network-endpoint-groups list
NAME                                        LOCATION       ENDPOINT_TYPE   SIZE
k8s1-757bee5a-default-hello-80-bdd7955e     us-central1-b  GCE_VM_IP_PORT  2
k8s1-757bee5a-default-hello2-80-a882c49f    us-central1-b  GCE_VM_IP_PORT  2

Lets store the NEG’s in a variable that can be used later when we associate backend service to NEG:

$ neg1=k8s1-757bee5a-default-hello-80-bdd7955e
$ neg2=k8s1-757bee5a-default-hello2-80-a882c49f

Following are the steps to create Traffic director setup:

  • Enable traffic director api
  • Enable the cluster service account to access traffic director api. In my case, this is done by adding the corresponding access scope when creating the cluster
  • Enable the cluster service account to have “networkviewer” role. Since I have used default service account, I have added “networkviewer” role to it. This is not a good practise though.
  • Create load balancer components that includes creating healthchecks, firewall rules, backend services, associating NEG’s to backend services, http proxies, forwarding rules etc. This step is similar to HTTP load balancer scenario.
  • Update client pod with manual envoy proxy.

Enable traffic director api

Following command enables traffic director api

gcloud services enable trafficdirector.googleapis.com

Cluster service account additions

Following commands sets up the networkviewer role. Access to traffic director api is done as part of creating the cluster.

PROJECT=`gcloud config get-value project`
SERVICE_ACCOUNT_EMAIL=`gcloud iam service-accounts list \
  --format='value(email)' \
  --filter='displayName:Compute Engine default service account'`
gcloud projects add-iam-policy-binding ${PROJECT} \
  --member serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
  --role roles/compute.networkViewer

Creating load balancer components

Following commands creates firewalls, healthchecks, backend services and forwarding rules.

# create firewall rules for health check
gcloud compute firewall-rules create fw-allow-health-checks \
    --network NETWORK_NAME \
    --action ALLOW \
    --direction INGRESS \
    --source-ranges 35.191.0.0/16,130.211.0.0/22 \
    --rules tcp

# create health check
gcloud compute health-checks create http td-gke-health-check \
    --use-serving-port
# create backend service 1

# create backend service 1
gcloud compute backend-services create td-gke-service \
    --global \
    --health-checks td-gke-health-check \
    --load-balancing-scheme INTERNAL_SELF_MANAGED

# add NEG to backend service 1
$ gcloud compute backend-services add-backend td-gke-service     --global     --network-endpoint-group $neg1 --network-endpoint-group-zone us-central1-b     --balancing-mode RATE     --max-rate-per-endpoint 5

# create backend service 2
gcloud compute backend-services create td-gke-service1 \
    --global \
    --health-checks td-gke-health-check \
    --load-balancing-scheme INTERNAL_SELF_MANAGED

# add NEG to backend service 2
$ gcloud compute backend-services add-backend td-gke-service1     --global     --network-endpoint-group $neg2  --network-endpoint-group-zone us-central1-b    --balancing-mode RATE     --max-rate-per-endpoint 5

# url map
gcloud compute url-maps create td-gke-url-map --default-service td-gke-service

# path matcher
gcloud compute url-maps add-path-matcher td-gke-url-map --default-service td-gke-service --path-matcher-name pathmap-port --path-rules=/hello1=td-gke-service,/hello2=td-gke-service1 --new-hosts=”hello”

# create proxy
gcloud compute target-http-proxies create td-gke-proxy \
   --url-map td-gke-url-map

# create forwarding rule
gcloud compute forwarding-rules create td-gke-forwarding-rule \
  --global \
  --load-balancing-scheme=INTERNAL_SELF_MANAGED \
  --address=0.0.0.0 \
  --target-http-proxy=td-gke-proxy \
  --ports 80 --network default

The “address” field when specifying forwarding rule says that the actual address does not matter. URL map is used for forwarding to the right backend service.

Proxy injection

Currently, the proxy injection has to be done manually on the client pod. Following command shows the 3 containers that are part of the client pod. Init container sets up the networking and gets killed. The other 2 containers are the app container and the proxy container.

$ kubectl describe pods $client | grep -i -A1 "container id"
    Container ID:  docker://958051684b134d4e606f18620dcc426830bc3fd805cd1ab34580b6e389a35e58
    Image:         docker.io/istio/proxy_init:1.2.4
--
    Container ID:  docker://71a0fd151546fc08fdac1195d468ae091880d8dc445d1fc8589f3a19602c69bc
    Image:         byrnedo/alpine-curl
--
    Container ID:  docker://00db2ae069083a8ab04f3206c5cc4486e84b92595c97d152704d76db46ee6cc6
    Image:         docker.io/istio/proxyv2:1.2.4

Following outputs shows client accessing the 2 hello versions.

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

# curl hello/hello1
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-s5dt2

/ # curl hello/hello2
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-wj2ft

How to decide?

Now that we have covered different approaches to talk between Kubernetes services in a GKE cluster, lets talk about how to decide the right approach for a specific usecase.

Following table shows a comparison between the different options:

References

1 thought on “Service to service communication within GKE cluster

  1. Hi Sreenivas.
    I am trying to implement Bigquery data warehouse. connecting to onprem sql server to GCP dataproc cluster. To begin with POC , is there a way where i can connect my local personal machine hosting sql server to my personal GCP vm. I could not find proper step to achive this. Your help on this will help me alot. thanks in ton

Leave a comment