Techgig Kubernetes Webinar

Recently, I did a webinar with Techgig on “Kubernetes design principles, patterns and ecosystem”. In this webinar, I covered the following topics:

  • Key design principles behind Kubernetes
  • Common design patterns with pods
  • Day-2 Kubernetes best practises
  • Kubernetes ecosystem

Following link has the recording to the webinar. I have also added a link to the slides below.

Kubernetes and GKE – Day 2 operations

For many folks working with Containers and Kubernetes, the journey begins with trying few sample container applications and then deploying applications into production in a managed kubernetes service like GKE. GKE or any managed kubernetes services provides lot of features and controls and it is upto the user to leverage them the right way. Based on my experience, what I see is that the best practises are not typically followed which results in post-production issues. There are some parameters that cannot be changed post cluster creation and this makes the problem even more difficult to handle. In this blog, I will share a set of resources that cover the best practises around Kubernetes and GKE. If these are evaluated before cluster creation and a proper design is done before-hand, it will prevent a lot of post-production issues.

This link talks about best practises with building containers and this link talks about best practises with operating containers. This will create a strong foundation around Containers.

Following collection of links talk about the following best practises with Kubernetes. These are useful both from developer and operator perspective.

  • Building small container images
  • Organizing with namespaces
  • Using healthchecks
  • Setting up resource limits for containers
  • Handling termination requests gracefully
  • Talking to external services outside Kubernetes
  • Upgrading clusters with zero downtime

Once we understand best practises around Docker and Kubernetes, we need to understand the best practises around GKE. Following set of links cover these well:

If you are looking for GKE samples to try, this is a good collection of GKE samples. These are useful to play around kubernetes without writing a bunch of yaml..

GCP provides qwiklabs to try out a particular GCP concept/feature in a sandboxed environment. Following qwiklabs quests around Kubernetes and GKE are very useful to get hands-on experience. Each quest below has a set of labs associated with that topic.

For folks looking for free Kubernetes books to learn, I found the following 3 books to be extremely useful. The best part about them is they are free to download.

Please feel free to ping me if you find other useful resources around getting your containers to production.

Kubernetes DR

Recently, we had a customer issue where a production GKE cluster was deleted accidentally which caused some outage till the cluster recovery was completed. Recovering the cluster was not straightforward as the customer did not have any automated backup/restore mechanism and also the presence of stateful workloads complicated this further. I started looking at some of the ways in which a cluster can be restored to a previous state and this blog is a result of that work.

Following are some of the reasons why we need DR for Kubernetes cluster:

  • Cluster is deleted accidentally.
  • Cluster master node has gotten into a weird state. Having redundant masters would avoid this problem.
  • Need to move from 1 cluster type to another. For example, GKE legacy network to VPC native network migration.
  • Move to different kubernetes distribution. This can include moving from onprem to cloud. 

The focus of this blog is more from a cold DR perspective and to not have multiple clusters working together to provide high availability. I will talk about multiple clusters and hot DR in a later blog.

There are 4 kinds of data to backup in a kubernetes cluster:

  1. Cluster configuration. These are parameters like node configuration, networking and security constructs for the cluster etc.
  2. Common kubernetes configurations. Examples are namespaces, rbac policies, pod security policies, quotas, etc.
  3. Application manifests. This is based on the specific application that is getting deployed to the cluster.
  4. Stateful configurations. These are persistent volumes that is attached to pods.

For item 1, we can use any infrastructure automation tools like Terraform or in the case of GCP, we can use deployment manager. The focus of this blog is on items 2, 3 and 4.

Following are some of the options possible for 2, 3 and 4:

  1. Use a Kubernetes backup tool like Velero. This takes care of backing up both kubernetes resources as well as persistent volumes. This covers items 2, 3 and 4, so its pretty complete from a feature perspective. Velero is covered in detail in this blog.
  2. Use GCP “Config sync” feature. This can cover 2 and 3. This approach is more native with Kubernetes declarative approach and the config sync approach tries to recreate the cluster state from stored manifest files. Config sync approach is covered in detail in this blog.
  3. Use CI/CD pipeline. This can cover 2 and 3. The CI/CD pipeline typically does whole bunch of other stuff in the pipeline and it is a roundabout approach to do DR. An alternative could be to create a separate DR pipeline in CI/CD.
  4. Kubernetes volume snapshot and restore feature was introduced in beta in 1.17 release. This is targeted towards item 4. This will get integrated into kubernetes distributions soon. This approach will use kubernetes api itself to do the volume snapshot and restore.
  5. Manual approach can be taken to backup and restore snapshots as described here. This is targeted towards item 4. The example described here for GCP talks about using cloud provider tool to take a volume snapshot , create a disk from the volume and then manually create a PV and attach the disk to the PV. The kubernetes deployment can use the new PV.
  6. Use backup and restore tool like Stash. This is targeted towards item 4. Stash is a pretty comprehensive tool to backup Kubernetes stateful resources. Stash provides a kubernetes operator on top of restic. Stash provides add-ons to backup common kubernetes stateful databases like postgres, mysql, mongo etc.

I will focus on Velero and Config sync in this blog.

Following is the structure of the content below. The examples are tried on GKE cluster.

Velero

Velero was previously Heptio Ark. Velero provides following functionalities:

  • Manual as well as periodic backups can be scheduled. Velero can backup and restore both kubernetes resources as well as persistent volumes.
  • Integrated natively with Amazon EBS Volumes, Azure Managed Disks, Google Persistent Disks using plugins. For some storage systems like Portworx, there is a community supported provider. Velero also integrates with Restic open source project that allows integration with any provider. This link provides complete list of supported providers.
  • Can handle snapshot consistency problem by providing pre and post hooks to flush the data before snapshot is taken.
  • Backups can be done for the complete cluster or part of the cluster like at individual namespace level.

Velero follows a client, server model. The server needs to be installed in the GKE cluster. The client can be installed as a standalone binary. Following are the installation steps for the server:

  1. Create Storage bucket
  2. Create Service account. The storage account needs to have enough permissions to create snapshots and also needs to have access to storage bucket
  3. Install velero server

Velero Installation

For the client component, I installed it in mac using brew.

brew install velero

For the server component, I followed the steps here.

Create GKE cluster

gcloud beta container --project "sreemakam-test" clusters create "prodcluster" --zone "us-central1-c" --enable-ip-alias

Create storage bucket

BUCKET=sreemakam-test-velero-backup
gsutil mb gs://$BUCKET/

Create service account with right permissions

gcloud iam service-accounts create velero \
    --display-name "Velero service account"
SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \
  --filter="displayName:Velero service account" \
  --format 'value(email)')
ROLE_PERMISSIONS=(
    compute.disks.get
    compute.disks.create
    compute.disks.createSnapshot
    compute.snapshots.get
    compute.snapshots.create
    compute.snapshots.useReadOnly
    compute.snapshots.delete
    compute.zones.get
)

gcloud iam roles create velero.server \
    --project $PROJECT_ID \
    --title "Velero Server" \
    --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \
    --role projects/$PROJECT_ID/roles/velero.server

Download service account locally
This service account is needed for the installation of Velero server.

gcloud iam service-accounts keys create credentials-velero \
    --iam-account $SERVICE_ACCOUNT_EMAIL

Set appropriate bucket permission

gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET}

Install Velero server
This has to be done after setting the right Kubernetes context.

velero install \
    --provider gcp \
    --plugins velero/velero-plugin-for-gcp:v1.0.1 \
    --bucket $BUCKET \
    --secret-file ./credentials-velero

After this, we can check that Velero is successfully installed:

$ velero version
Client:
	Version: v1.3.1
	Git commit: -
Server:
	Version: v1.3.1

Install application

To test the backup and restore feature, I have installed 2 Kubernetes application, the first is a hello go based stateless application and the second is stateful wordpress application. I have forked the GKE examples repository and made some changes for this use case.

Install hello application

kubectl create ns myapp
kubetcl apply -f hello-app/manifests -n myapp

Install wordpress application

First, we need to create sql secrets and then we can apply k8s manifests

SQL_PASSWORD=$(openssl rand -base64 18)
kubectl create secret generic mysql -n myapp \
    --from-literal password=$SQL_PASSWORD

kubectl apply -f wordpress-persistent-disks -n myapp

This application has 2 stateful resource, 1 for mysql persistent disk and another for wordpress. To validate the backup, open the wordpress page, complete the basic installation and create a test blog. This can be validated as part of restore.

Resources created

$ kubectl get secrets -n myapp
NAME                  TYPE                                  DATA   AGE
default-token-cghvt   kubernetes.io/service-account-token   3      22h
mysql                 Opaque                                1      22h

$ kubectl get deployments -n myapp
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
helloweb    1/1     1            1           22h
mysql       1/1     1            1           21h
wordpress   1/1     1            1           21h

$ kubectl get services -n myapp
NAME               TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)          AGE
helloweb           LoadBalancer   10.44.6.6     34.68.231.47     80:31198/TCP     22h
helloweb-backend   NodePort       10.44.4.178   <none>           8080:31221/TCP   22h
mysql              ClusterIP      10.44.15.55   <none>           3306/TCP         21h
wordpress          LoadBalancer   10.44.2.154   35.232.197.168   80:31095/TCP     21h

$ kubectl get ingress -n myapp
NAME       HOSTS   ADDRESS        PORTS   AGE
helloweb   *       34.96.67.172   80      22h

$ kubectl get pvc -n myapp
NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mysql-volumeclaim       Bound    pvc-3ebf86a0-8162-11ea-9370-42010a800047   200Gi      RWO            standard       21h
wordpress-volumeclaim   Bound    pvc-4017a2ab-8162-11ea-9370-42010a800047   200Gi      RWO            standard       21h

$ kubectl get pv -n myapp
kNAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                         STORAGECLASS   REASON   AGE
pvc-3ebf86a0-8162-11ea-9370-42010a800047   200Gi      RWO            Delete           Bound    myapp/mysql-volumeclaim       standard                21h
pvc-4017a2ab-8162-11ea-9370-42010a800047   200Gi      RWO            Delete           Bound    myapp/wordpress-volumeclaim   standard

Backup kubernetes cluster

The backup can be done at the complete cluster level or for individual namespaces. I will create a namespace backup now.

$ velero backup create myapp-ns-backup --include-namespaces myapp
Backup request "myapp-ns-backup" submitted successfully.
Run `velero backup describe myapp-ns-backup` or `velero backup logs myapp-ns-backup` for more details.

We can look at different commands like “velero backup describe”, “velero backup logs”, “velero get backup” to get the status of the backup. The following output shows that the backup is completed.

$ velero get backup
NAME              STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
myapp-ns-backup   Completed   2020-04-19 14:35:51 +0530 IST   29d       default            <none>

Let’s look at the snapshots created in GCP.

$ gcloud compute snapshots list
NAME                                                             DISK_SIZE_GB  SRC_DISK                                                                             STATUS
gke-prodcluster-96c83f-pvc-7a72c7dc-74ff-4301-b64d-0551b7d98db3  200           us-central1-c/disks/gke-prodcluster-96c83f-pvc-3ebf86a0-8162-11ea-9370-42010a800047  READY
gke-prodcluster-96c83f-pvc-c9c93573-666b-44d8-98d9-129ecc9ace50  200           us-central1-c/disks/gke-prodcluster-96c83f-pvc-4017a2ab-8162-11ea-9370-42010a800047  READY

Let’s look at the contents of velero storage bucket:

gsutil ls gs://sreemakam-test-velero-backup/backups/
gs://sreemakam-test-velero-backup/backups/myapp-ns-backup/

When creating snapshots, it is necessary that the snapshots are created in a consistent state when the writes are in the fly. The way Velero achieves this is by using backup hooks and sidecar container. The backup hook freezes the filesystem when backup is running and then unfreezes the filesystem after backup is completed.

Restore Kubernetes cluster

For this example, we will create a new cluster and restore the contents of namespace “myapp” to this cluster. We expect that both the kubernetes manifests as well as persistent volumes are restored.

Create new cluster

$ gcloud beta container --project "sreemakam-test” clusters create "prodcluster-backup" --zone "us-central1-c" --enable-ip-alias

Install Velero

velero install \
    --provider gcp \
    --plugins velero/velero-plugin-for-gcp:v1.0.1 \
    --bucket $BUCKET \
    --secret-file ./credentials-velero \
    --restore-only

I noticed a bug that even though we have done the installation with “restore-only” flag, the storage bucket is mounted as read-write. Ideally, it should be only “read-only” so that both clusters don’t write to the same backup location.

$ velero backup-location get
NAME      PROVIDER   BUCKET/PREFIX                  ACCESS MODE
default   gcp        sreemakam-test-velero-backup   ReadWrite

Let’s look at the backups available in this bucket:

$ velero get backup
NAME              STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
myapp-ns-backup   Completed   2020-04-19 14:35:51 +0530 IST   29d       default            <none>

Now, let’s restore this backup in the current cluster. This cluster is new and does not have any kubernetes manifests or PVs.

$ velero restore create --from-backup myapp-ns-backup
Restore request "myapp-ns-backup-20200419151242" submitted successfully.
Run `velero restore describe myapp-ns-backup-20200419151242` or `velero restore logs myapp-ns-backup-20200419151242` for more details.

Let’s make sure that the restore is completed successfully:

$ velero restore get
NAME                             BACKUP            STATUS      WARNINGS   ERRORS   CREATED                         SELECTOR
myapp-ns-backup-20200419151242   myapp-ns-backup   Completed   1          0        2020-04-19 15:12:44 +0530 IST   <none>

The restore command above would create all the manifests including namespaces, deployments and services. It will also create PVs and attach to the appropriate pods.

Let’s look at the some of the resources created:

$ kubectl get services -n myapp
NAME               TYPE           CLUSTER-IP     EXTERNAL-IP       PORT(S)          AGE
helloweb           LoadBalancer   10.95.13.226   162.222.177.146   80:30693/TCP     95s
helloweb-backend   NodePort       10.95.13.175   <none>            8080:31555/TCP   95s
mysql              ClusterIP      10.95.13.129   <none>            3306/TCP         95s
wordpress          LoadBalancer   10.95.7.154    34.70.240.159     80:30127/TCP     95s

$ kubectl get pv -n myapp
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                         STORAGECLASS   REASON   AGE
pvc-3ebf86a0-8162-11ea-9370-42010a800047   200Gi      RWO            Delete           Bound    myapp/mysql-volumeclaim       standard                5m18s
pvc-4017a2ab-8162-11ea-9370-42010a800047   200Gi      RWO            Delete           Bound    myapp/wordpress-volumeclaim   standard                5m17s

We can access the “hello” application service and “wordpress” and it should work fine. We can also check that the test blog created earlier is restored fine.

Config sync

GCP “Config sync” feature provides gitops functionality to kubernetes manifests. “Config sync” feature is installed as Kubernetes operator. When Config sync operator is installed in Kubernetes cluster, it points to a repository that holds the kubernetes manifests. The config sync operator makes sure that the state of the cluster reflects what is mentioned in the repository. Any changes to the local cluster or to the repository will trigger reconfiguration in the cluster to sync from the repository. “Config sync” feature is a subset of Anthos config management(ACM) feature in Anthos and it can be used without Anthos license. ACM provides uniform configuration and security policies across multiple kubernetes clusters. In addition to providing config sync functionality, ACM also includes policy controller piece that is based on opensource gatekeeper project.

Config sync feature can be used for 2 purposes:

  1. Maintain security policies of a k8s cluster. The application manifests can be maintained through a CI/CD system.
  2. Maintain all k8s manifests including security policies and application manifests. This approach allows us to restore cluster configuration in a DR scenario. The application manifests can still be maintained through a CI/CD system, but using CI/CD for DR might be time consuming.

In this example, we will use “Config sync” for DR purposes. Following are the components of “Config sync” feature:

  1. “nomos” CLI to manage configuration sync. It is possible that this can be integrated with kubectl later.
  2. “Config sync” operator installed in the kubernetes cluster.

Following are the features that “Config sync” provides:

  1. Config sync works with GCP CSR(code source repository), bitbucket, github, gitlab
  2. With namespace inheritance, common features can be put in abstract namespace that applies to multiple namespaces. This is useful if we want to share some kubernetes manifests across multiple clusters.
  3. Configs for specific clusters can be specified using cluster selector
  4. Default sync period is 15 seconds and it can be changed.

The repository follows the structure as below. The example below shows a sample repo with the following folders(cluster->cluster resources like quotas, rbac, security policy etc, clusterregistry->policies specific to each cluster, namespaces->application manifest under each namespace, system->operator related configs)

Repository structure(from this link)

Following are the steps that we will do below:

  1. Install “nomos” CLI
  2. Checkin configs to a repository. We will use github for this example.
  3. Create GKE cluster, make current user cluster admin. To access private git repo, we can setup kubernetes secrets. For this example, we will use public repository.
  4. Install config management CRD in the cluster
  5. Check nomos status using “nomos status” to validate that the cluster has synced to the repository.
  6. Apply kubernetes configuration changes to the repo as well as to cluster and check that the sync feature is working. This step is optional.

Installation

I installed nomos using the steps mentioned here in my mac.

Following commands download and install the operator in the GKE cluster. I have used the same cluster created in the “Velero” example.

gsutil cp gs://config-management-release/released/latest/config-sync-operator.yaml config-sync-operator.yaml
kubectl apply -f config-sync-operator.yaml

To verify that “Config sync” is running correctly, please check the following output. We should see that the pod is running successfully.

$ kubectl -n kube-system get pods | grep config-management
config-management-operator-5d4864869d-mfrd6                 1/1     Running   0          60s

Let’s check the nomos status now. As we can see below, we have not setup the repo sync yet.

$ nomos status --contexts gke_sreemakam-test_us-central1-c_prodcluster
Connecting to clusters...
Failed to retrieve syncBranch for "gke_sreemakam-test_us-central1-c_prodcluster": configmanagements.configmanagement.gke.io "config-management" not found
Failed to retrieve repos for "gke_sreemakam-test_us-central1-c_prodcluster": the server could not find the requested resource (get repos.configmanagement.gke.io)
Current   Context                                        Status           Last Synced Token   Sync Branch
-------   -------                                        ------           -----------------   -----------
*         gke_sreemakam-test_us-central1-c_prodcluster   NOT CONFIGURED                          


Config Management Errors:
gke_sreemakam-test_us-central1-c_prodcluster   ConfigManagement resource is missing

Repository

I have used the repository here and plan to syncup the “gohellorepo” folder. Following is the structure of the “gohellorepo” folder.

$ tree .
.
├── namespaces
│   └── go-hello
│       ├── helloweb-deployment.yaml
│       ├── helloweb-ingress.yaml
│       ├── helloweb-service.yaml
│       └── namespace.yaml
└── system
    └── repo.yaml

The repository describes a namespace “go-hello” and the “go-hello” directory contains kubernetes manifests for a go application.

The repository also has the “config-management.yaml” file that describes the repo that we want to sync to. Following is the content fo the file:

# config-management.yaml

apiVersion: configmanagement.gke.io/v1
kind: ConfigManagement
metadata:
  name: config-management
spec:
  # clusterName is required and must be unique among all managed clusters
  clusterName: my-cluster
  git:
    syncRepo: https://github.com/smakam/csp-config-management.git
    syncBranch: 1.0.0
    secretType: none
    policyDir: "gohellorepo"

As we can see, we want to sync to the “gohellorepo” folder in git repo “https://github.com/smakam/csp-config-management.git&#8221;

Syncing to the repo

Following command syncs the cluster to the github repository:

kubectl apply -f config-management.yaml

Now, we can look at “nomos status” to check if the sync is successful. As we can see from “SYNCED” status, the sync is successful.

$ nomos status --contexts gke_sreemakam-test_us-central1-c_prodcluster
Connecting to clusters...
Current   Context                                        Status           Last Synced Token   Sync Branch
-------   -------                                        ------           -----------------   -----------
*         gke_sreemakam-test_us-central1-c_prodcluster   SYNCED           020ab642            1.0.0 

Let’s look at the kubernetes resources to make sure that the sync is successful. As we can see below, the namespace and the appropriate resources got created in the namespace “go-hello”

$ kubectl get ns
NAME                       STATUS   AGE
config-management-system   Active   23m
default                    Active   27h
go-hello                   Active   6m26s

$ kubectl get services -n go-hello
NAME               TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
helloweb           LoadBalancer   10.44.6.41     34.69.102.8   80:32509/TCP     3m25s
helloweb-backend   NodePort       10.44.13.224   <none>        8080:31731/TCP   3m25s

$ kubectl get deployments -n go-hello
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
helloweb   1/1     1            1           7m

As a next step, we can make changes to the repo and check that the changes are pushed to the cluster. If we make any manual changes. to the cluster, config sync operator will check with the repo and push the changes back to the cluster. For example, if we delete namespace “go-hello” manually in the cluster, we will see that after 30 seconds or so, the namespace configuration is pushed back and recreated in the cluster.

References

Service to service communication within GKE cluster

In my last blog, I covered options to access GKE services from external world. In this blog, I will cover service to service communication options within GKE cluster. Specifically, I will cover the following options:

  • Cluster IP
  • Internal load balancer(ILB)
  • Http internal load balancer
  • Istio
  • Traffic director

In the end, I will also compare these options and suggest matching requirement to a specific option. For each of the options, I will deploy a helloworld service with 2 versions and then have a client access the hello service. The code that includes manifest files for all the options is available in my github project here.

Pre-requisites

Create a VPC native GKE cluster with 4 nodes. Have Istio and Httploadbalancing addon enabled.

gcloud beta container clusters create demo-cluster   --zone us-central1-b   --scopes=https://www.googleapis.com/auth/cloud-platform   --num-nodes=4   --enable-ip-alias   --addons=HttpLoadBalancing,Istio --istio-config=auth=MTLS_PERMISSIVE

The additional scope is needed for the cluster to access traffic director API. This can be achieved through service account as well.

cluster IP

“clusterIP” is the default option for services to talk to each other. Each service exposes a VIP and kube-dns is used to map service name to IP address.

ClusterIP flow

The manifest file consists of 2 deployments for the 2 hello versions and 2 services exposing the 2 versions. There is also a “client” deployment that accesses the “hello” services.

Following command deploys the application with all deployments and services:

kubectl apply -f clusterip

Following outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
hello        ClusterIP   10.113.9.117    <none>        8080/TCP   107m
hello2       ClusterIP   10.113.14.114   <none>        8080/TCP   107m
kubernetes   ClusterIP   10.113.0.1      <none>        443/TCP    15d
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           108m
hello    2/2     2            2           108m
hello2   2/2     2            2           108m
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
client-77c749d7f8-x4khg   1/1     Running   0          108m
hello-bc787595d-55925     1/1     Running   0          108m
hello-bc787595d-d5b95     1/1     Running   0          108m
hello2-7494666cc7-k4lj7   1/1     Running   0          108m
hello2-7494666cc7-rnllc   1/1     Running   0          108m

Following outputs shows the “client” pod accessing “hello” and “hello2” services:

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -- curl -s hello
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-fdhnq

$ kubectl exec $client -- curl -s hello2
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-dv4rj

Internal load balancer(ILB)

The primary usecase for ILB is applications residing outside the GKE cluster to access GKE services that are in the same network. ILB operates at L4 and is a regional service. Global access from outside the region is possible as a “beta” option currently. This is very similar to GKE Network load balancer with the only difference being the IP’s exposed here is internal to the VPC. In the Kubernetes service manifest, we can set the type to “Load balancer” with an additional annotation stating that the load balancer type is “internal”. This takes care of creating the load balancer as well as setting up the backends.

ILB flow

Following annotation specifies that the load balancer type is “internal”:

annotations:
    cloud.google.com/load-balancer-type: "Internal"

Following command deploys the application with all deployments and services:

kubectl apply -f ilb

Following outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
hello        LoadBalancer   10.113.12.203   <pending>     80:31503/TCP   17s
hello2       LoadBalancer   10.113.0.238    <pending>     80:32343/TCP   18s
kubernetes   ClusterIP      10.113.0.1      <none>        443/TCP        16d

$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           37s
hello    2/2     2            2           33s
hello2   2/2     2            2           35s

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
client-77c749d7f8-4sp57   1/1     Running   0          42s
hello-bc787595d-jkk4g     1/1     Running   0          38s
hello-bc787595d-sl2wx     1/1     Running   0          38s
hello2-7494666cc7-cph5m   1/1     Running   0          40s
hello2-7494666cc7-p6jst   1/1     Running   0          40s

Following command shows the 2 internal load balancers that are created for the 2 versions of “hello” service:

$ gcloud compute forwarding-rules list
NAME                              REGION        IP_ADDRESS      IP_PROTOCOL  TARGET
a8bb317531c3a11eab92442010a80009  us-central1   10.128.0.51     TCP          us-central1/backendServices/a8bb317531c3a11eab92442010a80009
a8c71e8cb1c3a11eab92442010a80009  us-central1   10.128.0.52     TCP          us-central1/backendServices/a8c71e8cb1c3a11eab92442010a80009

Following outputs shows the “client” pod accessing “hello” and “hello2” services through ILB:

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -- curl -s 10.128.0.51
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-p6jst

$ kubectl exec $client -- curl -s 10.128.0.52
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-jkk4g

Istio

Istio provides control plane for service mesh and envoy provides the data plane. Istio provides a lot of features around traffic redirection, telemetry and encryption. The best part of Istio is that these features can be achieved without changing the source application. In this example, we will use Istio to connect the client service with the hello service. When the cluster was created, Istio was enabled as add-on in the cluster. The first step is to enable default proxy injection.

Istio flow
kubectl label namespace default istio-injection=enabled --overwrite

Following command deploys the application with all deployments and services:

kubectl apply -f istio

Following outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
hello        ClusterIP   10.113.4.179   <none>        80/TCP    11s
kubernetes   ClusterIP   10.113.0.1     <none>        443/TCP   16d

$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           16s
hello    2/2     2            2           16s
hello2   2/2     2            2           15

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
client-77c749d7f8-lt8n5   2/2     Running   0          23s
hello-b6db8c6bf-lq6cp     2/2     Running   0          23s
hello-b6db8c6bf-xd5q8     2/2     Running   0          23s
hello2-68c4445dfc-d6slt   2/2     Running   0          22s
hello2-68c4445dfc-p87w9   2/2     Running   0          22s

As we can see in the pods output, all the pods have 2 containers. 1 of them is the main container and another is the envoy proxy container.

Following outputs shows the containers of a single pod. We can see that there are 3 containers, first is the Init container that sets up some networking constructs and goes away after that. The second is the application container, third is the proxy container.

$ kubectl describe pod hello-b6db8c6bf-lq6cp | grep -A1 "Container ID"
    Container ID:  docker://d080a0ffb61dc9efe437774c822b7296b8c18c732c8a0609334d100ed7fdd835
    Image:         gke.gcr.io/istio/proxy_init:1.1.16-gke.0
--
    Container ID:   docker://7eab7155a12e7652ad0886a84b9b164b71008a4d64c3ac38b423bd19408f7740
    Image:          gcr.io/google-samples/hello-app:1.0
--
    Container ID:  docker://8461e711321349d02665d8835542a5f56d9be0586b8d737481813657ab755967
    Image:         gke.gcr.io/istio/proxyv2:1.1.16-gke.0

Initially, I could not get this to work and after struggling for half a day, I found that the root cause was a weird requirement in Istio about port naming convention. This requirement is captured here, it is needed that the service port should have the following convention “name: <protocol>[-<suffix>]”. I renamed my service port from “hello” to “http-hello” and things started working fine. I learnt this tip from this NEXT19 video on Istio debugging. This video is very helpful to debug Istio issues.

Following outputs shows client accessing the 2 hello versions. We have setup the Istio virtualservice rules for “/v1” to land up in version 1, “/v2” to land up in version 2 and default to version 2.

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -c client -- curl -s hello/v1
Hello, world!
Version: 1.0.0
Hostname: hello-b6db8c6bf-xd5q8

$ kubectl exec $client -c client -- curl -s hello/v2
Hello, world!
Version: 2.0.0
Hostname: hello2-68c4445dfc-d6slt

$ kubectl exec $client -c client -- curl -s hello
Hello, world!
Version: 2.0.0
Hostname: hello2-68c4445dfc-p87w9

Http Internal load balancer

Http internal load balancer is regional L7 load balancer that is implemented underneath using Envoy proxy. This feature is in “beta” currently. In the case of external HTTP load balancer, its integrated well with Kubernetes “Ingress” type and all the GCP load balancer configurations are created automatically. In the case of HTTP ILB, configurations have to be done manually. A proxy subnet needs to be created in every region where HTTP load balancer needs to be placed.

HTTP ILB flow

Following command deploys the application with all deployments and services:

kubectl apply -f http-ilb

Following annotation in the service file tells GKE to create NEG associated with the services.

annotations:
    cloud.google.com/neg: '{"exposed_ports":{"80":{}}}'

Following command shows the 2 NEGs that gets created. Since each deployment contains 2 replicas, we see that there are 2 endpoints associated with each NEG.

$ gcloud compute network-endpoint-groups list
NAME                                        LOCATION       ENDPOINT_TYPE   SIZE
k8s1-757bee5a-default-hello-80-bdd7955e     us-central1-b  GCE_VM_IP_PORT  2
k8s1-757bee5a-default-hello2-80-a882c49f    us-central1-b  GCE_VM_IP_PORT  2

Lets store the NEG’s in a variable that can be used later when we associate backend service to NEG:

$ neg1=k8s1-757bee5a-default-hello-80-bdd7955e
$ neg2=k8s1-757bee5a-default-hello2-80-a882c49f

Following are the steps to create HTTP ILB.

  • Create proxy subnet
  • Create firewall rules for communication within subnet as well as communication from proxy subnet to backend subnet
  • Create backend services, http load balancer components and tie them to the firewall.

Create proxy subnet:

Following command creates proxy subnet. The cluster is in the “default” subnet in us-central1 region. Proxy subnet is created in the same “us-central1” region.

gcloud beta compute networks subnets create proxy-subnet \
  --purpose=INTERNAL_HTTPS_LOAD_BALANCER \
  --role=ACTIVE \
  --region=us-west1 \
  --network=lb-network \
  --range=10.129.0.0/26

Create firewall rules:

Following set of commands creates the necessary firewall rules. 1 thing that I found the hard way is to make sure that proxy firewall rule has to contain all ports that container exposes. In my case, container is exposing port 8080, so I needed to add to the proxy firewall rule.

#firewall rule to communicate within backend subnet
gcloud compute firewall-rules create fw-allow-backend-subnet \
    --network=lb-network \
    --action=allow \
    --direction=ingress \
    --source-ranges=10.1.2.0/24 \
    --rules=tcp,udp,icmp

#allow ssh
gcloud compute firewall-rules create fw-allow-ssh \
    --network=lb-network \
    --action=allow \
    --direction=ingress \
    --target-tags=allow-ssh \
    --rules=tcp:22

#allow health check
gcloud compute firewall-rules create fw-allow-health-check \
    --network=lb-network \
    --action=allow \
    --direction=ingress \
    --source-ranges=130.211.0.0/22,35.191.0.0/16 \
    --target-tags=load-balanced-backend \
    --rules=tcp

# proxies to connect to backend
gcloud compute firewall-rules create fw-allow-proxies \
  --network=lb-network \
  --action=allow \
  --direction=ingress \
  --source-ranges=10.129.0.0/26 \
  --target-tags=load-balanced-backend \
  --rules=tcp:80,tcp:443,tcp:8080

Create backend services and load balancer components:

Following set of commands creates health checks, backend services, associates backend services to NEG, http proxies and forwarding rules.

#health check create
gcloud beta compute health-checks create http l7-ilb-gke-basic-check \
--region=us-west1 \
--use-serving-port

# create backend service1
gcloud beta compute backend-services create l7-ilb-gke-backend-service1 \
--load-balancing-scheme=INTERNAL_MANAGED \
--protocol=HTTP \
--health-checks=l7-ilb-gke-basic-check \
--health-checks-region=us-west1 \
--region=us-west1

# create backend service2
gcloud beta compute backend-services create l7-ilb-gke-backend-service2 \
--load-balancing-scheme=INTERNAL_MANAGED \
--protocol=HTTP \
--health-checks=l7-ilb-gke-basic-check \
--health-checks-region=us-west1 \
--region=us-west1

# add neg associated with backend service 1
gcloud beta compute backend-services add-backend l7-ilb-gke-backend-service1 --network-endpoint-group=$neg1  --network-endpoint-group-zone=us-west1-b  --region=us-west1  --balancing-mode=RATE  --max-rate-per-endpoint=5

# add neg associated with backend service 2
gcloud beta compute backend-services add-backend l7-ilb-gke-backend-service2 \
 --network-endpoint-group=$neg2 \
 --network-endpoint-group-zone=us-west1-b \
 --region=us-west1 \
 --balancing-mode=RATE \
 --max-rate-per-endpoint=5

# create url map for /hello1 and /hello2
gcloud beta compute url-maps create hello-map    --default-service=l7-ilb-gke-backend-service1 --region=us-west1

gcloud beta compute url-maps add-path-matcher hello-map \
  --default-service l7-ilb-gke-backend-service1 \
  --path-matcher-name pathmap-port \
  --path-rules=/hello1=l7-ilb-gke-backend-service1,/hello2=l7-ilb-gke-backend-service2 \
  --new-hosts=”*” --region=us-west

# create target proxy
gcloud beta compute target-http-proxies create l7-ilb-gke-proxy \
--url-map=hello-map \
--url-map-region=us-west1 \
--region=us-west1
# create forwarding rule

# create forwarding rule
gcloud beta compute forwarding-rules create l7-ilb-gke-forwarding-rule \
--load-balancing-scheme=INTERNAL_MANAGED \
--network=lb-network \
--subnet=backend-subnet \
--address=10.1.2.199 \
--ports=80 \
--region=us-west1 \
--target-http-proxy=l7-ilb-gke-proxy \
--target-http-proxy-region=us-west1

Following set of outputs shows the services, deployments and pods:

$ kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
hello        ClusterIP   10.113.6.88    <none>        80/TCP    40m
hello2       ClusterIP   10.113.6.139   <none>        80/TCP    40m
kubernetes   ClusterIP   10.113.0.1     <none>        443/TCP   43m
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
client   1/1     1            1           40m
hello    2/2     2            2           40m
hello2   2/2     2            2           40m
sreemakam-macbookpro2:gke-internal-services sreemakam$ kubectl get pods
NAME                        READY   STATUS        RESTARTS   AGE
client-77c749d7f8-xjqmw     1/1     Running       0          40m
hello-bc787595d-8bcrt       1/1     Running       0          40m
hello-bc787595d-pddsz       1/1     Running       0          40m
hello2-7494666cc7-rm788     1/1     Running       0          40m
hello2-7494666cc7-rqhk6     1/1     Running       0          40m

Following outputs shows client accessing the 2 hello versions.

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

$ kubectl exec $client -- curl -s 10.128.0.199
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-pddsz

$ kubectl exec $client -- curl -s 10.128.0.199/hello1
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-8bcrt

$ kubectl exec $client -- curl -s 10.128.0.199/hello2
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-rm788

“10.128.0.199” is the IP address in “default” subnet that is created using forwarding rules. This gets intercepted by the proxy ip address in the proxy subnet and gets forwarded to the backend services.

Traffic director

In the Istio service mesh world, Pilot along with Mixer and Citadel provides the control plane and Envoy provides the data plane. Traffic director provides a GCP managed Pilot service which will go ahead and program Envoy using open standard xDS api. Traffic director provides global load balancing for internal service to service communication.

Traffic Director flow

Following command deploys the application with all deployments and services:

kubectl apply -f traffic-director

Following command deploys the application with all deployments and services:

$ gcloud compute network-endpoint-groups list
NAME                                        LOCATION       ENDPOINT_TYPE   SIZE
k8s1-757bee5a-default-hello-80-bdd7955e     us-central1-b  GCE_VM_IP_PORT  2
k8s1-757bee5a-default-hello2-80-a882c49f    us-central1-b  GCE_VM_IP_PORT  2

Lets store the NEG’s in a variable that can be used later when we associate backend service to NEG:

$ neg1=k8s1-757bee5a-default-hello-80-bdd7955e
$ neg2=k8s1-757bee5a-default-hello2-80-a882c49f

Following are the steps to create Traffic director setup:

  • Enable traffic director api
  • Enable the cluster service account to access traffic director api. In my case, this is done by adding the corresponding access scope when creating the cluster
  • Enable the cluster service account to have “networkviewer” role. Since I have used default service account, I have added “networkviewer” role to it. This is not a good practise though.
  • Create load balancer components that includes creating healthchecks, firewall rules, backend services, associating NEG’s to backend services, http proxies, forwarding rules etc. This step is similar to HTTP load balancer scenario.
  • Update client pod with manual envoy proxy.

Enable traffic director api

Following command enables traffic director api

gcloud services enable trafficdirector.googleapis.com

Cluster service account additions

Following commands sets up the networkviewer role. Access to traffic director api is done as part of creating the cluster.

PROJECT=`gcloud config get-value project`
SERVICE_ACCOUNT_EMAIL=`gcloud iam service-accounts list \
  --format='value(email)' \
  --filter='displayName:Compute Engine default service account'`
gcloud projects add-iam-policy-binding ${PROJECT} \
  --member serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
  --role roles/compute.networkViewer

Creating load balancer components

Following commands creates firewalls, healthchecks, backend services and forwarding rules.

# create firewall rules for health check
gcloud compute firewall-rules create fw-allow-health-checks \
    --network NETWORK_NAME \
    --action ALLOW \
    --direction INGRESS \
    --source-ranges 35.191.0.0/16,130.211.0.0/22 \
    --rules tcp

# create health check
gcloud compute health-checks create http td-gke-health-check \
    --use-serving-port
# create backend service 1

# create backend service 1
gcloud compute backend-services create td-gke-service \
    --global \
    --health-checks td-gke-health-check \
    --load-balancing-scheme INTERNAL_SELF_MANAGED

# add NEG to backend service 1
$ gcloud compute backend-services add-backend td-gke-service     --global     --network-endpoint-group $neg1 --network-endpoint-group-zone us-central1-b     --balancing-mode RATE     --max-rate-per-endpoint 5

# create backend service 2
gcloud compute backend-services create td-gke-service1 \
    --global \
    --health-checks td-gke-health-check \
    --load-balancing-scheme INTERNAL_SELF_MANAGED

# add NEG to backend service 2
$ gcloud compute backend-services add-backend td-gke-service1     --global     --network-endpoint-group $neg2  --network-endpoint-group-zone us-central1-b    --balancing-mode RATE     --max-rate-per-endpoint 5

# url map
gcloud compute url-maps create td-gke-url-map --default-service td-gke-service

# path matcher
gcloud compute url-maps add-path-matcher td-gke-url-map --default-service td-gke-service --path-matcher-name pathmap-port --path-rules=/hello1=td-gke-service,/hello2=td-gke-service1 --new-hosts=”hello”

# create proxy
gcloud compute target-http-proxies create td-gke-proxy \
   --url-map td-gke-url-map

# create forwarding rule
gcloud compute forwarding-rules create td-gke-forwarding-rule \
  --global \
  --load-balancing-scheme=INTERNAL_SELF_MANAGED \
  --address=0.0.0.0 \
  --target-http-proxy=td-gke-proxy \
  --ports 80 --network default

The “address” field when specifying forwarding rule says that the actual address does not matter. URL map is used for forwarding to the right backend service.

Proxy injection

Currently, the proxy injection has to be done manually on the client pod. Following command shows the 3 containers that are part of the client pod. Init container sets up the networking and gets killed. The other 2 containers are the app container and the proxy container.

$ kubectl describe pods $client | grep -i -A1 "container id"
    Container ID:  docker://958051684b134d4e606f18620dcc426830bc3fd805cd1ab34580b6e389a35e58
    Image:         docker.io/istio/proxy_init:1.2.4
--
    Container ID:  docker://71a0fd151546fc08fdac1195d468ae091880d8dc445d1fc8589f3a19602c69bc
    Image:         byrnedo/alpine-curl
--
    Container ID:  docker://00db2ae069083a8ab04f3206c5cc4486e84b92595c97d152704d76db46ee6cc6
    Image:         docker.io/istio/proxyv2:1.2.4

Following outputs shows client accessing the 2 hello versions.

$ client=$(kubectl get pods -l run=client -o=jsonpath='{.items[0].metadata.name}')

# curl hello/hello1
Hello, world!
Version: 1.0.0
Hostname: hello-bc787595d-s5dt2

/ # curl hello/hello2
Hello, world!
Version: 2.0.0
Hostname: hello2-7494666cc7-wj2ft

How to decide?

Now that we have covered different approaches to talk between Kubernetes services in a GKE cluster, lets talk about how to decide the right approach for a specific usecase.

Following table shows a comparison between the different options:

References

Ingress into GKE Cluster

In this blog, I will talk about different options for getting traffic from external world into GKE cluster. The options described are:

  • Network load balancer(NLB)
  • Http load balancer with ingress
  • Http load balancer with Network endpoint groups(NEG)
  • nginx Ingress controller
  • Istio ingress gateway

For each of the above options, I will deploy a simple helloworld service with 2 versions and show access from outside world to the 2 versions of the application. The code is available in my github project here. You can clone and play with it if needed. In the end, I will also discuss about choosing the right option based on the requirement.

Pre-requisites

For all the examples below, we need to create a GKE cluster with some basic configurations. Following command sets up a 3 node GKE cluster with VPC native mode:

gcloud container clusters create demo-cluster --num-nodes=3 --zone=us-central1-b --enable-ip-alias

Following command installs Istio add-on to the cluster. This is needed for Istio ingress gateway option. Istio can also be installed as a helm package.

gcloud beta container clusters update demo-cluster --project sreemakam-demo --zone us-central1-b\
    --update-addons=Istio=ENABLED --istio-config=auth=MTLS_PERMISSIVE

Following command installs nginx controller using Helm. Before doing this step, it is needed to install Helm client and server part.

Install helm:
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller

Install nginx controller:
helm install --name nginx-ingress stable/nginx-ingress --set rbac.create=true --set controller.publishService.enabled=true

Network load balancer(NLB)

NLB works at L4 level. To expose multiple versions, we need to deploy multiple NLBs. Each NLB will expose a public IP address.

Following command deploys the application:

kubectl apply -f nlb

Following command shows the 2 NLBs that are created:

gcloud compute forwarding-rules list
NAME                              REGION        IP_ADDRESS       IP_PROTOCOL  TARGET
a209565130eda11ea90fe42010a80019  us-central1   104.154.216.240  TCP          us-central1/targetPools/a209565130eda11ea90fe42010a80019
a20ec964c0eda11ea90fe42010a80019  us-central1   34.69.21.178     TCP          us-central1/targetPools/a20ec964c0eda11ea90fe42010a80019

Following outputs shows the services, deployments and pods:

kubectl get services
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP       PORT(S)          AGE
hello        LoadBalancer   10.113.1.198   34.69.21.178      8080:31482/TCP   59m
hello2       LoadBalancer   10.113.3.7     104.154.216.240   8080:31550/TCP   59m
kubernetes   ClusterIP      10.113.0.1     <none>            443/TCP          8d

kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
hello    2/2     2            2           59m
hello2   2/2     2            2           59m

kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
hello-bc787595d-4k27t     1/1     Running   0          59m
hello-bc787595d-smvdl     1/1     Running   0          94m
hello2-7494666cc7-sk2sv   1/1     Running   0          59m
hello2-7494666cc7-kl2pq   1/1     Running   0          59m

To access the hello v1 of the service, we need to do:

curl 34.69.21.178:8080

To access the hello v1 of the service, we need to do:

curl 104.154.216.240:8080

Http load balancer with ingress

Http load balancer works at L7. The ingress object in Kubernetes creates Global load balancer in GCP.

Following command deploys the application:

kubectl apply -f ingress

Following command shows the forwarding rules for HTTP load balancer:

$ gcloud compute forwarding-rules list

NAME                                             REGION        IP_ADDRESS      IP_PROTOCOL  TARGET
k8s-fw-default-fanout-ingress--fcdc22763fbe5b3a                34.102.181.115  TCP          k8s-tp-default-fanout-ingress--fcdc22763fbe5b3a

Following outputs shows the services, deployments, pods and ingress:

$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hello        NodePort    10.113.1.0      <none>        8080:31555/TCP   11h
hello2       NodePort    10.113.13.246   <none>        8080:30233/TCP   11h
kubernetes   ClusterIP   10.113.0.1      <none>        443/TCP          8d

$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
hello    2/2     2            2           11h
hello2   2/2     2            2           11h

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
hello-bc787595d-qjrmr     1/1     Running   0          11h
hello-bc787595d-ra3ar     1/1     Running   0          11h
hello2-7494666cc7-5h588   1/1     Running   0          11h
hello2-7494666cc7-faeaw   1/1     Running   0          11h

$ kubectl get ingress
curl NAME             HOSTS   ADDRESS          PORTS   AGE
fanout-ingress   *       34.102.181.115   80      20m

To access the hello v1 of the service, we need to do:

curl 34.102.181.115/v1

To access the hello v2 of the service, we need to do:

curl 34.102.181.115/v2

HTTP Load balancer with NEG

Network endpoint groups are groups of Network endpoints which can be tied to Load balancer as backends. NEGs are useful for Container native load balancing where each Container can be represented as endpoint to the load balancer. With Container native load balancing, the advantages are better load distribution, removal of extra hop that reduces latency and better health check. Without Container native load balancing, load balancer would distribute packets to each node and iptables in each node would further distribute the packets to each pod/container and this adds the extra hop.

Following command deploys the application with HTTP load balancer and NEG:

kubectl apply -f ingress-neg

The only additional configuration needed to setup Ingress with NEG is to add the following annotation to the ingress service manifest file:

annotations:
    cloud.google.com/neg: '{"ingress": true}'

Following command shows the forwarding rules for HTTP load balancer:

$ gcloud compute forwarding-rules list
NAME                                             REGION        IP_ADDRESS      IP_PROTOCOL  TARGET
k8s-fw-default-fanout-ingress--fcdc22763fbe5b3a                34.102.181.115  TCP          k8s-tp-default-fanout-ingress--fcdc22763fbe5b3a

Following outputs shows the services, deployments, pods and ingress:

$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hello        NodePort    10.113.10.187   <none>        8080:32046/TCP   2m38s
hello2       NodePort    10.113.4.7      <none>        8080:32420/TCP   2m38s
kubernetes   ClusterIP   10.113.0.1      <none>        443/TCP          8d

$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
hello    2/2     2            2           2m45s
hello2   2/2     2            2           2m46s

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
hello-bc787595d-64k8f     1/1     Running   0          2m50s
hello-bc787595d-fsw31     1/1     Running   0          2m50s
hello2-7494666cc7-ldvz8   1/1     Running   0          2m51s
hello2-7494666cc7-s314f   1/1     Running   0          2m51s

$ kubectl get ingress
NAME             HOSTS   ADDRESS          PORTS   AGE
fanout-ingress   *       34.102.181.115   80      19m

Following outputs shows the network endpoint groups and the endpoints associated with the network endpoint groups. There are 2 network endpoint groups, 1 for hello v1 service and another for hello v2 service. Each NEG has 2 pods that are part of the service.

gcloud compute network-endpoint-groups list
NAME                                        LOCATION       ENDPOINT_TYPE   SIZE
k8s1-fcdc2276-default-hello-8080-ae85b431   us-central1-b  GCE_VM_IP_PORT  2
k8s1-fcdc2276-default-hello2-8080-6ef9b812  us-central1-b  GCE_VM_IP_PORT  2

$ gcloud compute network-endpoint-groups list-network-endpoints --zone us-central1-b k8s1-fcdc2276-default-hello-8080-ae85b431
INSTANCE                                     IP_ADDRESS  PORT
gke-demo-cluster-default-pool-cf9e9717-vx2r  10.48.0.45  8080
gke-demo-cluster-default-pool-cf9e9717-bdfk  10.48.1.29  8080

$ gcloud compute network-endpoint-groups list-network-endpoints --zone us-central1-b k8s1-fcdc2276-default-hello2-8080-6ef9b812
INSTANCE                                     IP_ADDRESS  PORT
gke-demo-cluster-default-pool-cf9e9717-vx2r  10.48.0.44  8080
gke-demo-cluster-default-pool-cf9e9717-g02w  10.48.2.30  8080

To access hello version v1, we need to do:

curl 34.102.181.115/v1

To access hello version v2, we need to do:

curl 34.102.181.115/v2

Istio Ingress gateway

Istio provides service mesh functionality. In addition to providing a mesh between services, Istio also provides Ingress gateway functionality that takes care of traffic control and routing features for traffic entering the mesh from outside world.

Istio injects an Envoy proxy container into each pod which takes care of traffic management and routing without the individual applications being aware of it. The injection can either be done automatically or manually. Following command setups up automatic proxy injection:

kubectl label namespace default istio-injection=enabled

I have used 2 approaches to deploy the service using Istio.

Istio Approach 1:

In this approach, I have used Virtual service to demux the 2 versions of helloworld to separate services. This is not an optimal approach as we cannot use the advantages of Istio service mesh. I have put this approach just for reference.

In the pre-requisites section, I have mentioned the steps to add Istio to the GKE cluster. Following command deploys the application with Istio ingress gateway.

kubectl apply -f istio

Istio ingress gateway sets up GCP NLB. NLB provides connectivity from external world and Istio ingress gateway takes care of the http routing rules. Following command shows the Istio deployments in istio-system namespace:

$ kubectl get deployments --namespace istio-system
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
istio-citadel            1/1     1            1           8d
istio-galley             1/1     1            1           8d
istio-ingressgateway     1/1     1            1           8d
istio-pilot              1/1     1            1           8d
istio-policy             1/1     1            1           8d
istio-sidecar-injector   1/1     1            1           8d
istio-telemetry          1/1     1            1           8d
promsd                   1/1     1            1           8d

Following command shows the NLB created by Istio ingress gateway:

$ gcloud compute forwarding-rules list

NAME                              REGION        IP_ADDRESS      IP_PROTOCOL  TARGET
af95f4a780fa711ea9ae142010a80011  us-central1   35.232.71.209   TCP          us-central1/targetPools/af95f4a780fa711ea9ae142010a80011

Following outputs shows the services, deployments, pods and ingress:

$ kubectl get services
NAME                            TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
hello                           NodePort       10.113.3.95     <none>           8080:32133/TCP               5h58m
hello2                          NodePort       10.113.2.177    <none>           8080:31628/TCP               5h58m
kubernetes                      ClusterIP      10.113.0.1      <none>           443/TCP                      9d

$ kubectl get deployments
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
hello    2/2     2            2           5h50m
hello2   2/2     2            2           5h50m

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
hello-bc787595d-h7h6j     2/2     Running   0          5h50m
hello-bc787595d-sq7z4     2/2     Running   0          5h50m
hello2-7494666cc7-t7bnq   2/2     Running   0          5h50m
hello2-7494666cc7-tx9q2   2/2     Running   0          5h50m

In the above command, there are 2 containers in each pod. The first container is the application container and the second one is the proxy container.

To access the application, do the following:

export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
export GATEWAY_URL=$INGRESS_HOST:$INGRESS_PORT
curl http://$GATEWAY_URL/v1
curl http://$GATEWAY_URL/v2

In my case, I have specified “Hosts” field as “*” in the gateway and virtualservice, so I did not specify the host when doing the curl. If you have specified a hostname, use the curl -H option to access the service.

Istio Approach 2

In this approach, the 2 different versions of hello service is represented as subsets in Virtualservice manifest and using destinationrule, we can redirect to the right pod in the kubernetes cluster.

Virtualservice has 2 subsets, 1 for version v1 and another for version v2. All traffic management configuration happens through Virtualservice. The “host” field in virtual service maps to the kubernetes service name. Virtual service directs traffic to the right destination and the destination rule maps the subset to the appropriate labels and also sets up load balancing logic.

Following command sets up the application with virtual service pointing to the following mapping(/v1->hello version v1, v2->hello version v2, default-hello version v2):

kubectl apply -f istio1

Following command shows the rules in Virtual service:

Http:
    Match:
      Uri:
        Prefix:  /v1
    Route:
      Destination:
        Host:    hello
        Subset:  v1
    Match:
      Uri:
        Prefix:  /v2
    Route:
      Destination:
        Host:    hello
        Subset:  v2
    Route:
      Destination:
        Host:    hello
        Subset:  v2

Now, lets apply another virtual service that maps all traffic to version v1. This will map all traffic to hello version v1.

kubectl apply -f istio1/traffic-mgmt/istio-service-v1.yaml

Following command shows the rules in Virtual service:

Http:
    Route:
      Destination:
        Host:    hello
        Subset:  v1

Now, lets apply another virtual service that maps 20% of traffic to hello version v1 and 80% of traffic to hello version v2.

kubectl apply -f istio1/traffic-mgmt/istio-service-v1-20-v2-80.yaml

Following command shows the rules in Virtual service for the above configuration:

Http:
    Route:
      Destination:
        Host:    hello
        Subset:  v1
      Weight:    20
      Destination:
        Host:    hello
        Subset:  v2
      Weight:    80

Istio with GCP http load balancer

By default, Istio ingress gateway does not allow usage of GCP HTTP load balancer. There could be scenarios where its an advantage to use GCP HTTP load balancer like needing to use GCP managed certificates, integrate GCP load balancer with cloud armor, CDN etc. For these types of cases, it makes more sense to use GCP global load balancer and point it to Istio ingress gateway. In this case, GCP global load balancer would do the https termination while service level redirection will be done by Istio ingress gateway. The flow would look something like this:

Istio with GCP load balancer

I referred this example to get it working, but I was not able to get it working completely. Following are the steps that I used:

  • Modify the default Istio ingress controller to use “Nodeport” type and NEG.
  • Write a custom healthcheck. Since the default GCP LB health check sends to port 80 “/” and Istio health check is at a different point(port 15020 on the /healthz/ready), we need to write a virtual service to do the remapping. This service is specified in healthcheck.yaml.
  • Deploy Istio gateway, virtualservice and the kubernetes ingress, services, deployments and pods.

Modify Istio ingress controller

Apply the following patch in istio-http-lb/gateway-patch directory to the Istio ingress controller:

kubectl -n istio-system patch svc istio-ingressgateway     --type=json -p="$(cat istio-ingressgateway-patch.json)"     --dry-run=true -o yaml | kubectl apply -f -

Deploy services

kubectl apply -f istio-http-lb

For some reason, the health check was still failing. If someone has got this working, please ping me.

nginx Ingress controller

Ingress Kubernetes manifest file defaults to Global load balancer in GKE. GKE also supports integration with third party load balancers like nginx using annotation in the Ingress file. The first step would be to install nginx controller in the GKE cluster. As mentioned in the pre-requisites, this can be done using helm.

Following command shows the nginx services running after deploying nginx controller using helm. nginx-ingress-controller is the controller service, nginx-ingress-default-backend service provides the health check.

$ kubectl get services
NAME                            TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
nginx-ingress-controller        LoadBalancer   10.113.7.70    34.70.249.19   80:31545/TCP,443:31234/TCP   28h
nginx-ingress-default-backend   ClusterIP      10.113.1.89    <none>         80/TCP                       28h

Following annotation in the Ingress manifest file sets up usage of nginx controller.

annotations:
    kubernetes.io/ingress.class: nginx

Following command deploys the application with nginx ingress controller:

kubectl apply -f nginx

Following outputs shows the services, deployments, pods and ingress:

$ kubectl get services
NAME                            TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
hello                           NodePort       10.113.3.175   <none>         8080:30157/TCP               29h
hello2                          NodePort       10.113.1.11    <none>         8080:30693/TCP               29h
kubernetes                      ClusterIP      10.113.0.1     <none>         443/TCP                      46h
nginx-ingress-controller        LoadBalancer   10.113.7.70    34.70.249.19   80:31545/TCP,443:31234/TCP   28h
nginx-ingress-default-backend   ClusterIP      10.113.1.89    <none>         80/TCP                       28h

$ kubectl get deployments
NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
hello                           2/2     2            2           29h
hello2                          2/2     2            2           29h
nginx-ingress-controller        1/1     1            1           29h
nginx-ingress-default-backend   1/1     1            1           29h

$ kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
hello-bc787595d-cwm52     1/1     Running   0          94m
hello-bc787595d-smvdl     1/1     Running   0          94m
hello2-7494666cc7-nkl8w   1/1     Running   0          94m
hello2-7494666cc7-sb9hm   1/1     Running   0          94m

$ kubectl get ingress
NAME               HOSTS   ADDRESS        PORTS   AGE
ingress-resource   *       34.70.249.19   80      29h

Following command shows the NLB created by nginx ingress controller:

$ gcloud compute forwarding-rules list
NAME                              REGION        IP_ADDRESS      IP_PROTOCOL  TARGET
aa83c60c4103d11ea9ae142010a80011  us-central1   34.70.249.19    TCP          us-central1/targetPools/aa83c60c4103d11ea9ae142010a80011

To access hello version v1, we need to do:

curl 34.70.249.19/v1

To access hello version v2, we need to do:

curl 34.70.249.19/v2

How to choose?

Now that we have covered different ways to get traffic into the cluster, let’s talk about when to use each type. NLB is typically used for L4 based services. In the example application deployed above, we deployed 2 NLBs since NLB cannot do routing based on URL. HTTP load balancer with ingress is used for typical L7 services. Container native load balancing is built on top of HTTP load balancer and HTTP load balancer with NEG provides better distribution and health check, so this is preferred. If Istio is used as service mesh, it makes sense to use Istio ingress gateway as similar traffic rules can be used for traffic entering the mesh as well as within the mesh. Typically, Istio is front-ended by GCP NLB. In cases where we need to front-end with GCP HTTP LB, it is also possible as described above. There are some scenarios where customer is more comfortable with third party load balancer like nginx either because of familiarity or because of the complex routing/rewrite rules that nginx supports.

In the future blog, I will talk about internal communication between services inside GKE cluster. In that blog, I will discuss clusterip, Istio service mesh, Traffic director and ILB.

References

GKE with VPN – Networking options

While working on a recent hybrid GCP plus on-premise customer architecture, we had a need to connect GKE cluster running in GCP to a service running in on-premise through a VPN. There were few unique requirements like needing to expose only a small IP range to the on-premise and having full control over the IP addresses exposed. In this blog, I will talk about the different approaches possible from a networking perspective when connecting GKE cluster to a on-premise service. Following options are covered in this blog:

  • Flexible pod addressing scheme
  • Connecting using NAT service running on VM
  • Using IP masquerading at the GKE node level

I did not explore Cloud NAT managed service as that works only with private clusters and it does not work through VPN. I have used VPC native clusters as that has become the default networking scheme and it is more straightforward to use than route-based clusters. For more information on VPC native clusters and IP aliasing, please refer to my earlier blog series here.

Requirements

Following was the high level architecture:

Architecture diagram

On-premise application is exposing the service on a specific tcp port that we needed to access from the pods running in GKE cluster. We had a need to expose only few specific GCP ip addresses to on-premise.
For this use-case, I have used VPN using dynamic routing. There is a need to open up the firewall in on-premise for the source ip addresses that are accessed from GCP. To try this example where you don’t have on-premise network, you can setup 2 VPCs and make one to simulate on-premise.

Flexible pod addressing scheme

In VPC native clusters, there are separate IP address ranges allocated for GKE nodes, pods and services. The node ip address is allocated from the VPC subnet range. There are 2 ways to allocate ip addresses to pods and services.

GKE managed secondary address

In this scheme, GKE manages secondary address ranges. When the cluster is created, GKE automatically creates 2 IP alias ranges, 1 for the pods and another for the services. The user has a choice to enter the IP address range for the pods and services or let the GKE pickup the address ranges. Following are the default, minimum and maximum subnet range sizes for the pods and services.

DefaultMinimumMaximum
Pods/14
(2^18 pod ip address)
/9
(2^23 pod ip address)
/21
(2^11 pod ip address)
Services/20
(2^12 service ip address)
/16
(2^16 service ip address)
/27
(2^5 service ip address)

There is another important parameter called number of pods per node. By default, GKE reserves a /24 block or 256 ip addresses per node. Considering ip address reuse among pods when pods gets autoscaled, 110 pods share the 256 ip addresses and so the number of pods per node is set by default to 110. This number can be user configured.

For example, taking /21 for pods, we can have a total of 2048 pods. Assuming default of 110 pods(/24 address range for pods) in each node, then we can have only a maximum of 2^(24-21) = 8 nodes. This limit is irrespective of the subnet range reserved for the nodes. If we reduce the number of ip addresses for pods per node to 64(/26 range), then we can have a maximum of 2^(26-21)=32 nodes.

User managed secondary address

For my use case, GKE managed secondary address did not help since the minimal pod ip range is /21 and the customer was not willing to expose a /21 ip range in their on-premise firewall. The customer was willing to provide a /25 or /27 ip range to the pods. We settled on the configuration below:

  • /25 range for the pods, 8 pods per node. /25 range would give us 128 pod addresses. 8 pods per node would need 16 ip address(4 bits) per node. This provided us 2^(7-4)=8 nodes maximum in the cluster.
  • /27 range for the services. It was not needed to expose the service ip range to the on-premise as service ip addresses are used more for egress from on-premise.
  • /27 range for the nodes. Even though we could have created 32 nodes with this range, we are limited to 8 nodes because of the first point above.

Following are the steps to create a cluster with user managed secondary address:

  • Create ip alias range for pods and services from the VPC section in the console
  • When creating cluster, disable option “automatically create secondary range” and select pre-created ip alias range for the pods and services.
  • Set maximum number of pods per node to 8.

Following picture shows the 2 ip alias addresses created along with the VPC subnet:

VPC subnet with 2 alias IP ranges

Following picture shows the networking section of cluster creation part where we have specified the primary and secondary ip ranges for pod and service ip addresses.

Cluster with custom IP ranges for pods and services

Connecting using NAT service running on VM

Rather than exposing individual pod ip address to the on-premise service, we can expose a single IP address using a NAT service running in GCP. With this approach, all the pod IP addresses gets translated to the single NAT IP address. We only need to expose the single NAT ip address to the on-premise firewall.

Following picture shows how the architecture would look:

Connecting to on-premise using NAT

Following are the steps needed:

  • Create NAT instance on compute engine. As I mentioned earlier, Cloud NAT managed service could not be used as its not integrated with VPN. We can either create a standalone NAT instance or HA NAT instance as described here.
    I used the following command to create NAT instance:
 gcloud compute instances create nat-gateway --network gcp-vpc \
     --subnet subnet-a \
     --can-ip-forward \
     --zone us-east1-b \
     --image-family debian-9 \
     --image-project debian-cloud \
     --tags nat 
  • Login to the NAT instance and setup iptables rules to setup the NAT.
sudo sysctl -w net.ipv4.ip_forward=1
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADEsudo sysctl -w net.ipv4.ip_forward=1
  • Create GKE cluster with network tag. I was not able to use network tag for creating GKE cluster from the console and the only way is to to use gcloud CLI. The network tag is needed so that the route entry to forward to NAT applies only to the instances that are part of the GKE cluster.
gcloud container clusters create mygcpr-cluster --tags use-nat \
 --zone us-east1-b \
 --network gcp-vpc --subnetwork subnet-a --enable-ip-alias 
  • Create route entry to forward traffic from GKE cluster destined to on-premise service through the NAT gateway. Please make sure that the priority of this route entry supersedes other route entries. (Please note that the priority increase is in reverse, lower number means higher priority)
gcloud compute routes create nat-vpn-route1 \
     --network gcp-vpc \
     --destination-range 192.168.0.0/16 \
     --next-hop-instance nat-gateway \
     --next-hop-instance-zone us-east1-b \
     --tags use-nat --priority 50 

To test this, I created a pod on the GKE cluster and tried to ping a on-premise instance and with “tcpdump” verified that the source ip address of the ping request is not a pod IP but the NAT gateway IP address.

Using masquerading at the GKE node

The alternative to use NAT gateway is to do masquerading at the node level. What this will do is to translate the pod ip address to the node ip address when packets egress from the GKE node. With this case, it is needed to expose only the node IP addresses to on-premise and it’s not needed to expose pod ip addresses. There is a masquerading agent that runs in each GKE node to achieve this.

Following are the steps to setup masquerading:

  • The 2 basic requirements for masquerading agent to run in each node is to enable network control policy and have the pod ip address outside the RFC 1918 ip address range 10.0.0.0/8. Network control policy can be enabled when creating GKE cluster.
  • By default, masquerading is setup to avoid masquerading rfc 1918 addresses(10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12) as well as link local address(169.254.0.0/16). This can be overridden by the use of config file at the cluster level.
  • When there is a change in config file, the agent at the node periodically reads the config level and updates the cluster. This interval can also be configured.

Following is the config file I used:

nonMasqueradeCIDRs:
resyncInterval: 60s
masqLinkLocal: true

In my case, since I used rfc 1918 address for my pod, I wanted those ip addresses to be also masqueraded. The config file works in a negative direction w.r.to specifying ip address. Since I have not specified any ip address, all rfc1918 address will get masqueraded with this configuration. You can add specific ip address that you do not want masquerading to happen.

To apply the config, we can use the following kubectl command:

kubectl create configmap ip-masq-agent --from-file config --namespace kube-system

In the customer scenario, we have gone with user managed secondary address and that worked fine. The other 2 options I described above would also have worked. We hit few other issues with GCP VPN working with Cisco ASA which we were able to eventually overcome, more details on a different blog…

References

Migrate for Anthos

Anthos is a hybrid/multi-cloud platform from GCP. Anthos allows customers to build their application once and run in GCP or in any other private or public cloud. Anthos unifies the control, management and data plane when running a container based application across on-premise and multiple clouds. Anthos was launched in last year’s NEXT18 conference and made generally available recently. VMWare integration is available now, integration with other clouds is planned in the roadmap. 1 of the components of Anthos is called “Migrate for Anthos” which allows direct migration of VM into Containers running on GKE. This blog will focus on “Migrate for Anthos”. I will cover the need for “Migrate for Anthos”, platform architecture and move a simple application from GCP VM into a GKE container. Please note that “Migrate for Anthos” is in BETA now and it is not ready for production.

Need for “Migrate for Anthos”

Modern application development typically use microservices and containers to improve the application’s agility. Containers, Docker and Kubernetes provides the benefits of agility and portability to applications. It is easier to build a greenfield application using microservices and containers. What should we do with applications that are already existing as monoliths? Enterprises typically spend a lot of effort in modernizing their applications which could typically mean a long journey for a lot of them. What if we had an automatic way to convert VMs to Containers. Does this sound like magic? Yes, “Migrate for Anthos”(earlier called as V2K) does quite a bit of magic underneath to automatically convert VMs to Containers.

Following diagram shows the different approaches that enterprises take in their modernization and cloud journey. The X-axis shows classic and cloud native applications, Y-axis show on-prem and cloud.

Picture borrowed from “Migrate for Anthos” presentations to customers

Migrate and Modernize:
In this approach, we first do a lift and shift of the VMs to cloud and we then modernize the application to Containers. Velostrata is GCP’s tool to do lift and shift VM migration.

Modernize and Migrate:
In this approach, we first modernize the application on-prem and then migrate the modernized application to the cloud. If the on-prem application is modernized using Docker and Kubernetes, then it can be migrated easily to GKE.

Migrate for Anthos:
Both the above approaches are 2 step approaches. With “Migrate for Anthos”, migration and modernization happens in the same step. The modernization is not fully complete in this approach. Even though the VM is migrated to containers, the monolith application is not broken down into microservices.

You might be wondering why migrate to containers if the monolith application is not converted to microservices. There are some basic advantages that we get with containerizing the monolith application and that includes portability, better packing and integration with other container services like Istio. As a next step, the monolith container application can be broken down into microservices. There are some roadmap items in “Migrate for Anthos” that will facilitate this.

For some legacy applications, it might not make sense to break it down into microservices and they can live as a single monolithic container for a long time using this approach. In a typical VM environment, we need to worry about patching, security, networking, monitoring, logging and other infrastructure components which comes out of the box with gke and kubernetes after doing the migration to Containers. This is another advantage of “Migrate for Anthos”.

“Migrate for Anthos” Architecture

“Migrate for Anthos” converts the source VMs to system containers running in GKE. System containers when compared to application containers run multiple processes and applications in a single container. Initial support for “Migrate for Anthos” is available for VMWare VMs or GCE VMs as source. Following changes are done to convert VM to Container.

  • VM operating system is converted into kernel supported by GKE.
  • VM system disks are mounted inside container using persistent volume(PV) and stateful dataset.
  • Networking, logging and monitoring use GKE constructs.
  • Applications running inside VM using systemd scripts run in container user space.
  • During the initial migration phase, storage is streamed to container using CSI. The storage can then be migrated to any storage class supported by GKE.

Following are the components of “Migrate for Anthos”:

  • “Migrate for Compute Engine” (formerly Velostrata) – Velostrata team has enhanced the VM migration tool to also convert VM to containers and then do the migration. The fundamentals of Velostrata including agentless and streaming technologies still remain the same for “MIgrate for Anthos”. Velostrata manager and cloud extensions needs to be installed in GCP environment to do the migration. Because Velostrata uses streaming technology, the complete VM storage need not be migrated to run the container in GKE, this speeds up the entire migration process.
  • GKE cluster – “Migrate for Anthos” will run in the GKE cluster as an application containers and can be installed from the GKE marketplace.
  • Source VM – Source VM can be in GCE or in VMWare environment. In VMWare environment, “Migrate for Anthos” component needs to be installed in VMWare as well.

Following picture shows the different components in the VM and how it will look when they are migrated.

Picture borrowed from “Migrate for Anthos” presentations to customers

The second column in the picture is what exists currently when the VM is migrated to GKE container. The only option currently is to do vertical scaling when the capacity is reached. The yellow components leverage kubernetes and the green components run inside containers. The third column in the picture is how the future would look like where we can have multiple containers with horizontal pod autoscaling.

“Migrate for Anthos” hands-on

I did a migration of GCE VM to Container running in GKE using “Migrate for Anthos”. The GCE VM has a base Debian OS with nginx web server installed.

Following are a summary of the steps to do the migration:

  • Create service account for Velostrata manager and cloud extension.
  • Install Velostrata manager from marketplace with the service accounts created in previous step.
  • Create cloud extension from Velostrata manager.
  • Create GKE cluster.
  • Install “Migrate for Anthos” from GKE marketplace on the GKE cluster created in previous step.
  • Create source VM in GCE and install needed application in the source VM.
  • Create YAML configuration file(persistent volume, persistent volume claim, stateful dataset) from the source VM.
  • Stop source VM.
  • Apply the YAML configuration on top of the GKE cluster.
  • Create Kubernetes service configuration files to expose the container services.

Service account creation:
I created service accounts for Velostrata manager and cloud extension using steps listed here. I used the single project configuration example.

Velostrata manager installation:
I used the steps listed here to install Velostrata manager from marketplace and to do the initial configuration. Velostrata manager provides the management interface for Velostrata where all migrations can be managed. I have used the “default” network for my setup. We need to remember the api password for future steps.

Create cloud extension:
I used the steps here to install cloud extension from Velostrata manager. The cloud storage takes care of storage caching in GCP.

Create GKE cluster:
I used the steps here to create GKE cluster. GKE nodes and source VM needs to be in the same zone. Because of this restriction, it is better to create a regional cluster so that we have a GKE node in all the regions. When I first tried the migration, I got an error like below:

Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Normal   NotTriggerScaleUp  1m (x300 over 51m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added):
  Warning  FailedScheduling   1m (x70 over 51m)   default-scheduler   0/9 nodes are available: 9 node(s) had volume node affinity conflict.

Based on discussion with Velostrata engineering team, I understood that the problem lies with not able to schedule the pod since none of the GKE nodes are in the same zone as source VM. In my case, I created a regional cluster in us-central-1, but it created nodes only in 3 zones instead of the 4 zones available in us-central-1. My source VM unfortunately resided in the 4th zone where GKE node is not present. This looks like a bug in GKE regional cluster creation where GKE nodes are not created in all zones. After I created the source VM in 1 of the zones where GKE nodes were present, the problem got resolved.

Install “MIgrate for Anthos”:
I used the steps here to install “Migrate for Anthos” in GKE cluster. There is a need to mention Velostrata manager IP address and cloud extension name that we created in the previous steps.

Create source VM:
I created a debian VM and installed nginx webserver.

sudo apt-get update 
sudo apt-get install -y nginx 
sudo service nginx start 
sudo sed -i -- 's/nginx/Google Cloud Platform - '"\$HOSTNAME"'/' /var/www/html/index.nginx-debian.html

Create YAML configuration from source VM:
I used the steps here. This is the command I used to create the kubernetes configuration. The configuration contains details to create persistent volume claim(PVC), persistent volume(PV) and stateful dataset.

python3 /google/migrate/anthos/gce-to-gke/clone_vm_disks.py \
-p sreemakam-anthos `#Your GCP project name` \
-z us-central1-b `#GCP Zone that hosts the VM, for example us-central1-a` \
-i webserver `#Name of the VM. For example, myapp-vm` \
-A webserver `#Name of workload that will be launched in GKE` \
-o webserver.yaml `#Filename of resulting YAML configuration`

Apply YAML configuration:
Before applying the YAML config, we need to stop the source VM. This will create a consistent snapshot. I used the following command as in this link to create persistent volume claim(PVC), persistent volume(PV) and stateful dataset. The volume would use the GCE persistent disk.

kubectl apply -f webserver.yaml

Create Kubernetes service configuration:
To expose the container service running on port 80, we can create a Kubernetes service mentioned below.

kind: Service
 apiVersion: v1
 metadata:
   name: webserver
 spec:
   type: LoadBalancer
   selector:
     app: webserver
   ports:
 name: http
 protocol: TCP
 port: 80
 targetPort: 80 

After applying the service, it will create a load balancer with an external IP address using which we can access the nginx webservice .

The above example shows a migration of a simple VM to Container. The link here talks about how to migrate a two tier application involving application and database. Examples of applications that can be migrated includes web applications, middleware frameworks and any applications built on linux. Supported operating systems are mentioned here.

I want to convey special thanks to Alon Pildus from “Migrate for Anthos” team who helped to review and suggest improvements to this blog.

References