Category Archives: GKE

GKE with VPN – Networking options

While working on a recent hybrid GCP plus on-premise customer architecture, we had a need to connect GKE cluster running in GCP to a service running in on-premise through a VPN. There were few unique requirements like needing to expose only a small IP range to the on-premise and having full control over the IP addresses exposed. In this blog, I will talk about the different approaches possible from a networking perspective when connecting GKE cluster to a on-premise service. Following options are covered in this blog:

  • Flexible pod addressing scheme
  • Connecting using NAT service running on VM
  • Using IP masquerading at the GKE node level

I did not explore Cloud NAT managed service as that works only with private clusters and it does not work through VPN. I have used VPC native clusters as that has become the default networking scheme and it is more straightforward to use than route-based clusters. For more information on VPC native clusters and IP aliasing, please refer to my earlier blog series here.

Requirements

Following was the high level architecture:

Architecture diagram

On-premise application is exposing the service on a specific tcp port that we needed to access from the pods running in GKE cluster. We had a need to expose only few specific GCP ip addresses to on-premise.
For this use-case, I have used VPN using dynamic routing. There is a need to open up the firewall in on-premise for the source ip addresses that are accessed from GCP. To try this example where you don’t have on-premise network, you can setup 2 VPCs and make one to simulate on-premise.

Flexible pod addressing scheme

In VPC native clusters, there are separate IP address ranges allocated for GKE nodes, pods and services. The node ip address is allocated from the VPC subnet range. There are 2 ways to allocate ip addresses to pods and services.

GKE managed secondary address

In this scheme, GKE manages secondary address ranges. When the cluster is created, GKE automatically creates 2 IP alias ranges, 1 for the pods and another for the services. The user has a choice to enter the IP address range for the pods and services or let the GKE pickup the address ranges. Following are the default, minimum and maximum subnet range sizes for the pods and services.

DefaultMinimumMaximum
Pods/14
(2^18 pod ip address)
/9
(2^23 pod ip address)
/21
(2^11 pod ip address)
Services/20
(2^12 service ip address)
/16
(2^16 service ip address)
/27
(2^5 service ip address)

There is another important parameter called number of pods per node. By default, GKE reserves a /24 block or 256 ip addresses per node. Considering ip address reuse among pods when pods gets autoscaled, 110 pods share the 256 ip addresses and so the number of pods per node is set by default to 110. This number can be user configured.

For example, taking /21 for pods, we can have a total of 2048 pods. Assuming default of 110 pods(/24 address range for pods) in each node, then we can have only a maximum of 2^(24-21) = 8 nodes. This limit is irrespective of the subnet range reserved for the nodes. If we reduce the number of ip addresses for pods per node to 64(/26 range), then we can have a maximum of 2^(26-21)=32 nodes.

User managed secondary address

For my use case, GKE managed secondary address did not help since the minimal pod ip range is /21 and the customer was not willing to expose a /21 ip range in their on-premise firewall. The customer was willing to provide a /25 or /27 ip range to the pods. We settled on the configuration below:

  • /25 range for the pods, 8 pods per node. /25 range would give us 128 pod addresses. 8 pods per node would need 16 ip address(4 bits) per node. This provided us 2^(7-4)=8 nodes maximum in the cluster.
  • /27 range for the services. It was not needed to expose the service ip range to the on-premise as service ip addresses are used more for egress from on-premise.
  • /27 range for the nodes. Even though we could have created 32 nodes with this range, we are limited to 8 nodes because of the first point above.

Following are the steps to create a cluster with user managed secondary address:

  • Create ip alias range for pods and services from the VPC section in the console
  • When creating cluster, disable option “automatically create secondary range” and select pre-created ip alias range for the pods and services.
  • Set maximum number of pods per node to 8.

Following picture shows the 2 ip alias addresses created along with the VPC subnet:

VPC subnet with 2 alias IP ranges

Following picture shows the networking section of cluster creation part where we have specified the primary and secondary ip ranges for pod and service ip addresses.

Cluster with custom IP ranges for pods and services

Connecting using NAT service running on VM

Rather than exposing individual pod ip address to the on-premise service, we can expose a single IP address using a NAT service running in GCP. With this approach, all the pod IP addresses gets translated to the single NAT IP address. We only need to expose the single NAT ip address to the on-premise firewall.

Following picture shows how the architecture would look:

Connecting to on-premise using NAT

Following are the steps needed:

  • Create NAT instance on compute engine. As I mentioned earlier, Cloud NAT managed service could not be used as its not integrated with VPN. We can either create a standalone NAT instance or HA NAT instance as described here.
    I used the following command to create NAT instance:
 gcloud compute instances create nat-gateway --network gcp-vpc \
     --subnet subnet-a \
     --can-ip-forward \
     --zone us-east1-b \
     --image-family debian-9 \
     --image-project debian-cloud \
     --tags nat 
  • Login to the NAT instance and setup iptables rules to setup the NAT.
sudo sysctl -w net.ipv4.ip_forward=1
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADEsudo sysctl -w net.ipv4.ip_forward=1
  • Create GKE cluster with network tag. I was not able to use network tag for creating GKE cluster from the console and the only way is to to use gcloud CLI. The network tag is needed so that the route entry to forward to NAT applies only to the instances that are part of the GKE cluster.
gcloud container clusters create mygcpr-cluster --tags use-nat \
 --zone us-east1-b \
 --network gcp-vpc --subnetwork subnet-a --enable-ip-alias 
  • Create route entry to forward traffic from GKE cluster destined to on-premise service through the NAT gateway. Please make sure that the priority of this route entry supersedes other route entries. (Please note that the priority increase is in reverse, lower number means higher priority)
gcloud compute routes create nat-vpn-route1 \
     --network gcp-vpc \
     --destination-range 192.168.0.0/16 \
     --next-hop-instance nat-gateway \
     --next-hop-instance-zone us-east1-b \
     --tags use-nat --priority 50 

To test this, I created a pod on the GKE cluster and tried to ping a on-premise instance and with “tcpdump” verified that the source ip address of the ping request is not a pod IP but the NAT gateway IP address.

Using masquerading at the GKE node

The alternative to use NAT gateway is to do masquerading at the node level. What this will do is to translate the pod ip address to the node ip address when packets egress from the GKE node. With this case, it is needed to expose only the node IP addresses to on-premise and it’s not needed to expose pod ip addresses. There is a masquerading agent that runs in each GKE node to achieve this.

Following are the steps to setup masquerading:

  • The 2 basic requirements for masquerading agent to run in each node is to enable network control policy and have the pod ip address outside the RFC 1918 ip address range 10.0.0.0/8. Network control policy can be enabled when creating GKE cluster.
  • By default, masquerading is setup to avoid masquerading rfc 1918 addresses(10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12) as well as link local address(169.254.0.0/16). This can be overridden by the use of config file at the cluster level.
  • When there is a change in config file, the agent at the node periodically reads the config level and updates the cluster. This interval can also be configured.

Following is the config file I used:

nonMasqueradeCIDRs:
resyncInterval: 60s
masqLinkLocal: true

In my case, since I used rfc 1918 address for my pod, I wanted those ip addresses to be also masqueraded. The config file works in a negative direction w.r.to specifying ip address. Since I have not specified any ip address, all rfc1918 address will get masqueraded with this configuration. You can add specific ip address that you do not want masquerading to happen.

To apply the config, we can use the following kubectl command:

kubectl create configmap ip-masq-agent --from-file config --namespace kube-system

In the customer scenario, we have gone with user managed secondary address and that worked fine. The other 2 options I described above would also have worked. We hit few other issues with GCP VPN working with Cisco ASA which we were able to eventually overcome, more details on a different blog…

References

Migrate for Anthos

Anthos is a hybrid/multi-cloud platform from GCP. Anthos allows customers to build their application once and run in GCP or in any other private or public cloud. Anthos unifies the control, management and data plane when running a container based application across on-premise and multiple clouds. Anthos was launched in last year’s NEXT18 conference and made generally available recently. VMWare integration is available now, integration with other clouds is planned in the roadmap. 1 of the components of Anthos is called “Migrate for Anthos” which allows direct migration of VM into Containers running on GKE. This blog will focus on “Migrate for Anthos”. I will cover the need for “Migrate for Anthos”, platform architecture and move a simple application from GCP VM into a GKE container. Please note that “Migrate for Anthos” is in BETA now and it is not ready for production.

Need for “Migrate for Anthos”

Modern application development typically use microservices and containers to improve the application’s agility. Containers, Docker and Kubernetes provides the benefits of agility and portability to applications. It is easier to build a greenfield application using microservices and containers. What should we do with applications that are already existing as monoliths? Enterprises typically spend a lot of effort in modernizing their applications which could typically mean a long journey for a lot of them. What if we had an automatic way to convert VMs to Containers. Does this sound like magic? Yes, “Migrate for Anthos”(earlier called as V2K) does quite a bit of magic underneath to automatically convert VMs to Containers.

Following diagram shows the different approaches that enterprises take in their modernization and cloud journey. The X-axis shows classic and cloud native applications, Y-axis show on-prem and cloud.

Picture borrowed from “Migrate for Anthos” presentations to customers

Migrate and Modernize:
In this approach, we first do a lift and shift of the VMs to cloud and we then modernize the application to Containers. Velostrata is GCP’s tool to do lift and shift VM migration.

Modernize and Migrate:
In this approach, we first modernize the application on-prem and then migrate the modernized application to the cloud. If the on-prem application is modernized using Docker and Kubernetes, then it can be migrated easily to GKE.

Migrate for Anthos:
Both the above approaches are 2 step approaches. With “Migrate for Anthos”, migration and modernization happens in the same step. The modernization is not fully complete in this approach. Even though the VM is migrated to containers, the monolith application is not broken down into microservices.

You might be wondering why migrate to containers if the monolith application is not converted to microservices. There are some basic advantages that we get with containerizing the monolith application and that includes portability, better packing and integration with other container services like Istio. As a next step, the monolith container application can be broken down into microservices. There are some roadmap items in “Migrate for Anthos” that will facilitate this.

For some legacy applications, it might not make sense to break it down into microservices and they can live as a single monolithic container for a long time using this approach. In a typical VM environment, we need to worry about patching, security, networking, monitoring, logging and other infrastructure components which comes out of the box with gke and kubernetes after doing the migration to Containers. This is another advantage of “Migrate for Anthos”.

“Migrate for Anthos” Architecture

“Migrate for Anthos” converts the source VMs to system containers running in GKE. System containers when compared to application containers run multiple processes and applications in a single container. Initial support for “Migrate for Anthos” is available for VMWare VMs or GCE VMs as source. Following changes are done to convert VM to Container.

  • VM operating system is converted into kernel supported by GKE.
  • VM system disks are mounted inside container using persistent volume(PV) and stateful dataset.
  • Networking, logging and monitoring use GKE constructs.
  • Applications running inside VM using systemd scripts run in container user space.
  • During the initial migration phase, storage is streamed to container using CSI. The storage can then be migrated to any storage class supported by GKE.

Following are the components of “Migrate for Anthos”:

  • “Migrate for Compute Engine” (formerly Velostrata) – Velostrata team has enhanced the VM migration tool to also convert VM to containers and then do the migration. The fundamentals of Velostrata including agentless and streaming technologies still remain the same for “MIgrate for Anthos”. Velostrata manager and cloud extensions needs to be installed in GCP environment to do the migration. Because Velostrata uses streaming technology, the complete VM storage need not be migrated to run the container in GKE, this speeds up the entire migration process.
  • GKE cluster – “Migrate for Anthos” will run in the GKE cluster as an application containers and can be installed from the GKE marketplace.
  • Source VM – Source VM can be in GCE or in VMWare environment. In VMWare environment, “Migrate for Anthos” component needs to be installed in VMWare as well.

Following picture shows the different components in the VM and how it will look when they are migrated.

Picture borrowed from “Migrate for Anthos” presentations to customers

The second column in the picture is what exists currently when the VM is migrated to GKE container. The only option currently is to do vertical scaling when the capacity is reached. The yellow components leverage kubernetes and the green components run inside containers. The third column in the picture is how the future would look like where we can have multiple containers with horizontal pod autoscaling.

“Migrate for Anthos” hands-on

I did a migration of GCE VM to Container running in GKE using “Migrate for Anthos”. The GCE VM has a base Debian OS with nginx web server installed.

Following are a summary of the steps to do the migration:

  • Create service account for Velostrata manager and cloud extension.
  • Install Velostrata manager from marketplace with the service accounts created in previous step.
  • Create cloud extension from Velostrata manager.
  • Create GKE cluster.
  • Install “Migrate for Anthos” from GKE marketplace on the GKE cluster created in previous step.
  • Create source VM in GCE and install needed application in the source VM.
  • Create YAML configuration file(persistent volume, persistent volume claim, stateful dataset) from the source VM.
  • Stop source VM.
  • Apply the YAML configuration on top of the GKE cluster.
  • Create Kubernetes service configuration files to expose the container services.

Service account creation:
I created service accounts for Velostrata manager and cloud extension using steps listed here. I used the single project configuration example.

Velostrata manager installation:
I used the steps listed here to install Velostrata manager from marketplace and to do the initial configuration. Velostrata manager provides the management interface for Velostrata where all migrations can be managed. I have used the “default” network for my setup. We need to remember the api password for future steps.

Create cloud extension:
I used the steps here to install cloud extension from Velostrata manager. The cloud storage takes care of storage caching in GCP.

Create GKE cluster:
I used the steps here to create GKE cluster. GKE nodes and source VM needs to be in the same zone. Because of this restriction, it is better to create a regional cluster so that we have a GKE node in all the regions. When I first tried the migration, I got an error like below:

Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Normal   NotTriggerScaleUp  1m (x300 over 51m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added):
  Warning  FailedScheduling   1m (x70 over 51m)   default-scheduler   0/9 nodes are available: 9 node(s) had volume node affinity conflict.

Based on discussion with Velostrata engineering team, I understood that the problem lies with not able to schedule the pod since none of the GKE nodes are in the same zone as source VM. In my case, I created a regional cluster in us-central-1, but it created nodes only in 3 zones instead of the 4 zones available in us-central-1. My source VM unfortunately resided in the 4th zone where GKE node is not present. This looks like a bug in GKE regional cluster creation where GKE nodes are not created in all zones. After I created the source VM in 1 of the zones where GKE nodes were present, the problem got resolved.

Install “MIgrate for Anthos”:
I used the steps here to install “Migrate for Anthos” in GKE cluster. There is a need to mention Velostrata manager IP address and cloud extension name that we created in the previous steps.

Create source VM:
I created a debian VM and installed nginx webserver.

sudo apt-get update 
sudo apt-get install -y nginx 
sudo service nginx start 
sudo sed -i -- 's/nginx/Google Cloud Platform - '"\$HOSTNAME"'/' /var/www/html/index.nginx-debian.html

Create YAML configuration from source VM:
I used the steps here. This is the command I used to create the kubernetes configuration. The configuration contains details to create persistent volume claim(PVC), persistent volume(PV) and stateful dataset.

python3 /google/migrate/anthos/gce-to-gke/clone_vm_disks.py \
-p sreemakam-anthos `#Your GCP project name` \
-z us-central1-b `#GCP Zone that hosts the VM, for example us-central1-a` \
-i webserver `#Name of the VM. For example, myapp-vm` \
-A webserver `#Name of workload that will be launched in GKE` \
-o webserver.yaml `#Filename of resulting YAML configuration`

Apply YAML configuration:
Before applying the YAML config, we need to stop the source VM. This will create a consistent snapshot. I used the following command as in this link to create persistent volume claim(PVC), persistent volume(PV) and stateful dataset. The volume would use the GCE persistent disk.

kubectl apply -f webserver.yaml

Create Kubernetes service configuration:
To expose the container service running on port 80, we can create a Kubernetes service mentioned below.

kind: Service
 apiVersion: v1
 metadata:
   name: webserver
 spec:
   type: LoadBalancer
   selector:
     app: webserver
   ports:
 name: http
 protocol: TCP
 port: 80
 targetPort: 80 

After applying the service, it will create a load balancer with an external IP address using which we can access the nginx webservice .

The above example shows a migration of a simple VM to Container. The link here talks about how to migrate a two tier application involving application and database. Examples of applications that can be migrated includes web applications, middleware frameworks and any applications built on linux. Supported operating systems are mentioned here.

I want to convey special thanks to Alon Pildus from “Migrate for Anthos” team who helped to review and suggest improvements to this blog.

References

VPC native GKE clusters – Container native LB

This blog is last in the series on VPC native GKE clusters. In this blog, I will cover Network endpoint groups(NEG) and Container native load balancing. For the first part on GKE ip addressing, please refer here and the second part on VPC native clusters, please refer here.

Container load balancing and Network endpoint groups(NEG)

Following diagram shows how default Container load balancing works(without NEG). This is applicable to both http and network load balancer.

Traffic flows like this:
Load balancer -> VM -> iptables -> Pods

Load balancer distributes traffic to instances in instance group. iptables rules in each node further distributes the traffic among pods. If the pod is not in the same node, packets get routed to another node. This causes a double hop problem as illustrated by the path below for a specific case:

Load balancer -> VM1(iptables) -> VM2(Pod3)

Double hop increases latency. To avoid double hop, Kubernetes provides a service annotation “onlyLocal” which restricts the iptables in each node to distribute traffic to only the pods in that specific node. This is how the path would look like for different flows:

Load balancer -> VM1(iptables) -> Pod1
Load balancer -> VM2(iptables) -> Pod3
Load balancer -> VM2(iptables) -> Pod4

The disadvantage with “onlyLocal” is that the traffic distribution between pods will not be equal. For example, in the above example, pod1 would get more traffic compared to pods 3 and 4.

Network endpoint groups(NEG):

NEG can be a backend for http load balancer similar to managed instance groups. NEGs are zonal resources and each endpoint in NEG is composed of a combination of IP address and port. IP address can be primary IP address of VM or alias IP addresses.

Container load balancing with NEG:

NEG allows Containers to be represented as first class citizens from the load balancer perspective. Before NEG, Load balancers were not aware of containers. With NEG, pod IP address and port are exposed as endpoints to Load balancer and this provides the following advantages:

  • Compared to regular load balancing, this approach provides better performance because it prevents double hop. Compared to “onlyLocal” approach, this provides an equal traffic distribution between the pods.
  • This approach provides greater visibility and troubleshooting as the iptables layer is not there from load balancing perspective.

Container load balancing feature is in beta currently. It is supported only with http load balancer.

Following diagram illustrates the block diagram with Container native load balancing:

Traffic flow would look like this:
Load balancer-> NEG -> Pod1, Pod 3, Pod 4

To compare between the 3 load balancing approaches(native load balancing, native load balancing with onlyLocal, Container based load balancing with NEG), I created a Kubernetes cluster with 3 nodes and with IP aliasing enabled. I used the sample application illustrated here and deployed it with all 3 load balancing approaches. The goal was to highlight that native load balancing and container based load balancing with NEG does equal distribution of traffic between pods while “onlyLocal” approach does an uneven distribution.

Container based load balancing:

As a first step, I created a GKE cluster with IP alias enabled.

gcloud container clusters create cluster-vpc --zone us-central1-b --enable-ip-alias 

The Service’s annotation, cloud.google.com/neg: '{"ingress": true}', enables container-native load balancing. 

Then, create deployment, service and ingress. Create 4 replicas so that 1 node has more than 1 pod while other nodes has 1 pod each. This application prints the details of node serving the traffic.

kubectl apply -f neg-demo-app.yaml
kubectl apply -f neg-demo-svc.yaml
kubectl apply -f neg-demo-ing.yaml
kubectl scale deployment neg-demo-app --replicas 4

Following command shows the pod distribution. As we can see, 2 pods are allocated in 1 node, other 2 are distributed between 2 other nodes.

 neg-demo-app-7bbd69746c-bwcl4    1/1       Running   0          1d        10.64.0.6   gke-cluster-vpc-default-pool-db114d00-1l9m
 neg-demo-app-7bbd69746c-drvhf    1/1       Running   0          1d        10.64.1.7   gke-cluster-vpc-default-pool-db114d00-2mz8
 neg-demo-app-7bbd69746c-qv8qx    1/1       Running   0          1d        10.64.2.7   gke-cluster-vpc-default-pool-db114d00-j9km
 neg-demo-app-7bbd69746c-tv5ll    1/1       Running   0          1d        10.64.0.5   gke-cluster-vpc-default-pool-db114d00-1l9m 

Following diagram shows how the pods are allocated between nodes:

Native load balancing with “onlyLocal”:

For this case, we need to modify the service to remove “neg” annotation and add annotation for “onlyLocal” as shown below.

Following is the service yaml for “onlyLocal” configuration:

 apiVersion: v1
 kind: Service
 metadata:
   name: neg-demo-svc # Name of Service
 spec: # Service's specification
   type: NodePort
   externalTrafficPolicy: Local
   selector:
     run: neg-demo-app # Selects Pods labelled run: neg-demo-app
   ports:
   - port: 80 # Service's port
     protocol: TCP
     targetPort: 9376 

Native load balancing:

For this case, we need to remove “neg” annotation as well as “onlyLocal” annotation.

Traffic test:

For all 3 approaches above, I sent a request to load balancer IP 100 times and measured the distribution of traffic between the pods.

 for ((i=1;i<=10;i++)); do curl "34.96.89.199"; done 

What I observed is that in native load balancing and container based load balancing, traffic distribution between pods are equal. Each pod receives approximately 25% of the trafffic. In “onlyLocal” approach, I see that only 1/3 of the traffic is reaching 1 of the nodes that has both the pod scheduled. The traffic distribution looks like(pod1-33%, pod2-33%, pod3-17%, pod4-16%) with pod3 and pod4 scheduled on same node.

Between the 3 approaches mentioned above, Container load balancing with NEG provides best performance, distribution and visibility.

References:

VPC native GKE clusters – IP aliasing

This blog is second in the series on VPC native GKE clusters. In this blog, I will cover overview of IP aliasing, IP alias creation options and creating VPC native clusters using alias IPs. For the first blog on GKE ip addressing, please refer here.

Overview
IP aliases allows a single VM to have multiple internal IP addresses. These addresses are not the same as having multiple interfaces with each interface having a different IP address. Each of the multiple internal IP address can be used to allocate it for different services running in the VM. When the node is running containers, alias IPs can be used to allocate it to Container pods. GCP VPC network is aware of alias IPs, so the routing is taken care by VPC. Alias IP has significant advantages with GKE Containers since Containers have pod and service IP to manage in addition to the node IP and IP aliasing makes sure that these addresses are native to VPC allowing a tight integration with GCP services.

Alias IP Advantages

Alias IPs provides many advantages in the Container space as having VPC aware separate address ranges carved for pods and services gives better flexibility.

  • Alias IP addresses are VPC aware and routing is taken care by VPC. Without IP aliasing, pod routing was responsibility of the container orchestrator and orchestrator adds these routes manually. With IP aliasing, pod routing is taken care by VPC itself and there is no need to add manual routes to reach pod IP.
  • Without IP aliases, anti-spoofing checks had to be disabled in nodes that are part of GKE cluster. Traffic from the pods will have the source IP as pod IP and since VPC is not aware of pod IP, anti-spoofing checks needs to be disabled to allow this traffic to pass-thru. With IP aliases, VPC is aware of pod IPs so we can enable the anti-spoofing check. This prevents traffic with arbitrary source IPs to originate from the node.
  • IP aliasing prevents IP address conflicts between node IP address and cluster IP address since VPC is aware of both these addresses.
  • IP aliases helps in scenarios where its needed to reach the pod but not the node from external world. Since VPC is aware of alias address, we can setup different BGP and firewall rules between node and pod ip addresses.

Creating Alias IP

There are 3 ways to create Alias IP.

  1. When creating VM, specify alias IP address that will be used by VM.
  2. When creating subnet, create primary and secondary range or more ranges. During VM creation, use the alias IP ranges to create alias IP. Alias IP can be created for the subnets in either “auto” or “custom” network.
  3. When GKE cluster is created with ip aliasing enabled, 2 subnet secondary ranges are created implicitly, 1 for pod IP and another for service IP. Containers pod and services gets ip allocated from this range. This approach is used by GKE.

Creating Alias IP – Approach 1:

In this approach, we will specify alias IP address when creating VM. The steps are mentioned below.

Create network:

gcloud compute networks create aliasnet \
    --subnet-mode=auto

Create instance with alias ip address from the subnet:

 
 gcloud compute instances create vm5     --zone us-central1-b     --network-interface "subnet=s1,aliases=10.65.61.128/28" 

Instance network detail:

Below output shows internal IP, external IP and alias IP.

Creating Alias IP – Approach 2:

In this approach, we will first create primary and secondary range in the subnet and specify IP alias within the primary and secondary range when creating the VM.

Create network:

gcloud compute networks create aliasnet \
    --subnet-mode=auto

Create subnets with primary and secondary range:

gcloud compute networks subnets create s1 \
    --network  aliasnet \
    --region us-central1 \
    --range 10.65.61.0/24 \
    --secondary-range range1=172.16.0.0/16, range2=10.0.0.0/16

Create instance with ip address from primary and secondary range:
For some reason, I could not get the CLI command to work and I was able to do this only from console. Following is the VM created with the 2 alias IPs(172.16.1.0/24, 10.0.0.1/32). These alias IPs are within the primary and secondary range specified in previous command.

Creating Alias IP – Approach 3:

In this approach, we will create alias IP for Containers.

Create cluster with IP aliasing enabled:

 gcloud container clusters create cluster-vpc --zone us-central1-b --enable-ip-alias 

Because we had enabled IP alias and we have used the default network, the above command automatically creates 2 secondary ranges as shown below. “10.64.0.0/14” is the first secondary range and “10.0.0.0/20” is the second secondary range. These ranges will be used for cluster IP range and service IP range.

We also have the option of manually specifying cluster IP range and service IP range like below:

 gcloud container clusters create cluster-vpc --zone us-central1-b --enable-ip-alias --cluster-ipv4-cidr=<clusteriprange> --services-ipv4-cidr=<serviceiprange>

The above command will create secondary ranges on the subnet using the addresses specified and use them as alias IP address for cluster and service IPs.

Following command shows the ip ranges for “default” network after creating a cluster with IP aliasing enabled. “10.128.0.0/20” is the subnet range for internal ip, “10.64.0.0./14” is the cluster ip range, “10.0.0.0/20” is the service IP range.

 gcloud compute networks subnets describe default | grep -i ipcidrrange
 ipCidrRange: 10.128.0.0/20
 - ipCidrRange: 10.64.0.0/14
 - ipCidrRange: 10.0.0.0/20 

From the cluster, let’s check the cluster IP range and service IP range. This will match the default ranges.

 gcloud container clusters describe cluster-vpc | grep -e servicesIpv4Cidr -e clusterIpv4Cidr
 clusterIpv4Cidr: 10.64.0.0/14
   servicesIpv4Cidr: 10.0.0.0/20 

Let’s deploy a sample application:

kubectl run web --image=gcr.io/google-samples/hello-app:1.0 --replicas=3 --port=8080
kubectl expose deployment web --target-port=8080 --type=NodePort

Let’s look at pod and service IP addresses:

 kubectl get pods -o wide
 NAME                   READY     STATUS    RESTARTS   AGE       IP          NODE
 web-6d695d4565-g7lbb   1/1       Running   0          1h        10.64.1.6   gke-cluster-vpc-default-pool-db114d00-2mz8
 web-6d695d4565-nzdnx   1/1       Running   0          1h        10.64.0.4   gke-cluster-vpc-default-pool-db114d00-1l9m
 web-6d695d4565-ttkr8   1/1       Running   0          1h        10.64.1.5   gke-cluster-vpc-default-pool-db114d00-2mz8 

 
 kubectl get services
 NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
 kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP          1h
 web          NodePort    10.0.0.253   <none>        8080:30017/TCP   1h 

As we can see from above output, pod IP address(10.64.0.x, 10.64.1.x) comes out of the cluster IP range and service IP address(10.0.0.253) comes out of the service IP address range. If we look at the routing table, we do not see routes specific to pod IP address as VPC is already aware of it.

In the next blog, I will talk about Network endpoint group and Container native load balancing.

References:

VPC native GKE clusters – IP address management

This blog was written by me after a long gap of close to 7 months. Many reasons including busy work schedule, some health issues in the middle and a little bit of laziness contributed to this. I hope to be a more active blogger going forward.

In this blog series, I will cover the following topics:

The first blog in this series will talk about GKE default IP address management.

Following are the Kubernetes abstractions that needs IP addresses:

  • Node IP address – Assigned to individual nodes. The node ip address is assigned from the VPC subnet range.
  • Pod IP address – Assigned to individual pods. All containers within a single pod share same IP address.
  • Service IP address- Assigned to individual service

By default, “/14” address gets allocated for cluster IP range. Pod and service IP addresses comes out this pool. “/24” address that comes out of the cluster IP range gets assigned to each individual node and is used for pod IP allocation. “/20” address that comes out of the cluster IP range gets assigned for Kubernetes services. The user has a choice to select cluster IP range when creating the cluster.

To illustrate some of the above points, I have created a 3 node Kubernetes cluster with IP aliasing disabled. By default, VPC native clusters(ip aliasing enabled) is disabled and has to enabled manually. In the future GKE release, VPC native clusters will be the default mechanism.

Cluster output:

Following output shows the 3 node GKE cluster created using default IP options in the “default” subnet.

 $ kubectl get nodes -o wide
 NAME                                             STATUS    ROLES     AGE       VERSION          EXTERNAL-IP      OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
 gke-cluster-default-default-pool-0f6611b6-k68m   Ready     <none>    9d        v1.12.7-gke.10   35.225.186.179   Container-Optimized OS from Google   4.14.106+        docker://17.3.2
 gke-cluster-default-default-pool-0f6611b6-vrdc   Ready     <none>    9d        v1.12.7-gke.10   130.211.227.64   Container-Optimized OS from Google   4.14.106+        docker://17.3.2
 gke-cluster-default-default-pool-0f6611b6-xbql   Ready     <none>    9d        v1.12.7-gke.10   35.222.130.158   Container-Optimized OS from Google   4.14.106+        docker://17.3.2  

Cluster IP address:
Following output shows the allocated range of cluster IP address(10.60.0.0/14). Pod and service ip addresses are allocated out of this range. If we do not specify the cluster IP, GKE automatically assigns a “/14” address for cluster IP. We can manually specify cluster IP using “–cluster-ipv4-cidr” option when we create the cluster.

 $ gcloud container clusters describe cluster-default | grep Ip
 clusterIpv4Cidr: 10.60.0.0/14
 nodeIpv4CidrSize: 24
   podIpv4CidrSize: 24
 servicesIpv4Cidr: 10.63.240.0/20 

Node IP address:
Following output shows the node IP external and internal address, pod IP address allocated for each node. Node internal IP address(eg: 10.128.0.40) is assigned from the VPC subnet(10.128.0.0/20). Each node is allocated a /24 pod IP address range which allows maximum of 256 pods in a node. 10.60.2.0/24 is the pod ip address range allocated for node 1. To allow for pods going up and down, we need a buffer of pod IPs. Because of this reason, only 110 pod ip addresses(instead of 256) are allocated for each node.

 $ kubectl describe nodes | grep -i -e ip -e podcidr
   InternalIP:   10.128.0.40
   ExternalIP:   35.225.186.179
 PodCIDR:                     10.60.2.0/24
   InternalIP:   10.128.0.42
   ExternalIP:   130.211.227.64
 PodCIDR:                     10.60.0.0/24
   InternalIP:   10.128.0.41
   ExternalIP:   35.222.130.158
 PodCIDR:                     10.60.1.0/24 

Let’s deploy a sample application. The following commands will create 3 pods and create a “Nodeport” service on top.

kubectl run web --image=gcr.io/google-samples/hello-app:1.0 --replicas=3 --port=8080
kubectl expose deployment web --target-port=8080 --type=NodePort

Let’s look at the pod ip address allocated. 2 pods get allocated on single node, so it gets ip address out of “10.60.2.x/24” range and another pod gets IP out of “10.60.1.x/24” range.

$ kubectl get pods -o wide
NAME                   READY     STATUS    RESTARTS   AGE       IP          NODE
web-6d695d4565-4cg6t   1/1       Running   0          1m        10.60.1.3   gke-cluster-default-default-pool-0f6611b6-xbql
web-6d695d4565-6dj79   1/1       Running   0          1m        10.60.2.4   gke-cluster-default-default-pool-0f6611b6-k68m
web-6d695d4565-dhfqn   1/1       Running   0          1m        10.60.2.5   gke-cluster-default-default-pool-0f6611b6-k68m

Let’s look at service IP. Service IP(10.63.243.144) is allocated out of the range(10.63.240.0/20) which in turn comes out of the cluster IP range.

$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes   ClusterIP   10.63.240.1     <none>        443/TCP          17m
web          NodePort    10.63.243.144   <none>        8080:32668/TCP   2m

For the pods to talk across each nodes, gke needs to add explicit routes to the routing table as shown below. Since we do not have IP aliasing enabled, GCP VPC is not aware of these pod and service ip addresses and that’s the reason gke explicitly adds these routes to the routing table.

gke-cluster-default-3db044-19175121-779c-11e9-b601-42010a800159  default    10.60.0.0/24    us-central1-b/instances/gke-cluster-default-default-pool-0f6611b6-vrdc  1000
gke-cluster-default-3db044-192c50f1-779c-11e9-b601-42010a800159  default    10.60.1.0/24    us-central1-b/instances/gke-cluster-default-default-pool-0f6611b6-xbql  1000
gke-cluster-default-3db044-193c1da9-779c-11e9-b601-42010a800159  default    10.60.2.0/24    us-central1-b/instances/gke-cluster-default-default-pool-0f6611b6-k68m  1000

In the next 2 blogs, I will cover VPC native clusters using IP aliasing and Container native load balancing.

NEXT 100 Webinar – Top 3 reasons why you should run your enterprise workloads on GKE

I presented this webinar “Top 3 reasons why you should run your enterprise workloads on GKE” at NEXT100 CIO forum earlier this week. Businesses are increasingly moving to Containers and Kubernetes to simplify and speed up their application development and deployment. The slides and demo covers the top reasons why Google Kubernetes engine(GKE) is one of the best Container management platforms for enterprises to deploy their containerized workloads.

Following are the slides and recording:

Recording link

 

 

Container Conference Presentation

This week, I did a presentation in Container Conference, Bangalore. The conference was well conducted and it was attended by 400+ quality attendees. I enjoyed some of the sessions and also had fun talking to attendees. The topic I presented was “Deep dive into Kubernetes Networking”. Other than covering Kubernetes networking basics, I also touched on Network control policy, Istio service mesh, hybrid cloud and best practises.

Slides:

Recording:

Demo code and Instructions:

Github link

Recording of the Istio section of the demo: (the recording was not at conference)

As always, feedback is welcome.

I was out of blogging action for last 9 months as I was settling into my new Job at Google and I also had to take care of some personal stuff. Things are getting little clear now and I am hoping to start my blogging soon…