Category Archives: Containers

Kubernetes CRI and Minikube

Kubernetes CRI(Container runtime interface) is introduced in experimental mode in Kubernetes 1.15 release. Kubernetes CRI introduces a common Container runtime layer that allows for Kubernetes orchestrator to work with multiple Container runtimes like Docker, Rkt, Runc, Hypernetes etc. CRI makes it easy to plug in a new Container runtime to Kubernetes. Minikube project simplifies Kubernetes installation for development and testing purposes. Minikube project allows Kubernetes master and worker components to run in a single VM which facilitates developers and users of Kubernetes to easily try out Kubernetes. In this blog, I will cover basics of Minikube usage, overview of CRI and steps to try out CRI with Minikube.

Minikube

Kubernetes software is composed of multiple components and beginners normally get overwhelmed with the installation steps. It is also easier to have a lightweight Kubernetes environment for development and testing purposes. Minikube has all Kubernetes components in a single VM that runs in the local laptop. Both master and worker functionality is combined in the single VM.

Following are some major features present in Minikube:

  • Capability to run multiple Kubernetes versions
  • Supports both CNI and Kubenet networking mode
  • Web dashboard for configuration and monitoring
  • Support of Docker and Rkt Container runtime
  • CRI support in experimental mode
  • Minimal volume support
  • Supported in Linux, Mac and Windows

Installation

Minikube can be downloaded from here. Kubectl is the CLI utility to manage Kubernetes and this needs to be installed separately.

Minikube installation in Linux

curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.16.0/minikube-linux-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/

Kubectl installation in Linux

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && chmod +x kubectl && sudo mv kubectl /usr/local/bin/

Installation in Windows
For Windows, we can download “minikube-windows-amd64.exe” executable. kubectl can be downloaded using:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/windows/amd64/kubectl.exe

By default, Minikube uses Virtualbox hypervisor for starting the VM. Using “–vm-driver” option, we can use other hypervisors like vmfusion, hyperv.
For this blog, I have used Minikube running in Windows with Virtualbox.

Lets check the Minikube version after Minikube installation is complete.

$ minikube version
minikube version: v0.15.0

Lets check the supported Kubernetes versions with Minikube:

$ minikube get-k8s-versions
The following Kubernetes versions are available: 
	- v1.6.0-alpha.0
	- v1.5.2
	- v1.5.1
	- v1.4.5
	- v1.4.3
	- v1.4.2
	- v1.4.1
	- v1.4.0
	- v1.3.7
	- v1.3.6
	- v1.3.5
	- v1.3.4
	- v1.3.3
	- v1.3.0

Kubernetes with Docker

Lets start Minikube with Kubernetes version 1.5.2. By default, Docker is used as Container runtime.

minikube start --kubernetes-version=v1.5.2

Lets list the cluster detail:

$ kubectl cluster-info
Kubernetes master is running at https://192.168.99.115:8443
KubeDNS is running at https://192.168.99.115:8443/api/v1/proxy/namespaces/kube-system/
services/kube-dns
kubernetes-dashboard is running at https://192.168.99.115:8443/api/v1/proxy/namespaces
/kube-system/services/kubernetes-dashboard

Lets list the node detail:

$ kubectl get nodes
NAME       STATUS    AGE
minikube   Ready     3h

As we can see in the above output, there is only 1 node that is serving both as master and worker node.

To ssh into the node, we can do “kubectl ssh”.

Lets start a simple nginx webserver.

kubectl run webserver --image=nginx --port=80
kubectl expose deployment webserver --type=NodePort

The above set of commands starts a Kubernetes deployment which in turn triggers creation of pod and replication set. The deployment is exposed as a Service. Kubernetes creates a IP for each service that can be accesssed within the cluster. In the above example, we have used service type “NodePort”. This allows the service to be accessed from external world.

Lets look at the deployment, replication set and pod created:

$ kubectl get deployment
NAME        DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
webserver   1         1         1            1           1m

$ kubectl get rs
NAME                   DESIRED   CURRENT   READY     AGE
webserver-1505803560   1         1         1         1m

$ kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
webserver-1505803560-wf3sh   1/1       Running   0          1m

Lets look at the service list:

$ kubectl get services
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
kubernetes   10.0.0.1             443/TCP        5h
webserver    10.0.0.93           80:30195/TCP   1m

In the above output, we can see that the service is exposed using clusterip “10.0.0.93” as well as using port 30195 on the kubernetes node.

If we ssh to the node, we can see the Docker container running:

$ docker ps | grep nginx
fb6a462a7095        nginx                                                        "ngin
x -g 'daemon off"   4 minutes ago       Up 4 minutes                            k8s_we
bserver.a9c43075_webserver-1505803560-wf3sh_default_075c2375-f059-11e6-90bf-080027d244
a1_a4843218

We can access the Kubernetes dashboard at “nodeip:30000”

As part of cleanup, we can delete the deployment and service.

kubectl delete service webserver
kubectl delete deployment webserver

To delete the cluster, we can do the following:

minikube delete

Kubernetes with Rkt

Rkt is a Container runtime from CoreOS. Rkt does not have a daemon like Docker. Systemd manages the Container processes.

Now, lets start Minikube with Rkt as the Container runtime.

minikube start --kubernetes-version=v1.5.2 --container-runtime=rkt 

To check that Rkt is the runtime, we can confirm this with minikube logs.

minikube logs | grep -i rkt

We can start the nginx service using the same commands that we used earlier:

kubectl run webserver --image=nginx --port=80
kubectl expose deployment webserver --type=NodePort

Now, we can see that the Rkt container is running rather than Docker container:

$ rkt list | grep nginx
80f54c9e        webserver               registry-1.docker.io/library/nginx:latest    r
unning  52 seconds ago  51 seconds ago  rkt.kubernetes.io:ip4=10.1.0.4, default-restricted:ip4=172.16.28.4

In the above example, Docker nginx image was used to start Rkt container. The image format conversion happens automatically.

Kubernetes CRI

Before Kubernetes 1.3, Docker was the only Container runtime supported in Kubernetes. As part of Rktnetes project, Rkt was added as Container runtime in Kubernetes 1.3 version. The example illustrated in the previous section used Rktnetes.
The architecture of Kubernetes is that the apiserver with scheduler runs on the master node and kubelet runs in all worker nodes. Kubelet talks to the container runtime engine in the worker node to start the Containers. With this architecture, there was close interaction between Kubectl and Container runtime which needed Kubectl to be aware of the specific Container runtime engine. There were changes needed in kubectl for every new Container runtime which made code maintenance difficult. In Kubernetes 1.5 release, CRI(Container runtime interface) was introduced to make a clean interface between Kubectl and Container runtime. This allows for any new Container runtime to be easily plugged into Kubernetes. GRPC mechanism was introduced between kubectl and Container runtime shim layer. GRPC provides 2 services. The ImageService provides RPCs to pull an image from a repository, inspect, and remove an image. The RuntimeService contains RPCs to manage the lifecycle of the pods and containers. CRI is available as a experimental feature in Kubernetes 1.5.

Following picture from CRI release blog shows the interaction between Kubectl and CRI shim layer:

cri1
Following are the Container runtimes that are in the process of integration with Kubernetes CRI.

  • Docker – This is for Docker container runtime and is available as a experimental feature in Kubernetes 1.5.
  • Rktlet – This will supersede Rktnetes and will support Rkt Container runtime.
  • OCI – This integrates with OCI compatible runc
  • Hypernetes – This Container runtime runs Container inside lightweight VM.

Following picture from hypernetes github page shows how the different Container runtimes interacts with Kubectl:

cri2

To start Minikube with CRI and Docker shim, we can use the corresponding CRI config flag:

$ minikube start --kubernetes-version=v1.5.2 --extra-config=kubelet.EnableCRI=true

To check that CRI is enabled, we can do the following:

$  minikube logs | grep EnableCRI
Feb 11 13:54:44 minikube localkube[3114]: I0211 13:54:44.908881    3114 localkube.go:1
17] Setting EnableCRI to true on kubelet.

Summary

Having Container runtime as a pluggable model inside Kubernetes is very important step that allows many Container runtimes to thrive. For example, Hypernetes Container runtime runs Container inside a small isolated VM environment and this might be useful for container applications needing better hardware isolation. The beauty of it lies in the fact that the interface from Kubernetes to start the Container remains the same irrespective of the Container runtime.

Minikube is a very easy way to get started with Kubernetes. The fact that it runs on Windows, Linux and Mac makes it universally useful. 1 missing feature is the capability to have multiple nodes. I see that few folks have made the same request and this might be in Minikube’s roadmap.

References

Docker 1.13 Experimental features

Docker 1.13 version got released last week. Some of the significant new features include Compose support to deploy Swarm mode services, supporting backward compatibility between Docker client and server versions, Docker system commands to manage Docker host and restructured Docker CLI. In addition to these major features, Docker introduced a bunch of experimental features in 1.13 release. In every release, Docker introduces few new Experimental features. These are features that are not yet ready for production purposes. Docker puts out these features in experimental mode so that it can collect feedback from its users and make modifications when the feature gets officially released in the next set of releases. In this blog, I will cover the experimental features introduced in Docker 1.13.

Following are the regular features introduced in Docker 1.13:

  • Deploying Docker stack on Swarm cluster with Docker compose.
  • Docker cli with Docker daemon backward compatibility. This allows newer Docker CLI to talk to older Docker daemons.
  • Docker cli new options like “docker container”, “docker image” to collect related commands in docker sub-keyword.
  • Docker system details using “docker system” – This helps in maintaining Docker host for cleanup and to get Container usage details
  • Docker secret management
  • docker build with compress option for slow connections

Following are the 5 features introduced in experimental mode in Docker 1.13:

  • Experimental daemon flag to enable experimental features instead of having separate experimental build.
  • Docker service logs command to view logs for a Docker service. This is needed in Swarm mode.
  • Option to squash image layers to the base image after successful builds.
  • Checkpoint and restore support for Containers.
  • Metrics (Prometheus) output for basic container, image, and daemon operations.

Experimental Daemon flag

Docker released experimental features prior to 1.13 release as well. In earlier release, users needed to download a new Docker image to try out experimental features. To avoid this unnecessary overhead of having different images, Docker introduced a experimental flag or option to Docker daemon so that users can start the Docker daemon with or without experimental features. With Docker 1.13 release, Docker experimental flag is in experimental mode.

By default, experimental flag is turned off. To see the experimental flag, check Docker version.

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:50:17 2017
 OS/Arch:      linux/amd64
 Experimental: false

To turn on experimental mode, Docker daemon needs to be restarted with experimental flag turned on.

Experimental flag in Ubuntu 14.04:
For Ubuntu 14.04, Docker daemon options are specified as part of Upstart system manager. This is how I enabled experimental mode in Ubuntu 14.04:
Change /etc/default/docker:

/etc/default/docker:
DOCKER_OPTS="--experimental=true"

Restart Docker daemon:

sudo service docker restart

Check that experimental mode is turned on by executing “docker version”:

Client:
 Version:      1.13.0
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:50:17 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   49bf474
 Built:        Tue Jan 17 09:50:17 2017
 OS/Arch:      linux/amd64
 Experimental: true

Experimental flag in Ubuntu 16.04:
For Ubuntu 16.04, Docker daemon options are specified as part of systemd system manager. This is how I enabled experimental mode in Ubuntu 16.04:
Edit docker.conf:

# cat /etc/systemd/system/docker.service.d/docker.conf 
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// --experimental=true

Restart Docker daemon:

sudo systemctl daemon-reload
sudo systemctl restart docker

Experimental mode with Docker machine:
There are instances where we need to create Docker hosts using Docker machine. Docker machine can be used to create Swarm clusters for development as well as to create Docker hosts in any cloud provider. To set experimental mode using docker-machine, we can use the experimental option as shown below:

docker-machine create --driver virtualbox --engine-opt experimental=true test

It is very nice to have experimental mode present in default Docker image. The part that I am not sure is if the presence of experimental feature can destabilize base Docker features even if the experimental feature is turned off.

Docker service logs

Container debugging starts with looking at “Docker log” output for the specific container. Docker swarm mode along with Docker service abstraction was introduced in Docker 1.12. A single Docker service with its associated containers can be spread across multiple nodes. With Docker 1.12, there was no logging at service level. It was difficult to debug problems when there is issue at service level. Also, it is painful to look at container logs spread over multiple nodes of a single service. “Docker service logs” introduced in 1.13 provides service level logging.
Following is my 2 node Docker swarm mode cluster:

$ docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
mpa2rgbjb0b7ijqve3cvo74w3    worker    Ready   Active        
sx035ztm94x9naml9t7pqdm8g *  manager   Ready   Active        Leader

For this example, I have used the sample voting application described here. The application is deployed using Docker compose as shown below.

docker stack deploy --compose-file voting_stack.yml vote

Following are the services running:

$ docker service ls
ID            NAME             MODE        REPLICAS  IMAGE
391v9t5hub74  vote_redis       replicated  2/2       redis:alpine
e5v71i26ah0y  vote_result      replicated  2/2       dockersamples/examplevotingapp_result:after
masj0y4xp90a  vote_visualizer  replicated  1/1       dockersamples/visualizer:stable
q2pip3fudbgb  vote_worker      replicated  0/1       dockersamples/examplevotingapp_worker:latest
tgt7tx6sorje  vote_db          replicated  1/1       postgres:9.4
tmhk9k6ubjz0  vote_vote        replicated  2/2       dockersamples/examplevotingapp_vote:after

Lets check the status of the service “vote_vote”:

$ docker service ps vote_vote
ID            NAME         IMAGE                                      NODE     DESIRED STATE  CURRENT STATE          ERROR  PORTS
cjvzrzc18nta  vote_vote.1  dockersamples/examplevotingapp_vote:after  worker   Running        Running 4 minutes ago         
aqw5yysav42y  vote_vote.2  dockersamples/examplevotingapp_vote:after  manager  Running        Running 4 minutes ago

In the above ouput, we can see that the service is composed of 2 containers and that 1 container is running on manager node and another on worker node.

Lets look at service logs associated with vote_vote:

$ docker service logs vote_vote
vote_vote.1.xsn0m3al4jfz@worker    | [2017-01-21 08:17:47 +0000] [1] [INFO] Starting gunicorn 19.6.0
vote_vote.1.xsn0m3al4jfz@worker    | [2017-01-21 08:17:47 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
vote_vote.1.xsn0m3al4jfz@worker    | [2017-01-21 08:17:47 +0000] [1] [INFO] Using worker: sync
vote_vote.1.xsn0m3al4jfz@worker    | [2017-01-21 08:17:47 +0000] [9] [INFO] Booting worker with pid: 9
vote_vote.1.xsn0m3al4jfz@worker    | [2017-01-21 08:17:47 +0000] [10] [INFO] Booting worker with pid: 10
vote_vote.1.xsn0m3al4jfz@worker    | [2017-01-21 08:17:47 +0000] [12] [INFO] Booting worker with pid: 12
vote_vote.1.xsn0m3al4jfz@worker    | [2017-01-21 08:17:47 +0000] [11] [INFO] Booting worker with pid: 11
vote_vote.2.tpr51pvw9211@manager    | [2017-01-21 08:17:50 +0000] [1] [INFO] Starting gunicorn 19.6.0
vote_vote.2.tpr51pvw9211@manager    | [2017-01-21 08:17:50 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
vote_vote.2.tpr51pvw9211@manager    | [2017-01-21 08:17:50 +0000] [1] [INFO] Using worker: sync
vote_vote.2.tpr51pvw9211@manager    | [2017-01-21 08:17:50 +0000] [9] [INFO] Booting worker with pid: 9
vote_vote.2.tpr51pvw9211@manager    | [2017-01-21 08:17:50 +0000] [10] [INFO] Booting worker with pid: 10
vote_vote.2.tpr51pvw9211@manager    | [2017-01-21 08:17:50 +0000] [11] [INFO] Booting worker with pid: 11
vote_vote.2.tpr51pvw9211@manager    | [2017-01-21 08:17:50 +0000] [12] [INFO] Booting worker with pid: 12

In the above output, we can see the logs associated with both containers of the service.

Docker squash image layers

Docker container image consists of multiple layers and using Union filesystem, the layers are combined into a single image. Each line in the Dockerfile will result in a separate image layer. The sharing of image layers between different container images provides efficiencies with respect to storage. In certain scenarios, the presence of multiple image layers can add unnecessary overhead. Another use case for squashing is that some users prefer to not see the layers for security reasons. With Docker squash option, all Docker image layers are combined with the parent to reduce size of the image. There was a discussion between squashing to parent versus squashing to scratch image, the current decision is to squash to parent to allow for base image reuse. The image layers are still preserved in the cache to keep building Docker images fast in build machine.

Lets take a simple Container image built from busybox and illustrate how squash will work.

Dockerfile:

$ cat Dockerfile 
FROM busybox
RUN echo hello > /hello
RUN echo world >> /hello
RUN touch remove_me /remove_me
ENV HELLO world
RUN rm /remove_me

Lets first make a Container image with default options which does not enable squashing. We can also save the image into a tar file.

docker build  -t nosquashimage .
docker save nonsquashimage -o nonsquashimage.tar

Lets look at the image layers:

$ docker history nosquash
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
0b3e1b58bdfa        3 weeks ago         /bin/sh -c rm /remove_me                        0 B                 
2c8090cbf777        3 weeks ago         /bin/sh -c #(nop)  ENV HELLO=world              0 B                 
5e6bc1925f7d        3 weeks ago         /bin/sh -c touch remove_me /remove_me           0 B                 
a90740ad3307        3 weeks ago         /bin/sh -c echo world >> /hello                 12 B                
5b4e51667cd1        3 weeks ago         /bin/sh -c echo hello > /hello                  6 B                 
7968321274dc        4 weeks ago         /bin/sh -c #(nop)  CMD ["sh"]                   0 B                 
           4 weeks ago         /bin/sh -c #(nop) ADD file:707e63805c0be1a...   1.11 MB 

As we can see in above output, each line in Dockerfile is represented by a Docker image layer.

Following are the contents of “manifest.json” that shows the image layers. We get manifest.json after we untar “nonsquashimage.tar”.

$ cat manifest.json  | jq .
[
  {
    "Layers": [
      "59c553be1ded32f51e74244c9c54ca27050fb6b843a08b8b1edc8d7205690b7f/layer.tar",
      "8c034db2f83411ec1a40efaba29b3239845e98cd1a0d7380d7b2c71a2c9a9947/layer.tar",
      "436c6ca87d2cf07e3825b7508c4703bb8f5db8f85c60b1cef1b4677b68856021/layer.tar",
      "4c5cb32265ceca6d71152063e0de14dc915fdeddd2882f3c1171ca06327d70da/layer.tar",
      "3702c68ae35ec62ab6e2cc3bb4748a6a05511f0f6492e03361a0038af315a8b6/layer.tar"
    ],
    "RepoTags": [
      "nosquash:latest"
    ],
    "Config": "0b3e1b58bdfa5d72b218b25d01c05b718680f80a346a0d32067150bf256dc47a.json"
  }
]

We can look at the layers by inspecting the image as well.

$ docker inspect nosquash | grep -A 6 Layers
            "Layers": [
                "sha256:38ac8d0f5bb30c8b742ad97a328b77870afaec92b33faf7e121161bc78a3fec8",
                "sha256:6fad774884880a017688b2595c0f262451fd411eab78e3055bfb4f9ec2b647b2",
                "sha256:c552187b79dc6cb16254ea03a9bb1da4555d224958a8a84c390fa1271ba818d1",
                "sha256:0b33cdff4f88daba608841eb711f3aca00dd14bdce17f83ef87e7f8dc38cdc67",
                "sha256:1ef2f5783216dcaf10da9c2041002155f74044e62c6aaf150eb36e819881b776"
            ]

Lets look at layers in parent busybox image:

$ docker inspect busybox:latest | grep -A 5 Layers
            "Layers": [
                "sha256:38ac8d0f5bb30c8b742ad97a328b77870afaec92b33faf7e121161bc78a3fec8"
            ]

From above output, we can see parent busybox image has 1 layer, remaining 4 layers are from the new image we created.

To illustrate the difference when squash is enabled, lets build the image with Squash option:

docker build  --squash -t squashimage .
docker save squashimage -o squashimage.tar

If we look at Container image layer, we can see that all layers have been combined with the parent as shown below:

$ docker history squash
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
71219000b4b3        3 weeks ago                                                         12 B                merge sha256:0b3e1b58bdfa5d72b218b25d01c05b718680f80a346a0d32067150bf256dc47a to sha256:7968321274dc6b6171697c33df7815310468e694ac5be0ec03ff053bb135e768
           3 weeks ago         /bin/sh -c rm /remove_me                        0 B                 
           3 weeks ago         /bin/sh -c #(nop)  ENV HELLO=world              0 B                 
           3 weeks ago         /bin/sh -c touch remove_me /remove_me           0 B                 
           3 weeks ago         /bin/sh -c echo world >> /hello                 0 B                 
           3 weeks ago         /bin/sh -c echo hello > /hello                  0 B                 
           4 weeks ago         /bin/sh -c #(nop)  CMD ["sh"]                   0 B                 
           4 weeks ago         /bin/sh -c #(nop) ADD file:707e63805c0be1a...   1.11 MB           

If we look at the size of squashed image, we can see that it is smaller than unsquashed image:

$ ls -l */*.tar
-rw------- 1 sreeni sreeni 1341952 Jan 22 22:09 nosquash/nosquashimage.tar
-rw------- 1 sreeni sreeni 1327616 Jan 22 22:09 squash/squashimage.tar

If we look at layer output, we can see that there are only 2 layers present:

$ cat manifest.json |jq .
[
  {
    "Layers": [
      "4a34e9cef720c233c8b544b494dbb553536a6b5bbf4441fb62b30b5cf2bad895/layer.tar",
      "a1cf4d557ddf29e39f0d269bb93b0132cf1a258087aa73d27daa9db06207bd7a/layer.tar"
    ],
    "RepoTags": [
      "squash:latest"
    ],
    "Config": "71219000b4b3155ed6544b2b83d0b881f91d1b14372b4f61b8674d8c1b65b6aa.json"
  }
]

We can also look at the layers by inspecting the image.

$ docker inspect squash | grep -A 3 Layers
            "Layers": [
                "sha256:38ac8d0f5bb30c8b742ad97a328b77870afaec92b33faf7e121161bc78a3fec8",
                "sha256:c552187b79dc6cb16254ea03a9bb1da4555d224958a8a84c390fa1271ba818d1"
            ]

From above example, we can see that nonsquash image has 5 layers, out of which 1 is the parent image. squash image has 2 layers, out of which 1 is the parent image. We have compressed the 4 layers into 1 using the squash option.

The “docker history” output and “docker inspect” shows different number of layers since history shows all commands as seperate layers while some of the commands gets combined into a single layer. To check the actual number of layers, we would need to use “docker inspect” or look at “manifest.json”. In the above example, we have seen “missing” tag in docker history output. The “missing” tag is not an issue. Its because post Docker 1.10, imageid and layerid means different things and imageid is not preserved if the image is not built locally. There is a detailed blog that explains this very clearly.

Checkpoint and restore

Checkpoint and restore feature allows for persisting Docker Container runtime state. This is different from Docker persistent storage using Volumes. Volumes are used for persisting files and databases. Checkpoint and restore feature allows to persist process state within Container. This allows for preserving container state when host is rebooted, container is moved across hosts or when container is stopped and restarted.
Checkpoint and restore Docker experimental feature uses a tool called CRIU (Checkpoint restore in User space). It is needed to install CRIU to try out this feature.

I used the following steps to install CRIU in Ubuntu 16.04:

apt-get update
apt-get install libnet1-dev
git clone https://github.com/xemul/criu
sudo apt-get install --no-install-recommends git build-essential libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler protobuf-compiler python-protobuf libnl-3-dev libpth-dev pkg-config libcap-dev asciidoc
apt-get install asciidoc xmlto
cd criu
make
make install

To Check that CRIU is installed correctly, we can try the following:

# criu check
Warn  (criu/autofs.c:79): Failed to find pipe_ino option (old kernel?)
Looks good.

To illustrate the feature, we can start a busybox container that is running a loop and printing numbers. When we use checkpoint and restore, we can see that the container starts from the state where it is left off:

docker run --security-opt=seccomp:unconfined --name cr -d busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
docker checkpoint create cr checkpoint1
docker start --checkpoint checkpoint1 cr

We can use docker logs to confirm that the restarted container starts from the saved state.

For VMs, Vmotion is a very important feature which allows VM movement without stopping the VM. Its not clear if seamless Container movement across hosts is important as its expected that applications using containers are expected to spawn new containers to handle failure and not needing to preserve runtime state within the container. There are still scenarios where checkpoint and restore functionality for Containers can be useful.

Docker metrics in Prometheus format

Prometheus is an open source monitoring solution. In this experimental feature, Docker has added metrics Prometheus output for basic container, image and daemon operations. There are many more Container metrics that will be exposed in future.

There are 2 components to get Prometheus working. First is node exporter. This runs in each node and exports metrics in prometheus format. Second is Prometheus server that reads the metrics from each node and crunches the data into meaningful content. Prometheus can integrate with other monitoring visualizers like Grafana where Grafana can read data exported by Prometheus servers.

I used the following systemd conf file to enable Prometheus in my Ubuntu 16.04 system:

# cat /etc/systemd/system/docker.service.d/docker.conf 
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// --experimental=true --metrics-addr=0.0.0.0:4999

Docker daemon has to be restarted after this.

sudo systemctl daemon-reload
sudo systemctl restart docker

Following is a sample output out of the metrics endpoint that is exposed on port 4999 in the host machine.

# curl localhost:4999/metrics | more
# HELP engine_daemon_container_actions_seconds The number of seconds it takes to process each container action
# TYPE engine_daemon_container_actions_seconds histogram
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.005"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.01"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.025"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.05"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.1"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.25"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.5"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="1"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="2.5"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="5"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="10"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="+Inf"} 1

To export this data into Prometheus, lets start Prometheus container with the following configuration file. In the below config file, we specify the nodes that Prometheus needs to scrap the data and some options. First node is the local host and the second node is the Container metrics endpoint that we have exposed on the localhost.

# A scrape configuration scraping a Node Exporter and the Prometheus server
# itself.
scrape_configs:
  # Scrape Prometheus itself every 5 seconds.
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  # Scrape the Node Exporter every 5 seconds.
  - job_name: 'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['139.59.56.66:4999']

At this point, we can start the Prometheus Container using above config file:

docker run -d -p 9090:9090 -v ~/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus -config.file=/etc/prometheus/prometheus.yml -storage.local.path=/prometheus -storage.local.memory-chunks=10000

Following picture shows the targets served from prometheus endpoint on port 9090. 1 of the target is the host itself and the other is the Docker metrics endpoint.

docker_exp1

Following picture shows the count of Docker daemon events like container create, delete etc in CLI format.

docker_exp2

Following picture shows the count of Docker daemon events like container create, delete etc in GUI format.

docker_exp3

For much more fancy Dashboards, we can use Grafana and connect it to Prometheus. In the above example, we have used Prometheus to monitor standalone Docker node. Prometheus can also be used with Docker swarm clusters. Following blog covers an approach to integrate Prometheus with Docker Swarm where Prometheus can be used to monitor all nodes in Docker Swarm cluster.

At this point, the Prometheus metrics supported by Docker are minimal. There is a plan to support metrics for all Docker subsystems in future.

References

Docker in Docker and play-with-docker

For folks who want to get started with Docker, there is the initial hurdle of installing Docker. Even though Docker has made it extremely simple to install Docker on different OS like Linux, Windows and Mac, the installation step prevents folks from getting started with Docker. With Play with Docker, that problem also goes away. Play with Docker provides a web based interface to create multiple Docker hosts and be able to run Containers. This project is started by Docker captain Marcos Nils and is an open source project. Users can run regular containers or build Swarm cluster between the Docker hosts and create container services on the Swarm cluster. The application can also be installed in the local machine. This project got me interested in trying to understand the internals of the Docker host used within the application. I understood that Docker hosts are implemented as Docker in Docker(Dind) containers. In this blog, I have tried to cover some details on Dind and Play with Docker.

Docker in Docker(Dind)

Docker in Docker(Dind) allows Docker engine to run as a Container inside Docker. This link is the official repository for Dind. When there is a new Docker version released, corresponding Dind version also gets released. This link from Jerome is an excellent reference on Docker in Docker that explains issues with Dind, cases where Dind can be used and cases where Dind should not be used.

Following are the two primary scenarios where Dind can be needed:

Continue reading Docker in Docker and play-with-docker

Vault – Use cases

This blog is a continuation of my previous blog on Vault. In the first blog, I have covered overview of Vault. In this blog, I will cover some Vault use cases that I tried out.

Pre-requisites:

Install and start Vault

I have used Vault 0.6 version for the examples here. Vault can be used either in development or production mode. In development mode, Vault is unsealed by default and secrets are stored only in memory. Vault in production mode needs manual unsealing and supports backends like Consul, S3.

Start Vault server:

Following command starts Vault server in development mode. We need to note down the root key that will be used later.

Continue reading Vault – Use cases

Service Discovery and Load balancing Internals in Docker 1.12

Docker 1.12 release has revamped its support for Service Discovery and Load balancing. Prior to 1.12 release, support for Service discovery and Load balancing was pretty primitive in Docker. In this blog, I have covered the internals of Service Discovery and Load balancing in Docker release 1.12. I will cover DNS based load balancing, VIP based load balancing and Routing mesh.

Technology used

Docker service discovery and load balancing uses iptables and ipvs features of Linux kernel. iptables is a packet filtering technology available in Linux kernel. iptables can be used to classify, modify and take decisions based on the packet content. ipvs is a transport level load balancer available in the Linux kernel.

Sample application

Following is the sample application used in this blog:

Continue reading Service Discovery and Load balancing Internals in Docker 1.12

Comparing Swarm, Swarmkit and Swarm Mode

One of the big features in Docker 1.12 release is Swarm mode. Docker had Swarm available for Container orchestration from 1.6 release. Docker released Swarmkit as an opensource project for orchestrating distributed systems few weeks before Docker 1.12(RC) release. I had some confusion between these three projects. In this blog, I have tried to put my perspective on the similarities and differences between these three software components. I have also created a sample application and deployed it using the three approaches which makes it easier to compare.

Docker Swarm mode is fundamentally different from Swarm and it is confusing to use the same name Swarm. It would have been good if Docker could have renamed this to something different. Another point adding to the confusion is that native Swarm functionality will continue to be supported in Docker 1.12 release, this is done to preserve backward compatibility. In this blog, I have used the term “Swarm” to refer to traditional Swarm functionality, “SwarmNext” to refer to new Swarm mode added in 1.12, “Swarmkit” to refer to the plumbing open source orchestration project.

Swarm, SwarmNext and Swarmkit

Following table compares Swarm and SwarmNext:

Continue reading Comparing Swarm, Swarmkit and Swarm Mode