Service Discovery and Load balancing Internals in Docker 1.12

Docker 1.12 release has revamped its support for Service Discovery and Load balancing. Prior to 1.12 release, support for Service discovery and Load balancing was pretty primitive in Docker. In this blog, I have covered the internals of Service Discovery and Load balancing in Docker release 1.12. I will cover DNS based load balancing, VIP based load balancing and Routing mesh.

Technology used

Docker service discovery and load balancing uses iptables and ipvs features of Linux kernel. iptables is a packet filtering technology available in Linux kernel. iptables can be used to classify, modify and take decisions based on the packet content. ipvs is a transport level load balancer available in the Linux kernel.

Sample application

Following is the sample application used in this blog:

swarm1

“client” service has 1 client container task. “vote” service has multiple vote container tasks. Client service is used to access multi-container voting service. We will deploy this application in a multi-node Swarm cluster. We will access the voting application from client container and also expose the voting server to host machine for external load balancing.
I have used “smakam/myubuntu:v4” as client container and I have installed tools like dig,curl on top of base ubuntu container to demonstrate networking internals.
I have used Docker’s “instavote/vote container” as a voting server that shows container ID in output when accessed from client. This allows for easier demonstration as to the specific voting server container that responds to the client request.

Pre-requisites

I have used custom boot2docker image with ipvsadm and specific Docker version(1.12 RC4) installed. To build custom boot2docker image, please follow the procedure here in my earlier blog.

Following output shows the 2 node cluster running in Swarm mode. Node1 is acting as a master and worker. Node2 is acting only as worker.

$ docker node ls
ID                           HOSTNAME  MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
259ebbe62k8t9qd9ttbiyq31h *  node1     Accepted    Ready   Active        Leader
631w24ql3bahaiypblrgp4fa4    node2     Accepted    Ready   Active  

We need to install nsenter in boot2docker image. nsenter is a tool that allows us to enter into network namespaces and find more details on the namespace. This is useful for debugging. I have used the procedure here to install nsenter in a Container in Docker nodes.

Create overlay network:

Use following command in leader node to create overlay network “overlay1”. This should be done from master node.

docker network create --driver overlay overlay1

DNS based load balancing

Following picture describes how DNS based load balancing works for this application.

lb1

DNS server is embedded inside Docker engine. Docker DNS resolves the service name “vote” and returns list of container ip addresses in random order. Clients normally will pick the first IP so that load balancing can happen between the different instance of the servers.

Following commands create 2 services with DNS based load balancing.

docker service create --endpoint-mode dnsrr --replicas 1 --name client --network overlay1 smakam/myubuntu:v4 ping docker.com
docker service create --endpoint-mode dnsrr --name vote --network overlay1 --replicas 2 instavote/vote

Following output shows the services and tasks running:

$ docker service ls
ID            NAME    REPLICAS  IMAGE               COMMAND
4g2szdso0n2b  client  1/1       smakam/myubuntu:v4  ping docker.com
bnh59o28ckkl  vote    2/2       instavote/vote      
docker@node1:~$ docker service tasks client
ID                         NAME      SERVICE  IMAGE               LAST STATE           DESIRED STATE  NODE
d8m0snl88oc5b4r0b4tbriewi  client.1  client   smakam/myubuntu:v4  Running 2 hours ago  Running        node1
docker@node1:~$ docker service tasks vote
ID                         NAME    SERVICE  IMAGE           LAST STATE           DESIRED STATE  NODE
7irycowzbryx0rglin0wfa670  vote.1  vote     instavote/vote  Running 2 hours ago  Running        node2
aqwopxnc8eg4qx9nqwqj099es  vote.2  vote     instavote/vote  Running 2 hours ago  Running        node2

Following output shows Containers running in node1 and node2:

docker@node1:~$ docker ps
CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS               NAMES
11725c3bfec0        smakam/myubuntu:v4   "ping docker.com"   2 hours ago         Up 2 hours                              client.1.d8m0snl88oc5b4r0b4tbriewi
docker@node2:~$ docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS               NAMES
8d9b93f6847a        instavote/vote:latest   "gunicorn app:app -b "   2 hours ago         Up 2 hours          80/tcp              vote.1.7irycowzbryx0rglin0wfa670
6be7dc65655b        instavote/vote:latest   "gunicorn app:app -b "   2 hours ago         Up 2 hours          80/tcp              vote.2.aqwopxnc8eg4qx9nqwqj099es

Let’s login to client container and resolve the service name “vote” using dig. As we can see below, “vote” resolves to 10.0.0.3 and 10.0.0.4.

# dig vote
.
.
;; ANSWER SECTION:
vote.			600	IN	A	10.0.0.3
vote.			600	IN	A	10.0.0.4
.
.

Following example shows that ping to “vote” service resolves to IP 10.0.0.3 and 10.0.0.4 alternately:

# ping -c1 vote
PING vote (10.0.0.4) 56(84) bytes of data.
64 bytes from 10.0.0.4: icmp_seq=1 ttl=64 time=24.4 ms

--- vote ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 24.481/24.481/24.481/0.000 ms
root@11725c3bfec0:/# ping -c1 vote
PING vote (10.0.0.3) 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=44.9 ms

--- vote ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 44.924/44.924/44.924/0.000 ms

If we do curl, we can see that the request resolves to only a single container.

# curl vote  | grep -i "container id"
          Processed by container ID 6be7dc65655b
root@11725c3bfec0:/# curl vote  | grep -i "container id"
          Processed by container ID 6be7dc65655b

This is because of the DNS client implementation as mentioned in this blog and RFC 3484. In this example, client container IP is 10.0.0.2 and the 2 server Container IPs are 10.0.0.3 and 10.0.0.4.
Following output shows the IP of the 2 Containers along with Container ID.

docker@node2:~$ docker inspect 8d9b93f6847a | grep IPv4
                        "IPv4Address": "10.0.0.4"
docker@node2:~$ docker inspect 6be7dc65655b | grep IPv4
                        "IPv4Address": "10.0.0.3"

Following are the binary representations:

10.0.0.2 - "..010"
10.0.0.3 - "..011"
10.0.0.4 - "..100"

Compared to 10.0.0.4, 10.0.0.3 has a better longest prefix match so curl request always resolves to 10.0.0.3 ip address since the DNS client in curl reorders the response received from DNS server. 10.0.0.3 Container IP address maps to Container ID 6be7dc65655b. This issue makes load balancing to not work properly.

DNS based load balancing has the following issues:

  • Some applications cache the DNS host name to IP address mapping. This causes applications to timeout when the mapping gets changed.
  • Having non-zero DNS ttl value causes delay in DNS entries reflecting the latest detail.
  • DNS based load balancing does not do proper load balancing based on the client implementation. This is explained in this blog and in the example used in the above section.

VIP based Load balancing

VIP based load balancing overcomes some of the issues with DNS based load balancing. In this approach, each service has an IP address and this IP address maps to multiple container IP address associated with that service. In this case, service IP associated with a service does not change even though containers associated with the service dies and restarts.

Following picture shows how VIP based load balancing works for this application.

lb2

DNS would resolve the service name “vote” into Service IP(VIP). Using IP tables and ipvs, VIP gets load balanced to the 2 backend “vote” containers.
Following command starts the 2 services in VIP mode:

docker service create --replicas 1 --name client --network overlay1 smakam/myubuntu:v4 ping docker.com
docker service create --name vote --network overlay1 --replicas 2 instavote/vote

Following command shows the 2 services and their service IP.

docker@node1:~$ docker service inspect --format {{.Endpoint.VirtualIPs}}  vote
[{bhijn66xu7jagzzhfjdsg68ba 10.0.0.4/24}]
docker@node1:~$ docker service inspect --format {{.Endpoint.VirtualIPs}}  client
[{bhijn66xu7jagzzhfjdsg68ba 10.0.0.2/24}]

Following command shows the DNS mapping of service name to IP. In output below, we see that service name “vote” is mapped to VIP “10.0.0.4”.

root@9b7e8d8ce531:/# dig vote

;; ANSWER SECTION:
vote.			600	IN	A	10.0.0.4

Service IP 10.0.0.4 gets load balanced to the 2 containers using Linux kernel iptables and ipvs. iptables implements firewall rules and ipvs does load balancing.
To demonstrate this, we need to enter into Container network space using nsenter. For this, we need to find the network namespace.
Following are the network namespaces in node1:

root@node1:/home/docker# cd /var/run/docker/netns/
root@node1:/var/run/docker/netns# ls
1-1uwhvu7c4f  1-bhijn66xu7  934e0fdc377d  f15968a2d0e4

The first 2 namespaces are for overlay network and the next is for the containers. Following command sequence helps to find the client container’s network namespace:

root@node1:/var/run/docker/netns# docker ps
doCONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS               NAMES
9b7e8d8ce531        smakam/myubuntu:v4   "ping docker.com"   37 minutes ago      Up 37 minutes                           client.1.cqso2lzjvkymjdi8xp0zg6zne
root@node1:/var/run/docker/netns# docker inspect 9b7e8d8ce531 | grep -i sandbox
            "SandboxID": "934e0fdc377d7a6f597beb3e43f06b9d68b74b2e7746df8be70693664c03cc28",

SandboxID is the client Container’s network namespace.

Using following command, we can enter into client container’s network namespace:

root@node1:/var/run/docker/netns# nsenter --net=934e0fdc377d sh

Now, we can see the iptables mangle rule and ipvs output. I have pasted relevant sections of iptables output below.

root@node1:/var/run/docker/netns# iptables -nvL -t mangle
Chain OUTPUT (policy ACCEPT 2265 packets, 189K bytes)
 pkts bytes target     prot opt in     out     source               destination         
   14   880 MARK       all  --  *      *       0.0.0.0/0            10.0.0.4             MARK set 0x117
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.0.0.2             MARK set 0x10b
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.0.0.2             MARK set 0x116

Chain POSTROUTING (policy ACCEPT 2251 packets, 188K bytes)
 pkts bytes target     prot opt in     out     source               destination         
root@node1:/var/run/docker/netns# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  267 rr
  -> 10.0.0.3:0                   Masq    1      0          0         
FWM  278 rr
  -> 10.0.0.3:0                   Masq    1      0          0         
FWM  279 rr
  -> 10.0.0.5:0                   Masq    1      0          0         
  -> 10.0.0.6:0                   Masq    1      0          0         

10.0.0.4 service IP gets marking of 0x117(279) using iptables OUTPUT chain. ipvs uses this marking and load balances it to containers 10.0.0.5 and 10.0.0.6 as shown above.

Following output inside client container shows that the service access to service name “vote” getting load balanced between 2 containers. In this case, we don’t see the issue as seen in DNS endpoint example.

root@9b7e8d8ce531:/# curl vote  | grep -i "container id"
          Processed by container ID a53dbd51d90a
root@9b7e8d8ce531:/# curl vote  | grep -i "container id"
          Processed by container ID 495f97274126

Following output shows the IP address corresponding to the 2 container IDs mentioned above:

docker@node2:~$ docker inspect 495f97274126 | grep IPv4
                        "IPv4Address": "10.0.0.6"
docker@node2:~$ docker inspect a53dbd51d90a | grep IPv4
                        "IPv4Address": "10.0.0.5"

Routing mesh

Using routing mesh, the exposed service port gets exposed in all the worker nodes in the Swarm cluster. Docker 1.12 creates “ingress” overlay network to achieve this. All nodes become part of “ingress” overlay network by default using the sandbox network namespace created inside each node.

Following picture describes how Routing mesh does load balancing:

lb3

The first step is mapping of host name/IP into the Sandbox IP. IP table rules along with IPVS in the Sandbox takes care of load balancing the service between the 2 voting Containers. Ingress sandbox network namespace resides in all worker nodes of the cluster. It assists with routing mesh feature by load balancing the host mapped port to backend containers.

Following picture shows the mapping between Sandbox, Containers and the networks of each node:

lb4

In picture above, we can see that Sandboxes and “vote” containers are part of “ingress” network and it helps in routing mesh. “client” and “vote” containers are part of “overlay1” network and it helps in internal load balancing. All containers are part of the default “docker_gwbridge” network.
Following commands creates the 2 services with host port in the “vote” service getting exposed across all Docker nodes using routing mesh.

docker service create --replicas 1 --name client --network overlay1 smakam/myubuntu:v4 ping docker.com
docker service create --name vote --network overlay1 --replicas 2 -p 8080:80 instavote/vote

Following command shows the services running in node1 and node2:

docker@node1:~$ docker service tasks vote
ID                         NAME    SERVICE  IMAGE           LAST STATE             DESIRED STATE  NODE
ahacxgihns230k0cceeh44rla  vote.1  vote     instavote/vote  Running 8 minutes ago  Running        node2
0x1zqm8vioh3v3yxkkwiwy57k  vote.2  vote     instavote/vote  Running 8 minutes ago  Running        node2
docker@node1:~$ docker service tasks client
ID                         NAME      SERVICE  IMAGE               LAST STATE             DESIRED STATE  NODE
98ssczk1tenrq0pgg5jvqh0i7  client.1  client   smakam/myubuntu:v4  Running 8 minutes ago  Running        node1

Following NAT rule in iptables shows that the host traffic incoming on port 8080 is sent to the Sandbox inside node1:

root@node1:/home/docker# iptables -nvL -t nat
Chain DOCKER-INGRESS (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:172.18.0.2:8080
   74  4440 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

We need to enter into the sandbox to check the iptables and ipvs rules mentioned there. This can be achieved by using nsenter. Following outputs are inside the sandbox of node1.

Following is the iptable mangle rule that shows marking set to 0x101(257) for packets destined to port 8080.

root@node1:/var/run/docker/netns# iptables -nvL -t mangle
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 MARK set 0x101

Following is the ipvsadm rule that shows that any packets with firewall marking of 257 is load balanced to containers with ip address 10.255.0.6 and 10.255.0.7. Here ipvs masquerade mode is chosen whereby the destination ip address would get overwritten.

root@node1:/var/run/docker/netns# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  257 rr
  -> 10.255.0.6:0                 Masq    1      0          0         
  -> 10.255.0.7:0                 Masq    1      0          0         

Following output shows the load balancing working in node1 with ip address chosen as node1 IP and port 8080:

docker@node1:~$ curl 192.168.99.102:8080  | grep -i "container id"
          Processed by container ID d05886b1dfcd
docker@node1:~$ curl 192.168.99.102:8080  | grep -i "container id"
          Processed by container ID ee7647df2b14

Following output shows the load balancing working in node2 with ip address chosen as node2 IP and port 8080:

docker@node2:~$ curl 192.168.99.102:8080  | grep -i "container id"
          Processed by container ID d05886b1dfcd
docker@node2:~$ curl 192.168.99.102:8080  | grep -i "container id"
          Processed by container ID ee7647df2b14

Issues found

I was not able to get routing mesh working with DNS based service endpoint. I tried to create routing mesh with DNS service endpoint using the commands below:

docker service create --endpoint-mode dnsrr --replicas 1 --name client --network overlay1 smakam/myubuntu:v4 ping docker.com
docker service create --endpoint-mode dnsrr --name vote --network overlay1 --replicas 2 -p 8080:80 instavote/vote

Service got stuck in Init state. Based on discussion with Docker team, I understood that this is not yet supported and this will be blocked in Docker 1.12 official release. There are plans to add this feature to later release.

References

Comparing Swarm, Swarmkit and Swarm Mode

One of the big features in Docker 1.12 release is Swarm mode. Docker had Swarm available for Container orchestration from 1.6 release. Docker released Swarmkit as an opensource project for orchestrating distributed systems few weeks before Docker 1.12(RC) release. I had some confusion between these three projects. In this blog, I have tried to put my perspective on the similarities and differences between these three software components. I have also created a sample application and deployed it using the three approaches which makes it easier to compare.

Docker Swarm mode is fundamentally different from Swarm and it is confusing to use the same name Swarm. It would have been good if Docker could have renamed this to something different. Another point adding to the confusion is that native Swarm functionality will continue to be supported in Docker 1.12 release, this is done to preserve backward compatibility. In this blog, I have used the term “Swarm” to refer to traditional Swarm functionality, “SwarmNext” to refer to new Swarm mode added in 1.12, “Swarmkit” to refer to the plumbing open source orchestration project.

Swarm, SwarmNext and Swarmkit

Following table compares Swarm and SwarmNext:

Swarm SwarmNext
Separate from Docker Engine and can run as Container Integrated inside Docker engine
Needs external KV store like Consul, etcd No need of separate external KV store
Service model not available Service model is available. This provides features like scaling, rolling update, service discovery, load balancing and routing mesh
Communication not secure Both control and data plane is secure
Integrated with machine and compose Not yet integrated with machine and compose as of release 1.12. Will be integrated in the upcoming releases

Following table compares Swarmkit and SwarmNext:

Swarmkit SwarmNext
Plumbing opensource project Swarmkit used within SwarmNext and tightly integrated with Docker Engine
Swarmkit needs to built and run separately Docker 1.12 comes integrated with SwarmNext
No service discovery, load balancing and routing mesh Service discovery, load balancing and routing mesh available
Use swarmctl CLI Use regular Docker CLI

Sample application:

Following is a very simple application where there is a highly available voting web server that can be accessed from a client. The client’s request will get load balanced between the available web servers. This application will be created in an custom overlay network. We will deploy this application using Swarm, SwarmNext and Swarmkit.

swarm1

Pre-requisites:

  • I have used docker-machine version 0.8.0-rc1 and Docker engine version 1.12.0-rc3.
  • “smakam/myubuntu” Container is regular Ubuntu plus some additional utilities like curl to illustrate load balancing.

Deployment using Swarm:

Following are the summary of the steps:

  • Create KV store. In this example, I have used Consul.
  • Create Docker instances using the created KV store. In this example, I have created Docker instances using Docker machine.
  • Create a Overlay network.
  • Create multiple instances of voting web server and a single instance of client. All web servers need to share same net alias so that the request from client can get load balanced between the web servers.

Create KV store:

docker-machine create -d virtualbox mh-keystore
eval "$(docker-machine env mh-keystore)"
docker run -d \
    -p "8500:8500" \
    -h "consul" \
    progrium/consul -server -bootstrap

Create 2 Docker Swarm instances pointing to KV store:

docker-machine create \
-d virtualbox \
--swarm --swarm-master \
--swarm-discovery="consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
mhs-demo0

docker-machine create -d virtualbox \
    --swarm \
    --swarm-discovery="consul://$(docker-machine ip mh-keystore):8500" \
    --engine-opt="cluster-store=consul://$(docker-machine ip mh-keystore):8500" \
    --engine-opt="cluster-advertise=eth1:2376" \
  mhs-demo1

Create overlay network:

eval $(docker-machine env --swarm mhs-demo0)
docker network create --driver overlay overlay1

Create the services:
Both instances of voting container has the same alias “vote” so that they can be accessed as single service.

docker run -d --name=vote1 --net=overlay1 --net-alias=vote instavote/vote
docker run -d --name=vote2 --net=overlay1 --net-alias=vote instavote/vote
docker run -ti --name client --net=overlay1 smakam/myubuntu:v4 bash

Lets connect to the voting web server from client container:

root@abb7ec6c67fc:/# curl vote  | grep "container ID"
          Processed by container ID a9c05cd4ee15
root@abb7ec6c67fc:/# curl -i vote  | grep "container ID"
          Processed by container ID ce94f38fc958

As we can see from above output, the ping to “vote” service gets load balanced between “vote1” and “vote2” with each one having different Container ID.

Deploying using SwarmNext:

Following are the summary of steps:

  • Create 2 Docker instances using Docker machine with 1.12 RC3 Docker image. Start 1 node as master and another as worker.
  • Create a Overlay network.
  • Create voting web service with 2 replicas and client service with 1 replica in the overlay network created above.

Create 2 Docker instances:

docker-machine create -d virtualbox node1
docker-machine create -d virtualbox node2

Setup node1 as master:

docker swarm init --listen-addr 192.168.99.100:2377

Node1 will also serve as worker in addition to being master.

Setup node2 as worker:

docker swarm join 192.168.99.100:2377

Lets look at running nodes:

$ docker node ls
ID                           HOSTNAME  MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
b7jhf7zddv2w2evze1bz44ukx *  node1     Accepted    Ready   Active        Leader
ca4jgzcnyz70ry4h5enh701fv    node2     Accepted    Ready   Active    

Create overlay network:

docker network create --driver overlay overlay1

Create services:

docker service create --replicas 1 --name client --network overlay1 smakam/myubuntu:v4 ping docker.com
docker service create --name vote --network overlay1 --replicas 2 -p 8080:80 instavote/vote

For this example, it is not needed to expose port to the host, I have used it anyway. Port 8080 gets exposed on both “node1” and “node2” using the routing mesh feature in Docker 1.12.

Lets look at running services:

$ docker service ls
ID            NAME    REPLICAS  IMAGE               COMMAND
2rm1svgfxzzw  client  1/1       smakam/myubuntu:v4  ping docker.com
af6lg0cq66bl  vote    2/2       instavote/vote 

Lets connect to the voting web server from client container:

# curl vote | grep "container ID"
          Processed by container ID c831f88b217f
# curl vote | grep "container ID"
          Processed by container ID fe4cc375291b

From above output, we can see the load balancing happening from client to the 2 web server containers.

Deploying using Swarmkit:

Following are the steps:

  • Create Docker machine 2 node cluster. I was able to create Swarm cluster and use it without KV store. For some reason, Overlay network did not work without KV store. So, I had to use KV store for this example.
  • Build Swarmkit and export the binaries to Swarm nodes.
  • Create Swarm cluster with 2 nodes.
  • Create Overlay network and create services in the overlay network.

Building Swarmkit:

Here, Swarmkit is built inside a GO Container.

git clone https://github.com/docker/swarmkit.git
eval $(docker-machine env swarm-01)
docker run -it --name swarmkitbuilder -v `pwd`/swarmkit:/go/src/github.com/docker/swarmkit golang:1.6 bash
cd /go/src/github.com/docker/swarmkit
make binaries

Create Docker instances with KV store:

docker-machine create \
-d virtualbox \
--engine-opt="cluster-store=consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
swarm-01
docker-machine create -d virtualbox \
--engine-opt="cluster-store=consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
swarm-02

Export Swarmkit binaries to the nodes:

docker-machine scp bin/swarmd swarm-01:/tmp
docker-machine scp bin/swarmctl swarm-01:/tmp
docker-machine ssh swarm-01 sudo cp /tmp/swarmd /tmp/swarmctl /usr/local/bin/
docker-machine scp bin/swarmd swarm-02:/tmp
docker-machine scp bin/swarmctl swarm-02:/tmp
docker-machine ssh swarm-02 sudo cp /tmp/swarmd /tmp/swarmctl /usr/local/bin/

Create Swarm cluster:

Master:
docker-machine ssh swarm-01
swarmd -d /tmp/swarm-01 \
--listen-control-api /tmp/swarm-01/swarm.sock \
--listen-remote-api 192.168.99.101:4242 \
--hostname swarm-01 &

Worker:
swarmd -d /tmp/swarm-02 \
--hostname swarm-02 \
--listen-remote-api 192.168.99.102:4242 \
--join-addr 192.168.99.101:4242 &

Create overlay network and services:

swarmctl network create --driver overlay --name overlay1
swarmctl service create --name vote --network overlay1 --replicas 2 --image instavote/vote
swarmctl service create --name client --network overlay1 --image smakam/myubuntu:v4 --command ping,docker.com

Following command shows the 2 node cluster:

export SWARM_SOCKET=/tmp/swarm-01/swarm.sock
swarmctl node ls

ID                         Name      Membership  Status   Availability  Manager Status
--                         ----      ----------  ------   ------------  --------------
5uh132h0acqebetsom1z1nntm  swarm-01  ACCEPTED    READY    ACTIVE        REACHABLE *
5z8z6gq36maryzrsy0cmk7f51            ACCEPTED    UNKNOWN  ACTIVE        

Following command shows successful connect from client to voting web server:

# curl 10.0.0.3   | grep "container ID"
          Processed by container ID 78a3e9b06b7f
# curl 10.0.0.4   | grep "container ID"
          Processed by container ID 04e02b1731a0

In the above output, we have pinged by Container IP address since Service discovery and load balancing is not integrated with Swarmkit.

Issues:

I have raised an issue on the need for using KV store with Overlay network in Swarmkit. This looks like a bug to me or I might be missing some option.

Summary

SwarmNext(Swarm mode) is  a huge improvement over previous Docker Swarm. Having the Services object in Docker makes it easier to do features like scaling, rolling update, service discovery, load balancing and routing mesh. This also makes Swarm to catch up on some of the Kubernetes like features. Docker has supported both SwarmNext and Swarm in release 1.12 so that production users who have deployed Swarm wont get affected as part of upgrade. SwarmNext does not have all functionalities at this point including integration with Compose and storage plugins. This will get added to SwarmNext soon. In the long run, I feel that Swarm would get deprecated and SwarmNext would become the only mode for orchestration in Swarm. Having Swarmkit as an opensource project allows independent development of Swarmkit and anyone developing a orchestration system for distributed application can use this as a standalone module.

References

Mesos DC/OS Hands-On

In this blog, I will cover some of the hands-on stuff that I tried with Opensource DC/OS. I created DC/OS cluster using Vagrant and deployed multi-instance nginx webserver using Marathon. For Mesos FAQ, please refer to my previous blog.

I followed the instructions here to create DC/OS Vagrant cluster.

Pre-requisites

I tried DC/OS Vagrant cluster in my Windows machine. Virtualbox and Vagrant needs to be installed before-hand.

Setting up cluster

Following are the instructions that I used to setup the cluster:

Continue reading Mesos DC/OS Hands-On

Mesos and Mesosphere – FAQ

The most popular Container Orchestration solutions available in the market are Kubernetes, Swarm and Mesos. I have used Kubernetes and Swarm, but never got a chance to use Mesos or DC/OS. There were a bunch of questions I had about Mesos and DC/OS and I never got the time to explore that. Recently, I saw the announcement about Mesosphere opensourcing DC/OS and I found this as a perfect opportunity for me to try out Opensource DC/OS. In this blog, I have captured the answers to questions I had regarding Mesos and DC/OS. In the next blog, I will cover some hands-on that I did with Opensource DC/OS.

What is the relationship between Apache Mesos, Opensource DC/OS and Enterprise DC/OS?

Apache Mesos is the Opensource distributed orchestrator for Container as well as non-Container workloads. Both Opensource DC/OS and Enterprise DC/OS are built on top of Apache Mesos. Opensource DC/OS adds  Service discovery, Universe package for different frameworks, CLI and GUI support for management and Volume support for persistent storage. Enterprise DC/OS adds enterprise features around security, performance, networking, compliance, monitoring and multi-team support that the Opensource DC/OS project does not include. Complete list of differences between Opensource and Enterprise DC/OS are captured here.

What does Mesosphere do and how it is related to Apache Mesos?

Mesosphere company has products that are built on top of Apache Mesos. Lot of folks working in Mesosphere contribute to both Apache Mesos and Opensource DC/OS. Mesosphere has the following products currently:

  • DC/OS Enterprise – Orchestration solution
  • Velocity- CI, CD solution
  • Infinity – Big data solution

Why DC/OS is called OS?

Sometimes folks get confused thinking Mesos being a Container optimized OS like CoreOS, Atomic. Mesos is not a Container optimized OS. Similar to the way Desktop OS provides resource management in a single host, DC/OS provides cluster management across entire cluster. Mesos master(including first level scheduler) and agent are perceived as kernel components and user space components include frameworks, user space applications, dns and load balancers. The kernel provides primitives for the frameworks.

What are Mesos frameworks and why they are needed?

Continue reading Mesos and Mesosphere – FAQ

Looking inside Container Images

This blog is a continuation of my previous blog on Container standards. In this blog, we will look inside a Container image to understand the filesystem and manifest files that describes the Container. We will cover Container images in Docker, APPC and OCI formats. As mentioned in previous blog, these Container images will converge into OCI format in the long run.

I have picked two Containers for this blog: “nginx”which is a standard webserver and “smakam/hellocounter” which is a Python web application.

Docker format:

To see Container content in Docker format, do the following:

docker save nginx > nginx.tar
tar -xvf nginx.tar

Following files are present:

  • manifest.json – Describes filesystem layers and name of json file that has the Container properties.
  • <id>.json – Container properties
  • <layer directory> – Each “layerid” directory contains json file describing layer property and filesystem associated with that layer. Docker stores Container images as layers to optimize storage space by reusing layers across images.

Following are some important Container properties that we can see in the JSON file:

Continue reading Looking inside Container Images

Container Standards

In this blog, I will cover some of the standardization effort that is happening in the Containers area. I will cover some history, current status and also mention how the future looks like. In the next blog, we will look inside ACI and OCI Container images.

Container Standards

Lot of developments in Container area are done as Open source projects. That still does not automatically mean that these projects will become standards. Following are the areas where Container standardization is important:

  • Container image format – Describes how an application is packaged into a Container. The application can be an executable from any programming language. As you would know, Containers packages an application along with all its application dependencies.
  • Container runtime – Describes the environment(namespaces, cgroups etc) necessary to run the Container and the APIs that Container runtime should support.
  • Image signing – Describes how to create Container image digest and to sign these so that Container images can be trusted.
  • Image discovery – Describes alternate approaches to discover Container images other than using registry.
  • Container Networking – This is a pretty complex area and it describes ways to network Containers in same host and across hosts. There are different implementations based on the use-case.

Having common Container standards would allow things like this:

Continue reading Container Standards