Mesos DC/OS Hands-On

In this blog, I will cover some of the hands-on stuff that I tried with Opensource DC/OS. I created DC/OS cluster using Vagrant and deployed multi-instance nginx webserver using Marathon. For Mesos FAQ, please refer to my previous blog.

I followed the instructions here to create DC/OS Vagrant cluster.


I tried DC/OS Vagrant cluster in my Windows machine. Virtualbox and Vagrant needs to be installed before-hand.

Setting up cluster

Following are the instructions that I used to setup the cluster:

git clone
VBoxManage list hostonlyifs | grep vboxnet0 -q || VBoxManage hostonlyif create
VBoxManage hostonlyif ipconfig vboxnet0 --ip
vagrant plugin install vagrant-hostmanager
cd dcos-vagrant
curl -O
export DCOS_CONFIG_PATH=etc/config-1.7.yaml
cp VagrantConfig.yaml.example VagrantConfig.yaml

Following command starts a cluster with 1 master, 2 private agent nodes, 1 public agent node and 1 bootstrap node that runs installer:

vagrant up m1 a1 a2 p1 boot

I had authentication issues which got resolved using the procedure here. Web interface can be accessed using:


Installing DC/OS CLI

I used the procedure here to install DC/OS CLI. One additional thing that I had to do was after installing pip, I had to set PATH manually to include pip path.

The only framework that gets installed by default is Marathon.

Following command output shows the agent nodes available to start user workload:

$ dcos node
   HOSTNAME           IP                           ID  c2a4ad47-c4b3-4fb6-bb3f-57902951fb8e-S0  c2a4ad47-c4b3-4fb6-bb3f-57902951fb8e-S1   c2a4ad47-c4b3-4fb6-bb3f-57902951fb8e-S2

Following command shows the running Services, only Marathon is running in our case:

$ dcos service
marathon   True     4    4.0  512.0  0.0   9a77ac7a-5c23-4852-a76

Marathon nginx service

Following is a sample “nginx.json” file for creating web service with nginx Container using Marathon:

  "id": "mynginxserver",
  "cmd": null,
  "cpus": 1,
  "mem": 128,
  "disk": 0,
  "instances": 4,
  "container": {
    "docker": {
      "network": "BRIDGE",
      "image": "nginx",
      "portMappings": [
          "containerPort": 80,
          "protocol": "tcp",
          "name": "nginxserver",
          "labels": {
            "VIP_0": ""
    "type": "DOCKER"
  "env": {},
  "labels": {},
  "healthChecks": []

Following are some notes on the above service:

  • Service name is “mynginxserver” and it has 4 running instances.
  • Container port 80 is exposed to outside world and is also exposed to the host machine.
  • VIP for the service is “”. Service can be accessed using VIP address.

To start the service, do the following:

dcos marathon app add nginx.json

Lets look at the running tasks:

$ dcos marathon task list
APP             HEALTHY          STARTED                HOST       ID

/mynginxserver    True   2016-06-19T10:19:54.468Z  mynginxserver
/mynginxserver    True   2016-06-19T10:19:54.484Z  mynginxserver
/mynginxserver    True   2016-06-19T10:19:54.885Z  mynginxserver
/mynginxserver    True   2016-06-19T10:19:55.082Z  mynginxserver

In the above output, we can see that the 4 nginx instances got spread between nodes “” and “” with each node running 2 instances.

Default approach for Service discovery in DC/OS is using Mesos DNS. Mesos DNS queries the Mesos master, gets details of Services running along with their IP address and port numbers and updates their database. DNS database has “Type A” record for service->IP and “Type SRV” record for Service->(IP,Port) mapping. Services running in Mesos agent nodes can query the DNS to get the mapping.

“Type A” records are available at “servicename.frameworkname.mesos” and “Type SRV” records are available at “_servicename._protocol.frameworkname.mesos”. DNS records can either be retrieved using “dig” or “http” interface.

Following output shows the “Type A” records which shows the IP address of the 2 nodes corresponding to “mynginxserver.marathon.mesos”

$ curl http://master.mesos:8123/v1/hosts/mynginxserver.marathon.mesos
   "host": "mynginxserver.marathon.mesos.",
   "ip": ""
   "host": "mynginxserver.marathon.mesos.",
   "ip": ""

Following output shows the “Type SRV” records which shows the IP address and port numbers of the 4 instances corresponding to “_mynginxserver._tcp.marathon.mesos”

$ curl http://master.mesos:8123/v1/services/_mynginxserver._tcp.marathon.mesos
"service": "_mynginxserver._tcp.marathon.mesos",
"host": "mynginxserver-k1jze-s0.marathon.mesos.",
"ip": "",
"port": "2658"
"service": "_mynginxserver._tcp.marathon.mesos",
"host": "mynginxserver-pepgr-s1.marathon.mesos.",
"ip": "",
"port": "27964"
"service": "_mynginxserver._tcp.marathon.mesos",
"host": "mynginxserver-f936m-s1.marathon.mesos.",
"ip": "",
"port": "12615"
"service": "_mynginxserver._tcp.marathon.mesos",
"host": "mynginxserver-fwy6z-s0.marathon.mesos.",
"ip": "",
"port": "11668"

DNS does rudimentary load balancing. Following output shows that the ping to service name is getting load balanced between the 2 agent nodes:

$ ping -c1 mynginxserver.marathon.mesos
PING mynginxserver.marathon.mesos ( 56(84) bytes of data.
64 bytes from a1.dcos ( icmp_seq=1 ttl=64 time=1.15 ms

--- mynginxserver.marathon.mesos ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.154/1.154/1.154/0.000 ms
[vagrant@m1 ~]$ ping -c1 mynginxserver.marathon.mesos
PING mynginxserver.marathon.mesos ( 56(84) bytes of data.
64 bytes from a2.dcos ( icmp_seq=1 ttl=64 time=1.54 ms

--- mynginxserver.marathon.mesos ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.549/1.549/1.549/0.000 ms

Following output shows the service getting successfully accessed using VIP address.

$ curl
<!DOCTYPE html>
<title>Welcome to nginx!</title>

Mesos DNS is primitive in its approach and it has some disadvantages like applications needing modification to access SRV records and that the DNS database update takes time. Mesos provides Marathon-lb approach for Service discovery and load balancing and it is more scalable compared to Mesos DNS. I have not tried Marathon-lb approach.

Following are some issues I faced:

  • There are 2 agent types, private and public. Even though I had a public agent node, Services were getting scheduled only on private agent nodes and not on public agent nodes. I think I might be missing some some configuration to allow services to get scheduled on public agent nodes.
  • I was not able to access Universe repository to install new packages. I have opened a Jira case for this issue.


4 thoughts on “Mesos DC/OS Hands-On

  1. Public Agent is used by Mesos when you have a service that needs to be hosted for public networks. By default all applications / tasks run on private slaves. If you try to deploy a tweeter app (clone of twitter), all the services like kafka, cassandra and spark go on private nodes and the load-balancer will go on public node.

  2. You can use

    “acceptedResourceRoles”: [“slave_public”]

    in the app definition

    Or use a HA Proxy Load Balancer as below

    “labels”: {
    “HAPROXY_0_VHOST”: “”,
    “HAPROXY_GROUP”: “external”,
    “VIP_0”: “”

    “portMappings”: [
    “containerPort”: 3000,
    “hostPort”: 0,
    “servicePort”: 10000,
    “protocol”: “tcp”,
    “labels”: {
    “VIP_0”: “”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s