Docker Networking Tip – Troubleshooting

Debugging Container and Docker Networking issues can be daunting at first considering that containers do not contain any debug tools inside the container. I normally see a lot of questions around Docker networking issues in Docker and stackoverflow forum. All the usual networking tools can be used to debug Docker networking, it is just that the approach taken is slightly different. I have captured my troubleshooting steps in a video and a presentation.

Following is the video and presentation of my Docker Networking troubleshooting tips.

I would appreciate if you can provide me feedback if the Networking tip videos were useful to you. Also, if there are any other Docker Networking topics that you would like to see as a tip video, please let me know.

I have also put few Docker Networking videos and presentations that I did over last 3 months below for completeness.

Following are the 2 previous Networking tip videos.

Following are 2 Docker Networking deep dive presentations:

Advertisements

9 thoughts on “Docker Networking Tip – Troubleshooting

  1. Awesome work. Can you please help me to understand why all containers are linked to dockergwbridge? I thought only containers connected only to vxlan network should be connected to dockergwbridge network to provide them external world connectivity.

    1. Hi Vikrant
      Overlay network is the vxlan network and is used for containers to talk to each other, its not used for external connectivity. docker_gwbridge is used for external connectivity when publish mode is host. Otherwise, ingress network is used for external connectivity using routing mesh.

      Sreenivas

      1. Thanks for your quick reply Sreenivas. I am with Opentack background, so if I am understanding correctly 1:1 NAT happens between ingress network and docker_gwbridge if the external host is trying to reach container. In case of docker_gwbridge port mapping (PNAT) happens between container and host.

      1. Thanks for your reply. Today I thought of spending sometime on docker swarm to understand it’s working. It was in TODO list for long time. I have created cluster with one master and two worker nodes. docker-machine was used with virtualbox as driver on windows machine to create this cluster.

        ~~~
        root@manager1:/var/run/docker/netns# docker node ls
        ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
        bkh5p9y54gl8qo9wgi04k25ge * manager1 Ready Active Leader
        sepumehq2ftggts6t1vbl13k5 worker1 Ready Active
        xcf7qwkj2az0ui76ftthvow9u worker2 Ready Active
        ~~~

        Started one nginx service using two replicas. One docker is running on manager1 and second docker got spinned on worker1.

        ~~~
        root@manager1:/var/run/docker/netns# docker service ps web
        ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
        j02nxp3024c4 web.1 nginx:latest worker1 Running Running 5 hours ago
        5e0yewexp2z6 web.2 nginx:latest manager1 Running Running 5 hours ago
        ~~~

        I am able to access the nginx service using IP address of all nodes (manager and both workers), I followed your approach to dig up the networking stuff.

        Each container is having two interfaces, one from vxlan (10.255.0.0/16) and second from docker_gwbridge network (172.18.0.0/16)

        While inspecting service I found that VIP assigned to service is 10.255.0.5 and to containers IP address assigned is 10.255.0.7 and 10.255.0.6, I tried to access the nginx from windows laptop browser using manager1 node IP i.e 192.168.99.101 and I was able to access the nginx startup page. I started network troubleshoot container to check on which interface HTTP traffic is coming, as we are accessing the service from external network hence as expected traffic went to interface with IP 10.255.0.7 through VIP.

        But how can I find where VIP is present, I tried to following the sandbox approach which you followed for client, I followed the same one for nginx docker and while issuing “ifconfig -a” in sandbox it’s showing me the IP address of docker container itself. How can I find where the virtual IP address is present?

        One more strange thing on host machine NAT rules I don’t see any rule which can redirect the traffic from host machine IP to vxlan. But I do see the DNAT rule for docker_gwbridge network. While writing this comment when I was collecting outputs, I found one namespace with name “ingress_sbox” which I was not able to find from inspect command on docker containers? How can I find from this namespace is coming?

        ~~~
        root@manager1:/var/run/docker/netns# ls
        1-tpfa5gq8gh 3383edb4dd3f 990e641bf5a9 ingress_sbox
        ~~~

        I attached the network troubleshooting container with this ingress_sbox namespace, I was able to connect some dots, I can see the VIP in iptable rules but still the interface on which this IP address is assigned I can’t find that interface?

        ~~~
        root@manager1:/var/run/docker/netns# docker run -it –rm -v /var/run/docker/netns:/var/run/docker/netns –privileged=true nicolaka/netshoot
        / # nsenter –net=/var/run/docker/netns/ingress_sbox sh

        / # ip a
        1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
        9: eth0@if10: mtu 1450 qdisc noqueue state UP group default
        link/ether 02:42:0a:ff:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.255.0.2/16 scope global eth0
        valid_lft forever preferred_lft forever
        12: eth1@if13: mtu 1500 qdisc noqueue state UP group default
        link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
        inet 172.18.0.2/16 scope global eth1
        valid_lft forever preferred_lft forever
        / # iptables -t nat -L
        Chain PREROUTING (policy ACCEPT)
        target prot opt source destination

        Chain INPUT (policy ACCEPT)
        target prot opt source destination

        Chain OUTPUT (policy ACCEPT)
        target prot opt source destination
        DOCKER_OUTPUT all — anywhere 127.0.0.11
        DNAT icmp — anywhere 10.255.0.5 icmp echo-request to:127.0.0.1

        Chain POSTROUTING (policy ACCEPT)
        target prot opt source destination
        DOCKER_POSTROUTING all — anywhere 127.0.0.11
        SNAT all — anywhere 10.255.0.0/16 ipvs to:10.255.0.2

        Chain DOCKER_OUTPUT (1 references)
        target prot opt source destination
        DNAT tcp — anywhere 127.0.0.11 tcp dpt:domain to:127.0.0.11:39103
        DNAT udp — anywhere 127.0.0.11 udp dpt:domain to:127.0.0.11:54599

        Chain DOCKER_POSTROUTING (1 references)
        target prot opt source destination
        SNAT tcp — 127.0.0.11 anywhere tcp spt:39103 to::53
        SNAT udp — 127.0.0.11 anywhere udp spt:54599 to::53
        / # ipvsadm
        IP Virtual Server version 1.2.1 (size=4096)
        Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port Forward Weight ActiveConn InActConn
        FWM 256 rr
        -> 10.255.0.6:0 Masq 1 0 0
        -> 10.255.0.7:0 Masq 1 0 0

        ~~~

  2. Sorry to bug you again. But still I am facing some challenges to understand network working in swarm. May be you can ignore the previous stuff. This time I am using your vote application and the client which you have used.

    1) First mode of access:

    client –> DNS–> Service IP –> IPtables + ipvs –> Web server 1 & Web server 2

    Created client using docker service command, it has assigned following ip addresses to client:

    eth0 (10.0.0.6) overlay vxlan based
    eth1 (172.18.0.4) docker_gwbridge
    VIP (10.0.0.5) VIP from overlay network.

    Created application using docker service command

    vote running on manager1

    eth0 (10.255.0.6) overlay ingress network
    eth1 (172.18.0.3) docker_gwbridge
    eth2 (10.0.0.8) overlay vxlan based.

    vote running on worker1

    eth0 (10.255.0.7) overlay ingress network
    eth1 (172.18.0.3) same IP address used by two containers. {not sure why?}
    eth2 (10.0.0.9) overlay vxlan based.

    Two VIP assigned to vote application

    10.255.0.5 (external access)
    10.0.0.7 (client access)

    Procedure 1 : Tracing the flow from client to application access and taking tcpdump on interface.

    client (10.0.0.6) –> DNS (vote) –> service VIP (10.0.0.7) –> iptables + ipvs (10.0.0.8 & 10.0.0.9)

    a) Checking the sandbox associated for vote container running on manager1

    ~~~
    root@manager1:/var/run/docker/netns# docker inspect 3b69a62151d2 | grep -i sandbox
    “SandboxID”: “ea0c9272fe430d386cd6b738d4bea5c1ecf1a28413f51cf9b28c937a54d7d3ba”,
    “SandboxKey”: “/var/run/docker/netns/ea0c9272fe43”,

    ~~~

    b) Checking the corresponding network namespace. I can see the both VIPs of service vote assigned to loopback interface of namespace.

    ~~~
    root@manager1:/var/run/docker/netns# docker run -it –rm -v /var/run/docker/netns:/var/run/docker/netns –privileged=true nicolaka/netshoot
    / # nsenter –net=/var/run/docker/netns/ea0c9272fe43 sh
    / # ip a
    1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet 10.255.0.5/32 scope global lo
    valid_lft forever preferred_lft forever
    inet 10.0.0.7/32 scope global lo
    valid_lft forever preferred_lft forever
    35: eth0@if36: mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:ff:00:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.255.0.6/16 scope global eth0
    valid_lft forever preferred_lft forever
    37: eth1@if38: mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 172.18.0.3/16 scope global eth1
    valid_lft forever preferred_lft forever
    39: eth2@if40: mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:00:00:08 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet 10.0.0.8/24 scope global eth2
    valid_lft forever preferred_lft forever
    ~~~

    c) Tried to run curl on vote from client container and started taking tcpdump on loopback interface but I don’t see any traffic hitting VIP. I was expecting the traffic should reach here.

    d) Checked the mangle table and ipvsadm I undertand that for client I am inside the right namespace.

    ~~~
    / # iptables -t mangle -L | grep ‘10.0.0.7’
    MARK all — anywhere 10.0.0.7 MARK set 0x103
    / # ipvsadm
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
    -> RemoteAddress:Port Forward Weight ActiveConn InActConn
    FWM 257 rr
    -> 10.0.0.6:0 Masq 1 0 0
    FWM 259 rr
    -> 10.0.0.8:0 Masq 1 0 0
    -> 10.0.0.9:0 Masq 1 0 0
    ~~~

    Question : But why no traffic is reported on lo interface on namespace while hitting the VIP using DNS?

    2) Second mode of access:

    Going step by step:

    Hostname/port (http://192.168.99.101:8080/) –> Sandbox ip (172.18.0.2 docker_gwbridge)

    ~~~
    docker@manager1:~$ sudo iptables -t nat -L | grep ‘8080’
    DNAT tcp — anywhere anywhere tcp dpt:webcache to:172.18.0.2:8080
    ~~~

    Same rule added on all hosts irrespective whether vote container is running on it or not which is expected because I can access the service from any host. This is VIP 172.18.0.2 is assigned in ingress_box container.

    ~~~
    / # ip a
    1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    9: eth0@if10: mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:ff:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.255.0.2/16 scope global eth0
    valid_lft forever preferred_lft forever
    12: eth1@if13: mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 172.18.0.2/16 scope global eth1
    valid_lft forever preferred_lft forever
    ~~~

    Let’s look at mangle table, it’s redirecting the traffic to VIP(10.255.0.5) which is assigned to lo interface of namespace “ea0c9272fe43”.

    ~~~
    / # iptables -t mangle -L | grep MARK
    MARK tcp — anywhere anywhere tcp dpt:http-alt MARK set 0x102
    MARK all — anywhere 10.255.0.5 MARK set 0x102

    / # ipvsadm
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
    -> RemoteAddress:Port Forward Weight ActiveConn InActConn
    FWM 258 rr
    -> 10.255.0.6:0 Masq 1 0 0
    -> 10.255.0.7:0 Masq 1 0 0
    ~~~

    Question : But again taking tcpdump on lo interface of namespace “ea0c9272fe43” didn’t give me anything in output.

    I am wondering like how traffic is hitting the VIPs in both cases?

    Extremely sorry for such a big comment but I really want to understand the deep network stuff of swarm.

    1. I am not sure where exactly the iptables interception is done with the rules. That could be why packets are not seen in loopback. I have never tried interception at loopback interface. I assume that u r able to capture packets fine at ingress and overlay interfaces.
      Regarding ur other question of overlapping ip address(172.18.0.3), bridge addresses are scoped at node level.
      Its good that u r trying to go to the depth here…

      1. Thanks much for your reply.. Yes, docker_gwbridge scope have host level scope only, initially I thought it’s merely a coincidence. After giving multiple attempts also they are picking up the same IP address which looks strange to me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s