Update 03/08/2015: There is now an official Weave guide to using Mesos and Marathon, as many things have changed with Weave 1.0 release, and the usage of Docker bridge is now discouraged in favour of proxy and IP allocator.

Disclaimer: The following article is based on my own experiments with Weave and Docker. I am not associated with Weaveworks and this is not an official guide. If you get stuck, please contact the super nice and helpful weave guys at help@weave.works or ask in the #weavenetwork channel on Freenode.

Update 16/07/15: since writing this article Weave has evolved a lot. The setup described below is not recommended officially by Weaveworks. Please read this blog post by Weaveworks for more information.

Update 02/03/15: avoid weave expose and assign IP directly to avoid a loop in the routing of packets (see what changed).

Update 25/03/15: use 10.2.0.0/16 addresses to match Weave's documentation and to avoid IP clashes in many environments.

In the Docker world one always feels like being at the edge of the development. After orchestration slowly gets settled with tools like Mesos, fleet or Kubernetes, the next big topic is networking.

Networking – the next big thing in the Docker world

Docker started out with a simple one-host networking solution. Unfortunately, this does not really scale to multiple hosts. A number of projects attack this problem, to name a few: socketplane, weave, pipeworks, flannel.

What they have in common is that they try to get rid of the port management which one usually has to do when running multiple apps in a Docker environment: if you have two webservers, you cannot reuse port 80 on the same host. Hence, in many former setups (like with Mesos Marathon) one had to choose (or let the orchestrator choose) different host ports which were mapped to port 80 of the webserver inside of Docker containers.

Now, with those new networking solutions each container gets its own IP address. Consequently, every webserver inside of a container can not only listen to port 80, but other services can access those webservers via port 80 and the correct IP.

Towards container IPs with Mesos Marathon

In the following I will describe the way to use Weave to assign each Mesos Marathon container its own IP address. All Mesos tasks will live in a Weave overlay network 10.2.0.0/16 which is shared among all hosts running the Mesos slaves.

With Weave it's possible to define different networks for different apps. For the sake of simplicity we will not implement any network separation here. Instead we choose one overlay network for all tasks. By doing this we get IP management for free from the Docker deamon. You will read in a minute how.

With Weave one usually runs containers using the weave run wrapper script which runs docker runand then attaches the weave bridge to the container. We will go another route and use the weave bridge directly as Docker bridge, i.e. as replacement of the docker0 bridge. This way we will get rid of the problems around this wrapper hack (i.e. that the container process will start before the weave bridge is attached). In the future, with the integration of network plugins into Docker, this wrapper will go away. But in the current state using the wrapper is not an option.

0. Setting up weave

Using Weave is pretty simple:

  • download the weave shell script:
$ sudo wget -O /usr/local/bin/weave \
  https://github.com/zettio/weave/releases/download/latest_release/weave
$ sudo chmod a+x /usr/local/bin/weave
  • starting the router container:
$ weave launch
  • put the host into the 10.2.0.0/16 network:
$ weave expose 10.2.0.1/16
$ ifconfig weave
weave     Link encap:Ethernet  HWaddr 7a:d2:8d:b1:26:7b
          inet addr:10.2.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:65535  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
$ brctl show
bridge name     bridge id               STP enabled     interfaces
weave           8000.7ad28db1267b       no
  • start a container:
$ weave run 10.2.0.2/16 -itd sttts/python-ubuntu python -m SimpleHTTPServer 80
$ curl 10.2.0.2

1. Patching the weave shell script

We want to use the weave bridge directly as Docker bridge (using the -b weave parameter for the Docker daemon). The weave script creates the weave bridge on launch. But in order to run weave launch Docker must be running already. Here, the snake tries to bite its tail.

It is easy to patch the weave script such that the weave bridge can created without a running Docker. A pull request exists to add this to weave: https://github.com/zettio/weave/pull/357. Until this is merged a patched version is available here: https://raw.githubusercontent.com/sttts/weave/888edd69f3344006dac7865a0afe567ebfaa5967/weave

With this in place in /usr/local/bin/weave we create the weave bridge:

There is an create-bridge command in recent weave which does exactly what we want:

$ weave create-bridge

Then we assign an ip to the weave bridge for the local host. This can be done with the weave expose command. Unfortunately, expose requires a running Docker daemon. A pull request exists to remove this requirement weave: https://github.com/zettio/weave/pull/377. Until this is merged a patched version is available here: https://github.com/sttts/weave/blob/d77e3a4c226ab7803a947927a61da084de5162ca/weave.

With this in place in /usr/local/bin/weave we assign the IP:

$ ip addr add dev weave 10.2.0.1/16

Update: In a previous version of this artical, weave expose was used to assign the IP. As it turned out in this issue in weave's Github project, this can lead to loops. This is because weave expose will add additional iptables masquerading rules which can cause the tunneled data packets to be routed again via the weave interface. Hence, in the upper command the ip address is assigned directly.

2. Configuring Docker

Then we change the Docker daemon parameters accordingly, on Ubuntu this is done in /etc/default/docker:

DOCKER_OPTS="--bridge=weave --fixed-cidr=10.2.1.0/24"

We have also set the fixed-cidr value, which defines where the containers of this host will live. The given network is a subnet of 10.2.0.0/16 which must be exclusive for this host. In a bigger Mesos cluster you have to configure disjunct subnets of course.

Now start the Docker daemon:

$ start docker

New containers will get a new IP address in the fixed-cidr address range. Notice that we offload the IP management for every host to Docker. Because the subnets above are selected properly we get cluster wide unique IP addresses. Fantastic!

$ docker run -it ubuntu /bin/bash
root@3ce56d73fc18:/# ping 10.2.0.1
PING 10.2.0.1 (10.2.0.1) 56(84) bytes of data.
64 bytes from 10.2.0.1: icmp_seq=1 ttl=64 time=0.152 ms

3. Automatic startup of the weave bridge

Everything done above can easily be scriped using Debian/Ubuntu's interfaces definitions in /etc/network/interfaces:

auto weave
iface weave inet manual
        pre-up /usr/local/bin/weave create-bridge
        post-up ip addr add dev weave 10.2.0.1/16
        pre-down ifconfig weave down
        post-down brctl delbr weave

Update: Compare update remark.

We can use ifup and ifdown to start and stop the script:

$ ifup weave
$ ifdown weave

As the interface is defined as auto it will come up by itself after a reboot. Because /etc/init/docker.conf defines a dependency on the network interfaces, the Docker daemon will properly wait for the weave bridge as well.

4. Launching the router

To connect multiple hosts in one giant 10.2.0.0/16 network we have to start the weave router container:

$ weave launch

Then you can connect multiple hosts via e.g.

$ weave connect 192.168.0.42

Note the following:

  • All containers and hosts can "see" eachother via the weave network.
  • All ports that container processes listen to on the weave network interface will be accessible by all containers.
  • Those ports are not exposed outside of the weave network. This can be done by exposing them with the means of Docker e.g. docker run -itd -p 12345:80 sttts/python-ubuntu python -m SimpleHTTPServer 80 The webserver will be accessible via the weave network address, e.g. 10.2.1.1:80 and via 192.168.0.41:12345 if the later is the host interface IP. Via 10.2.1.1:80 it can even be accessed on every node in the cluster because we assigned a host IP 10.2.0.1 to the weave interface.

5. Service discovery with Consul and progrium/registrator

In other setups I have used the combination of Consul and Jeff Lindsay's great registrator tool to get service discovery via normal DNS.

They are pretty simple to get up an running. You start consul as usual, e.g. in a one-host test setup like

$ consul agent -server -node=goedel -bootstrap -data-dir ./data -client=0.0.0.0

and then you can point the registrator container to Consul's address, e.g. 192.168.0.41 with

$ docker run -it -v /var/run/docker.sock:/tmp/docker.sock sttts/registrator consul://192.168.0.41:8500

Now, Consul's DNS server can be queried to find out about the running containers, e.g. the SimpleHTTPServer container from above:

$ dig @192.168.0.41 -p 8600 python-ubuntu.service.dc1.consul. ANY
; <<>> DiG 9.8.3-P1 <<>> @192.168.0.41 -p 8600 python-ubuntu.service.dc1.consul. ANY
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45546
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;python-ubuntu.service.dc1.consul. IN   ANY

;; ANSWER SECTION:
python-ubuntu.service.dc1.consul. 0 IN  A       192.168.0.41

;; Query time: 3 msec
;; SERVER: 192.168.0.41#8600(127.0.0.1)
;; WHEN: Thu Jan 22 17:30:09 2015
;; MSG SIZE  rcvd: 146

As you see the problem is that Consul will return the public node ID, more precisely the Consul agent advertised IP. This is not what we want.

Luckily, in Consul's git master branch from 2 weeks ago a new feature of custom service addresses was merged: https://github.com/hashicorp/consul/pull/570. This is now available in the Consul 0.5.0rc1 pre-release.

But the very first sentence of this post strikes again. The features was merged, but the golang API in the same repository (used by registrator) does not support the Address extension of the service definition yet.

But, as Consul is open source it is easy to make a pull request for this: https://github.com/hashicorp/consul/pull/627.

With this patch weWith the new address feature of Consul we can extend registrator with a few lines of code to support the -internal flag also for Consul (it had turned out that registrator already knows about internal container addresses but due to the lack of support in Consul, it was only implemented for etcd). So, here is another pull request, this time against registrator: https://github.com/progrium/registrator/pull/91.

With this in place we can use the extended registrator and get proper, cluster-wide routable weave network addresses:

$ docker run -it -v /var/run/docker.sock:/tmp/docker.sock sttts/registrator -internal consul://192.168.0.41:8500

and then the DNS response is much better:

$ dig @192.168.0.41 -p 8600 python-ubuntu.service.dc1.consul. ANY
; <<>> DiG 9.8.3-P1 <<>> @192.168.0.41 -p 8600 python-ubuntu.service.dc1.consul. ANY
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41271
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;python-ubuntu.service.dc1.consul. IN   ANY

;; ANSWER SECTION:
python-ubuntu.service.dc1.consul. 0 IN  A       10.2.1.1

;; Query time: 0 msec
;; SERVER: 192.168.0.41#8600(127.0.0.1)
;; WHEN: Thu Jan 22 17:32:40 2015
;; MSG SIZE  rcvd: 146

6. Mesos and Marathon

Actually, there is hardly anything to be done for Mesos and Marathon support.

A mesos-slave with the docker containerizer calls docker run without any additional magic. Due to our work above, Docker automatically assigns a cluster-wide, unique IP to each container. The daemon inside the container listens on that IP. In order to communicate between containers via the weave IP, not even ports must be exposed.

In order to use Mesos or Marathon health checks the mesos-slave IP is used. Mesos doesn't know about the internal container IPs (yet). This is certainly worth a pull request, but I haven't done that yet.

Instead, for now it works to expose the health check ports as usual via Marathon's app definition. In other words: we use internal IPs for intra-container communication via Consul service discovery and the port mapping for everything else.

Another nice thing would be to transform Marathon's app id (e.g. /web/customer1/site1) into a Consul service name. Then the app would be automatically available via site1.customer1.web.service.dc1.consul. This would either require to set the SERVICE_NAME and SERVICE_TAGS variables which are then read by progrium/registrator. Or progrium/registrator should understand the MARATHON_APP_ID variable which Marathon exports to the container environment. Either way would be nice. But for now we can work with explicit SERVICE_NAME/SERVICE_TAGS definition within the Marathon app definitions.

7. Outlook

We have a Mesos cluster running with cluster-wide unique IPs for each container. All this is completely transparent for Mesos and Marathon. Service discovery works with Consul and progrium/registrator (with one open https://github.com/hashicorp/consul/pull/627).

All this is pretty feasible with the current state of the Docker art and all its components. I am pretty sure it will not take long until container IPs are just the way to use Docker. The use of IP + standard-port makes the system engineering life so much easier because normal DNS lookups suddenly work again as they should (and always did before containers came up). No haproxys, no ambassadors, no SRV record logic in the apps.

Anybody who likes to see all this in a production like setup should take a look at my Ansible code base for a Mesos cluster: https://github.com/sttts/compute-platform. It includes all the weave bits and pieces in a weave role.