Vagrant, Docker and Ansible
For a high end customer, I had to delve into the world of Vagrant, VirtualBox, Libvirt and Docker. The request was to create a single Vagrantfile that would be able to provision the multi-host application to both host machines running through VirtualBox and Libvirt on one side and Docker containers on the other side.
Now, first off, at the time of writing, Vagrant, Docker and Ansible all seem rather twitchy application stacks. It might be me, but they all suffer from a badly defined application domain and a large ‘hackable’ instead of ‘pluggable’ architecture. It just does not feel mature, but that might be the years biting at me…
Anyways, getting VirtualBox and Libvirt up and running was quite easy. Vagrant has a concept of ‘Provider’ as an application that provides a virtualised environment and VirtualBox and Libvirt match that concept nicely.
Vagrant also has a concept of ‘Provisioner’, which is the tool that deploys and configures your environment so you can get busy developing your application. As you develop, you can have the Provisioner deploy parts of or the whole of your application, so you end up with a one-stop-solution: Vagrant, a provider, a provisioner and the required configurations to deploy your stack. Sounds nice.
Well, it is, in a way, if you work a lot with virtualisation. On the other side, if you work a lot with virtualisation, you probably are working with Docker as well. And that is where the issues start. Docker is not a Provider. It is also not a Provisioner. It is something in between, and beyond.
Docker provides ‘containers’ instead of ‘hosts’. A container is in essence only a configured environment in which you are supposedly running a single application. There is no concept of ‘box’ as a basic environment on Docker, but you could define a minimally configured image as such. But at the same time, Docker allows you to provision those images and deliver fully configured containers at the end. That is however not what we need. Also, a Docker container that is not running is stopped: i.e. if no application is running and the container is waiting for input, Docker stops the container and Vagrant does not play nice with that. If you have a multi-host application and you first want all containers up and running and provision them afterwards, some containers need to be waiting a while.
Lastly, Docker and networking…. no good. Docker has all kinds of nifty features, but assigning static IPs and controlling that through an easy interface that interfaced with the existing Vagrant networking options is not (yet) an option, unfortunately.
So, after some weeks of trying and refining, I solved the issues in a way.
- Docker as Provider
As I said, Docker is not a provider in the sense of VirtualBox and Libvirt. It is not a virtualisation tool, but uses the container API of the Linux kernel. This poses a few problems, as Vagrant expects the Docker container to behave as a virtualised environment. If you work your way through the documentation, you can easily get the impression that ‘Docker as a Provider’ was never meant to function in the same way as the other providers, but only as a way of spinning up a host machine (not a Docker container) that supports Docker containers (i.e.: a Linux host) and then provision on top of that. So it seems we are going to try something slightly different.
2. Docker and Networking
Docker does not allow you to set static IPs, but provides an opaque set of networking options, supposedly for the good. The only way of changing networking seems to be to use docker-compose, a tool to deploy multi-container applications with all kinds of nifty load-balancing options and producer-consumer variations. Very cool, but way out of scope. Sometimes you’d wish they first solve the easy stuff (static IPs) before they venture into the world of complexity.
3. Docker as Provisioner
I delved into the possibility to use Docker as a Provisioner, because that Vagrant plugin has more features and is better described in the various Howtos on the internet. But I’ll bring it to you straight away: this is not going to fly in this case. Docker as a Provisioner would solve the use case for which we have Ansible already. And Ansible being what it is, it is still far better at being a Provisioner than Docker will ever be. In the Vagrant environment, Docker as a Provisioner will not spin up the containers you require for provisioning with ansible. It also does not mix well with VirtualBox and Libvirt, which both provide the required machines. To actually use Docker as a Provisioner in this case (and run Ansible afterwards), the Vagrantfile needs to be refactored in such a way that it cannot be combined with VirtualBox, Libvirt and the other sensible Providers.
TL;DR: here are the relevant code structures you need in your Vagrantfile
1. First define a list of the machines you require. For each machine, you need a name, an IP and the FQDN (within the context of the parent host):
N=2 machines = { "m1" => { "name" => "machine1" "ip" => "172.20.1.20", "hostname" => "machine1.local" }, "m2" => { "name" => "machine2", "ip" => "172.20.1.21", "hostname" => "machine2.local" } }
2. You can configure VirtualBox and Libvirt as usual inside the Vagrant.configure block. Note the use of the override parameter, which allows us to set the VM box to be used. Because the Docker Provider does not have a concept of ‘box’, configuring a box in the higher scope will cause an error for the Docker Provider plugin. You can add additional code for the VM box (for example checksum checks, etc.) on this override parameter:
Vagrant.configure("2") do |config| config.vm.synced_folder ".", "/vagrant", disabled: true config.vm.provider "virtualbox" do |vb, override| override.vm.box = "debian/stretch64" vb.cpus = "1" vb.memory = "512" end config.vm.provider "libvirt" do |lv, override| override.vm.box = "debian/stretch64" lv.cpus = "1" lv.memory = "512" lv.graphics_type = "spice" lv.video_type = "qxl" end <More configuration code goes here later on> end
3. We don’t want to use the Vagrant public key, but at the same time Ansible wants to use a single key to access all hosts, so we need to generate a key at the top of the Vagrantfile:
require 'vagrant/util/keypair' env = Vagrant::Environment.new() sshkeypriv = Pathname.new(env.local_data_path) + 'id_rsa' sshkeypub = Pathname.new(env.local_data_path) + 'id_rsa.pub' if ARGV[0] == "up" and ( !sshkeypriv.exist? or !sshkeypub.exist? ) # see https://github.com/mitchellh/vagrant/blob/master/plugins/communicators/ssh/communicator.rb#L183-L193 puts "Generating new ssh key to use" pub, priv, openssh = Vagrant::Util::Keypair.create sshkeypriv.open("w+").write(priv) sshkeypub.open("w+").write(openssh) File.chmod(0600,sshkeypriv) end
4. We want to add this key to the authorized_keys for the vagrant user of our containers and boxes:
# we add the key to authorized_keys instead of provisioning the entire file, to allow # vagrant to reprovision running boxes. In that case, both the vagrant key and the # generated key need to be allowed config.vm.provision "file", source: sshkeypub, destination: "~/.ssh/provision_key.pub" config.vm.provision :shell do |shell| shell.inline = "cat /home/vagrant/.ssh/provision_key.pub >> \ /home/vagrant/.ssh/authorized_keys; \ echo '' >> /home/vagrant/.ssh/authorized_keys" end
5. Now we start providing and provisioning the various boxes by looping over our list of machines. This sets a name for our VirtualBox provider, configures network for non-Docker instances and allows the Docker Provider to provide us with something that we need in a transient fashion. Note that the Docker Provider points at a build_dir that should contain a Dockerfile with a minimal configuration
(1..N).each do |machine_id| machine = machines["m#{machine_id}"] config.vm.define machine["name"], autostart: true do |m| m.vm.network :private_network, ip: machine["ip"] m.vm.hostname = machine["hostname"] m.vm.provider "virtualbox" do |v| v.name = "#{machine['name']}" end m.vm.provider "docker" do |dk| dk.name = machine['name'] dk.build_dir ="./docker/#{machine['name']}" dk.build_args = ["-t", "scz:#{machine['name']}" ] dk.remains_running = false dk.has_ssh = false end <We'll be adding more code here> end
6. The Dockerfile we point to needs to build a simple image that has the following two main characteristics: it has SSHD so Vagrant can connect to it and it has Python, so Ansible can provision to it. The Dockerfile looks similar to:
FROM debian:stretch # Install base tools: sshd, python RUN apt-get update RUN apt-get install -y openssh-server python2.7 python3 python sudo EXPOSE 22 RUN mkdir -p /var/run/sshd RUN chmod 0755 /var/run/sshd # Create and configure vagrant user RUN useradd --create-home -s /bin/bash vagrant WORKDIR /home/vagrant # Configure SSH access RUN mkdir -p /home/vagrant/.ssh RUN echo "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA6NF8iallvQVp22WDkTkyrtvp9eWW6A8YVr+kz4TjGYe7gHzIw+niNltGEFHzD8+v1I2YJ6oXevct1YeS0o9HZyN1Q9qgCgzUFtdOKLv6IedplqoPkcmF0aYet2PkEDo3MlTBckFXPITAMzF8dJSIFo9D8HfdOV0IAdx4O7PtixWKn5y2hMNG0zQPyUecp4pzC6kivAIhyfHilFR61RGL+GPXQ2MWZWFYbAGjyiYJnAmCP3NOTd0jMZEnDkbUvxhMmBYSdETk1rRgm+R4LOzFUGaHqHDLKLX+FIPKcF96hrucXzcWyLbIbEgE98OHlnVYCzRdK8jlqm8tehUc9c9WhQ== vagrant insecure public key" > /home/vagrant/.ssh/authorized_keys RUN chown -R vagrant: /home/vagrant/.ssh RUN echo -n 'vagrant:vagrant' | chpasswd # Enable passwordless sudo for the "vagrant" user RUN mkdir -p /etc/sudoers.d RUN install -b -m 0440 /dev/null /etc/sudoers.d/vagrant RUN echo 'vagrant ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers.d/vagrant # Clean up APT when done. RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
7. So, at this point we have Docker containers that stop (do not keep running) and because we indicated that to Vagrant, we do not get an error situation. But that basically means we cannot provision them, so we need to spin them up again and we do that when we are providing the last box or container. After we have done that, we can provision to all our running boxes/containers in a parallel fashion and have the system behave ‘normally’. In the place marked above inside the Vagrant loop, we add the following code:
if machine_id == N <Docker provider code goes here in a moment> m.ssh.insert_key = false m.vm.provision :ansible do |ansible| ansible.playbook = "provision.yml" ansible.inventory_path = "./environments/vm/inventory" ansible.raw_ssh_args = ["-o IdentityFile=.vagrant/id_rsa"] ansible.extra_vars = { user: "vagrant" } end end
8. Almost there. This Vagrantfile has a list of machines and invokes the correct Provider for each. Then, when the last machine is Provided, we call Ansible to Provision all the running machines simultaneously. The only thing biting us at the moment is that the Docker containers all stopped, so there is nothing to Provision to. We workaround that by defining a docker-compose configuration for our machines. You need to manually match this with the configured machines, but it is fairly straightforward. To ensure the containers keep on running, we spin up the SSHD daemon in the foreground, which coincindentally also gives us SSH access:
m.vm.provider "docker" do |dk| dk.cmd = ["/usr/sbin/sshd", "-D" ] dk.remains_running = true dk.has_ssh = true dk.compose = true dk.compose_configuration = { "services" => { machines["m1"]["name"] => { "build" => { "context" => "../../docker/#{machines['m1']['name']}" }, "command" => ["/usr/sbin/sshd", "-D" ], "image" => "myproject:#{machines['m1']['name']}", "hostname" => "#{machines['m1']['hostname']}", "networks" => { "scznet" => { "ipv4_address" => machines["m1"]["ip"] } } }, machines["m2"]["name"] => { "build" => { "context" => "../../docker/#{machines['m2']['name']}" }, "command" => ["/usr/sbin/sshd", "-D" ], "image" => "myproject:#{machines['m2']['name']}", "hostname" => "#{machines['m2']['hostname']}", "networks" => { "scznet" => { "ipv4_address" => machines["m2"]["ip"] } } } }, "networks" => { "scznet" => { "driver" => "bridge", "ipam" => { "config" => [{ "subnet" => "172.20.1.0/24", "gateway" => "172.20.1.1" }] } } } } end
This final trick allows us to define the static IP network for Docker to match the networking options of VirtualBox and Libvirt. As mentioned before, we have SSH access to each container and the container keep running because of the SSHD daemon. Now ansible can access these containers with the IP addresses as defined in the inventory list. Due to the bridged interface, each container has access to internet through the gateway, allowing updates over the network.
There are a slew of issues with this setup, but I feel that this whole undertaking is a mix of incomplete and badly compatible tools using hacks to stay alive. Mileage may vary, no guarantees.