Multi-host Volumes

Blockbridge volumes work in both single and multi-host Docker environments. In this post we’ll describe how it all works and the key semantics that you need to be aware of for Docker 1.9.

Docker itself is not multi-host aware. While orchestration tools like Docker Swarm facilitate multi-host scheduling and management, each Docker daemon is responsible for local container management only. If you have a containerized application that maintains persistent state, and you need to move it between hosts, then you have a basic use-case for multi-host volumes.

A multi-host volume is a persistent data volume that can be accessed by any Docker host and its containers. This definition includes both shared access volumes as well as those requiring exclusive access.

The key technical challenges in implementing multi-host volumes involve managing coordinated access to single-writer data sources and awareness of distributed container references that are managed by Docker. The Blockbridge Volume Driver implements solutions for these challenges and others: it is purpose-built for multi-host awareness.

Multi-host Volume Create

When a volume is created in 1.9, Docker creates host-local metadata to track its existence. This metadata includes the name of the volume and the volume driver responsible for its management. Docker assumes that a volume does not exist if it cannot find corresponding volume metadata.

For multi-host volumes in 1.9, this mandates a create operation on every host that desires access to a volume. Note that the create operation may be implicit (i.e., via docker run -v) or explicit (i.e., via docker volume create).

When a volume is created, on any host, via the Blockbridge volume driver (i.e., via ‐‐driver blockbridge), the driver first checks to see if the volume already exists. Docker expects the volume create API method to be idempotent: if the volume already exists, the volume plugin must return success.

Blockbridge tracks per-host metadata using an extensible metadata service implemented by external management software. For each host that creates a volume, the Blockbridge driver registers per-host metadata that includes the hostname. This metadata acts as a per-host reference count and ensures that the volume will not be removed until all hosts have released their references. A similar approach is used to implement mutual exclusion.

Multi-host Volume Remove

Docker internally reference counts volumes. This mechanism prevents a volume from being removed while actively referenced by a container. When a volume’s reference count logically drops to zero it can be removed.

Volumes are removed explicitly (i.e., docker volume rm) or implicitly via container remove (i.e., docker rm -v). Docker will only send a remove request to a volume driver when the internal reference count drops to zero. Since Docker’s reference counting is not multi-host aware, the volume driver must be.

When Docker executes the remove volume API method, the Blockbridge volume driver removes the associated host specific metadata by communicating with the metadata service. As long as metadata exists for any given host, a volume will persist. Similar to Docker, when the last metadata reference is removed, Blockbridge will remove the volume and reclaim its resources.

Multi-host Volume Usage

Once a volume is created on a host, it is eligible to be mounted inside of a container. The role of the volume driver is to attach the volume to a specific path on the host platform. In a multi-host environment this requires strict synchronization. Again, Docker is not multi-host aware, the burden is on the volume driver. Any synchronization or virtualization that takes place must be transparent to the Mount/Unmount API methods.

When a container starts, Docker issues a Mount request to the volume plugin for each referenced volume. When a container stops, it issues an Unmount request to the volume plugin. As the volume plugin API does not contain any additional identifying information, such as the container ID, the Blockbridge volume driver maintains a reference count locally for each container mount request. This ensures that the volume stays mounted on the host as long as a container is using it.

On the first mount request, Blockbridge attaches the volume to the host and mounts it locally on a directory in the host filesystem. When the last container using the volume is stopped it is unmounted and detached from the host. With default options, only one host may access a Blockbridge volume at a time (as a normal XFS filesystem layered on a smart virtual block device). To access the volume on a second host, first stop the container on the first host, then start the container on the second host.

Happy Swarming!