Docker volume backup

A brief introduction to docker volume and their automated backups.

The structure of a Docker image

Docker images are immutable files built of various layers, starting from layers in the base image to various package installation & application setup layers. These images can be stored locally or in container registries, and they act as the template for creating Docker containers.  Docker containers are instances of these images with an extra read-write layer on the top. The data in this layer contains the data which was created and/or modified after the instantiation of the underlying image; this may include internal process information, application logs, uploaded files, database files etc.
By default, a container’s read-write layer is not persisted on the host running the container, i.e. information in this layer will be lost if you remove the container and recreate it. This isn’t an issue for some applications, but in most cases, we need to persist some sort of data: database files, uploaded files, logs (even in the case of stateless microservices) e.t.c.

Docker provides two options for persisting a container’s read-write layer
  1. Bind mounts - This allows specific folders inside the container to be mapped to folders on the host. This option is very good for application development directly in the docker container without having a complete development environment installed on the development machine. However, this option sometimes doesn’t work for certain applications, in the case of a Windows host running a Linux machine. This option also requires setup of folders on the host before starting the container.
  2. Volumes - This links the folders inside the container to the allocated memory section on the host. This is a very flexible option, especially as volumes can automatically be created when the docker or docker-compose is run. However, since this is not directly available on the host file system, it poses slight challenges for backup.
More details https://docs.docker.com/storage/volumes/.

Why backing up volumes is important

As previously mentioned, docker volumes can contain business-critical data e.g. uploaded files, database files and logs. Therefore, it is obviously very important to have regular backups of these volumes and an option to easily restore them in order to achieve a better MTTR for disaster recovery. Docker volume backups are also required to move docker containers from one host to another. You may also want to take the volume backup in your DevOps pipelines before updating the container from updated docker images of your application.

If you are running the container in the cloud e.g. as an Azure container app, then there is an option to mount the volumes to file shares in Azure storage containers which then can be easily and automatically backed up. This also removes the problem of restoring the backup in the case of disaster recovery or rollback, as that is equivalent to restoring or rolling back Azure storage. NB: There are different options e.g. blob versioning, snapshots, etc to store the backup and restore them.

Here, I am discussing the problem of volume backup for containers running inside VMs (on-premise or in the cloud) with automated deployment using the CI/CD pipelines. 

Docker volume backup using a Powershell Core script

Here I have created a Powershell script to create a Docker volume backup as tar.gz file. This is built upon two key facts
  1. The same docker volume can be mapped to different containers.
  2. The same folder inside a container can have both a volume and a bind mount.

Let’s assume that you want to take a backup of volumeA that is currently used in container A. The steps to do so are as follows:
  1. Spin up a basic ubuntu container (containerB), mapping its /var/temp folder to volumeA
  2. Bind mount the /var/temp folder of containerB on the host at /home/temp. 
  3. Take a backup of /home/temp and stop containerB
The following script does the full backup in three steps:
  1. Find the containers currently using the volume and stop them. (You can exclude this if you want to take the backup of a running container file locking is not the problem)
  2. Take the volume backup
  3. Restart the containers which were stopped in step1
param([string]$backupPath='', [string]$conNames='')

if(($backupPath -eq '') -or (($conNames -eq '')))
{
    Write-Host "backup parameters missing"
    exit 1;
   
}
$currentBackupFolder=new-item -Path $backupPath -itemtype directory -name Backup$(get-date -f MM-dd-yyyy_HH_mm_ss)
$currentBackupFolderstr="$($currentBackupFolder)"

$target = $currentBackupFolderstr
$containerNames = $conNames.Split(",")
#$containerNames = "elevenplus_nginx","djangoapp","solr","elevenplus_postgresql"
#$target = "c:/temp/dockerbackup"

docker pull ubuntu:latest

Foreach ($containterName in $containerNames)

{
    $containterName = $containterName.Trim()

	Write-host "Checking for container $containterName ."

	$ccount = (docker ps -a --format "{{.Names}}" | select-string $containterName | measure).count
	if($ccount -gt 0)
	{
		Write-host "Found container $containterName."
		$running = docker inspect -f '{{.State.Running}}' $containterName


		$x = docker inspect --format='{{json .Mounts}}' $containterName | ConvertFrom-Json
		if($running -eq $true)
		{
			Write-host "Stopping container $containterName ."
			docker stop $containterName
			
		}
		Foreach($y in $x)
		{
			If($y.Type -eq 'volume'){
				$volname = $y.Name
			   Write-host "Trying volume backup for $containterName volume : $volname ."
				$expr3 =[string]::Format('docker run --rm --volumes-from {0} -v {3}:/backup ubuntu:latest bash -c "cd {1} && tar cf /backup/{2}.tar ."',$containterName,$y.Destination, $y.Name, $target)
				Invoke-Expression $expr3
				
			}
		}
		if($running -eq $true)
		{
			Write-host "Starting container $containterName ."
			docker start $containterName
		}
	}

}
The script above can be adapted to take backups in specific scenarios e.g. scheduled backups using cron jobs, backups in the DevOps pipeline or manual backups.

Restoring backup

Restoring the backup can be done by utilizing the same steps as above. Below is the script for automatically doing this. NB: this script is assuming that the backup file names are the same as the volume names generated in the volume backup script above.
$target = $null

while ($target -eq $null){
$target = read-host "Enter backup folder path (C:\temp\volumebackup)"
if([string]::IsNullOrWhiteSpace($target))
{
    $target = "C:\temp\volumebackup"
}
if (-not(test-path $target)){
    Write-host "Invalid directory path, re-enter."
    $target = $null
    }
elseif (-not (get-item $target).psiscontainer){
    Write-host "Target must be a directory, re-enter."
    $target = $null
    }
}


$volumes = Invoke-Expression "docker volume ls --format='{{.Name}}'"

# ForEach($v in $volumes){
#        Write-host "Existing volume $v."
#   
# }

$list = Get-ChildItem -Path $target -Recurse | `
    Where-Object { $_.PSIsContainer -eq $false -and $_.Extension -eq '.tar' }
    write-host "`nTotal : "$list.Count "Backups to restore `n"
ForEach($n in $list){
 $backupFullName =   [string]$n.FullName
 $backupName = Split-Path $backupFullName -leaf
 $longvolName = $backupFullName.Replace(".tar","")
 $volumename = Split-Path $longvolName -leaf
 
 if($volumes.Contains($volumename))
 {
    Write-host "Volume $volumename already exists."  
 }
 else
 {
     Write-host "Creating Volume $volumename."
    $createvolume = "docker volume create $volumename"
    Invoke-Expression $createvolume  
 }


 #$rmvolume = "docker volume rm $volumename"
 #$createvolume = "docker volume create $volumename"
 #Invoke-Expression $rmvolume
 #Invoke-Expression $createvolume

 $containernames = Invoke-Expression "docker ps -a --filter volume=$volumename --format='{{.Names}}'"
 [System.Collections.ArrayList]$containertostart = @()
 ForEach($cn in $containernames){
    $running = docker inspect -f '{{.State.Running}}' $cn
    if($running -eq $true)
    {
        Write-host "Stopping container $cn."
        docker stop $cn
        $containertostart.Add($cn)    
    }
    $containerbackup = [string]::Format('docker run --rm -v {0}:/recover -v {1}:/backup ubuntu bash -c "cd /recover && rm -r -f * &&  tar xvf /backup/{2}"', $volumename, $target, $backupName.Trim())

    Write-host "Invoking : $containerbackup"
    Invoke-Expression $containerbackup
    #Write-host ([string]$containerbackup)
 }
 Write-host ([string]$n).Replace(".tar","")
 ForEach($cstart in $containertostart){
        Write-host "Starting container $cstart."
        docker start $cstart    
 }
}