Backups with Duplicati and Docker: Data Protection

Tutorial for backups with Duplicati and Docker. Protect your data from drive failure or ransomware with reliable, automated backups. Open source & flexible.

Backups with Duplicati and Docker: Data Protection-heroimage

You’re sitting comfortably at your computer, sipping your coffee, and suddenly poof - hard drive dead, or even worse: ransomware! Now what? I’ve lost data myself because I thought, “Ah, nothing will happen.”

Key Takeaways

  • Hard drive failures are real: The Annual Failure Rate (AFR) shows that hard drives have a limited lifespan.
  • RAID is not a backup: It increases resilience against hardware defects but doesn’t protect against data loss from deletion, ransomware, or disasters.
  • The 3-2-1 rule is fundamental: Three copies, two types of media, one copy offsite. This is the standard.
  • Duplicati + Docker: An open-source tool, super flexible, runs great in a container.
  • Databases require special handling: Tools like pg_dump or mysqldump are necessary for consistent backups.
  • Automate! Nobody likes doing backups manually. Automated scripts or backup containers are more reliable.
  • Restore tests are essential: A backup you’ve never restored might not be a backup at all.

Why Backups are Indispensable

The reliability of data storage is a central issue, even if it’s not something we like to think about. A key metric is the “Annual Failure Rate” (AFR), which indicates the probability of a hard drive failing within a year. Current statistics, for example from Backblaze, show an average AFR of around 1.41%. For certain models, this rate is significantly higher.

RAID vs. Backup: Redundancy Doesn’t Equal Security

Using a second hard drive to mirror data, known as RAID 1, is a common method to increase data availability. One drive dies? No problem, the second one still has everything.

Despite this improvement, it’s crucial to understand: A RAID system does not replace a backup. The reason is that RAID only protects against a specific problem – the physical failure of a hard drive. It offers no protection against:

  • Accidental Deletion or Modification: You delete a file on drive A? Poof, it’s gone on drive B too. RAID just mirrors.
  • Ransomware / Viruses: If the system gets infected and data is encrypted, it affects both mirrored drives.
  • On-site Disasters: Lightning strike, water damage, fire? If both drives are in the same location, both are likely toast.
  • Software Errors or Filesystem Corruption: Sometimes the operating system or an application messes up and corrupts data. RAID doesn’t help here either (not to be confused with RAID 0).

RAID provides redundancy against hardware failure, but not security against the many other potential causes of data loss. Studies suggest that data loss due to human error, software problems, or cyberattacks is even more common than pure hardware defects. Therefore, a RAID system alone does not offer comprehensive protection. Without a true, separate backup, a significant residual risk remains.

The 3-2-1 Rule: A Solid Foundation

If RAID isn’t the solution for everything, what is? This is where the 3-2-1 Backup Rule comes in. And the name pretty much says it all:

  • 3 Copies of your data: This means: Your original data plus two separate backups. Why three? If one copy fails, you still have two. If the second one also fails (unlikely, but possible), you still have one.
  • 2 Different types of media: Don’t store all copies on the same type of medium. Not just internal hard drives:
    • An internal hard drive AND an external USB hard drive.
    • A local hard drive AND cloud storage.
    • A NAS (Network Attached Storage) AND a tape drive (okay, that’s more for businesses). The idea is to spread the risk. If a specific technology or batch is prone to failure, the data on the other media type remains unaffected.
  • 1 Copy offsite: At least one of the backup copies should be stored geographically separate from the original data. This protects against local events like fire, burglary, or natural disasters. Options include:
    • Cloud storage (Google Drive, OneDrive, Dropbox, specialized backup providers).
    • A server at a friend’s or family member’s place.
    • A rented server (VPS or Dedicated Server) in a data center.
    • An external hard drive that is regularly taken to another location (e.g., workplace, bank safe deposit box, relatives).

Implementing the 3-2-1 rule naturally requires additional effort: more storage space is needed, costs for cloud storage or extra hardware may arise, electricity is consumed, and you need to set up and maintain the whole thing. This effort must be weighed against the value of the data. More advanced concepts like the 3-2-1-1-0 rule further increase data security: the additional 1 stands for an offline copy (one not constantly connected, e.g., an external drive in a drawer, protection against ransomware) and the 0 for zero errors during regular restore tests.

Duplicati in a Docker Container: Installation and Configuration

For the practical implementation of a backup, especially in a Docker environment, the open-source tool Duplicati is well-suited. It’s flexible, supports many destinations and encryption, and runs nicely as a Docker container. Pre-built images, for example from linuxserver.io, simplify deployment.

Installation is best done via Docker Compose. Create a folder for Duplicati (e.g., ~/docker/duplicati) and inside it, create a file docker-compose.yml with the following content (adjust paths and IDs):

mkdir -p ~/docker/duplicati/{backups,config}
nano ~/docker/duplicati/docker-compose.yml
services:
    duplicati:
        image: lscr.io/linuxserver/duplicati:latest
        container_name: duplicati
        environment:
            - PUID=1000 # Your User ID here! Find with 'id -u'
            - PGID=1000 # Your Group ID here! Find with 'id -g'
            - TZ=Europe/Berlin # Your Timezone
            - SETTINGS_ENCRYPTION_KEY=abcd1234 # Encryption key for settings (important!) - CHANGE THIS!
            # - CLI_ARGS= # Optional: Extra command-line arguments
        volumes:
            - ./config:/config # Where Duplicati stores its settings
            - ./backups:/backups # Where backups land (local, changeable!)
            - /path/to/your/data:/source/data:ro # IMPORTANT: Map your data here!
            - /path/to/host/data2:/source/data2:ro # More data to back up
        ports:
            - "8200:8200" # Port for accessing the web interface
        restart: unless-stopped

Important Adjustments:

  • PUID and PGID: Very important! These are the user and group IDs on your host system. Enter id -u and id -g in the terminal to find yours. Otherwise, Duplicati might not be able to access your folders.
  • TZ: Set your timezone so schedules are correct. Europe/Berlin works for Germany. Adjust as needed.
  • Volumes:
    • ./config:/config: A local config folder is mapped into the container. Duplicati stores important stuff here. Must exist!
    • ./backups:/backups: A local backups folder is mapped. Your backups can land here if you back up locally. Must also exist!
    • /path/to/your/data:/source/data:ro: Crucial! You need to specify the path on your host system where the data you want to back up resides (e.g., the folder containing Docker volumes of your other containers). /source/data is then the path inside the Duplicati container that you’ll select later when setting up the backup. You can map multiple volumes here. The :ro (read-only) suffix is recommended so Duplicati cannot accidentally modify the original data.
  • Ports: 8200:8200 means you can access the Duplicati web interface via http://YOUR_SERVER_IP:8200.

Once the docker-compose.yml is ready, simply enter sudo docker compose up -d in the terminal within that folder. Docker will download the image and start the container. You can then access the web interface in your browser. The first time, the password is often changeme or duplicati. You should change this immediately in the settings! Better safe than sorry.

Your First Duplicati Backup: Securing Folders Made Easy

Is Duplicati running? Great! Now let’s set up the first backup. Go to the web interface (http://YOUR_SERVER_IP:8200) and click “Add backup” > “Configure a new backup”.

  1. General: Give the backup a meaningful name, e.g., “Docker Volumes App XY” or “Paperless App”. You can also add a description. The Encryption setting is important here.
    • No encryption: Only if you are absolutely sure that no unauthorized person can access the backup destination (e.g., purely local backup on an encrypted disk).
    • AES-256 encryption (recommended): If you back up to the cloud, to a third-party server, or even just to an external drive that might be lying around, it’s best to always encrypt! Choose a strong password (passphrase) and write it down securely! Without this passphrase, restoring the data is impossible. Losing it means total data loss.
  2. Destination: Where should the backup go? Duplicati offers numerous options:
    • Local folder or drive: Simply specify the path you mapped under /backups in docker-compose.yml (e.g., /backups).
    • SFTP (SSH): Backup to another server via SSH. Requires server address, port (default 22), username, and password or SSH key.
    • WebDAV: Many NAS systems or cloud providers offer this.
    • Cloud providers: Google Drive, OneDrive, Dropbox, Amazon S3, Backblaze B2, and many more are built-in. You usually need to authenticate via an “AuthID” button.
    • Rclone: If your destination isn’t directly supported but is by Rclone (another cool tool), you can configure it here. For the 3-2-1 rule, you need at least one destination that is not on the same server/drive! So SFTP or Cloud is a good choice for the offsite copy. Use the “Test connection” button to check if the destination is reachable.
  3. Source Data: What should be backed up? Click through the folder structure inside the Duplicati container. Find the folder you mapped earlier as a source in docker-compose.yml (e.g., /source/data). Select the folders and files to include in the backup. You can also define filters to exclude certain file types or folders (e.g., log files, cache folders).
  4. Schedule: When should the backup run? Daily? Weekly? At a specific time (nighttime is popular)? Set that here. Regular, automatic backups are invaluable.
  5. Options: This section is important again:
    • Backup retention: How many old backups do you want to keep? “Keep all backups” consumes a lot of storage. Options like “Smart backup retention” (e.g., keeps daily backups for the last 7 days, weekly for the last 4 weeks, etc.) or custom rules are often sensible.
    • Remote volume size: Defines how large the data blocks should be before uploading. The default value (e.g., 50 MB) is usually a good choice.

When everything looks right, click “Save”. Duplicati might ask for the passphrase again (if you set one). Then you can click “Run now” to manually trigger the first backup. This takes the longest the first time because everything needs to be transferred. Subsequent backups are incremental, meaning Duplicati only saves the changes since the last run. This is much faster and saves storage space.

Backing Up Databases Correctly: The Challenge with Dumps (PostgreSQL & MySQL)

Backing up simple files and folders with Duplicati is straightforward. But what about databases running in your Docker containers, like PostgreSQL or MySQL? This gets a bit trickier. Why? A database constantly writes data to its files on disk (i.e., in the Docker volume). If Duplicati simply copies these files while the database is active, there’s a high risk of backing up an inconsistent state. This means the backup might be completely useless because data was being written mid-copy, and the files don’t match up.

What’s the solution? Almost every database comes with its own tools to create a consistent snapshot, known as a dump. These tools communicate with the database to ensure all data is captured in a coherent state and written to a single file. This dump file can then be safely backed up by Duplicati.

The main tools for common databases are:

  • PostgreSQL: The command-line program pg_dump. It needs information like the database user (-U), the database name (-d or as the last argument), and optionally the host (-h). The output is usually redirected to a file (extension .psql or .dump).
    # Example: Dump the database 'my_app_db' as user 'app_user'
    pg_dump -h db_host -U app_user my_app_db > /path/to/backup/backup.sql
  • MySQL/MariaDB: The corresponding tool is mysqldump. It requires similar parameters: user (-u), password (-p), database name, and optionally host (-h).
    # Example: Dump the database 'my_website_db' as user 'web_user'
    mysqldump -u web_user -pYOUR_PASSWORD my_website_db > /path/to/backup/backup.sql

The challenge lies in running these dump commands regularly and automatically, and making the resulting dump files accessible to Duplicati. A clean solution is automation using scripts or dedicated backup containers.

Automated Database Dumps: Scripts and Docker Services

Who wants to manually run pg_dump or mysqldump every day? Nobody! It needs to be automated. In a Docker environment, this can often be done directly in your docker-compose.yml. The idea: For each database we want to back up, we create an additional service (container) whose sole purpose is to regularly create a dump and place it in a shared location.

Example for PostgreSQL and MySQL:

mkdir -p ~/docker/app/backup/{postgres,mysql}

Then add something like this to your docker-compose.yml:

services:
  # ... other services here ...
  [...]

  # MySQL Database
  mysql:
    image: mysql:8.4
    container_name: demo-mysql
    environment:
      MYSQL_ROOT_PASSWORD: example
      MYSQL_DATABASE: demo
      MYSQL_USER: demouser
      MYSQL_PASSWORD: demopass
    volumes:
      - mysql-data:/var/lib/mysql
    networks:
      - demo-network
    healthcheck:
      test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p$$MYSQL_ROOT_PASSWORD"]
      interval: 5s
      timeout: 5s
      retries: 5

  mysql-backup:
    networks:
      - demo-network # Must be in the same network as the DB
    image: mysql:8.4 # Same major version as the database!
    container_name: mysql-backup
    environment:
      MYSQL_HOST: mysql
      MYSQL_DATABASE: demo
      MYSQL_USER: demouser
      MYSQL_PASSWORD: demopass
      BACKUP_NUM_KEEP: 7 # Number of dumps to keep
      BACKUP_FREQUENCY: 1d # How often to run (e.g., 1d=daily, 12h=twice a day)
    entrypoint: |
      bash -c 'bash -s <<EOF
      trap "break;exit" SIGHUP SIGINT SIGTERM
      sleep 2m # Initial delay to allow DB to start fully
      while /bin/true; do
        echo "Creating MySQL dump..."
        mysqldump -h $$MYSQL_HOST -u $$MYSQL_USER -p$$MYSQL_PASSWORD \
          --no-tablespaces \
          --single-transaction --quick --lock-tables=false $$MYSQL_DATABASE | gzip > /dump/mysql_backup_`date +%d-%m-%Y"_"%H_%M_%S`.sql.gz
        echo "Cleaning up old MySQL dumps (keeping $$BACKUP_NUM_KEEP)..."
        # List all dumps, keep the newest N, list all again, find unique (old) ones, delete them
        (ls -t /dump/mysql_backup_*.sql.gz | head -n $$BACKUP_NUM_KEEP; ls /dump/mysql_backup_*.sql.gz) | sort | uniq -u | xargs --no-run-if-empty rm -f
        echo "MySQL dump complete. Sleeping for $$BACKUP_FREQUENCY..."
        sleep $$BACKUP_FREQUENCY
      done
      EOF'
    depends_on:
      mysql:
        condition: service_healthy
    restart: unless-stopped
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - ./backup/mysql:/dump # Host directory for the dumps

  # PostgreSQL Database
  postgres:
    image: postgres:17-alpine
    container_name: demo-postgres
    environment:
      POSTGRES_PASSWORD: example
      POSTGRES_USER: demouser
      POSTGRES_DB: demo
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - demo-network
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U demouser -d demo"]
      interval: 5s
      timeout: 5s
      retries: 5

  postgres-backup:
    container_name: postgres-backup
    networks:
      - demo-network # Must be in the same network as the DB
    image: postgres:17-alpine # Same major version as the database!
    environment:
      PGHOST: postgres
      PGDATABASE: demo
      PGUSER: demouser
      PGPASSWORD: example
      BACKUP_NUM_KEEP: 7 # Number of dumps to keep
      BACKUP_FREQUENCY: 1d # How often to run
    entrypoint: |
      bash -c 'bash -s <<EOF
      trap "break;exit" SIGHUP SIGINT SIGTERM
      sleep 2m # Initial delay
      while /bin/true; do
        echo "Creating PostgreSQL dump..."
        # Use custom format (-Fc) which is generally preferred for pg_restore
        pg_dump -h $$PGHOST -U $$PGUSER -d $$PGDATABASE -Fc > /dump/pg_backup_`date +%d-%m-%Y"_"%H_%M_%S`.dump
        echo "Cleaning up old PostgreSQL dumps (keeping $$BACKUP_NUM_KEEP)..."
        (ls -t /dump/pg_backup_*.dump | head -n $$BACKUP_NUM_KEEP; ls /dump/pg_backup_*.dump) | sort | uniq -u | xargs --no-run-if-empty rm -f
        echo "PostgreSQL dump complete. Sleeping for $$BACKUP_FREQUENCY..."
        sleep $$BACKUP_FREQUENCY
      done
      EOF'
    depends_on:
      postgres:
        condition: service_healthy
    restart: unless-stopped
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - ./backup/postgres:/dump # Host directory for the dumps

  # Optional Adminer for database management
  adminer:
    image: adminer:latest
    container_name: demo-adminer
    ports:
      - "8080:8080"
    environment:
      ADMINER_DEFAULT_SERVER: mysql # Pre-fill server field
    networks:
      - demo-network
    depends_on:
      - mysql
      - postgres

  # Optional pgAdmin for PostgreSQL management
  pgadmin:
    image: dpage/pgadmin4:latest
    container_name: demo-pgadmin
    environment:
      PGADMIN_DEFAULT_EMAIL: admin@example.com
      PGADMIN_DEFAULT_PASSWORD: admin
    ports:
      - "8081:80"
    networks:
      - demo-network
    depends_on:
      - postgres

volumes:
  mysql-data:
  postgres-data:

networks:
  demo-network:
    driver: bridge

What’s happening here?

  • We use the same image base as the database containers.
  • We map a host folder to /dump in the container. This is where the backups land. This host folder must then also be mapped into the Duplicati container!
  • We set environment variables for the database connection.
  • The entrypoint is a small shell script:
    • It starts an infinite loop (while /bin/true; do ... done).
    • Inside the loop, pg_dump or mysqldump is executed. The output is written to the /dump folder with a timestamp and compressed (gzip for MySQL, custom format for PostgreSQL).
    • A cleanup command removes older backups, keeping only the number specified by BACKUP_NUM_KEEP.
    • Then the script sleeps for the duration specified by BACKUP_FREQUENCY (e.g., 1d for one day, 12h for 12 hours) and starts over.
  • Important: The backup container must be in the same Docker network as the database and needs depends_on with condition: service_healthy so it doesn’t start too early.

The key takeaway: You now have a host folder containing the latest (and possibly a few older) database dumps. Simply include this folder as a source in your Duplicati backup job! Problem solved. Once set up, it runs fully automatically.

Data Restoration: The Practice Test for Your Backup

Having a backup is good. Knowing how to restore it is better. A backup is only as good as its successful restoration. Therefore, it’s essential to regularly test the restore process. This is the only way to be sure that in an emergency – whether due to hardware failure, ransomware, or human error – the data is actually recoverable.

1. Restoring Files/Folders with Duplicati:

This is usually the easy part. Go to the Duplicati web interface, click on “Restore”.

  • Select the backup job you want to restore from.
  • You’ll see a list of available backup versions (points in time). Choose the version you need.
  • You can then browse the folder structure of the backup and select the files or folders you want back.
  • Destination: You can either choose “Restore to original location” (only works if the path exists and Duplicati has write permissions – careful, might overwrite existing newer files!) or “Restore to different location” and specify an empty folder where Duplicati should copy the items. The latter is often safer, allowing you to check the data calmly and then manually move it where it belongs.
  • Click “Restore”. If you used encryption, you’ll be asked for the passphrase now if it’s not saved in the settings.
  • Duplicati works for a bit and restores the files. Done.

2. Restoring a Database (from the Dump):

This requires a few extra steps, as the dump file needs to be imported back into the database.

  • Get the Dump File: First, restore the required .sql, .sql.gz, or .dump file from your Duplicati backup (see step 1), preferably to a temporary folder on your host. If the file is compressed (.gz), you’ll need to uncompress it first (e.g., with gunzip backup.sql.gz).
  • Prepare the Database Container: Ensure your database container is running. Often, it’s best if the database is empty before importing the dump. You might stop the container, remove the associated Docker volume (docker volume rm ...), and restart the container (it will then create an empty database). Warning: This step permanently deletes all current data! Only do this if you are certain you want the old state from the dump.
  • Copy or Mount the Dump into the Container: You need to get the dump file into the running database container.
    • Temporarily mount a volume: Stop the container, add a temporary volume mapping in docker-compose.yml that maps your host folder with the dump into the container, start the container.
    • Use docker cp: sudo docker cp /path/on/host/backup.sql container_name:/tmp/backup.sql
  • Execute the Restore Command: Now you need the counterpart to the dump command. You execute this inside the container (e.g., sudo docker compose exec container_name sh).
    • PostgreSQL: The tool is pg_restore (if you used -Fc format for dump) or psql (if plain SQL dump).
      # Inside the PostgreSQL container:
      # For custom format (.dump):
      pg_restore -U your_db_user -d your_db_name --clean --if-exists /tmp/pg_backup.dump
      # For plain SQL format (.sql):
      # psql -U your_db_user -d your_db_name < /tmp/backup.sql
      (Adjust paths, filenames, user, db name! The --clean --if-exists options help ensure a clean restore into an existing DB.)
    • MySQL/MariaDB: The tool is mysql.
      # Inside the MySQL/MariaDB container:
      # If the dump is gzipped:
      # gunzip < /tmp/mysql_backup.sql.gz | mysql -u your_db_user -pYOUR_PASSWORD your_db_name
      # If the dump is plain SQL:
      # mysql -u your_db_user -pYOUR_PASSWORD your_db_name < /tmp/backup.sql
  • Verify: After the command finishes (can take time!), check in your application or with database tools if the data is back.

The restore process, especially for databases, requires care. Regular tests are therefore essential. Simulate data loss in a test environment. Once you’ve successfully performed a restore, you know your backup strategy works, and you’ll be much calmer in a real emergency. Backups aren’t optional; they are absolutely essential if you care about your data.


Frequently Asked Questions (FAQ)

  • How often should I run backups? It depends on how often your data changes and how much loss you can tolerate. For important things: Daily is often good. For less critical data, weekly might suffice. Databases might need backing up even more frequently than regular files.
  • Which backup destinations are recommended? According to the 3-2-1 rule, a combination is wise: At least one copy locally (for faster restores) and one externally (protection against local disasters). For example, backup to an external USB drive AND to the cloud or a second server via SFTP.
  • Is Duplicati free? Yes, Duplicati is open-source software under the LGPL license and can be used freely.
  • Do I really need to encrypt? If the backup leaves your house/server (cloud, SFTP, external drive you take with you), strong encryption is highly recommended. Remember the password!
  • What’s the best way to test my restore? Set up a test environment (e.g., on your laptop or a test server). Restore the backup there and check if everything works as expected. Do this regularly, e.g., every few months.
  • What about other databases (e.g., MongoDB, Redis)? They usually have their own backup tools (mongodump, Redis has snapshots or AOF). The principle is similar: Find the right tool, create a consistent dump/snapshot, automate it, and include the dump file in your Duplicati backup. Read the database documentation!
  • Is backing up the Docker Volume enough? For normal application data (like uploads, config files), yes. For running databases: No, due to the risk of inconsistency. Always use the database’s native dump tools!

Share this post:

This website uses cookies. These are necessary for the functionality of the website. You can find more information in the privacy policy