Usually, the time for backups is right after a critical system failure or after a ransomware attack! So, let's avoid that and make backups simple. Well, somewhat simple...
Borg backup (https://borgbackup.readthedocs.io/en/stable/) is a great backup tool. It does compression and de-duplication on its own and is perfect for automatic backups.
I've had to deploy a large backup policy at work, so I prefer not to have to run manual commands on each system - that's why I invested time in polishing an ansible role and some playbooks that can help anyone deploy a borg backup server (or several) with automatic scripts that push backups from distributed systems to a centralized server.
The way backups will work is the following:
- client systems will run borg via cron to backup certain directories (or the full filesystem)
- they connect to a central server and push the differences via ssh in a borg repository on the server. Each client will have a dedicated user on the backup server and will have its own repository, to keep things separated.
- if all backups are done correctly, the clients run prune to delete backups older than x days/months/years
The client can mount the borg repo via ssh (with keys) and can push backups, prune, or mount backups in order to restore data.
Sounds complicated? It is! But, thankfully it's all automated.
Managing your hosts with ansible
If you're not into ansible already, you may be reluctant to learn another way to manage your systems. Maybe you only have one client and one server - why complicate things? Well, it's not that difficult to set up, and can be easily extended, so why not jump in?
This is not going to be a full ansible tutorial, but just enough to get you on your feet. To use ansible you need (on your clients and backup server(s)):
- python 2.6 or later
- ssh
Unless you're still running RHEL4, that's not very difficult, is it?
We're going to do all the configuration for a single host - which I'm going to call "laptop", from where you will be running the ansible commands (called playbooks), and this laptop will connect to your clients and backup server. Here, in addition to the requirements I already stated, you also need to have "ansible" installed. If you're on Ubuntu, it's as simple as sudo apt-get install ansible.
Next, we'll need a directory where to do some configuration. Let's call it "ansible-playground". All the commands and files we're going to discuss will be inside this directory, so time to cd to it:
$ mkdir ansible-playground
$ cd ansible-playground
The first thing you're going to need is to set up the inventory (or the ansible hosts file). This means building a list of systems you want to manage via ansible. You'll need to specify a name for the client, an IP and optionally some parameters to help access the system. Here are some examples:
$ cat hosts
# host definition
[hosts_to_backup]
stingray ansible_host=192.168.1.11 ansible_user=root
aldebaran ansible_host=192.168.1.5 ansible_user=root
hc4 ansible_host=192.168.1.173 ansible_user=root
The first item is the name that you're going to be using later. I prefer to use the system's hostname. The variable ansible_host points to the IP address of the system, and ansible_user tells ansible to try to connect via ssh with that user. There are lots of other variables you can specify, such as ssh port or the path to the python interpreter (https://docs.ansible.com/ansible/latest/reference_appendices/special_variables.html).
In principle, ansible will try to connect with the ansible_user username via ssh to ansible_host and will try to login with a ssh key. You can use a password, but it's annoying, so it's best to set up ssh authentication with keys between your "laptop" and all systems managed by ansible. There are lots of tutorials on how to do this, but normally you:
- Create a ssh key pair if you don't have one already on your laptop:
$ ssh-keygen -t rsa
- You copy the public key on your client's ~/.ssh/authorized_keys. The simplest way to do it is with:
$ ssh-copy-id ansible_user@server
Replace ansible_user and server with the correct information. ssh-copy-id has the benefit of asking to accept the fingerprint of the server, so this will help later on.
Ok, this is the most time-consuming step, but you're actually preparing your systems to be managed by ansible and needs to be done once. For large deployments, if you don't want to use root login, you can create a special user to be used by ansible, and you can enable sudo for it so that ansible can run commands as root. If you want to manage your "laptop" as well, the simplest way to do it is to connect to it via ssh (ssh ansible_user@127.0.0.1), to keep things consistent.
So, can you be sure that ansible is setup correctly? You can test it with:
$ ansible -i hosts -m ping hosts_to_backup
hc4 | UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: root@192.168.1.17: Permission denied (publickey,password).", "unreachable": true } stingray | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python" }, "changed": false, "ping": "pong" } aldebaran | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python" }, "changed": false, "ping": "pong" }
The "-i hosts" specifies the path to the host files we've just defined, "-m ping" runs the ping module that checks connectivity and "hosts_to_backup" specifies a group of hosts in the inventory. If ssh with keys fail for some reason, it's best to run ssh -vvv user@server and also check /var/log/auth.log on the server. My problem was caused by permissions not being tight enough for a directory:
Mar 22 18:25:03 localhost sshd[3337162]: message repeated 2 times: [ Authentication refused: bad ownership or modes for directory /root]
Once you fix all the connection issues, you should be set to use ansible to manage your hosts.
Downloading the borg-backup playbook and role
The first thing you're going to need is to download and install the ansible-role-borgbackup from here: https://github.com/mad-ady/ansible-role-borgbackup. This role handles most of the details - distributing ssh keys between client and server, creating the repo, adding a cron job and also creating a customized backup script. You need to install it only to your "laptop" with this command:
$ ansible-galaxy install git+https://github.com/mad-ady/ansible-role-borgbackup.git
- extracting ansible-role-borgbackup to /home/adrianp/.ansible/roles/ansible-role-borgbackup
- ansible-role-borgbackup was installed successfully
Now comes the boring part - you should take the time to go through the README from https://github.com/mad-ady/ansible-role-borgbackup/README.md to understand what variables you need to set. Go on, I'll wait...
We're going to write a playbook that makes use of this role. You can get it from here:
$ wget https://raw.githubusercontent.com/mad-ady/ansible-odroid-config/master/deploy-borg-backup.yaml
Before we run the playbook to deploy the backups, we need to talk about configuration. We need to tell borg backup what directories to backup from each client, how long to keep those backups and in case of special backup (e.g. a mysql database), supply the credentials.
The variables
The playbook comes with a set of default variables that you can consult here: https://github.com/mad-ady/ansible-role-borgbackup/blob/master/defaults/main.yml. So, by default, borg will backup just /etc, /home and /root, will run the backup job daily sometime between 0:00 and 4:59 (it gets set randomly on first deployment, but it doesn't jump around from day to day) and keeps backups 1 yearly backup, 6 backups in the last 6 months, 4 backups in the last month and 7 backups in the last week. The backup retention policy is best explained here: https://borgbackup.readthedocs.io/en/stable/usage/prune.html
In order to override these variables, the most elegant way is to do so in a file called group_vars/all. Note that the file syntax is ansible and indentation is important and must be consistent:
$ mkdir group_vars
$ touch group_vars/all
$ cat group_vars/all
borgbackup_encryption_mode: none
borgbackup_include:
- /etc
- /var/spool/cron
borgbackup_exclude: []
borgbackup_pre_commands: []
borgbackup_post_commands: []
borgbackup_servers:
- fqdn: 192.168.1.5
shortname: aldebaran
user: "borg_{{inventory_hostname}}"
type: ssh
home: "/media/wdc/storage3TB/backup/borg_{{inventory_hostname}}"
pool: "borg_{{inventory_hostname}}"
options: ""
borgbackup_passphrase: ""
borgbackup_retention:
hourly: -1
daily: 7
weekly: 4
monthly: 6
yearly: -1
With the settings above I'll be instructing borg to keep unencrypted backups, without any passphrase, to back-up /etc and /var/spool/cron unless overridden, to keep backups up to 6 months, and I've defined a backup server with IP 192.168.1.5 where it will create ssh users named "borg_$hostname" and for each user will keep backups in /media/wdc/storage3TB/backup/borg_$hostname
If you want to have a replicated backup server, you can simply add it to the borgbackup_servers list.
Now - you may want to set different settings per host, so this is where we're going to do things. To set variables per host, ansible uses a hosts_var directory where variables are stored in a file with the name of the host (according to the inventory):
$ mkdir host_vars
$ touch host_vars/stingray host_vars/aldebaran host_vars/hc4
Inside a host_vars/$hostname file we can customize borg settings for that host. Here's an example. I'd like my aldebaran server to back up its mysql database (without a couple of tables), and also backup some other directories. I'd also like to skip some directories which are too big/not important:
$ cat host_vars/aldebaran
borgbackup_exclude:
- /home/adrianp/.vscode-server-insiders
- /home/adrianp/go
- /home/adrianp/development/kernel-n1-mainline
- /home/adrianp/developmnet/linux-hardkernel
- /home/adrianp/development/linux-hardkernel-5.2
- /root/development/linux
- /root/development/HandBrake
- /root/go
- /usr/share/hassio/homeassistant/home-assistant.log
- /usr/share/hassio/homeassistant/home-assistant.log.1
borgbackup_include:
- /etc
- /home
- /root
- /usr/local
- /var/www/html
- /usr/share/hassio
- /var/spool/cron
mysqluser: root
mysqlpass: "super_secret_stuff"
mysqlskip:
- homeassistant.events
- homeassistant.states
borgbackup_cron_hour: 2
borgbackup_cron_minute: 10
Because there are just a few systems, I'd like to time the backups, and have it run at a specific time in the night, so that it doesn't keep the HDDs spinned up for too long, that's why I'm specifying the cron hour and cron minute.
A much simpler config file for a simpler server looks like this, for example:
$ cat host_vars/hc4
borgbackup_include:
- /etc
- /var/spool/cron
borgbackup_cron_hour: 2
borgbackup_cron_minute: 09
Deploying the playbook
The playbook decides on which hosts it will run. By default it runs on all hosts defined in "hosts_to_backup" section in our inventory. You can change that and run on a specific host if you wish. Before we run, we need to add a few sections in our inventory:
$ cat hosts
# host definition
[hosts_to_backup]
stingray ansible_host=192.168.1.11 ansible_user=root
aldebaran ansible_host=192.168.1.5 ansible_user=root
hc4 ansible_host=192.168.1.17 ansible_user=root
#needed by the borgbackup role -- the servers to backup to
[borgbackup_servers]
192.168.1.5
# borgbackup servers where to create accounts.
[all_borgbackup_servers]
192.168.1.5
[borgbackup_management]
We're going to deploy it on all hosts and see what happens:
$ ansible-playbook -i hosts deploy-borg-backup.yaml
PLAY [hosts_to_backup] ******************************************************************* TASK [Gathering Facts] ******************************************************************* ok: [stingray] ok: [hc4] ok: [aldebaran] ... output ommited...
PLAY RECAP ******************************************************************************* aldebaran : ok=19 changed=1 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0 hc4 : ok=19 changed=1 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0 stingray : ok=21 changed=0 unreachable=0 failed=0 skipped=9 rescued=0 ignored=0
The nice thing about ansible playbooks is that - if done correctly, the tasks executed are idempotent. This means the result of the action is the same no matter how many times you execute it. So, if you run into problems, you can fix them and simply rerun the whole playbook again. Whatever was done won't break and things will get updated. If you later add more hosts, or change the directories to be backed-up, you'd simply rerun the playbook to apply the changes. This also means that any local changes done manually to the backup script will get overwritten.
Where are my backups?
Now, the playbook took care to install borg, create the accounts used by it, distribute ssh keys (/root/.ssh/id_rsa_borg), set up cron and create a customized wrapper script for each host. This customized script is /usr/local/bin/borg-backup. If you were to read it, you'd see it calls successive borg commands and may contain passwords for mysql or make references to backup encryption passwords. So, it contains sensitive information.
The playbook did not actually create any backups. You'll need to do it yourself, or wait for cron to call borg-backup. Note that the initial backup will probably take a long time to complete, because it will be a full backup, but follow-up backups should be faster (and smaller!). This is why it's a good idea to run "borg-backup backup" manually on each system, to make sure the backup process works correctly:
# borg-backup backup Backing up /etc to 192.168.1.5: repo - borg_hc4::_etc-20220331-1113 ------------------------------------------------------------------------------ Archive name: _etc-20220331-1113 Archive fingerprint: 1c9aac4c3c89dc2f8fbf076d3c619f5ec441a016217a56634881d14ebb93dba5 Time (start): Thu, 2022-03-31 11:13:37 Time (end): Thu, 2022-03-31 11:13:40 Duration: 3.08 seconds Number of files: 801 Utilization of max. archive size: 0% ------------------------------------------------------------------------------ Original size Compressed size Deduplicated size This archive: 2.74 MB 886.97 kB 878.66 kB All archives: 2.74 MB 886.97 kB 878.66 kB Unique chunks Total chunks Chunk index: 773 792 ------------------------------------------------------------------------------ Backing up /var/spool/cron to 192.168.1.5: repo - borg_hc4::_var_spool_cron-20220331-1113 ------------------------------------------------------------------------------ Archive name: _var_spool_cron-20220331-1113 Archive fingerprint: a70e642a5b1c9fcc634d2d999c74e282c996363a33dce54afc26b519ec8889c8 Time (start): Thu, 2022-03-31 11:13:43 Time (end): Thu, 2022-03-31 11:13:43 Duration: 0.03 seconds Number of files: 0 Utilization of max. archive size: 0% ------------------------------------------------------------------------------ Original size Compressed size Deduplicated size This archive: 836 B 696 B 696 B All archives: 2.74 MB 887.67 kB 879.36 kB Unique chunks Total chunks Chunk index: 775 794 ------------------------------------------------------------------------------ Pruning old /etc backups Pruning old /var/spool/cron backups
You will also get logs in /var/log/borg-backup.log that you can consult when there are problems.
You can view the existing backups with:
# borg-backup list
Archives on 192.168.1.5 :
_etc-20220331-1113 Thu, 2022-03-31 11:13:37 [1c9aac4c3c89dc2f8fbf076d3c619f5ec441a016217a56634881d14ebb93dba5]
_var_spool_cron-20220331-1113 Thu, 2022-03-31 11:13:43 [a70e642a5b1c9fcc634d2d999c74e282c996363a33dce54afc26b519ec8889c8]
And get more details with:
# borg-backup info _etc-20220331-1113
Archive name: _etc-20220331-1113
Archive fingerprint: 1c9aac4c3c89dc2f8fbf076d3c619f5ec441a016217a56634881d14ebb93dba5
Comment:
Hostname: hc4
Username: root
Time (start): Thu, 2022-03-31 11:13:37
Time (end): Thu, 2022-03-31 11:13:40
Duration: 3.08 seconds
Number of files: 801
Command line: /usr/local/bin/borg create --compression auto,zlib,6 --stats ssh://borg_hc4@192.168.1.5:22/media/wdc/storage3TB/backup/borg_hc4/borg_hc4::_etc-20220331-1113 /etc --exclude-if-present .borg_exclude
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 2.45 MB 813.68 kB 878.66 kB
All archives: 2.74 MB 887.67 kB 879.36 kB
Unique chunks Total chunks
Chunk index: 775 794
Keeping an eye on backups
As you may know, backups which are unsupervised will end up failing silently, filling up the disk or breaking in some way, which you will find out only when you desperately need to restore some backup. It would be nice to keep an eye on what is backed up and raise an alert if a specific backed-up directory is below a certain size. This should tell you early on when something is wrong.
About a year ago I switched my home monitoring from munin to prometheus and grafana and I'm happy with them. So, my backup checking strategy is built-into prometheus. To continue from here you should have prometheus node exporter deployed on your backup server, with a central prometheus instance that gathers the metrics, an alert-manager instance that triggers alerts and a grafana instance to draw some nice dashboards. Since setting up prometheus/alertmanager/grafana is a large subject, it's not handled here, you should search for tutorials elsewhere and return here once your environment is already setup.
Prometheus node-exporter has a textfile collector that allows you to gather metrics from text files. On the backup server we can run a script in a cron job to check the backup size of today's backups and expose data about the backups to prometheus.
Instructions on how to set it up can be found here: https://github.com/mad-ady/prometheus-borg-exporter. Note that if you're not planning on using prometheus, you can still modify the script to send an email when there's an issue with today's backups.
These metrics are best viewed in a grafana dashboard, customized based on the data exposed by the node-exporter script. You can get the dashboard from here and install it: https://grafana.com/grafana/dashboards/14516. Note, that since the borg-exporter scripts runs once a day, the data in the dashboard will update daily.
In the end your dashboard should look something like this:
In order to get alerts, you'll need to add something like this to prometheus alert.rules file:
- name: borgbackup
rules:
- alert: hosts_without_a_backup_today
expr: borg_archives_count_today == 0
labels:
severity: warning
annotations:
summary: "Backups missing for server {{$labels.host}}"
description: "[{{$labels.host}}] Missing backups for {{$labels.host}} on {{$labels.backupserver}}"
- alert: errored_mysqldumps
expr: borg_last_size{archive="mysqldump"} < 10000000
labels:
severity: warning
annotations:
summary: "mysqldump backup for {{$labels.host}} is too small ({{ humanize $value}})"
description: "[{{$labels.host}}] mysqldump backup for {{$labels.host}} on {{$labels.backupserver}} is too small ({{ humanize $value}})"
These alerts should then be exported by alertmanager to email/slack or whatever alerting method you set up.
What if I want to restore?
Ah, yes... You may need to restore at some point, right? That's why you've invested so much effort into backups... Ok, fine, let's see how to restore...
There are two cases:
1. You want to restore some old data on the same host from where it was backed up
This is actually simple. The borg-backup script has a mount option that lets you do that.
First, locate the backup you want to restore. Use borg-backup list and grep to get the backup's name:
# borg-backup list | grep _etc | grep 20220331
_etc-20220331-1113 Thu, 2022-03-31 11:13:37 [1c9aac4c3c89dc2f8fbf076d3c619f5ec441a016217a56634881d14ebb93dba5]
Next, you need to specify the backup server, backup and mount point (where to mount the backup) and it should get mounted:
# borg-backup mount 192.168.1.5 _etc-20220331-1113 /media/backup
Backup mounted on /media/backup, do not forget to unmount!
# mount | grep /media/backup
borgfs on /media/backup type fuse (ro,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions)
# ls -l /media/backup/
total 0
drwxr-xr-x 1 root root 0 Mar 30 14:39 etc
# du -sh /media/backup/etc
2.9M /media/backup/etc
2. You want to access the backup on a different system
Step 1 - access the backup. This may be tricky, since access is over ssh, and the account that holds the backup for a host (let's say hc4) accepts only the key of that host. If you ever want to access the backup from a different host (let's say stingray), you'll need to add stingray's ssh key (/root/.ssh/id_rsa_borg.pub) to the borg_hc4 account on the backup server (~borg_hc4/.ssh/authorized_keys). Note that you need to keep the "command=" variable that is before the public key (the same as with the first line). Simply duplicate the first line and just replace the public key.
Step 2 - list the backups. Sadly the borg-backup script is hardcoded to access just specific repos, but if you know the repo/backup names, you can do it manually. So, for the repo hc4 we want to access the /etc folder backed up on 20220331 - let's first create the URL needed to access it. To do this, we'll try to list all backups in the repo and filter for those that contain _etc or the desired date, to get an idea of what the name is.
Here is my URL - which gets assigned to the special BORG_REPO shell variable:
BORG_REPO="ssh://borg_hc4@192.168.1.5:22/media/wdc/storage3TB/backup/borg_hc4/borg_hc4"
To break down the BORG_REPO variable:
- borg_hc4 - is the ssh user for the repo on the backup server. Its name was chosen based on your ansible configuration
- 192.168.1.5 - is the IP of the backup server
- /media/wdc/storage3TB/backup is the base path on your backup server where the borg backup accounts live
- borg_hc4 is the repo name (same as the username)
In addition to the correct BORG_REPO, there are a couple other shell variables that you need to use as well:
# BORG_RELOCATED_REPO_ACCESS_IS_OK=yes BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes BORG_REPO="ssh://borg_hc4@192.168.1.5:22/media/wdc/storage3TB/backup/borg_hc4/borg_hc4" borg list
Warning: Attempting to access a previously unknown unencrypted repository!
Do you want to continue? [yN] yes (from BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK)
_etc-20220331-1113 Thu, 2022-03-31 11:13:37 [1c9aac4c3c89dc2f8fbf076d3c619f5ec441a016217a56634881d14ebb93dba5]
_var_spool_cron-20220331-1113 Thu, 2022-03-31 11:13:43 [a70e642a5b1c9fcc634d2d999c74e282c996363a33dce54afc26b519ec8889c8]
Step 3 - Once you've located the desired backup, you can mount it:
# BORG_RELOCATED_REPO_ACCESS_IS_OK=yes BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes BORG_REPO="ssh://borg_hc4@192.168.1.5:22/media/wdc/storage3TB/backup/borg_hc4/borg_hc4" borg mount $BORG_REPO::_etc-20220331-1113 /media/backup
# mount | grep /media/backup
borgfs on /media/backup type fuse (ro,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions)
# ls -l /media/backup
total 0
drwxr-xr-x 1 root root 0 Mar 30 14:39 etc
# du -sh /media/backup
2.9M /media/backup
This concludes our epic journey through the wonderful world of backups! May you never need them in the future!
Comments