Rsync.net - ZFS Replication from Proxmox using Syncoid

Hello!

A long time since I’ve posted, but I’m just going to dive right into it.

The back story

Recently I decided to up my backup game and start taking full advantage of ZFS. I use Proxmox as my main hypervisor and have been using ZFS as the back-end file system for a while. Previously I had been using Proxmox’s built in snapshot facility, but mainly just using it to take manual snapshots prior to updates etc. I decided it was high time to automate this process and decided to take Sanoid for a whirl to see if it would fit my use case. Using Sanoid turned out to be a great decision! It’s policy based snapshotting was exactly what I needed for easy to use, automated snapshot’s.

Getting backups off site

Whilst Sanoid was a great fit for taking snapshots I still had to solve the issue of creating an off site (and also off system) backup.

On my search I looked at a fair few solutions, and whilst many of them were adequate I really wanted to use ZFS’s native send and receive tools, whilst it is not necessary to backup the snapshots using this method, it felt such a waste to not use this native ZFS functionality.

After a short amount of searching I found rsync.net, they currently run all there servers with ZFS and as well as this they offer native ZFS backup functionality, via a lightweight VM running FreeBSD. The price is pretty decent at $0.25/GB however in order to be allowed to use there native ZFS offering you have to have order a minimum of 1TB per month, but on the plus side there are no ingress/egrees fees.

Now that I had my snapshots being managed automatically and a place to replicate my snapshots to, I just needed to get an automated process in place!

Enter Syncoid - and hours of troubleshooting!

Syncoid is a tool bundled as part of Sanoid used to make the replication of datasets easier and automatic, no need to write you own wrapper around the sync/receive commands, the work has already been done for you! I’m not going to delve into the ins and outs of Syncoid, it has many options and good documentation.

Whilst I have now figured out how to get my snapshots replicating incrementally, I can’t say this didn’t come with turmoil. The first sync I performed took a long time to complete (which is expected as I was pushing GB’s of data over the internet). The second sync however, I kept encountoring the same issue (I’ve redacted some of the host names, but the error is the same):

syncoid -r --compress=none --sendoptions=w --no-privilege-elevation --sshkey=/root/.ssh/sync tank/example root@redacted.rsync.net:data1/data WARN: mbuffer not available on target ssh:-S /tmp/syncoid-root-root@redacted.rsync.net-1636497823 root@redacted.rsync.net - sync will continue without target buffering. Sending incremental tank/example@syncoid_localzfshost_2021-11-09:22:30:22 ... syncoid_localzfshost_2021-11-09:22:43:45 (~ 4 KB): 2.53KiB 0:00:00 [8.10KiB/s] [====================================================================================> ] 63% mbuffer: error: outputThread: error writing to <stdout> at offset 0x0: Broken pipe mbuffer: warning: error during output to <stdout>: Broken pipe CRITICAL ERROR: zfs send -w -I 'tank/example'@'syncoid_localzfshost_2021-11-09:22:30:22' 'tank/example'@'syncoid_localzfshost_2021-11-09:22:43:45' | pv -s 4096 | mbuffer -q -s 128k -m 16M 2>/dev/null | ssh -i /root/.ssh/sync -S /tmp/syncoid-root-root@redacted.rsync.net-1636497823 root@redacted.rsync.net ' zfs receive -s -F '"'"'data1/data'"'"' 2>&1' failed: 512 at /usr/sbin/syncoid line 786.

As you can see Syncoid will print the zfs command’s that it is trying to run and running these commands manually produced the same error, it appears that the issue is coming from mbuffer. I attempted to remove mbuffer from the command and to my surprise it worked! The problem I then encountered was how to stop Syncoid attempting to use mbuffer, I’ll save you the hassle, you can’t.

I googled, and googled, and even duck-duck-go’d and couldn’t pin down what the issue was the, closest I got was this post on /r/zfs which discusses the possibility being of the error being that the user on the remote side does have the correct permissions to receive the snapshots, but as can be seen from the output above I was connecting as the root user! Hours more googling ensued and a I find this issue on the Sanoid github which looked promising, but alas nothing.

The Solution!

I decided to take it back to basics I went to re-read the readme file on github but before I did something caught my eye:

https://github.com/jimsalterjrs/sanoid/blob/master/FREEBSD.readme

A seperate readme fie for FreeBSD! Rsync.net (proudly) use FreeBSD as the back end OS for there infrastructure and also the VM they provide you for to use with the ZFS pool.

As Jim Salter states in the separate readme, Syncoid requires that the users to be using a bourne derived shell, or it will error, this was it!

FreeBSD by default sets the root users shell to csh, which is not bourne derived, all that was needed was to create a user that uses /bin/sh and give it ZFS permission’s - it turned out that this was also un-necessary, I looked in /etc/passwd and there was a entry staring me in the face:

toor:*:0:0:Bourne-again Superuser:/root:

It was right there all along! I just needed to adjust Syncoid to use the ’toor’ user instead of root and that was it! Incremental snapshots working!

The Conclusion

So the lessons learned:

  1. Read the Documentation thoroughly before using software!
  2. Remember that not all OS’s use the bourne shell by default
  3. READ THE DOCUMENTATION!

Cheers guys!

DrDisgust