May Contain Blueberries

the sometimes journal of Jeremy Beker

Hey friends, I hope this helps someone not spend a few hours tracking down this issue like I did. Due to the recently released CVE-2022-32224, I needed to update our version of Rails to the proper version, However when I did so, I started getting the following exception all over my code with a very unhelpful stack trace:

Psych::DisallowedClass: Tried to load unspecified class: Symbol

Sadly, searching did not turn up any useful leads. Only when I thought to go look at the commit in the rails code itself, did the solution become apparent.

I needed to add Symbol to the allowed YAML safe load classes in my environment files like this:

ActiveRecord.yaml_column_permitted_classes = [Symbol]

I hope this can help someone else.

These ideas have been bouncing around my head for a bit and resulted in a bunch of random tweets but I was finally prompted to write after getting an inquiry from a recruiter that contained this:

Selling Points: -- This is a Hybrid work situation where you can work 2 days from home.

I have worked remotely for Food52 from Virginia for nearly 7 years. When I started as a remote engineer, this was an uncommon situation. There existed companies that were fully remote, but they were the exception, not the rule. This has slowly shifted, but the pandemic has resulted in a rapid realignment at technology companies that is becoming clear in how recruiting and hiring is done.

I have been seeing this shift from two sides; as an principal level engineer myself and as a participant in the hiring process inside my company.

As a company that has hired software engineers remotely (sometimes enthusiastically and sometimes reluctantly), I have argues that pre-pandemic this gave us a significant advantage. As the majority of technology companies restricted themselves to candidates in their area or those willing to relocate, it gave us an advantage of being able to pull talent from anywhere. In addition, it allowed us to offer salaries that were often higher than the local averages in the regions where people lived.

Over the last 6 months the number of recruiters reaching out to me regarding new roles has skyrocketed. I counted recently and in the last 2 months alone I have received over 60. 2 years ago, I might receive 2 or 3 a month.

So what has changed? I have not magically gotten more talented. The market has shifted.

I believe that many companies have realized after being forced to go remote for all of their staff that the fears they had about remote work was unfounded. And the smart ones have realized that they can expand the potential market for new engineers beyond the limited borders they had before. Especially for companies based in previously high cost regions (Silicon Valley) they are able to offer the same salaries but pluck the best talent from regions where those salary levels are unheard of.

This shift has vastly changed the power dynamics. The last time I was looking for new work the employers held most of the cards. I live in a region with more limited employment opportunities. So, an offer like the one above, of working 2 days at home would seem truly like a selling point. But today, I am able to field offers from the entire country (and probably further afield if I wanted) and expect that 100% remote is something I can demand. This is great for me as a software engineer.

For companies, I think it is probably a mixed bag, depending on how much they are willing to embrace these new standards. It will also be hard on companies who counted on only needing to compensate employees based on the local salary expectations. I have seen this in our hiring at Food52 (shameless plug: come work for us). For more senior technical roles, we have found it much more difficult to find exceptional applicants. It is my belief that this is purely because we are now competing on a much larger field.

We are still in the middle of a realignment. It will be interesting to see how it falls out.

Aspirationally, I hope that this change in dynamic will be especially good for underrepresented groups in engineering for whom the additional flexibility will offer them more opportunities to access employment markets they were shut out of. People who may need to stay outside of the tech hot spots because they need to live near family for childcare or so that their spouse can have a job that they love or they can live in a community that is accepting of who they are. We can only hope.

It still suprises me sometimes when I run into a clear parallel between experiences I have had in the real world and those in computing. Years ago in an old job I ran a software team that designed software for running large warehouse systems full of robotics and automated conveyor belts. In these systems we were responsible for routing boxes or pallets around a system efficiently. In order to do so, one had to take into account the capacity of the various conveyor belts as well as how quickly the robotics could move things to their shelves. It was a balancing act of different throughputs.

In my current job I have a similar problem that I have been slowly optimizing that is entirely electronic in nature. The situation can be simplified like this:

  • A large file is generated on a server online once a day (on the order of 100GB)
  • It needs to be restored to a local development server for me to work with
  • This restoration is done when I need new data, so I initiate it when needed
  • Getting the absolute latest backup restored is not important. When I do a restore, if the data is 24-48 hours old, that is fine

A few other items of note:

  • My home internet connection is 100 Mbit/s
  • My internal home network is 1000 Mbit/s
  • The destination machine is writing to an SSD that has a write throughput way faster than everything else

The process that has existed for doing this was created before I came to the company and has evolved over the years with a burst of activity recently (because I was bored/annoyed/curious to make it better).

1: Copy Then Restore

This is the first method that was used and is the simplest. It was a two step process that was not optimized for speed. The process was:

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local] The script would download the file from the remote host and store it on the local machine
  • [local] This would remove the old data, xz -d the file into tar and extract it into the destination directory

This had the benefit in that you could download the big file once and then retore with it as many times as you wanted. It also took advantage of xz’s very high compression ratios so the file transfered over the slowest link was as short as possible. The downside was that it was still slow to get a new file to your machine before you could do the restore. It is highly dependent on your internet speed at the time of getting a new backup file.

2: Stream restore

This was the first major optimization that was made to the system. It took the assumption that doing multiple restores from the same backup was unlikely and that by the time you wanted to restore a second time you also wanted a new, more up to date, backup. It also dealt with the issue that storing the compressed backup file prior to restoring took up disk space that was getting non-trivial.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local] This would initiate an ssh session with the remote server, and cat the file across it directly into xz -d and then tar and extract it into the destination directory

This removed the “store and restore” problem of the first solution and since everything locally being done was faster than my internet connection, the transfer time became the bottleneck.

3: Local copy, stream restore xz compressed data (24 minutes)

Realizing that the bottleneck was now transfering the backup from our remote system to my house, I wondered if I could just get rid of that step from the critical path. I am lucky in that I have a server that is running 24/7 at home that I could schedule things on, so I realized that I could get a copy of the backup to my house overnight before I needed it. This became a combination of #1 & #2.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local] This would initiate an ssh session with my local server, and cat the file across it directly into xz -d and then tar and extract it into the destination directory

This had a lot of benefits in that now I could restore as fast as I could move data across my local network. I used this solution for quite a while before trying to improve it. It is also where I started keeping track of how long it took as I made improvements.

4: Local copy, stream restore uncompressed data (16 minutes)

With this solution, I started monitoring the throughput of data through the various pipelines and I was suprised to find out that the bottleneck was not actually my local network. It turns out it was in decompressing the xz compressed file on the destination server before it could be run through tar. It turns out that while xz has very high compression rates, it can’t sustain high data rates even when decompressing.

So, I figured why not add the decompression to the overnight task?

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local server] decompress the xz file on the local server and just store the raw tar file
  • [local] This would initiate an ssh session with my local server, and cat the file across it directly tar and extract it into the destination directory

This had a significant benefit and got my restore time down to 16 minutes, which was a nice bump. However, I was now restricted by my home network as I was saturating that network link.

5: Local copy, stream restore zstd compressed data (7 minutes)

Knowing that I could saturate my network link, I saw that I was not utilizing the full write speeds of the SSD. I knew that the only way to get more data into the SSD was to have compressed data over the wire. However as I learned in #3, xz was not fast enough. I had read a few articles about zstd as a compression algorithm that was both CPU efficient and optimized for high throughput. So I figured if I could compress the data across the wire it would expand on the destination system to faster than wire speeds.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local server] decompress the xz file on the local server and just store the raw tar file
  • [local] This would initiate an ssh session with my local server, compress the file using zstd (using the fast option) as it was sent over the ssh connection then decompress using zstd on the destination server before piping it into tar and extract it into the destination directory

This got me really close and down to 7 minutes. But still I wondered if I could do better. I couldn’t use a very high compression setting for zstd when it was inline and keep the network saturated.

6: Local copy, recompress using zstd, stream restore compressed data (5 minutes)

Given that I wanted to send zstd compressed data over the wire, there was no reason to do that compression at the time of restore. I could do it overnight. This had the benefits of being able to use a higher compression level and remove it from the time critical path.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local server] decompress the xz file on the local server and then recompress it using zstd
  • [local] This would initiate an ssh session with my local server, and cat the compressed file across, run it through zstd and then into tar and extract it into the destination directory

This is, I think, the best that I can do. I am playing with the compression levels of the overnight zstd run, but higher levels don’t seem to be doing much better (and may be impacting decompression speeds). I am seeing about a 3.5-4x reduction in file size.

I think at this point I have moved the bottleneck all the way down to tar and the SSD itself, so I’m quite happy.

Day to day operations

Ironically doing these retores very quickly isn’t as big a deal as they used to be. This used to be the only way that we could restore our development systems to a pristine state if we were making large changes. It was a pain and people dreaded having to do this (and would therefore work with messy data).

To alleviate that problem, I changed our local development systems from using ext4 filesystems for the data storage of the uncompressed, un-tard data to zfs. One of the many awesome things about zfs is filesystem snapshots. So now, once the various restore scripts finish restoring a pristine set of the data, they mark a snapshot of the filesystem at that point in time. Then, whenever we need to reset our local machines, we can just tell the filesystem to roll back to that snapshot. And this takes less than a minute. So on a daya to day basis when one of our developers needs to clean there system, but doesn’t need to update their data, they can do so very quickly. This has been a game changer for us but I still wanted to make the full restore go faster (for me at any rate).

Conclusions and asides

To be clear, these speed improvements aren’t “free.” I am basically throwing CPU resources (and therefore electricity) at the problem. I am using one algorithm xz to get the file as small as possible for the slowest link, then switching to zstd because of its fast decompression speed. I am also trying to not break compatibility with other people who use the xz compressed file and don’t want/have the infrastructure and setup to run this as a multistep process.

I also found as part of this that my CPU cooler on the server that was storing/recompressing the archives was not properly seated, so I kept overheating the CPU until I fixed that. But once fixed, I was confident it could handle any high loads.

Thanks for coming along on this mostly pointless journey.

I don’t feel that I have any great insights that are not already being shared by people way smarter than me. But as I have heard in many forums: “silence is complicity” and I want my name to publicly be next to the simple statement that black lives matter. While the horrible and preventable deaths of George Floyd and Breonna Taylor in recents weeks has brought this to everyone’s minds I can’t forget the deaths in recent years of Eric Garner, Trayvon Martin, or the hundreds and thousands of human beings who just happen to have a different skin color than me that have been killed by racism since our country’s founding. All of their lives mattered and were cut short by a system that valued their lives less than they would mine. Racism is and has been present in the United States since it founding.

I am a straight, cisgender, wealthy, white, male, software engineer in my mid 40s; I am a living example of privilege in our society. Have I done enough to help end racism in my community? In my country? I don’t really know what enough is, but I am sure the answer is no. I do some. I will continue to do more.

All I ask is that, if you can, do something to help fix this. Maybe it is going to a march. Maybe it is giving money to a worthwhile organization. Maybe it is just saying publicly what you believe. Whatever you do, please vote. Change needs to be driven by all of us, but our choice of leadership will help.

Nowhere close to exhaustive list of resources:

A few months back as I was getting ready to go to XOXO I was reminded of the early days of computing and the advent of the classic text adventure. Even though i never really played them regularly as a kid, I remember hearing about Zork and enjoying the idea of the simple interface (although it is combined with the frustration the few times I did play them of figuring out what to type). However, this simplicity of interface, a typed command followed by a description of an environment or result of an action brought to mind the back and forth of a Twitter conversation. And that thought spawned this idea; a melding of the old and new.

It seemed an easy enough proposition to hook up one of the older text adventure games to a Twitter bot that would allow one to send commands and receive the output. Mapping the conversation thread between a user and the bot as a single adventure with the bot maintaining persistence on the back end.

The original Colossal Cave Adventure has been dutifully reconstructed and the code is available as open source. With some slight modifications to work around some situations where multi-line answers were required and allow saving without penalty, I was able to get the code in a form that matched a back and forth Twitter exchange.

A simple ruby application acted as the go between for the old C program and Twitter. Each conversation was saved off based on the username of the recipient and commands sent back and forth.

And Colossal Cave Bot was born.

It was a fun little exercise that a few people played around with and seemed to have fun with. The code on the ruby side is not pretty but I might at some point clean it up and share it.

(To those who find this in the future, apologies if the bot is no longer running.)

The recent redesign of this blog ended up being a lot more than just a visual redesign. In many ways it has come full circle. In the earliest incarnation of May Contain Blueberries, I hand built HTML pages for each entry. As that became more annoying I switched over to Movable Type in all its Perl glory. But as I was doing my own development in PHP, I eventually transitioned to Wordpress and while that has served me well (and the other users on my server), it was time to move on.

The biggest reason was maintenance driven by security. For a blog that I don’t write on very often (sadly once a year seems about the norm), I was having to apply security patches at least monthly, if not more often. The benefits of a fully dynamic blog platform was its downfall. Wordpress is such a large piece of software it seems that no end of security issues are being discovered. And as a solo system administrator I need to minimize my attack surface as much as I can.

I took two paths. There were quite a few blogs that are no longer being used on my server, so I took static “snapshots” of them. While this did break inbound links, they weren’t highly trafficed so I was ok with that (and search engines will figure it out).

For my blog, I decided to move to a static site generator. This has the benefits that I wanted such as templates, auto-generated index pages and such, but ends up serving plain HTML eliminating that security issue. I don’t need onsite interactivity so why pay the costs for it. I chose Jekyll. It allows me to write posts in Markdown and automatically build the site and push it to my server in one step.

On to the new thing! Who knows, I might write more (probably not).

There is much discussion on the internet about the wisdom of running one’s own mail server and it includes much valid criticism. There are significant security concerns beyond the normal amount of maintenance of any system. For reasons varied and irrelevant here, I have chosen to do so for over 15 years.

The aspect of doing so that is often not discussed when compared to using commercial services such as Gmail is that one has to deal with spam entirely on your own. This is difficult for (at least) 2 reasons. The obvious being that it is a hard problem to deal with, the not so obvious is I only have 2 users of my mail server, so I don’t have the ability to allow my users provide input in identifying spam.

For many years I have relied on tools such as Spamassassin to try to identify spam once it has reached my mail server. I also make use of various blacklists to identify IP addresses that are known to deliver spam. I use Mailspike and Spamhaus.

This is the situation I was in up until this past weekend. Hundreds of emails a day would slip past the blacklists. Spamassassin is very good but it was still allowing around 5% of those messages to reach my inbox. And the problem seemed to be getting worse.

In the past, I had used greylisting but I eventually stopped given the main side effect of that system; messages from new senders, legitimate or otherwise, would get delayed by at least 5 minutes. This is fine for most emails, but for things like password resets or confirmation codes, was just too much of an inconvenience.

What I wanted was a system where messages that are unlikely to be spam make it right through and all others get greylisted.

My boss mentioned a solution that he implemented once that decided to greylist based on the spam score of the inbound emails. This allowed him to only greylist things that looked like they might be spam. Unfortunately, looking at the emails that were slipping through my existing system, they generally had very low scores (spammers test against Spamasassin).

So, I pursued a different solution. Several services provide not only blacklists but also whitelists that give a reputation score for various IP addresses around the internet. I chose to use the whitelist from Mailspike and DNSWL

I implemented a hierarchy:

  • Accept messages from hosts we have manually whitelisted
  • Reject messages from hosts on one of the watched blacklists
  • Accept messages from hosts with a high reputation score
  • Greylist everything else

When I enabled this ruleset, I thought I had broken things. I stopped getting any email coming into my system. It turns out that I had just stopped all the spam. It was amazing.

In the two days I have been running this system, every legitimate email has made it to my inbox. I have seen 10-15 messages get through the initial screens and been correctly identified as spam by Spamassassin. (In the early stages I had a few messages make it to my inbox but I realized that was because I trusted the whitelists more than the blacklists. I.e. hosts were listed as both trustworthy and sending spam. As the Blacklists seem to react faster, I decided to switch the order as shown above.)

You can look at the graphs to see when I turned the system on:

This first graph shows the number of messages that were accepted by my server (per second). You can see that the number dropped considerably when I turned on my hybrid solution. Since messages were getting rejected before they were accepted by the system, there are less messages for Spamassassin to investigate.

This can be seen here, where the number of messages identified as spam also went down because they were stopped before Spamassassin even needed to look at them.

If you run postfix and would like to implement a similar system, here is the relevant configuration section from my

smtpd_recipient_restrictions = permit_mynetworks,
    check_client_access hash:/usr/local/etc/postfix/rbl_override,
    permit_dnswl_client[ 18..20 ],
    permit_dnswl_client[ 0..255 ].[ 2..3 ],
    check_policy_service unix:/var/run/postgrey.sock,

So, overall this has been a resounding success. I hope this helps some of you out there with the same challenges.

I like to read. It is my chosen form of escapism. After participating in the Goodreads 2015 Reading Challenge I thought it might be fun to gather some further statistics about how my reading changed throughout the year. My goal had originally been 25 books, which I way exceeded with a count of 36. So, for 2016, I set it to 35 books. Some observations:

  • I started a new job at the beginning of March, so that kept me busy
  • May was high in page count due to rereading (I can reread faster than I can read a fresh book)
  • Tiffany and I went on vacation in September, so lots of books there


And here is a nice collage of the titles taken from Goodreads.


What is your reading goal for 2016?

Welcome to another of my end of the year, oh-god-I-didn’t-blog-all-year posts. I’ve been trying to go through some of the geeky things I did this year that were a challenge for me and document them so that they might be easier for someone else.

Today’s topic: setting up a VPN server using a Cisco router for “road warrior” clients (aka, devices which could be coming from any IP address).

As should come to no surprise to anyone who knows me or who is exposed to my twitter stream I value privacy and security both from a philosophical perspective but also just as fun projects to tackle.

This project arose as an evolution of earlier VPN setups I have had in the past. When I was living in the linux world (and before I purchased my Cisco router), I used a linux server as my internet router. If you are in that situation, I highly recomend using the strongSwan VPN server. It is an enterprise grade VPN server that is also easily configured to handle small situations. I often had multiple VPN tunnels up for fixed connections that were both site to site and for roadwarriors using both pre-shared keys (PSK) and X509 certificates.

But when I upgraded our home network to using a Cisco 2811 router that I bought from a tech company liquidation auction for $11.57, running the strongSwan VPN from behind the NAT router became much more challenging. (Doable, but required some ugly source routing hacks I never liked.)

My requirements were:

  1. IPsec
  2. Capable of supporting iOS and Mac OS X clients
  3. Clients could be behind NATs (NAT-T support)
  4. Pre-shared Key support (I might do certificates again later, but as there are only 2 users of the VPN, seems like overkill.)
  5. All traffic from the clients will be routed through the VPN (no split-tunnels)
  6. Ability to to do hairpin routing. (This means that a VPN client can tunnel all of their traffic, including that destined to the rest of the internet, to the VPN server and it will be able to route it back out to the internet. This is critical for protecting your clients on untrusted networks.)

The biggest challenge that I ran into was not the lack of capabilities of the Cisco platform, but the fact that it is designed for much much larger implementations that I was going to do. In addition, most of the examples were for site-to-site configurations.

I don’t intend to go through all of the steps needed to set up a Cisco router, that is beyond the scope of this post, so I will be making the following assumptions.

  1. You are familiar working in the IOS command line interface
  2. You already have a working network
  3. It has a single external IP address (preferably a static IP)
  4. You have 1 (or more than 1) internal networks
  5. Internal hosts are NAT translated when communicating with the Internet
  6. You are familiar with setting up your ip access-list commands to protect yourself and allow the appropriate traffic in and out of your networks

OK, let’s go!

Note: For my setup, FastEthernet0/0 is my external interface (set up as ip nat outside)

User & IP address setup

Set up a user (or more than one) that will be used to access the VPN.

aaa new-model
aaa authentication login AUTH local
aaa authorization network NET local
username vpn-user password 0 VERY-STRONG-PASSWORD

And set up a pool of IP addresses that will be given out to users who connect to the VPN.

ip local pool VPN-POOL

ISAKMP Key Management

ISAKMP is the protocol that is used to do the initial negotiation and set up keys for the VPN session. First we will set up more general settings such as the fact we will be using 256 bit AES, PSKs, keepalives, etc.

crypto isakmp policy 1
 encr aes 256
 authentication pre-share
 group 2
 lifetime 3600

crypto isakmp keepalive 10

We will then set up the group which represents our clients. This includes setting paramaters for your clients, such as the pool of IP addresses they will get (from above), DNS servers, settings for perfect forward secrecy (PFS), etc.

crypto isakmp client configuration group YOUR-VPN-GROUP
 pool VPN-POOL

Finally, we will pull these items into a profile, vpn-profile, that can be used to set up a client.

crypto isakmp profile vpn-profile
   match identity group YOUR-VPN-GROUP
   client authentication list AUTH
   isakmp authorization list NET
   client configuration address respond
   client configuration group YOUR-VPN-GROUP
   virtual-template 1

IPSEC Paramaters

We set up the paramaters that define how IOS transforms (aka, encrypts and HMACs) the traffic on this tunnel and give it a name vpn-transform-set

crypto ipsec transform-set vpn-transform-set esp-aes esp-sha-hmac

Full IPSEC profile

Finally we link both the ISAKMP (vpn-profile) and IPSEC (vpn-transform-set) items together and give them a name ipsecprof that can be attached to a virtual interface (below).

crypto ipsec profile ipsecprof
 set transform-set vpn-transform-set
 set isakmp-profile vpn-profile

Virtual Template Interface

This caused me a bunch of confusion. Because we do not have a static site-to-site tunnel, we can’t define a tunnel interface for our VPN clients. What we do is set up a template interface that IOS will use to create the interfaces for our clients when they connect.

This needs to reference your external interface, which in my case is FastEthernet0/0.

interface Virtual-Template1 type tunnel
 ip unnumbered FastEthernet0/0
 ip nat inside
 ip virtual-reassembly
 tunnel source FastEthernet0/0
 tunnel mode ipsec ipv4
 tunnel protection ipsec profile ipsecprof

Other Notes

It is important that you have the appropriate access controls set up to restrict where in your network a VPN client can send packets. That is really beyond the scope of this post as it is very dependent on your configuration.

However, at a minimum, you need to allow the packets that arrive on your external interface for VPN clients to be handled. These packets will show up in a few forms.

You will need to add rules to handle these packets to your external, border, access lists.

ip access-list extended inBorder
 permit esp any host YOUR-EXTERNAL-IP
 permit udp any host YOUR-EXTERNAL-IP eq isakmp
 permit udp any host YOUR-EXTERNAL-IP eq non500-isakmp

Client Setup

Assuming all of this worked (and I transcribed things properly), you will be all set to configure a client. This should be a relatively easy configuration.

  • VPN Type: IKEv1, in iOS/Mac OS X this is listed as Cisco IPsec or IPsec
  • Server: Your public server IP or hostname
  • Pre shared key: VERY-STRONG-GROUP-KEY
  • User: vpn-user

Final Notes

Even though this setup uses users that are hard coded on your router, you may still want to set up a Radius server to receive accounting information so you can track connections to your VPN. It can also be expanded to do authentication and authorization for your VPN users.

I hope this was helpful to you. If you have any questions, please feel free to contact me via twitter @gothmog

As part of my transition from using a combination of Linux and FreeBSD for our home servers to being exclusively FreeBSD, I wanted to update how I did backups from my public server, bree, to the internal storage server, rivendell. Previously, I had done this with a home grown script which used rsync to transfer updates to the storage server overnight. This solution worked just fine, but was not the most efficient (see: ZFS Replication to the cloud is finally here-and it’s fast). While I didn’t intend to replicate to I wanted to leverage ZFS since I am now going FreeBSD to FreeBSD.

There are numerous articles about using zxfer to perform backups but there was one big hiccup that I couldn’t get over. Quoting the man page:

zxfer -dFkPv -o copies=2,compression=lzjb -T root@ -R storage backup01/pools

Having to open up the root account on my storage server, no matter how I restricted it to IP address, keys, whatever, makes me really uncomfortable and a show-stopper for me. But I thought I could do better. I have limited experience using restricted-shells to limit access to servers before and I knew that ZFS allows for delegating permissions to non-root users so I decided to give it a shot.

TL;DR: It can work.

The configuration had a few phases to it:

  1. Create a new restricted user account on my backup server and configure the commands that zxfer needs access to in the restricted shell
  2. Create the destination zfs filesystem to receive the mirror and configure the delegated permissions for the backup user
  3. Set up access to the backup server from the source server via SSH
  4. Make slight modification to zxfer to allow it to run zfs command from the PATH instead of hardcoding the path in the script

Setting up the restricted user

I created a new user on the backup system named zbackup that would be my restricted user for receiving the backups. The goal was for this user to be as limited as possible. It should only be allowed to run the commands necessary for zxfer to do its job. I landed on using rzsh as the restricted shell as it was the first one I got working with the correct environment. I set up a directory to hold binaries that the zbackup user was allowed to use.

root@storage$ mkdir /usr/local/restricted_bin
root@storage$ ln -s /sbin/zfs /usr/local/restricted_bin/zfs
root@storage$ ln -s /usr/bin/uname /usr/local/restricted_bin/uname

I then set up the .zshenv file for the zbackup user to restrict the user to that directory for executables.

export PATH=/usr/local/restricted_bin

Setting up the destination zfs filesystem

I already had a zfs filesystem that was devoted to backups so I made a new zfs filesystem underneath it to hold these new backups and be a point where I could set delegation points for permissions. Then, through trial and error, I figured out all the permissions I had to delegate to the zbackup user on the filesystem to allow zxfer to work

root@storage$ zfs create nas/backup/bree-zxfer
root@storage$ chown zbackup:zbackup /nas/backup/bree-zxfer
root@storage$ zfs allow -u zbackup atime,canmount,casesensitivity,checksum,compression,copies,create,
                          snapshot_count,snapshot_limit,sync,userprop,utf8only,volmode nas/backup/bree-zxfer

(I figured out the list of actions and properties that I needed to delegate by having zxfer dump the zfs create command it was trying to run on the backup system when it failed.)

Update: I forgot 1 thing that is critical to making this work. You need to ensure that non-root users are allowed to mount filesystems. This can be accomplished by adding the following line to your /etc/sysctl.conf and rebooting:


Remote access to the backup server

Nothing fancy here. On my source server, I created a new SSH keypair for the root user (no problem with running the source zfs command as root). I then copied the public half of that key to the authorized_keys file of the zbackup user on the backup server. At this point, I could ssh from my source server to the backup server as the zbackup user. But when logged in to the backup server, the only commands that could be run are those in the /usr/local/restricted_bin directory (zfs and uname).

Tweak zxfer script to remove hard coded path in zfs commands

One of the limitations (intentional) of a restricted shell is that the restricted user is not allowed to specify a full pathname for any commands. Only commands located in their PATH can be run. Unfortunately, while the zbackup user has the zfs command in their PATH, it is referenced as /sbin/zfs in the zxfer script. To work around this, I modified the zxfer script to not use the path of zfs directly and assume that zfs will be in the path. This was only in 2 places of the script. If you do a quick search for /sbin/zfs you will find them.

Moment of truth!

After all this, I was now able to run any number of commands to mirror my source servers zfs filesystems (with snapshots) to my backup server.

root@source$ zxfer -dFPv -T zbackup@storage -N zroot/git nas/backup/bree-zxfer
root@source$ zxfer -dFPv -T zbackup@storage -R zroot/var nas/backup/bree-zxfer

And best of all, the storage server does not have SSH enabled for root. Success.