It still suprises me sometimes when I run into a clear parallel between experiences I have had in the real world and those in computing. Years ago in an old job I ran a software team that designed software for running large warehouse systems full of robotics and automated conveyor belts. In these systems we were responsible for routing boxes or pallets around a system efficiently. In order to do so, one had to take into account the capacity of the various conveyor belts as well as how quickly the robotics could move things to their shelves. It was a balancing act of different throughputs.
In my current job I have a similar problem that I have been slowly optimizing that is entirely electronic in nature. The situation can be simplified like this:
- A large file is generated on a server online once a day (on the order of 100GB)
- It needs to be restored to a local development server for me to work with
- This restoration is done when I need new data, so I initiate it when needed
- Getting the absolute latest backup restored is not important. When I do a restore, if the data is 24-48 hours old, that is fine
A few other items of note:
- My home internet connection is 100 Mbit/s
- My internal home network is 1000 Mbit/s
- The destination machine is writing to an SSD that has a write throughput way faster than everything else
The process that has existed for doing this was created before I came to the company and has evolved over the years with a burst of activity recently (because I was bored/annoyed/curious to make it better).
1: Copy Then Restore
This is the first method that was used and is the simplest. It was a two step process that was not optimized for speed. The process was:
- [remote] the backup was taken and compressed on the server using
xz
at some point prior.
- [local]
download.sh
: The script would download the file from the remote host and store it on the local machine
- [local]
restore.sh
: This would remove the old data, xz -d
the file into tar
and extract it into the destination directory
This had the benefit in that you could download the big file once and then retore with it as many times as you wanted. It also took advantage of xz
’s very high compression ratios so the file transfered over the slowest link was as short as possible. The downside was that it was still slow to get a new file to your machine before you could do the restore. It is highly dependent on your internet speed at the time of getting a new backup file.
2: Stream restore
This was the first major optimization that was made to the system. It took the assumption that doing multiple restores from the same backup was unlikely and that by the time you wanted to restore a second time you also wanted a new, more up to date, backup. It also dealt with the issue that storing the compressed backup file prior to restoring took up disk space that was getting non-trivial.
- [remote] the backup was taken and compressed on the server using
xz
at some point prior.
- [local]
stream_restore.sh
: This would initiate an ssh
session with the remote server, and cat
the file across it directly into xz -d
and then tar
and extract it into the destination directory
This removed the “store and restore” problem of the first solution and since everything locally being done was faster than my internet connection, the transfer time became the bottleneck.
3: Local copy, stream restore xz compressed data (24 minutes)
Realizing that the bottleneck was now transfering the backup from our remote system to my house, I wondered if I could just get rid of that step from the critical path. I am lucky in that I have a server that is running 24/7 at home that I could schedule things on, so I realized that I could get a copy of the backup to my house overnight before I needed it. This became a combination of #1 & #2.
- [remote] the backup was taken and compressed on the server using
xz
at some point prior.
- [local server]
download.sh
: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
- [local]
stream_restore.sh
: This would initiate an ssh
session with my local server, and cat
the file across it directly into xz -d
and then tar
and extract it into the destination directory
This had a lot of benefits in that now I could restore as fast as I could move data across my local network. I used this solution for quite a while before trying to improve it. It is also where I started keeping track of how long it took as I made improvements.
4: Local copy, stream restore uncompressed data (16 minutes)
With this solution, I started monitoring the throughput of data through the various pipelines and I was suprised to find out that the bottleneck was not actually my local network. It turns out it was in decompressing the xz
compressed file on the destination server before it could be run through tar
. It turns out that while xz
has very high compression rates, it can’t sustain high data rates even when decompressing.
So, I figured why not add the decompression to the overnight task?
- [remote] the backup was taken and compressed on the server using
xz
at some point prior.
- [local server]
download.sh
: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
- [local server] decompress the
xz
file on the local server and just store the raw tar
file
- [local]
stream_restore.sh
: This would initiate an ssh
session with my local server, and cat
the file across it directly tar
and extract it into the destination directory
This had a significant benefit and got my restore time down to 16 minutes, which was a nice bump. However, I was now restricted by my home network as I was saturating that network link.
5: Local copy, stream restore zstd compressed data (7 minutes)
Knowing that I could saturate my network link, I saw that I was not utilizing the full write speeds of the SSD. I knew that the only way to get more data into the SSD was to have compressed data over the wire. However as I learned in #3, xz
was not fast enough. I had read a few articles about zstd
as a compression algorithm that was both CPU efficient and optimized for high throughput. So I figured if I could compress the data across the wire it would expand on the destination system to faster than wire speeds.
- [remote] the backup was taken and compressed on the server using
xz
at some point prior.
- [local server]
download.sh
: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
- [local server] decompress the
xz
file on the local server and just store the raw tar
file
- [local]
stream_restore.sh
: This would initiate an ssh
session with my local server, compress the file using zstd
(using the fast
option) as it was sent over the ssh
connection then decompress using zstd
on the destination server before piping it into tar
and extract it into the destination directory
This got me really close and down to 7 minutes. But still I wondered if I could do better. I couldn’t use a very high compression setting for zstd
when it was inline and keep the network saturated.
6: Local copy, recompress using zstd, stream restore compressed data (5 minutes)
Given that I wanted to send zstd
compressed data over the wire, there was no reason to do that compression at the time of restore. I could do it overnight. This had the benefits of being able to use a higher compression level and remove it from the time critical path.
- [remote] the backup was taken and compressed on the server using
xz
at some point prior.
- [local server]
download.sh
: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
- [local server] decompress the
xz
file on the local server and then recompress it using zstd
- [local]
stream_restore.sh
: This would initiate an ssh
session with my local server, and cat
the compressed file across, run it through zstd
and then into tar
and extract it into the destination directory
This is, I think, the best that I can do. I am playing with the compression levels of the overnight zstd
run, but higher levels don’t seem to be doing much better (and may be impacting decompression speeds). I am seeing about a 3.5-4x reduction in file size.
I think at this point I have moved the bottleneck all the way down to tar
and the SSD itself, so I’m quite happy.
Day to day operations
Ironically doing these retores very quickly isn’t as big a deal as they used to be. This used to be the only way that we could restore our development systems to a pristine state if we were making large changes. It was a pain and people dreaded having to do this (and would therefore work with messy data).
To alleviate that problem, I changed our local development systems from using ext4
filesystems for the data storage of the uncompressed, un-tar
d data to zfs
. One of the many awesome things about zfs
is filesystem snapshots. So now, once the various restore scripts finish restoring a pristine set of the data, they mark a snapshot of the filesystem at that point in time. Then, whenever we need to reset our local machines, we can just tell the filesystem to roll back to that snapshot. And this takes less than a minute. So on a daya to day basis when one of our developers needs to clean there system, but doesn’t need to update their data, they can do so very quickly. This has been a game changer for us but I still wanted to make the full restore go faster (for me at any rate).
Conclusions and asides
To be clear, these speed improvements aren’t “free.” I am basically throwing CPU resources (and therefore electricity) at the problem. I am using one algorithm xz
to get the file as small as possible for the slowest link, then switching to zstd
because of its fast decompression speed. I am also trying to not break compatibility with other people who use the xz
compressed file and don’t want/have the infrastructure and setup to run this as a multistep process.
I also found as part of this that my CPU cooler on the server that was storing/recompressing the archives was not properly seated, so I kept overheating the CPU until I fixed that. But once fixed, I was confident it could handle any high loads.
Thanks for coming along on this mostly pointless journey.