May Contain Blueberries

the sometimes journal of Jeremy Beker


At least the titles work together now.

I have spent more of my work life hiring new engineers than I have spent looking for work myself. One of the observations I have made when looking at hiring is that there is a high drop off at each stage of the interviewing process. So, you can’t think of just how many you need to hire, you need to think of how large your pipeline is to support the number of roles you want to hire.

Obviously every company and situation is different but I expect a 10x dropoff at each major stage:

  • 100 people apply
  • 10 people participate in interviews
  • 1 person is hired

This could probably further be broken down when you include initial phone screens and multiple rounds of interviews but this rubric has served me well when hiring. So, how has this worked out from the other side, when looking for work?

  • 25 applications
  • 5 (meaningful) interview processes
  • 2 late stage interviews
  • 1 job offer (accepted)

So these numbers are not the same as the other side, but I think interesting in that it was a factor of 5 dropoff at each stage.

Oh, yeah, I guess I buried the lede, I got a new job. The team was great to talk with, I like the company, and the work they do, so I am super excited. They were quite understanding that I need some time off to clear my head of my last job and the immediate job search so I don’t start for a few weeks. I will share more later.


I may have set my self up with that title, implying that I will have more thoughts later on in this process. Likely but not guaranteed.

It is now a little more than 3 weeks since I was let go from Food52. While I very much appreciate the advice I got from friends to take a break and rest before I jumped into a job search, that is not who I am. My anxieties would not let me sit in a state of unemployment without trying to make some progress towards finding a new job. It has become clear that I, like I am sure many, define a large part of their being based on their career. While I have not enjoyed the emotional rollercoaster of self worth, it is a good lesson to learn. I look forward to the day when I can retire and I need to make myself ready to define myself through the activities I love that are not my career.

While I wanted to start making progress I did not want to open the floodgates right away, so I have a roadmap of sorts for the finding roles that I could be interested in and apply for them:

  1. Share my situation with my network and see if anyone has any openings
  2. Find companies that I respect and look for openings there
  3. Reach out to recruiters that have contacted me in the past with interesting roles
  4. Mark myself as “available for work” on LinkedIn
  5. ???

At this point I am still in phases 1 & 2 which pretty much happened simultaneously. My network included friends, former coworkers, and the job boards on a few Slacks that I am a member of. I was not surprised to get compassion and positive feedback from my friends and former coworkers, but what did surprise me was the response from the various Slack communities I am a part of. While I participate in these groups, I am not very active an some of them are huge so I had not been expecting the reception my message got.

I was overwhelmed by the positive and supportive response from these communities. I had numerous people reach out with ideas and several with specific opportunities that they encouraged me to apply for. People were willing to schedule calls to talk about roles and give their time to help me, basically a stranger. This was truly touching and I can’t give my thanks enough.

A choice I have to make in my process is about the size of the company I want to work in. A criticism I had of my time at Food52 was that as the senior-most engineer I did not have a community to work with nor did I have a clear path of growth. I am finding in the roles that I apply to that I have a choice of looking at larger organizations that could bring with them a community of Staff+ engineers to work with. And that idea is very tempting. The other side is to look at the smaller companies and know that I could have a large influence on them in a broad set of topics. I am not ruling either out but it is interesting to me that the difference has been rather stark in the companies I talk with.

So at this point I am in later stages with a couple of opportunities that I am pretty excited about. My hope is that I do not have to advance to stages 3 and beyond, but I should know more in the coming weeks.


The only constant in life is change, they say. The last few months have definitely been that way. But change is an opportunity for new beginnings and for me, my time at Food52 has ended and I am starting the journey to find a new work home. Over the past few days I have been pondering what I want to do and with whom I would like to work. I’d like to share some of my thoughts here as I work through them. This change is exciting, scary, humbling, and even a bit confusing as I come out of nearly 8 years at Food52 (ok, 7 years, 10 months, 28 days if I am being pedantic).

The coincidence with the new year is not lost on me, a time of reflection on what direction I want to proceed in. Through my career I have worked for small companies and large, public and private, and as an individual contributor and in engineering management. Most recently I was a Principal Software Engineer. I appreciate this kind of role as it kept me hands-on in coding but also has some of the benefits of engineering management: mentorship, guiding technical direction, and visibility outside of the engineering organization. However, I have run engineering teams before and there are aspects of those that can also be very fulfilling within the right kind of organization.

At this moment, my focus is going to be mostly on Staff/Principal IC roles with the occasional management role thrown in if the company seems like a good fit. I’ve started writing down my wishlists which will certainly evolve over time. I am not really focused on the particular technology facets of the companies I am looking at, it is really the people and culture that matter to create a fulfilling environment.

Must haves:

  • remote friendly, preferable remote native
  • a mission that I believe in and is actually practiced day to day
  • solid DEI practices
  • teams that respect all of their engineers regardless of seniority
  • a respect for work/life balance
  • open and communicative leadership

Nice to haves:

  • missions that align with my personal interests: outdoors, travel, cooking
  • the ability to cross train in adjacent technology that I am not great at (yet)
  • small to mid size organizations (or well defined units within a large company)

No thanks:

  • business models based on manipulation of users’ behaviors
  • “hardcore”

If you have made it this far and would like to learn more about me, here are some useful links:

Here are some great resources I have found for aiding in a job search.

  • Angela Riggs’ Job Search Template offers a simple but highly effective template in Trello for keeping track of roles you are considering and where you stand in the application and interview process for each. It took what was already a mess of emails and notes I had been building at this early stage and made it much more manageable.

  • Interviewing is like speed dating, you only have small blocks of time to decide if the company you are talking to is a good fit for you. It reminds me of one of my favorite lines from Neal Stephenson’s Snow Crash: “Condensing fact from the vapor of nuance.” Charity Majors (who you should be reading in general) has a great article of questions to try to find those signals as you interview: How can you tell if the company you’re interviewing with is rotten on the inside?.


Hey friends, I hope this helps someone not spend a few hours tracking down this issue like I did. Due to the recently released CVE-2022-32224, I needed to update our version of Rails to the proper version, 6.1.6.1. However when I did so, I started getting the following exception all over my code with a very unhelpful stack trace:

Psych::DisallowedClass: Tried to load unspecified class: Symbol

Sadly, searching did not turn up any useful leads. Only when I thought to go look at the commit in the rails code itself, did the solution become apparent.

I needed to add Symbol to the allowed YAML safe load classes in my environment files like this:

ActiveRecord.yaml_column_permitted_classes = [Symbol]

I hope this can help someone else.


These ideas have been bouncing around my head for a bit and resulted in a bunch of random tweets but I was finally prompted to write after getting an inquiry from a recruiter that contained this:

Selling Points: -- This is a Hybrid work situation where you can work 2 days from home.

I have worked remotely for Food52 from Virginia for nearly 7 years. When I started as a remote engineer, this was an uncommon situation. There existed companies that were fully remote, but they were the exception, not the rule. This has slowly shifted, but the pandemic has resulted in a rapid realignment at technology companies that is becoming clear in how recruiting and hiring is done.

I have been seeing this shift from two sides; as an principal level engineer myself and as a participant in the hiring process inside my company.

As a company that has hired software engineers remotely (sometimes enthusiastically and sometimes reluctantly), I have argues that pre-pandemic this gave us a significant advantage. As the majority of technology companies restricted themselves to candidates in their area or those willing to relocate, it gave us an advantage of being able to pull talent from anywhere. In addition, it allowed us to offer salaries that were often higher than the local averages in the regions where people lived.

Over the last 6 months the number of recruiters reaching out to me regarding new roles has skyrocketed. I counted recently and in the last 2 months alone I have received over 60. 2 years ago, I might receive 2 or 3 a month.

So what has changed? I have not magically gotten more talented. The market has shifted.

I believe that many companies have realized after being forced to go remote for all of their staff that the fears they had about remote work was unfounded. And the smart ones have realized that they can expand the potential market for new engineers beyond the limited borders they had before. Especially for companies based in previously high cost regions (Silicon Valley) they are able to offer the same salaries but pluck the best talent from regions where those salary levels are unheard of.

This shift has vastly changed the power dynamics. The last time I was looking for new work the employers held most of the cards. I live in a region with more limited employment opportunities. So, an offer like the one above, of working 2 days at home would seem truly like a selling point. But today, I am able to field offers from the entire country (and probably further afield if I wanted) and expect that 100% remote is something I can demand. This is great for me as a software engineer.

For companies, I think it is probably a mixed bag, depending on how much they are willing to embrace these new standards. It will also be hard on companies who counted on only needing to compensate employees based on the local salary expectations. I have seen this in our hiring at Food52 (shameless plug: come work for us). For more senior technical roles, we have found it much more difficult to find exceptional applicants. It is my belief that this is purely because we are now competing on a much larger field.

We are still in the middle of a realignment. It will be interesting to see how it falls out.

Aspirationally, I hope that this change in dynamic will be especially good for underrepresented groups in engineering for whom the additional flexibility will offer them more opportunities to access employment markets they were shut out of. People who may need to stay outside of the tech hot spots because they need to live near family for childcare or so that their spouse can have a job that they love or they can live in a community that is accepting of who they are. We can only hope.


It still suprises me sometimes when I run into a clear parallel between experiences I have had in the real world and those in computing. Years ago in an old job I ran a software team that designed software for running large warehouse systems full of robotics and automated conveyor belts. In these systems we were responsible for routing boxes or pallets around a system efficiently. In order to do so, one had to take into account the capacity of the various conveyor belts as well as how quickly the robotics could move things to their shelves. It was a balancing act of different throughputs.

In my current job I have a similar problem that I have been slowly optimizing that is entirely electronic in nature. The situation can be simplified like this:

  • A large file is generated on a server online once a day (on the order of 100GB)
  • It needs to be restored to a local development server for me to work with
  • This restoration is done when I need new data, so I initiate it when needed
  • Getting the absolute latest backup restored is not important. When I do a restore, if the data is 24-48 hours old, that is fine

A few other items of note:

  • My home internet connection is 100 Mbit/s
  • My internal home network is 1000 Mbit/s
  • The destination machine is writing to an SSD that has a write throughput way faster than everything else

The process that has existed for doing this was created before I came to the company and has evolved over the years with a burst of activity recently (because I was bored/annoyed/curious to make it better).

1: Copy Then Restore

This is the first method that was used and is the simplest. It was a two step process that was not optimized for speed. The process was:

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local] download.sh: The script would download the file from the remote host and store it on the local machine
  • [local] restore.sh: This would remove the old data, xz -d the file into tar and extract it into the destination directory

This had the benefit in that you could download the big file once and then retore with it as many times as you wanted. It also took advantage of xz’s very high compression ratios so the file transfered over the slowest link was as short as possible. The downside was that it was still slow to get a new file to your machine before you could do the restore. It is highly dependent on your internet speed at the time of getting a new backup file.

2: Stream restore

This was the first major optimization that was made to the system. It took the assumption that doing multiple restores from the same backup was unlikely and that by the time you wanted to restore a second time you also wanted a new, more up to date, backup. It also dealt with the issue that storing the compressed backup file prior to restoring took up disk space that was getting non-trivial.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local] stream_restore.sh: This would initiate an ssh session with the remote server, and cat the file across it directly into xz -d and then tar and extract it into the destination directory

This removed the “store and restore” problem of the first solution and since everything locally being done was faster than my internet connection, the transfer time became the bottleneck.

3: Local copy, stream restore xz compressed data (24 minutes)

Realizing that the bottleneck was now transfering the backup from our remote system to my house, I wondered if I could just get rid of that step from the critical path. I am lucky in that I have a server that is running 24/7 at home that I could schedule things on, so I realized that I could get a copy of the backup to my house overnight before I needed it. This became a combination of #1 & #2.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local] stream_restore.sh: This would initiate an ssh session with my local server, and cat the file across it directly into xz -d and then tar and extract it into the destination directory

This had a lot of benefits in that now I could restore as fast as I could move data across my local network. I used this solution for quite a while before trying to improve it. It is also where I started keeping track of how long it took as I made improvements.

4: Local copy, stream restore uncompressed data (16 minutes)

With this solution, I started monitoring the throughput of data through the various pipelines and I was suprised to find out that the bottleneck was not actually my local network. It turns out it was in decompressing the xz compressed file on the destination server before it could be run through tar. It turns out that while xz has very high compression rates, it can’t sustain high data rates even when decompressing.

So, I figured why not add the decompression to the overnight task?

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local server] decompress the xz file on the local server and just store the raw tar file
  • [local] stream_restore.sh: This would initiate an ssh session with my local server, and cat the file across it directly tar and extract it into the destination directory

This had a significant benefit and got my restore time down to 16 minutes, which was a nice bump. However, I was now restricted by my home network as I was saturating that network link.

5: Local copy, stream restore zstd compressed data (7 minutes)

Knowing that I could saturate my network link, I saw that I was not utilizing the full write speeds of the SSD. I knew that the only way to get more data into the SSD was to have compressed data over the wire. However as I learned in #3, xz was not fast enough. I had read a few articles about zstd as a compression algorithm that was both CPU efficient and optimized for high throughput. So I figured if I could compress the data across the wire it would expand on the destination system to faster than wire speeds.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local server] decompress the xz file on the local server and just store the raw tar file
  • [local] stream_restore.sh: This would initiate an ssh session with my local server, compress the file using zstd (using the fast option) as it was sent over the ssh connection then decompress using zstd on the destination server before piping it into tar and extract it into the destination directory

This got me really close and down to 7 minutes. But still I wondered if I could do better. I couldn’t use a very high compression setting for zstd when it was inline and keep the network saturated.

6: Local copy, recompress using zstd, stream restore compressed data (5 minutes)

Given that I wanted to send zstd compressed data over the wire, there was no reason to do that compression at the time of restore. I could do it overnight. This had the benefits of being able to use a higher compression level and remove it from the time critical path.

  • [remote] the backup was taken and compressed on the server using xz at some point prior.
  • [local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
  • [local server] decompress the xz file on the local server and then recompress it using zstd
  • [local] stream_restore.sh: This would initiate an ssh session with my local server, and cat the compressed file across, run it through zstd and then into tar and extract it into the destination directory

This is, I think, the best that I can do. I am playing with the compression levels of the overnight zstd run, but higher levels don’t seem to be doing much better (and may be impacting decompression speeds). I am seeing about a 3.5-4x reduction in file size.

I think at this point I have moved the bottleneck all the way down to tar and the SSD itself, so I’m quite happy.

Day to day operations

Ironically doing these retores very quickly isn’t as big a deal as they used to be. This used to be the only way that we could restore our development systems to a pristine state if we were making large changes. It was a pain and people dreaded having to do this (and would therefore work with messy data).

To alleviate that problem, I changed our local development systems from using ext4 filesystems for the data storage of the uncompressed, un-tard data to zfs. One of the many awesome things about zfs is filesystem snapshots. So now, once the various restore scripts finish restoring a pristine set of the data, they mark a snapshot of the filesystem at that point in time. Then, whenever we need to reset our local machines, we can just tell the filesystem to roll back to that snapshot. And this takes less than a minute. So on a daya to day basis when one of our developers needs to clean there system, but doesn’t need to update their data, they can do so very quickly. This has been a game changer for us but I still wanted to make the full restore go faster (for me at any rate).

Conclusions and asides

To be clear, these speed improvements aren’t “free.” I am basically throwing CPU resources (and therefore electricity) at the problem. I am using one algorithm xz to get the file as small as possible for the slowest link, then switching to zstd because of its fast decompression speed. I am also trying to not break compatibility with other people who use the xz compressed file and don’t want/have the infrastructure and setup to run this as a multistep process.

I also found as part of this that my CPU cooler on the server that was storing/recompressing the archives was not properly seated, so I kept overheating the CPU until I fixed that. But once fixed, I was confident it could handle any high loads.

Thanks for coming along on this mostly pointless journey.


I don’t feel that I have any great insights that are not already being shared by people way smarter than me. But as I have heard in many forums: “silence is complicity” and I want my name to publicly be next to the simple statement that black lives matter. While the horrible and preventable deaths of George Floyd and Breonna Taylor in recents weeks has brought this to everyone’s minds I can’t forget the deaths in recent years of Eric Garner, Trayvon Martin, or the hundreds and thousands of human beings who just happen to have a different skin color than me that have been killed by racism since our country’s founding. All of their lives mattered and were cut short by a system that valued their lives less than they would mine. Racism is and has been present in the United States since it founding.

I am a straight, cisgender, wealthy, white, male, software engineer in my mid 40s; I am a living example of privilege in our society. Have I done enough to help end racism in my community? In my country? I don’t really know what enough is, but I am sure the answer is no. I do some. I will continue to do more.

All I ask is that, if you can, do something to help fix this. Maybe it is going to a march. Maybe it is giving money to a worthwhile organization. Maybe it is just saying publicly what you believe. Whatever you do, please vote. Change needs to be driven by all of us, but our choice of leadership will help.

Nowhere close to exhaustive list of resources:


A few months back as I was getting ready to go to XOXO I was reminded of the early days of computing and the advent of the classic text adventure. Even though i never really played them regularly as a kid, I remember hearing about Zork and enjoying the idea of the simple interface (although it is combined with the frustration the few times I did play them of figuring out what to type). However, this simplicity of interface, a typed command followed by a description of an environment or result of an action brought to mind the back and forth of a Twitter conversation. And that thought spawned this idea; a melding of the old and new.

It seemed an easy enough proposition to hook up one of the older text adventure games to a Twitter bot that would allow one to send commands and receive the output. Mapping the conversation thread between a user and the bot as a single adventure with the bot maintaining persistence on the back end.

The original Colossal Cave Adventure has been dutifully reconstructed and the code is available as open source. With some slight modifications to work around some situations where multi-line answers were required and allow saving without penalty, I was able to get the code in a form that matched a back and forth Twitter exchange.

A simple ruby application acted as the go between for the old C program and Twitter. Each conversation was saved off based on the username of the recipient and commands sent back and forth.

And Colossal Cave Bot was born.

It was a fun little exercise that a few people played around with and seemed to have fun with. The code on the ruby side is not pretty but I might at some point clean it up and share it.

(To those who find this in the future, apologies if the bot is no longer running.)


The recent redesign of this blog ended up being a lot more than just a visual redesign. In many ways it has come full circle. In the earliest incarnation of May Contain Blueberries, I hand built HTML pages for each entry. As that became more annoying I switched over to Movable Type in all its Perl glory. But as I was doing my own development in PHP, I eventually transitioned to Wordpress and while that has served me well (and the other users on my server), it was time to move on.

The biggest reason was maintenance driven by security. For a blog that I don’t write on very often (sadly once a year seems about the norm), I was having to apply security patches at least monthly, if not more often. The benefits of a fully dynamic blog platform was its downfall. Wordpress is such a large piece of software it seems that no end of security issues are being discovered. And as a solo system administrator I need to minimize my attack surface as much as I can.

I took two paths. There were quite a few blogs that are no longer being used on my server, so I took static “snapshots” of them. While this did break inbound links, they weren’t highly trafficed so I was ok with that (and search engines will figure it out).

For my blog, I decided to move to a static site generator. This has the benefits that I wanted such as templates, auto-generated index pages and such, but ends up serving plain HTML eliminating that security issue. I don’t need onsite interactivity so why pay the costs for it. I chose Jekyll. It allows me to write posts in Markdown and automatically build the site and push it to my server in one step.

On to the new thing! Who knows, I might write more (probably not).


There is much discussion on the internet about the wisdom of running one’s own mail server and it includes much valid criticism. There are significant security concerns beyond the normal amount of maintenance of any system. For reasons varied and irrelevant here, I have chosen to do so for over 15 years.

The aspect of doing so that is often not discussed when compared to using commercial services such as Gmail is that one has to deal with spam entirely on your own. This is difficult for (at least) 2 reasons. The obvious being that it is a hard problem to deal with, the not so obvious is I only have 2 users of my mail server, so I don’t have the ability to allow my users provide input in identifying spam.

For many years I have relied on tools such as Spamassassin to try to identify spam once it has reached my mail server. I also make use of various blacklists to identify IP addresses that are known to deliver spam. I use Mailspike and Spamhaus.

This is the situation I was in up until this past weekend. Hundreds of emails a day would slip past the blacklists. Spamassassin is very good but it was still allowing around 5% of those messages to reach my inbox. And the problem seemed to be getting worse.

In the past, I had used greylisting but I eventually stopped given the main side effect of that system; messages from new senders, legitimate or otherwise, would get delayed by at least 5 minutes. This is fine for most emails, but for things like password resets or confirmation codes, was just too much of an inconvenience.

What I wanted was a system where messages that are unlikely to be spam make it right through and all others get greylisted.

My boss mentioned a solution that he implemented once that decided to greylist based on the spam score of the inbound emails. This allowed him to only greylist things that looked like they might be spam. Unfortunately, looking at the emails that were slipping through my existing system, they generally had very low scores (spammers test against Spamasassin).

So, I pursued a different solution. Several services provide not only blacklists but also whitelists that give a reputation score for various IP addresses around the internet. I chose to use the whitelist from Mailspike and DNSWL

I implemented a hierarchy:

  • Accept messages from hosts we have manually whitelisted
  • Reject messages from hosts on one of the watched blacklists
  • Accept messages from hosts with a high reputation score
  • Greylist everything else

When I enabled this ruleset, I thought I had broken things. I stopped getting any email coming into my system. It turns out that I had just stopped all the spam. It was amazing.

In the two days I have been running this system, every legitimate email has made it to my inbox. I have seen 10-15 messages get through the initial screens and been correctly identified as spam by Spamassassin. (In the early stages I had a few messages make it to my inbox but I realized that was because I trusted the whitelists more than the blacklists. I.e. hosts were listed as both trustworthy and sending spam. As the Blacklists seem to react faster, I decided to switch the order as shown above.)

You can look at the graphs to see when I turned the system on:

This first graph shows the number of messages that were accepted by my server (per second). You can see that the number dropped considerably when I turned on my hybrid solution. Since messages were getting rejected before they were accepted by the system, there are less messages for Spamassassin to investigate.

This can be seen here, where the number of messages identified as spam also went down because they were stopped before Spamassassin even needed to look at them.

If you run postfix and would like to implement a similar system, here is the relevant configuration section from my main.cf.

smtpd_recipient_restrictions = permit_mynetworks,
    permit_sasl_authenticated,
    reject_invalid_hostname,
    reject_non_fqdn_sender,
    reject_non_fqdn_recipient,
    reject_unknown_sender_domain,
    reject_unknown_recipient_domain,
    reject_unauth_pipelining,
    reject_unauth_destination,
    reject_non_fqdn_sender,
    check_client_access hash:/usr/local/etc/postfix/rbl_override,
    reject_rbl_client bl.mailspike.net,
    reject_rbl_client zen.spamhaus.org,
    permit_dnswl_client rep.mailspike.net=127.0.0.[ 18..20 ],
    permit_dnswl_client list.dnswl.org=127.0.[ 0..255 ].[ 2..3 ],
    check_policy_service unix:/var/run/postgrey.sock,
    reject_unverified_recipient,
    permit

So, overall this has been a resounding success. I hope this helps some of you out there with the same challenges.