Everything to nothing, starting a new role

The last time I went through a job transition, I was moving from a role in technical management to being an individual contributor again. This involved picking up a new programming language, a new framework, and a new industry. I knew that I would be starting from scratch in a lot of ways; hoping my experience would carry me through.

This new transition is markedly different. I have left a role as a very senior engineer where I had significantly more professional experience than my coworkers as well as significantly longer tenure at the company than all of my coworkers (by nearly 5 years). I exaggerate but it could be argued that I knew everything about everything of our technical stack and how we had gotten to where we were. My new role is using the same language and the same framework (but in a much more modern way).

Intellectually I knew that this would still be a big change, but I don’t think I appreciated how much of a shock it would be going to basically knowing nothing about anything. This has been the hardest adjustment to make. Coming to every problem with only basic knowledge; like knowing a language but never having spoken to a single person for whom it is their native tongue. My new coworkers have been amazing and are spending lots of time helping but this is a process that can’t be forced.

In my head, I imaging a vast map of all the knowledge I could know that is covered up like the map of an RPG. As I experience and learn, I open up different parts of the map, but they are islands. Over time I will start making the connections between different parts and that is when I truly can bring my experiences to bear and add more value to the team.

It has only been a few weeks, but I feel like some small connections are being made. I can’t wait until I get more. It is an exciting and sometimes exhausting journey.

Using Client IDs for static DHCP addressing

Just a quick entry that will hopefully help someone else with a similar situation. (Pardon the ton of links and buzzwords, I want to make sure this is easily searchable.) In my network, I use the ISC DHCP server via OPNsense. Most of my hosts get fully dynamic addresses and I have the DHCP server register their names in DNS using RFC 2136. This works very well in that I don’t have to worry about manual IP allocation yet I can still use friendly hostnames to access systems. For the few systems that need staticly defined IP addresses, I have set up static leases via system MAC addresses. This has worked very well so far.

But I recently came across a new situation that hit a snag. As I will now be having 2 laptops that I use regularly, I wanted to be able to attach them to my desktop monitors via my CalDigit TS3+ which has a wired ethernet connection. This means that I will have two computers that will at times be requesting an IP address from the same MAC address. This isn’t an issue particularly except that macOS will set the hostname of the system based on a reverse DNS lookup of the IP address it receives. Given caching and the timing of the RFC 2136 updates, I would open a terminal on laptop minas-tirith and see my prompt saying my computer was hobbiton. This bothered me for aesthetic and possible confusion reasons.

DHCP has the ability to send a “client ID” when requesting an address and the ISC DHCPd server can do a static IP assignment based on that client ID. This seemed the perfect solution. I could set a client ID on each of the laptops for that network interface and each would get the proper IP, register a DNS name for that IP and all would be happy. My first attempt almost worked. I set the client IDs on both the laptops and the DHCP server did give them different IP addresses so I was confident that the client ID was being used, however, they were not the static assignments that I had set in my configuration.

Doing some searching I found very little about this problem but I saw a few mentions that indicated that DHCP clients often prepended some bytes to what they send as the client ID. I dug a bit into what was being sent by looking at the dhcpd.leases file on my server and lo and behold, that was what was happening:

lease 192.168.42.208 {
  starts 4 2023/03/02 12:08:46;
  ends 4 2023/03/02 12:10:05;
  tstp 4 2023/03/02 12:10:05;
  cltt 4 2023/03/02 12:08:46;
  binding state free;
  hardware ethernet 64:4b:f0:13:22:f6;
  uid "\000hobbiton";
}

The uid line is the client ID. The client was prepending a null byte at the beginning. So, I went back to my DHCP server and set the matching client ID name to be \000hobbiton. I renewed the lease and VOILA! I was now getting the IP address I assigned.

Another step in living the dual laptop life. Now if I can just find a good solution to using Apple bluetooth keyboards with TouchID…

Digital Housekeeping

As I mentally prepare for my new role and the new computer that will come along with it, it seemed like a good time to do some digital housekeeping. At Food52 I never had a company owned laptop so I was able to be a little more lazy about keeping work and personal things separate. But a new shiny M2 MacBook Pro showed up a few hours ago and I want to try to do things a bit cleaner now. In addition to that I wanted to improve some security and identity items.

Overall the setup went pretty well taking only a few hours. I’m sure I missed some things, but I am ready to get started!

SSH Keys

For the longest time I was a bit inconsistent with SSH keys. I wandered between them representing my person as an identity and having them represent me as a user on a particular computer. With the advent of being able to store and use SSH keys via 1Password, I wanted to clean things up. Using 1Password, it made more sense to treat keys as something that represented me personally without regards to the computer I am on. I reverted to having 2 keys stored in 1Password, a big (4096 bit) RSA one and a newer ED25519 one. I prefer the newer key but I have found that some systems can’t handle them so having both is nice. I cleaned up my access to various SSH based system and now have a simple authorized_hosts file with just 2 keys in it everywhere. (GitHub just gets the ED25519 one as they support it just fine.)

API Keys

I don’t have a lot of API keys right now but I assume I will have more in the new role. Another new 1Password feature (can you tell I am a fan) is command line integration for API keys. I had read about this when I was at Food52 but had not gotten around to setting it up. I did so this morning for a few keys I still have and it works really well. Excited to see how it works when I have a bunch more.

Software

You may already know about brew for installing UNIX-y software. But did you know about using a Brewfile? You can use them to install all kinds of applications automatically with one command. This simplified the vast majority of installs on the new laptop.

File Synchronization

The convenience of tools like Dropbox and iCloud Drive is pretty obvious. But for one like me who is very concerned with privacy (and likes futzing with tech and occasionally making things more difficult for myself), I don’t like the idea of keeping my sensitive data on someone else’s infrastructure in an unencrypted format. So, a number of years ago I started using Resilio Sync (at the time it was BTSync). This is a sync product that operates in a similar way to Dropbox but it is peer-to-peer between any number of computers you control. It also has the ability to set up read-only and (more interestingly) encrypted copies. This means I can have a replicated server that has all of my data but it is inaccessible to anyone who breaches that machine. This has allowed me to set up a few remote servers outside of my house that provide disaster recovery but are also safe from a privacy perspective.

As part of my cleanup, I made a new shared folder specifically for work files separate from my personal synced folders.

Job Search Thoughts, Take 2

At least the titles work together now.

I have spent more of my work life hiring new engineers than I have spent looking for work myself. One of the observations I have made when looking at hiring is that there is a high drop off at each stage of the interviewing process. So, you can’t think of just how many you need to hire, you need to think of how large your pipeline is to support the number of roles you want to hire.

Obviously every company and situation is different but I expect a 10x dropoff at each major stage:

100 people apply
10 people participate in interviews
1 person is hired

This could probably further be broken down when you include initial phone screens and multiple rounds of interviews but this rubric has served me well when hiring. So, how has this worked out from the other side, when looking for work?

25 applications
5 (meaningful) interview processes
2 late stage interviews
1 job offer (accepted)

So these numbers are not the same as the other side, but I think interesting in that it was a factor of 5 dropoff at each stage.

Oh, yeah, I guess I buried the lede, I got a new job. The team was great to talk with, I like the company, and the work they do, so I am super excited. They were quite understanding that I need some time off to clear my head of my last job and the immediate job search so I don’t start for a few weeks. I will share more later.

Job Search Thoughts, Take 1

I may have set my self up with that title, implying that I will have more thoughts later on in this process. Likely but not guaranteed.

It is now a little more than 3 weeks since I was let go from Food52. While I very much appreciate the advice I got from friends to take a break and rest before I jumped into a job search, that is not who I am. My anxieties would not let me sit in a state of unemployment without trying to make some progress towards finding a new job. It has become clear that I, like I am sure many, define a large part of their being based on their career. While I have not enjoyed the emotional rollercoaster of self worth, it is a good lesson to learn. I look forward to the day when I can retire and I need to make myself ready to define myself through the activities I love that are not my career.

While I wanted to start making progress I did not want to open the floodgates right away, so I have a roadmap of sorts for the finding roles that I could be interested in and apply for them:

Share my situation with my network and see if anyone has any openings
Find companies that I respect and look for openings there
Reach out to recruiters that have contacted me in the past with interesting roles
Mark myself as “available for work” on LinkedIn
???

At this point I am still in phases 1 & 2 which pretty much happened simultaneously. My network included friends, former coworkers, and the job boards on a few Slacks that I am a member of. I was not surprised to get compassion and positive feedback from my friends and former coworkers, but what did surprise me was the response from the various Slack communities I am a part of. While I participate in these groups, I am not very active an some of them are huge so I had not been expecting the reception my message got.

I was overwhelmed by the positive and supportive response from these communities. I had numerous people reach out with ideas and several with specific opportunities that they encouraged me to apply for. People were willing to schedule calls to talk about roles and give their time to help me, basically a stranger. This was truly touching and I can’t give my thanks enough.

A choice I have to make in my process is about the size of the company I want to work in. A criticism I had of my time at Food52 was that as the senior-most engineer I did not have a community to work with nor did I have a clear path of growth. I am finding in the roles that I apply to that I have a choice of looking at larger organizations that could bring with them a community of Staff+ engineers to work with. And that idea is very tempting. The other side is to look at the smaller companies and know that I could have a large influence on them in a broad set of topics. I am not ruling either out but it is interesting to me that the difference has been rather stark in the companies I talk with.

So at this point I am in later stages with a couple of opportunities that I am pretty excited about. My hope is that I do not have to advance to stages 3 and beyond, but I should know more in the coming weeks.

New Beginnings

The only constant in life is change, they say. The last few months have definitely been that way. But change is an opportunity for new beginnings and for me, my time at Food52 has ended and I am starting the journey to find a new work home. Over the past few days I have been pondering what I want to do and with whom I would like to work. I’d like to share some of my thoughts here as I work through them. This change is exciting, scary, humbling, and even a bit confusing as I come out of nearly 8 years at Food52 (ok, 7 years, 10 months, 28 days if I am being pedantic).

The coincidence with the new year is not lost on me, a time of reflection on what direction I want to proceed in. Through my career I have worked for small companies and large, public and private, and as an individual contributor and in engineering management. Most recently I was a Principal Software Engineer. I appreciate this kind of role as it kept me hands-on in coding but also has some of the benefits of engineering management: mentorship, guiding technical direction, and visibility outside of the engineering organization. However, I have run engineering teams before and there are aspects of those that can also be very fulfilling within the right kind of organization.

At this moment, my focus is going to be mostly on Staff/Principal IC roles with the occasional management role thrown in if the company seems like a good fit. I’ve started writing down my wishlists which will certainly evolve over time. I am not really focused on the particular technology facets of the companies I am looking at, it is really the people and culture that matter to create a fulfilling environment.

Must haves:

remote friendly, preferable remote native
a mission that I believe in and is actually practiced day to day
solid DEI practices
teams that respect all of their engineers regardless of seniority
a respect for work/life balance
open and communicative leadership

Nice to haves:

missions that align with my personal interests: outdoors, travel, cooking
the ability to cross train in adjacent technology that I am not great at (yet)
small to mid size organizations (or well defined units within a large company)

No thanks:

business models based on manipulation of users’ behaviors
“hardcore”

If you have made it this far and would like to learn more about me, here are some useful links:

Here are some great resources I have found for aiding in a job search.

Angela Riggs’ Job Search Template offers a simple but highly effective template in Trello for keeping track of roles you are considering and where you stand in the application and interview process for each. It took what was already a mess of emails and notes I had been building at this early stage and made it much more manageable.
Interviewing is like speed dating, you only have small blocks of time to decide if the company you are talking to is a good fit for you. It reminds me of one of my favorite lines from Neal Stephenson’s Snow Crash: “Condensing fact from the vapor of nuance.” Charity Majors (who you should be reading in general) has a great article of questions to try to find those signals as you interview: How can you tell if the company you’re interviewing with is rotten on the inside?.

Upgrading to Rails 6.1.6.1 and Psych::DisallowedClass

Hey friends, I hope this helps someone not spend a few hours tracking down this issue like I did. Due to the recently released CVE-2022-32224, I needed to update our version of Rails to the proper version, 6.1.6.1. However when I did so, I started getting the following exception all over my code with a very unhelpful stack trace:

Psych::DisallowedClass: Tried to load unspecified class: Symbol

Sadly, searching did not turn up any useful leads. Only when I thought to go look at the commit in the rails code itself, did the solution become apparent.

I needed to add Symbol to the allowed YAML safe load classes in my environment files like this:

ActiveRecord.yaml_column_permitted_classes = [Symbol]

I hope this can help someone else.

Pandemic changes to engineer recruitment power dynamics

These ideas have been bouncing around my head for a bit and resulted in a bunch of random tweets but I was finally prompted to write after getting an inquiry from a recruiter that contained this:

Selling Points: -- This is a Hybrid work situation where you can work 2 days from home.

I have worked remotely for Food52 from Virginia for nearly 7 years. When I started as a remote engineer, this was an uncommon situation. There existed companies that were fully remote, but they were the exception, not the rule. This has slowly shifted, but the pandemic has resulted in a rapid realignment at technology companies that is becoming clear in how recruiting and hiring is done.

I have been seeing this shift from two sides; as an principal level engineer myself and as a participant in the hiring process inside my company.

As a company that has hired software engineers remotely (sometimes enthusiastically and sometimes reluctantly), I have argues that pre-pandemic this gave us a significant advantage. As the majority of technology companies restricted themselves to candidates in their area or those willing to relocate, it gave us an advantage of being able to pull talent from anywhere. In addition, it allowed us to offer salaries that were often higher than the local averages in the regions where people lived.

Over the last 6 months the number of recruiters reaching out to me regarding new roles has skyrocketed. I counted recently and in the last 2 months alone I have received over 60. 2 years ago, I might receive 2 or 3 a month.

So what has changed? I have not magically gotten more talented. The market has shifted.

I believe that many companies have realized after being forced to go remote for all of their staff that the fears they had about remote work was unfounded. And the smart ones have realized that they can expand the potential market for new engineers beyond the limited borders they had before. Especially for companies based in previously high cost regions (Silicon Valley) they are able to offer the same salaries but pluck the best talent from regions where those salary levels are unheard of.

This shift has vastly changed the power dynamics. The last time I was looking for new work the employers held most of the cards. I live in a region with more limited employment opportunities. So, an offer like the one above, of working 2 days at home would seem truly like a selling point. But today, I am able to field offers from the entire country (and probably further afield if I wanted) and expect that 100% remote is something I can demand. This is great for me as a software engineer.

For companies, I think it is probably a mixed bag, depending on how much they are willing to embrace these new standards. It will also be hard on companies who counted on only needing to compensate employees based on the local salary expectations. I have seen this in our hiring at Food52 (shameless plug: come work for us). For more senior technical roles, we have found it much more difficult to find exceptional applicants. It is my belief that this is purely because we are now competing on a much larger field.

We are still in the middle of a realignment. It will be interesting to see how it falls out.

Aspirationally, I hope that this change in dynamic will be especially good for underrepresented groups in engineering for whom the additional flexibility will offer them more opportunities to access employment markets they were shut out of. People who may need to stay outside of the tech hot spots because they need to live near family for childcare or so that their spouse can have a job that they love or they can live in a community that is accepting of who they are. We can only hope.

Restoring remote data, an adventure

It still suprises me sometimes when I run into a clear parallel between experiences I have had in the real world and those in computing. Years ago in an old job I ran a software team that designed software for running large warehouse systems full of robotics and automated conveyor belts. In these systems we were responsible for routing boxes or pallets around a system efficiently. In order to do so, one had to take into account the capacity of the various conveyor belts as well as how quickly the robotics could move things to their shelves. It was a balancing act of different throughputs.

In my current job I have a similar problem that I have been slowly optimizing that is entirely electronic in nature. The situation can be simplified like this:

A large file is generated on a server online once a day (on the order of 100GB)
It needs to be restored to a local development server for me to work with
This restoration is done when I need new data, so I initiate it when needed
Getting the absolute latest backup restored is not important. When I do a restore, if the data is 24-48 hours old, that is fine

A few other items of note:

My home internet connection is 100 Mbit/s
My internal home network is 1000 Mbit/s
The destination machine is writing to an SSD that has a write throughput way faster than everything else

The process that has existed for doing this was created before I came to the company and has evolved over the years with a burst of activity recently (because I was bored/annoyed/curious to make it better).

1: Copy Then Restore

This is the first method that was used and is the simplest. It was a two step process that was not optimized for speed. The process was:

[remote] the backup was taken and compressed on the server using xz at some point prior.
[local] download.sh: The script would download the file from the remote host and store it on the local machine
[local] restore.sh: This would remove the old data, xz -d the file into tar and extract it into the destination directory

This had the benefit in that you could download the big file once and then retore with it as many times as you wanted. It also took advantage of xz’s very high compression ratios so the file transfered over the slowest link was as short as possible. The downside was that it was still slow to get a new file to your machine before you could do the restore. It is highly dependent on your internet speed at the time of getting a new backup file.

2: Stream restore

This was the first major optimization that was made to the system. It took the assumption that doing multiple restores from the same backup was unlikely and that by the time you wanted to restore a second time you also wanted a new, more up to date, backup. It also dealt with the issue that storing the compressed backup file prior to restoring took up disk space that was getting non-trivial.

[remote] the backup was taken and compressed on the server using xz at some point prior.
[local] stream_restore.sh: This would initiate an ssh session with the remote server, and cat the file across it directly into xz -d and then tar and extract it into the destination directory

This removed the “store and restore” problem of the first solution and since everything locally being done was faster than my internet connection, the transfer time became the bottleneck.

3: Local copy, stream restore xz compressed data (24 minutes)

Realizing that the bottleneck was now transfering the backup from our remote system to my house, I wondered if I could just get rid of that step from the critical path. I am lucky in that I have a server that is running 24/7 at home that I could schedule things on, so I realized that I could get a copy of the backup to my house overnight before I needed it. This became a combination of #1 & #2.

[remote] the backup was taken and compressed on the server using xz at some point prior.
[local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
[local] stream_restore.sh: This would initiate an ssh session with my local server, and cat the file across it directly into xz -d and then tar and extract it into the destination directory

This had a lot of benefits in that now I could restore as fast as I could move data across my local network. I used this solution for quite a while before trying to improve it. It is also where I started keeping track of how long it took as I made improvements.

4: Local copy, stream restore uncompressed data (16 minutes)

With this solution, I started monitoring the throughput of data through the various pipelines and I was suprised to find out that the bottleneck was not actually my local network. It turns out it was in decompressing the xz compressed file on the destination server before it could be run through tar. It turns out that while xz has very high compression rates, it can’t sustain high data rates even when decompressing.

So, I figured why not add the decompression to the overnight task?

[remote] the backup was taken and compressed on the server using xz at some point prior.
[local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
[local server] decompress the xz file on the local server and just store the raw tar file
[local] stream_restore.sh: This would initiate an ssh session with my local server, and cat the file across it directly tar and extract it into the destination directory

This had a significant benefit and got my restore time down to 16 minutes, which was a nice bump. However, I was now restricted by my home network as I was saturating that network link.

5: Local copy, stream restore zstd compressed data (7 minutes)

Knowing that I could saturate my network link, I saw that I was not utilizing the full write speeds of the SSD. I knew that the only way to get more data into the SSD was to have compressed data over the wire. However as I learned in #3, xz was not fast enough. I had read a few articles about zstd as a compression algorithm that was both CPU efficient and optimized for high throughput. So I figured if I could compress the data across the wire it would expand on the destination system to faster than wire speeds.

[remote] the backup was taken and compressed on the server using xz at some point prior.
[local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
[local server] decompress the xz file on the local server and just store the raw tar file
[local] stream_restore.sh: This would initiate an ssh session with my local server, compress the file using zstd (using the fast option) as it was sent over the ssh connection then decompress using zstd on the destination server before piping it into tar and extract it into the destination directory

This got me really close and down to 7 minutes. But still I wondered if I could do better. I couldn’t use a very high compression setting for zstd when it was inline and keep the network saturated.

6: Local copy, recompress using zstd, stream restore compressed data (5 minutes)

Given that I wanted to send zstd compressed data over the wire, there was no reason to do that compression at the time of restore. I could do it overnight. This had the benefits of being able to use a higher compression level and remove it from the time critical path.

[remote] the backup was taken and compressed on the server using xz at some point prior.
[local server] download.sh: The script would download the file from the remote host and store it on a server at my house. This was scheduled to run every night so I always have a recent copy available
[local server] decompress the xz file on the local server and then recompress it using zstd
[local] stream_restore.sh: This would initiate an ssh session with my local server, and cat the compressed file across, run it through zstd and then into tar and extract it into the destination directory

This is, I think, the best that I can do. I am playing with the compression levels of the overnight zstd run, but higher levels don’t seem to be doing much better (and may be impacting decompression speeds). I am seeing about a 3.5-4x reduction in file size.

I think at this point I have moved the bottleneck all the way down to tar and the SSD itself, so I’m quite happy.

Day to day operations

Ironically doing these retores very quickly isn’t as big a deal as they used to be. This used to be the only way that we could restore our development systems to a pristine state if we were making large changes. It was a pain and people dreaded having to do this (and would therefore work with messy data).

To alleviate that problem, I changed our local development systems from using ext4 filesystems for the data storage of the uncompressed, un-tard data to zfs. One of the many awesome things about zfs is filesystem snapshots. So now, once the various restore scripts finish restoring a pristine set of the data, they mark a snapshot of the filesystem at that point in time. Then, whenever we need to reset our local machines, we can just tell the filesystem to roll back to that snapshot. And this takes less than a minute. So on a daya to day basis when one of our developers needs to clean there system, but doesn’t need to update their data, they can do so very quickly. This has been a game changer for us but I still wanted to make the full restore go faster (for me at any rate).

Conclusions and asides

To be clear, these speed improvements aren’t “free.” I am basically throwing CPU resources (and therefore electricity) at the problem. I am using one algorithm xz to get the file as small as possible for the slowest link, then switching to zstd because of its fast decompression speed. I am also trying to not break compatibility with other people who use the xz compressed file and don’t want/have the infrastructure and setup to run this as a multistep process.

I also found as part of this that my CPU cooler on the server that was storing/recompressing the archives was not properly seated, so I kept overheating the CPU until I fixed that. But once fixed, I was confident it could handle any high loads.

Thanks for coming along on this mostly pointless journey.

Black Lives Matter

I don’t feel that I have any great insights that are not already being shared by people way smarter than me. But as I have heard in many forums: “silence is complicity” and I want my name to publicly be next to the simple statement that black lives matter. While the horrible and preventable deaths of George Floyd and Breonna Taylor in recents weeks has brought this to everyone’s minds I can’t forget the deaths in recent years of Eric Garner, Trayvon Martin, or the hundreds and thousands of human beings who just happen to have a different skin color than me that have been killed by racism since our country’s founding. All of their lives mattered and were cut short by a system that valued their lives less than they would mine. Racism is and has been present in the United States since it founding.

I am a straight, cisgender, wealthy, white, male, software engineer in my mid 40s; I am a living example of privilege in our society. Have I done enough to help end racism in my community? In my country? I don’t really know what enough is, but I am sure the answer is no. I do some. I will continue to do more.

All I ask is that, if you can, do something to help fix this. Maybe it is going to a march. Maybe it is giving money to a worthwhile organization. Maybe it is just saying publicly what you believe. Whatever you do, please vote. Change needs to be driven by all of us, but our choice of leadership will help.

–

Nowhere close to exhaustive list of resources:

May Contain Blueberries

the sometimes journal of Jeremy Beker