Bus Factor 1

A friend at a prominent tech company took last month off using their new covid policy. When he got back to work last week, he was horrified. Not to see that things had fallen apart in his absence, but to realize that they hadn’t. Overwhelmingly, life without him had continued as normal.

This is not a coincidence. A company dependent on a single person’s work has a single point of failure. In tech jargon, they have low Bus Factor, the number of people who would have to get hit by a bus for the project to fail. The higher your Bus Factor, the greater your institutional resilience.

But it’s hard to reconcile this experience with faith in an impactful career. If a firing squad executes a prisoner, we can say that each shooter is sufficient but not necessary. The success of the project depends on the shooters in aggregate, but not on any person in particular.

It’s easy to understand why an organization might want this kind of redundancy. Sure, it’s wasteful and inefficient, but it prevents the greatest threat to any company: giving employees leverage.

The Misery of Redundancy

If you’re reading this, you’re probably well versed on sublinear returns to headcount. You know about the Mythical Man-Month, and understand that communication scales quadratically. You shudder at the thought of growing complexity.

But the untold story isn’t about efficiency, it’s about negative returns to engagement. One person on a project is personally responsible. Two people might manage to coordinate. Beyond that, you’re in the endless hellscape of team projects.

Remember the last time you were assigned partners in school? A team project is a game of chicken. Each participant signals their willingness to fail through procrastination, until finally at the last minute someone defects and does all the work. The more you care, the more you lose.

(At least those projects ended. As a friend once put it, the weirdest thing about working at large companies is that you’ll wake up one morning, have a new teammate randomly assigned to you, and proceed to see them 8 hours a day every day until one of you quits or dies.)

This is the world of high Bus Factor. The greater a project’s supposed “resilience”, the more each person becomes dispensable. Sufficient, but not necessary.

(Even if you’re working in a mission-driven company, your job is meaningless unless you’re personally responsible for the success of that mission. The US Government might be the single more important organization in the world. How do you think it feels to work as an entry level bureaucrat?)

Have you ever wondered why you spend so much time writing design docs and detailed PRs? Or why you’re constantly stuck in meetings explaining what you do to various groups of people? Or why your project has a separate engineer, EM, PM, Designer and Tech Lead? Maybe these are best practices, maybe we’ve all settled on the right way of doing things.

But they’re also ways of making sure that if you get hit by a bus, no one has to care.

Self-Reinforcing Churn

Every large tech company runs performance reviews twice a year. Don’t ask me why, it’s the way things have always been done.

Simultaneously, the average tenure at these companies hovers at around 2 years.

That means that at any given point, fully one quarter of your teammates are totally checked out.

On the high end, there are people who know they’re leaving in the next year, and just have absolutely no reason to care. If you’re at the beginning of a new review cycle, you’re 6 months away from getting PIP’d, which means you’re 12 months away from getting fired.

Odds are, you’re going to lose the game of chicken. You might not have the most to lose, but unless you’re also quitting next cycle, you certainly don’t have the least.

For the employer, the upshot is that anyone can quit at any time. If you’re on a 12 person team, you might lose 3 people each half. The only viable response is to up the Bus Factor, layer on more redundancy, and limit the importance of each individual.

Of course, the irony is that it’s precisely this sense of replaceability that drives churn in the first place! The more employees are alienated, the more willing they are to leave as soon as their equity vests.

So this ends up being a vicious cycle where higher churn forces a higher Bus Factor, which causes alienation, which increases churn, and so on until you either die or become Oracle.

Escaping the Labyrinth

The flip side is that there’s still hope.

Since the cycle is self-reinforcing, we can break it in the middle and slay the ouroboros. There’s historical path dependence, but no fundamental reason we have to live this way.

The alternative to redundancy isn’t fragility, it’s personal responsibility. It’s keeping your Bus Factor low.

In this world, you can develop specific mastery instead of a broad assortment of skills. You can be a craftsman instead of a code monkey, understand systems deeply instead of cargo culting best practices.

Imagine working at a company where every single one of your coworkers gives a shit. Imagine knowing that your contributions are important, that the world is different because of your existence.

That’s Bus Factor 1.

As far as I can tell, this is basically how SpaceX works. It’s not a coincidence that they’re ranked #1 for both stress and sense of meaning. If you hear that employees are overworked and underpaid, don’t cry abuse, ask how they’re getting compensated instead.

FAQ

Why exactly does leverage even matter?
A priori, it shouldn’t. If all employees have leverage, there’s nothing to negotiate for. You could ask for more money, but everyone else could too, eventually the company falls apart and you all lose.

But a posteriori, we can work backwards from behavior. If companies are willing to sacrifice efficiency, that is itself evidence of exploitation.

That sounds crazy. If engineers are actually underpaid, why don’t they just go somewhere else? Why doesn’t the market raise salaries?
I’ve screened resumes from entry level engineers who claim to have saved their companies millions of dollars. The crazy part is that I believe it! If you can make even the smallest change to a ranking algorithm, or improve the performance of a costly computation, or run an experiment that marginally increases click-through rate, the effects multiply out across a huge user base.

And yet, I don’t know a single entry level engineer getting paid commensurately.

First, there’s a lack of counterfactual impact. Sure, you saved the company a million dollars, but if you didn’t take the job someone else would, and they would have done the same work.

Your impact looks huge, but in a high Bus Factor world, your actual impact is just the delta between your ability and the next best engineer. You might be able to negotiate another $10k, but not two orders of magnitude more.

Second, you don’t own the means of production. Your skills are only worth millions of dollars in the particular context of this company. You can’t walk away and produce that value on your own. You depend on the company much more than it depends on you.

Hmm, maybe, but if firms have so much market power, why don’t they just keep Bus Factor low but set a strong precedent against adversarial negotiation?
Great question. Since companies are good at internal coordination, and employees are bad at collective bargaining, firms ought to be able to keep salaries low even without increasing the Bus Factor.

You could just wait for someone to speak up, immediately fire them, take whatever financial hit comes from their project’s failure, but set the precedent for any future dissidents that negotiation is unacceptable.

The problem is, firms don’t handle negotiations, managers do, and their incentives are just as screwed up as yours. A manager stands to lose their career if too many projects fail, but it costs them nothing to advocate for your promotion. So in a low Bus Factor world, the tendency is always to default to generosity.

Why doesn’t this happen at SpaceX? Remember, none of this is actually about salaries or promotions, it’s about sociopathically pursuing leverage as an alternative to fulfilment when you have no other choices. This is all conditional on not being treated like a human, and not having a meaningful job.

In Bus Factor 1 world, you have too much personal responsibility to coast on momentum or free ride on a teammate’s accomplishments. More importantly, you might actually personally care.

Wouldn’t running a company as Bus Factor 1 invite risk?
Absolutely. SpaceX has blown up rockets. Tesla has missed deadlines. Neuralink is doing god knows what.

But management science, financial engineering and organizational design are filled with countless other methods for deferring risk.

Okay fine, I’m sold, where do I sign up?
Pretty much nowhere!

I’ve asked every friend, accepted every cold call from recruiters, and as far as I can tell, nearly no companies or even individual teams are run with Bus Factor 1. Again, maybe SpaceX, but I’m relaying that second hand.

If you find out, let me know. Or better yet, start it yourself and tell me!