Most of the complex systems (humans, computers, pirate ships) that didn’t break apart, seems to have ascended a point where things got “bad before it got better”. And more often than not, they tend to be in areas where there’s “shared ownership” of a thing.

If you think about it, no one really has an incentive to spend their own money to improve a “shared thing”, unless there’s some kind of payoff at the end. (Altruism doesn’t scale) But for economies of scale to work, there has to be shared things. Shared things often fall into tragedy of the commons where everyone has figured out that they can exhaust it, without paying the cost of upkeep.

Collective is worse-off in the long run, constituents are better-off in the short-run.

In startups, this tends to be a shared database everyone writes to, and no one does the clean up for. (Setting up indexes, deleting unused indexes, refactoring the data model) This continues until one day everything comes to a grinding halt because database queries are timing out.

In bigger companies, service to service communication had always been a bit of a shit-show. Everyone started working in their own silos, and later figured out that they need to talk to the rest of the world. In absence of someone who had strong opinions on how the service topology must evolve, suddenly there are n systems talking to n-1 systems.

These areas live in a perpetual crisis, where improvements are made after a threshold is met. Often via a Crisis. Hence - Crisis Thresholds.

There are a few ways out of this:

Externalise and centralise costs

Hire a database administrator, a support team etc. Inevitably, you’d need to do this when you hit some scale. But we must avoid it until it can be avoided.

Sure, it keeps the database running, but it removes all incentives from the individuals writing database queries to write good queries. Since there’s someone paid to absorb the costs, it’s easy enough to throw stuff over the fence and let someone else take care of it.

Or worse, it’s easy for a maligned or an incompetent actor to make sure everyone has a bad time, if the village well is poisoned.

Collective punishment

The most effective way to stop a sailor stealing food is to impose half rations on the crew until the thief is caught. Org-wide feature freezes in tech companies are a form of this as well.

This works, but it also leads to mutinies on the high seas. And these days, we can’t really get away with paying Software Engineers half wages until uptime improves.

And perhaps more important, if you have a system like this - you’re catering for the lowest common denominator.

Eliminate Shared Ownership (Single-threaded ownership)

Or, we can eliminate the shared ownership by removing things that require shared ownership as much as we can. Small things, with well-defined contracts (API contracts, SLAs) that are owned by someone with a seat at the table.

This seems expensive at first, with everyone paying a smaller upkeep. But this negates a whole lot of n:n conversations about roadmap alignment, “it’s not my responsibility”-isms, and an on-call team who is exhausted.

At a certain point of scale, it makes sense to externalise the costs - but only those costs that you can externalise without falling into tragedy of the commons. In other words, the group that you form to externalise these costs (let’s say database administrators) should have high enough fences that it’s impossible for developers to fling bad queries across the fence.

For example, database administrators can take care of the general health of the database, and optimise it globally. But they also set SLOs for each consumer of the database in a way that costs the consumers something when they breach the SLO.

Sadly, a problem of scale is that you need something that resembles the Code of Hammurabi, and stop depending on each actor to have the best intention of the collective.