Originally published at Inferable Blog

When building distributed systems that need to process jobs or tasks concurrently, one of the most challenging problems is ensuring that work is distributed efficiently without duplication. PostgreSQL’s SELECT FOR UPDATE SKIP LOCKED feature provides an elegant and performant solution to this problem that almost feels too good to be true. We’re big believers in the just use postgres ethos, as our source code might tell you.

The Problem: Distributed Job Processing

Imagine you have a queue of jobs that need to be processed by multiple workers. The core challenge is: how do you ensure that each job is processed exactly once, while maximizing throughput and minimizing contention?

The naive approach might look something like this:

BEGIN;
SELECT * FROM jobs WHERE status = 'pending' LIMIT 1;
-- Process job
UPDATE jobs SET status = 'completed' WHERE id = ?;
COMMIT;

This has an obvious race condition - two workers could select the same job before either marks it as completed.

You might try to fix this with SELECT FOR UPDATE:

BEGIN;
SELECT * FROM jobs WHERE status = 'pending' FOR UPDATE LIMIT 1;
-- Process job
UPDATE jobs SET status = 'completed' WHERE id = ?;
COMMIT;

With FOR UPDATE, the behavior changes but creates a bottleneck where workers wait for locks to be released.

Enter SKIP LOCKED

This is where SKIP LOCKED comes in. By adding this clause, we tell PostgreSQL to skip any rows that are already locked by other transactions, rather than waiting for the locks to be released:

BEGIN;
SELECT * FROM jobs WHERE status = 'pending' FOR UPDATE SKIP LOCKED LIMIT 1;
-- Process job
UPDATE jobs SET status = 'completed' WHERE id = ?;
COMMIT;

Now, transactions can claim jobs without ever blocking each other. This works for us because our goal is to get through the queue as quickly as possible.

Real World Usage

Postgres-based Job Queues

Solid Queue by 37signals showcases the power of this pattern. Solid Queue is a PostgreSQL-backed job queue for Ruby on Rails that relies heavily on SKIP LOCKED to handle job distribution.

In fact many other postgres backed job queues use this pattern, including PG Boss.

Inferable’s Job Processing System

At Inferable, we use SKIP LOCKED in our job processing system to distribute work across machines that long poll the control plane.

When the agent runtime determines that an action needs to be peformed (a function/tool call), it will insert a new job into the jobs table with a status of pending. Then we wait until a suitable machine (worker) polls the jobs table for a job. Since at a given point, there could be hundreds of mahines competing for thousands of jobs concurrently, we use SKIP LOCKED to ensure that we don’t block each other.

Here’s a snippet from our codebase that shows how we claim jobs:

const results = await db.execute(sql`
  UPDATE jobs SET
    status = 'running',
    remaining_attempts = remaining_attempts - 1,
    last_retrieved_at = now(),
    executing_machine_id = ${machineId}
  WHERE id IN (
    SELECT id FROM jobs
    WHERE status = 'pending'
      AND cluster_id = ${clusterId}
      AND service = ${service}
    LIMIT ${limit}
    FOR UPDATE SKIP LOCKED
  )
  RETURNING id, target_fn, target_args...`);

Technical Details

When using SELECT FOR UPDATE SKIP LOCKED, there are a few important technical considerations from the PostgreSQL documentation:

  1. The SKIP LOCKED clause only applies to row-level locks - the required ROW SHARE table-level lock is still taken normally.

  2. Using SKIP LOCKED provides an inconsistent view of the data by design. This is why it’s perfect for queue-like tables where we want to distribute work, but not suitable for general purpose work where consistency is required.

  3. SKIP LOCKED can be combined with LIMIT to grab batches of jobs. The locking will stop once enough rows have been returned to satisfy the limit. In fact, this is how we implement our long polling logic.