Skip to content
InferzoINFERZO
Backend Systems
Jobs & Pipelines

Every slow request is usually work that had no business blocking the user.

We move the heavy, slow, and scheduled work out of the request and onto queues and pipelines, so the user gets an instant response and the real work happens reliably in the background.

The queue

Jobs come in, workers pick them up, the work happens off to the side. The user already got their answer and moved on.

The problem

Doing the work while the user waits is the most expensive place to do it.

Plenty of work does not need the user standing there for it. Sending the welcome email. Generating the export. Resizing the upload. Embedding a thousand documents. Calling a slow third-party API. When that work happens inside the web request, the user watches a spinner for something they did not need to wait for, and if it fails, their whole action fails with it.

It gets worse under load. Every slow request holds a connection open, so a burst of traffic or one slow dependency can tie up the whole server doing work that could have happened later. A third-party API having a bad day becomes your product having a bad day. A nightly report that grew too big quietly starts timing out, and nobody notices until the numbers stop arriving.

We move that work where it belongs: onto queues and pipelines that run in the background, out of the request. The user gets an instant response, the work happens reliably with retries and visibility, and a spike or a slow dependency becomes a slightly longer queue instead of an outage. The unglamorous machinery that keeps the fast things fast and the slow things from taking everything down with them.

Here is the version you might recognize. A customer signs up at 3am. The job that provisions their account throws once on a flaky network call and gives up. No spinner spins, because nobody is sitting there watching. No alert fires, because the job was fire-and-forget and failure was never wired to anything. The dashboard says signups are up. Everything looks fine. Then at 9am a support ticket lands: "I paid and I still cannot log in." That is when you learn the work failed six hours ago, in silence, in front of no one. The whole point of background work is that no user is watching it. Which is exactly why it has to watch itself.

The part nobody warns you about

Your job will run twice, and you should plan for it.

Almost every queue you can actually buy or run gives you at-least-once delivery. That phrase sounds like a feature. It is really a warning. It means a job can be delivered to a worker more than once: the worker finished the work, then crashed before it could acknowledge it, so the queue, having heard nothing back, hands the same job to someone else. The work was done. The queue does not know that. So it runs again. Send the welcome email twice. Charge the card twice. Provision the same account twice.

You will hear about exactly-once delivery as the fix. Mostly it is a myth you design around, not a setting you switch on. Across a network that can drop a message at any moment, between sending a job and confirming it was processed, there is no honest way to promise it happened once and only once. Even the queues that advertise exactly-once are doing deduplication with limits: Amazon SQS FIFO, for example, only remembers a deduplication ID for about five minutes, so the same message sent twice ten minutes apart sails right through as two. The real answer is older and simpler. Make the job idempotent: design it so running it twice leaves the same result as running it once. Check whether this charge already went through before charging. Use the order ID as a key so the second attempt is a no-op. Then at-least-once stops being a bug and starts being a safety net.

Ordering bites the same way. People assume jobs come out in the order they went in. Standard queues do not promise that. Under load or after a retry, the "account created" job can land after the "send the welcome email" job that depended on it, and now you are emailing a user who, for a few seconds, does not exist yet. If order actually matters, you either reach for a queue that guarantees it, which costs you throughput, or you build the work so it does not care about order. Knowing which one you need is the whole job. Assuming an ordering you never had is how the rare, impossible-to-reproduce bug gets born.

And then there is the failure you cannot see, because the absence of work makes no noise. A retry that keeps failing should not loop forever in silence; it should land in a dead-letter queue, which is not where jobs go to die but where failed jobs go to be seen, parked somewhere a human can read the error and decide. Worse than the job that failed loudly is the job that should have run and did not: the nightly sync that quietly stopped the day the server restarted, the cron that died back in some Tuesday nobody remembers. You cannot alert on an error that never happened. So you alert on the heartbeat instead, on the expected run that did not arrive. Monitoring the absence of work is the part people skip, and it is the part that saves you.

How we build it

Out of the request, onto the queue, done reliably.

Background work is easy to do badly: fire-and-forget jobs that vanish, retries that hammer a failing service, a queue nobody can see into. These are the habits that make async work you can actually trust.

If it can wait, it does not block

Anything that does not need the user standing there, slow calls, heavy processing, third-party requests, batch work, comes out of the request and onto a queue. The request does one cheap thing: it writes the job down and returns. The user gets an immediate response, and the work happens right after, out of band, picked up by a worker the user never sees. The request stays fast because it stopped doing work that was never urgent in the first place, and because it no longer holds a connection open while a slow API decides whether to answer.

Make the same job safe to run twice

Every real queue delivers at-least-once, which means a job can arrive again: a worker finishes the work, then dies before it can say so, and the queue, hearing nothing, hands it to the next worker. Exactly-once delivery is mostly a promise nobody can keep across a network that drops messages. So we make the work idempotent instead. Each job carries a key, the order ID, the user ID, the upload hash, and before it acts it checks whether that key was already handled. Charging twice becomes charging once and skipping the second time. Running twice leaves the same result as running once, on purpose, by design, so at-least-once stops being a hazard and becomes a safety net.

Retries that back off instead of pile on

Background jobs fail, usually because something they depend on is having a moment. We retry with exponential backoff and a little jitter, so a failing service gets breathing room instead of a thundering herd of clients all retrying in lockstep at the same instant and turning a brief blip into a stampede that keeps it down. Each attempt waits longer than the last. A job that keeps failing after its retries are spent lands in a dead-letter queue for a human to look at instead of retrying forever in silence and burning the downstream service while it is already on the floor. Failure is handled, not ignored, and not amplified.

Nothing vanishes silently

A job that fails without anyone knowing is worse than no job at all, because you go on trusting work that is not happening. Every job is tracked through its whole life: queued, running, succeeded, or failed, with the reason and the stack trace, not just a red dot. You can see what is in flight, what is stuck, what landed in the dead-letter queue, and what needs attention, instead of finding out from a customer that the email never sent. And because the scariest failure is the job that should have run and never did, we alert on the missing heartbeat too, on the expected run that did not arrive, not only on the errors that do.

Pipelines for work that moves in stages

Some work is a chain: extract, transform, load; or fetch, embed, index. We build those as pipelines where each stage is independent, checkpointed, and restartable, so a failure in step four does not mean redoing steps one through three. Each stage validates the shape of what the stage before it produced, because the pipeline that ran fine for a year will break the Tuesday an upstream feed quietly adds a column or renames a field, and you want it to stop loudly at the seam instead of writing garbage downstream. Data flows through reliably, and a broken run resumes from where it stopped instead of starting over from scratch.

Scheduled work that actually runs

The nightly report, the weekly cleanup, the hourly sync. We set up scheduled jobs that run on time, alert when they do not, and do not silently stop the day the server restarts or the timezone shifts under them. The trick is that you cannot alert on an error a dead job never throws, so we watch for the run that was supposed to happen and did not, a dead-man's switch that goes off on silence. A cron job nobody is watching is a cron job that quietly died back in March; we build the kind you hear about the same day it fails, not the day a customer does.

"The fastest request is the one that stopped doing work it never needed to do. We move that work to where the user is not waiting and nobody is guessing whether it ran."

Inferzo · Bending binaries to behave

What you get

Work that happens out of the way, reliably.

The queues, workers, and pipelines that take the slow work off your critical path, built so you can always see what ran and what did not.

  • A job queue and workers that move slow work out of the request and run it in the background
  • Retries with backoff and a dead-letter queue, so failures are handled instead of amplified or lost
  • Visibility into every job: queued, running, succeeded, or failed, with the reason it failed
  • Multi-stage pipelines where each step is independent and a failure resumes instead of restarting
  • Scheduled jobs that run on time and alert when they do not
  • Backpressure and concurrency limits, so a burst becomes a longer queue, not a meltdown
  • The full repository and documentation, so your team can add new jobs safely

Have a request that is slow because it is doing work the user should not have to wait for? Tell us what it does, and we will tell you what belongs on a queue.

Invoke us

Is this the right call

When this fits.

Good fit

  • Requests in your app are slow because they do work the user does not need to wait for
  • You send emails, generate exports, process uploads, or call slow APIs inside the request today
  • You run AI work like batch embeddings or long generations that should not block anyone
  • You have scheduled jobs and you are not fully sure they are all still running

Wrong call

  • Your app is simple, fast, and does nothing slow enough to move off the request. Do not add a queue you do not need.
  • You need a single one-off batch run, not standing background infrastructure. That is a script, not this.
  • You have not built the thing yet and have no slow work to move. Come back when there is something to offload.

Deployment and scale

Scales with the work, visible the whole way.

Workers scale with the queue. When the backlog grows, more workers pick up the slack; when it is quiet, they scale back down. A flood of jobs becomes a queue that drains a little slower, not a server that falls over, and you are not paying for idle workers when there is nothing to do.

The whole system is observable. You can see queue depth, how long jobs are waiting, which ones are failing and why, and whether the backlog is growing or shrinking. When something is wrong, the queue tells you, instead of the silence of work that simply stopped happening.

It degrades gracefully under pressure. Concurrency limits stop the workers from overwhelming a fragile downstream service, backpressure keeps a flood from consuming everything, and priorities make sure the urgent jobs do not get stuck behind a giant batch. Busy looks like a slightly longer wait, not a cascading failure.

What we settle before we begin: which work can move off the request, how quickly each kind of job has to finish, and what has to happen when one fails. Everything else follows from those three.

Ready to start

Tell us what is making your requests wait.

Describe the product, the slow or scheduled work it does, and where it bites: the spinners, the timeouts, the jobs you are not sure still run. We will tell you what belongs on a queue or a pipeline, and the shortest honest path to getting it off your users' critical path.