Durable
Durable workflows for NestJS — write a workflow as plain code; every step is checkpointed, so it survives crashes and deploys. Steps can run across apps and languages, with a built-in control plane.
@dudousxd/nestjs-durable brings durable execution to NestJS. You write a workflow as ordinary async code — call a step, use its result, call the next — and the engine records every step's output. If the process crashes or you deploy mid-run, the workflow resumes from the last checkpoint instead of starting over. Steps can run locally in NestJS or on a remote worker (even a Python one), but it stays one workflow, with one source of truth, and one end-to-end timeline.
The one rule
The engine recovers a run by replaying the workflow function from the top. Completed steps return their saved result instead of executing again, so your run body must be deterministic — no Date.now(), Math.random(), or direct I/O outside a step. See Durability & replay.
The problem it solves
Today multi-service flows are scattered: a queue here, a queue there, a piece in Python, and no single place to read or watch the whole thing. When a process dies halfway through, you are left reconstructing which steps already ran. nestjs-durable collapses that into three guarantees:
- The flow becomes code, in one place. Read the workflow function and you understand the whole sequence — even when steps execute in different apps or languages.
- Durability by replay. A crash or deploy never re-runs completed work. Each step is checkpointed; on recovery, finished steps replay their saved result and only unfinished work executes.
- End-to-end visibility. Because one orchestrator owns the state, it knows about every step — including the remote ones — so a full-flow trace, dashboard, and Telescope view come almost for free.
Quickstart
The minimal loop — install, register the module, write a workflow, start a run — with zero infrastructure: the in-process event-emitter transport and an in-memory store. Swap those for BullMQ and an ORM store when you go to production. For the full walkthrough, see Getting Started.
Install the core packages, the zero-infra transport, and its peers:
pnpm add @dudousxd/nestjs-durable @dudousxd/nestjs-durable-core @dudousxd/nestjs-durable-transport-event-emitter @nestjs/event-emitter zodWrite a workflow as plain async code. Every ctx.step is checkpointed; ctx.waitForSignal suspends the run until a signal arrives:
import { Workflow } from '@dudousxd/nestjs-durable';
import type { WorkflowCtx } from '@dudousxd/nestjs-durable-core';
@Workflow({ name: 'checkout', version: '1' })
export class CheckoutWorkflow {
async run(ctx: WorkflowCtx, order: { id: string; total: number }) {
await ctx.step('reserveStock', async () => ({ reserved: true }));
const approval = await ctx.waitForSignal<{ approved: boolean }>(`approve:${order.id}`);
if (!approval.approved) return { status: 'rejected' };
await ctx.step('ship', async () => ({ shipped: true }));
return { status: 'shipped' };
}
}Register the module with the event-emitter transport and an in-memory store, then provide the workflow:
import { DurableModule } from '@dudousxd/nestjs-durable';
import { InMemoryStateStore } from '@dudousxd/nestjs-durable-core';
import { EventEmitterTransport } from '@dudousxd/nestjs-durable-transport-event-emitter';
import { Module } from '@nestjs/common';
import { EventEmitter2, EventEmitterModule } from '@nestjs/event-emitter';
import { CheckoutWorkflow } from './checkout.workflow';
@Module({
imports: [
EventEmitterModule.forRoot(),
DurableModule.forRootAsync({
inject: [EventEmitter2],
useFactory: (emitter: EventEmitter2) => ({
store: new InMemoryStateStore(),
transport: new EventEmitterTransport(emitter),
}),
}),
],
providers: [CheckoutWorkflow],
})
export class AppModule {}Start a run and resume it later. start enqueues the run and returns immediately — the HTTP handler never blocks on workflow logic:
constructor(private readonly workflows: WorkflowService) {}
async checkout(order: { id: string; total: number }) {
const { runId } = await this.workflows.start('checkout', order); // → { status: 'pending' }
return runId; // respond now; a worker runs the workflow body
}
// later, from your approval webhook — this completes & ships the run:
async approve(orderId: string) {
await this.workflows.signal(`approve:${orderId}`, { approved: true });
}Need the outcome inline instead? await this.workflows.waitForRun(runId) resolves once the run settles.
Steps across apps and languages
A step does not have to run in NestJS. Declare a remote step with a typed, validated contract and call it with ctx.remote — the engine dispatches it over a pluggable transport to wherever its handler lives:
import { remoteStep } from '@dudousxd/nestjs-durable-core';
import { z } from 'zod';
export const chargeCard = remoteStep({
name: 'payments.charge-card',
input: z.object({ orderId: z.string(), amountCents: z.number().int() }),
output: z.object({ chargeId: z.string() }),
retries: 3,
});The handler is just a provider method decorated with @DurableStep, decoupled from the workflow. With the event-emitter transport it runs in the same process; swap the transport for BullMQ to move it to a separate process — or implement it in a Python worker — without changing the workflow code.
The split goes both ways: a remote worker can implement a step the NestJS workflow calls, or author the whole workflow itself and call back into NestJS. Either way, the engine stays the single owner of durable state.
What you get
- Crash-proof by replay. Each step is checkpointed and runs exactly once; only unfinished work executes after a restart.
- Durable sleep and signals. Pause for minutes or months with
ctx.sleep(no compute while waiting), or wait on a human approval or webhook withctx.waitForSignal. Both survive restarts. - Bring your ORM, any SQL database. State lives in Postgres, MySQL, or SQLite through a
StateStoreinterface, with MikroORM, TypeORM, Prisma, and Drizzle adapters and auto-schema on boot. - See the whole flow. A built-in control plane renders each run as a graph; OpenTelemetry and a Telescope watcher give you two more views of the same event log.
Where to go next
Getting Started
Install, write a workflow, register the module, and start your first run — step by step.
Concepts
Durability, deterministic replay, and the mental model behind the engine.
Authoring
Workflows, steps, ctx.step / ctx.remote, sleep, and signals in depth.
Reliability
Retries, fatal errors, recovery, and exactly-once execution.
Transports
The event-emitter default and BullMQ for cross-process or Python steps.
State stores
Persist runs to Postgres, MySQL, or SQLite with your ORM of choice.
Observability
The control-plane dashboard, OpenTelemetry traces, and the Telescope watcher.
Python
Implement steps and author workflows from a Python worker.
Tooling
The CLI, the test harness, and developer ergonomics.