Sagas & compensation
Undo the side effects of a partially-completed run with per-step compensate callbacks that run in reverse on failure, compensationRetries for transient undos, compensate:<step> events, and compensating cancellation via engine.cancel(runId, { compensate: true }).
A durable run often performs several irreversible side effects in sequence — reserve inventory, charge a
card, allocate a shipment. If a later step fails, the earlier effects are still out there in the world, and
"the run failed" is not an acceptable end state when money has moved. The saga pattern handles this:
alongside each step that does something, you register how to undo it, and when the run fails the engine
runs those undos in reverse order. nestjs-durable builds this in via a compensate option on ctx.step.
Registering a compensation
Attach a compensate callback to a local step. The callback is registered when the step completes; if the
run later fails, the engine runs every registered compensation in reverse order (last completed step
undone first), restoring the world before failing the run.
@Workflow({ name: 'checkout', version: '1' })
export class CheckoutWorkflow {
constructor(
private readonly inventory: InventoryService,
private readonly payments: PaymentsService,
private readonly shipping: ShippingService,
) {}
async run(ctx: WorkflowCtx, order: Order) {
const reservation = await ctx.step(
'reserve-inventory',
() => this.inventory.reserve(order.items),
{ compensate: () => this.inventory.release(order.items) },
);
const charge = await ctx.step(
'charge-card',
() => this.payments.charge(order.customerId, order.totalCents),
{ compensate: () => this.payments.refund(order.customerId, order.totalCents) },
);
// If this step throws, the engine runs the two compensations above in reverse:
// refund the card, then release the inventory — then fails the run.
const label = await ctx.step('allocate-shipment', () =>
this.shipping.allocate(order, reservation.id),
);
return { chargeId: charge.id, tracking: label.tracking };
}
}If allocate-shipment throws (and exhausts its retries, or throws a FatalError), the run fails — but
before it does, the engine refunds the card and releases the inventory. The saga is reconstructed from the
run's history on replay, so it works correctly even after a crash: the steps that completed are the ones
whose compensations get registered.
Compensations are for local steps. A remote step is already deduplicated by its deterministic
stepId(runId:seq), which workers can use as an idempotency key, so to compensate remote work, wrap the orchestration in a local step (or model the undo as its own remote step invoked from a compensating local step).
Retrying a transient undo
A compensation can itself fail transiently — the refund API might be momentarily unreachable. The engine
retries each compensation up to compensationRetries times. This is an engine/module-level option (it
applies to every compensation), and it defaults to 1, i.e. a single attempt with no retry:
DurableModule.forRoot({
store,
transport,
compensationRetries: 5, // retry each saga undo up to 5 times before giving up on it
});Because a compensation may run more than once, compensations should be idempotent — releasing an
already-released reservation or refunding an already-refunded charge must be a no-op. A compensation that
keeps failing past compensationRetries is skipped rather than allowed to throw: a permanently-failing
undo must not mask the original failure or strand the remaining compensations.
Compensations are visible
Every compensation surfaces as a compensate:<step> event, emitted as a step.completed (the undo ran) or
step.failed (it exhausted its retries) lifecycle event. The dashboard and the Telescope integration render
these, so a stranded undo is visible rather than silently swallowed. For the checkout above you'd see
compensate:charge-card and compensate:reserve-inventory appear in the timeline as the saga unwinds.
Compensating cancellation
The saga also runs when you deliberately cancel a run with compensation. A plain engine.cancel(runId) is
immediate: it marks the run cancelled right away and broadcasts the cancellation so a worker actually
running it can abort cooperatively — but it does not undo completed steps. Passing { compensate: true }
instead runs the saga first:
// Immediate cancel — mark cancelled, abort in-flight work, but leave completed side effects in place:
await engine.cancel(runId);
// Compensating cancel — undo the completed steps in reverse, THEN mark the run cancelled:
await engine.cancel(runId, { compensate: true });A compensating cancel works by resuming the run with a cancellation pending: the replay re-registers the
saga from history, and at the run's suspension point the engine runs the compensations in reverse and marks
the run cancelled (rather than re-suspending). For the checkout example, cancelling a run that had already
charged the card and reserved inventory with { compensate: true } issues the refund and releases the
reservation before the run becomes cancelled — leaving the world clean, exactly as a failure would.
Retries & backoff
Local-step retries with fixed/exponential backoff and jitter, FatalError to opt out, the durable remote-step retry path (re-dispatch on a persisted wakeAt), retryable:false worker verdicts, and the in-memory timeoutMs + heartbeat liveness path.
Flow control
Durable queues for remote steps — cap concurrency and enforce fixed-window rate limits with engine.registerQueue (or the module's queues option) and ctx.remote(step, input, { queue }). A call that can't be admitted re-suspends and the timer poller retries admission, so the limit survives crashes.