Aviary
Concepts

Capture & correlation

How watchers, batches, and AsyncLocalStorage turn scattered events into one navigable flow — the request and everything it caused, in capture order.

The unit of value in Telescope is the batch — one entry-point (a request, a queue job, a scheduled tick) and everything it caused. Correlation is the differentiator: it's the view metrics and logs can't give you, where a single request expands into the exact queries, jobs, exceptions, and mails it produced, in the order they happened.

Watchers

A watcher captures one kind of activity and records it as an Entry. Watchers come in two flavours:

  • Entry-point watchers open a batch: the HTTP request watcher, the queue job watcher, the schedule watcher, and manual Telescope.batch().
  • Sub-watchers record into whatever batch is already active: query, mail, cache, event, log, Redis-command, model, and outbound-HTTP watchers.

Request and exception capture are wired automatically by TelescopeModule.forRoot(). Every other watcher is a value you add to the watchers array (most live in their own package). The same Watcher SPI backs the built-ins and your own watchers — community watchers are first-class, not second-class hooks.

interface Watcher {
  readonly type: string;                                // entry type it produces
  register(ctx: WatcherContext): void | Promise<void>;  // wire NestJS hooks
  shouldRecord?(candidate: unknown): boolean;           // cheap pre-filter
}

Watchers never touch storage and never block. They call ctx.record(...), which returns immediately.

Batches and AsyncLocalStorage

A TelescopeContext lives in AsyncLocalStorage. When an entry-point watcher calls beginBatch(origin), it seeds a Batch { id, origin, startedAt } into the ALS store for the duration of that async flow. Every record() made inside that flow then inherits the batch's id and gets a monotonic sequence — so entries reassemble in capture order with no manual plumbing.

This is why adapters like MikroORM and TypeORM correlate each query to its request: their loggers run inside the query's async context, so the active batch is already there. (Prisma is the exception — its query events fire detached from the caller's context, so Prisma queries are captured but orphaned. See the Prisma package.)

Non-HTTP entry points get their own batch:

  • a queue worker opens a batch per job,
  • @nestjs/schedule opens one per cron / interval / timeout tick,
  • Telescope.batch(origin, fn) wraps an arbitrary script or CLI run.

Entries recorded outside any batch get a synthetic per-entry batch, so nothing is ever lost.

traceId / spanId from an active OpenTelemetry span are stamped onto entries (via the -otel provider), so a Telescope batch maps 1:1 to a trace and the correlation survives across the OTel bridge.

The Entry

Every watcher produces the same universal record. Type-specific data lives in content; everything else is uniform, so the API, dashboard, pruner, and OTel bridge treat all entry types identically:

interface Entry<TContent = unknown> {
  id: string;            // uuid v7 — time-sortable, globally unique
  batchId: string;       // correlation key; all entries in one batch share it
  type: string;          // 'request' | 'query' | 'job' | 'exception' | 'mail' | <custom>
  familyHash: string | null; // groups "the same thing" (query template, exception class+message)
  content: TContent;     // redacted, type-specific payload
  tags: string[];        // cross-cutting filters: 'status:500', 'user:42', 'slow'
  sequence: number;      // order within the batch (capture order)
  durationMs: number | null;
  origin: BatchOrigin;   // 'http' | 'queue' | 'schedule' | 'cli' | 'manual'
  instanceId: string;    // hostname / pod id — multi-instance aggregation
  createdAt: Date;
}

familyHash is what powers "show me every occurrence of this exception" and the slow-query / duplicate-query views without scanning content. Queries normalize to a SQL template before hashing; exceptions hash on class + message.

Exception capture

Exceptions thrown out of a route handler are captured automatically (no watcher to register) and recorded as exception entries — which is what opens an error family, drives the new-exception alert, and feeds AI diagnosis.

By default, expected 4xx control flow is not recorded as an exception. A NestJS HttpException whose status is a 4xx — a 403 ForbiddenException, a 404 NotFoundException, a 400 from the validation pipe — is the framework doing its job (permission denied, resource missing, bad input), not an incident. Recording each one would open a new exception family (the family hash keys on class + message + top frame, so every call site is distinct), fire the new-exception alert, and — in AI auto-mode — spend model tokens diagnosing intended behaviour. In production every permission denial would page on-call and burn a diagnosis. (This default changed after exactly that incident: Telescope's own client-errors authorize gate threw a 403, which was captured as a brand-new family and paged Slack.)

The 4xx is not lost — the request watcher still records the 4xx statusCode (and a status:NNN tag, e.g. status:404) on its own request entry. You still see the 4xx in the dashboard and in error-rate metrics; it just doesn't spawn an exception family, can't fire new-exception, and can't trigger diagnosis.

Always recorded: 5xx HttpExceptions (real server errors) and any non-HttpException error (a plain Error, TypeError, etc.). Untouched: browser-reported client_exception entries — those are deliberate reports recorded directly by the ingestion endpoint, never through this filter.

To opt 4xx back in (restore the pre-change behaviour), set exceptions.captureHttp4xx:

TelescopeModule.forRoot({
  exceptions: { captureHttp4xx: true }, // default false — 4xx is control flow, not an incident
});

The Recorder pipeline

ctx.record(input) hands the entry to the Recorder — a bounded, async, backpressure-safe pipe between watchers and storage. The application thread never waits on it:

record(input)
  → enrich   (attach batchId from ALS, instanceId, sequence, createdAt)
  → tag      (run registered Taggers; built-in + user-provided)
  → redact   (deep-redact configured paths + default sensitive keys)
  → sample   (per-type sampling + filter() hook; drop early)
  → buffer   (push to a bounded ring buffer)
  → flush    (drain in batches on a timer / size threshold → StorageProvider.store)

The guarantees that make this safe to run in production:

  • Non-blocking. record() is synchronous and O(1); all I/O is deferred to the flush timer.
  • Bounded memory. The ring buffer has a hard cap; on overflow it drops the oldest entry and increments a dropped counter — it never grows unboundedly and never blocks the app.
  • Batched writes. Flushes coalesce many entries into one store() call.
  • Graceful shutdown. onApplicationShutdown drains the buffer with a timeout.
  • Failure isolation. A storage error is logged and the batch is dropped — a broken telescope never breaks the host app.

Redaction is load-bearing

The synchronous redact() step is not just for privacy: it snapshots each entry's content into a plain, reference-free object at record() time. That releases live object graphs (e.g. a hydrated ORM entity captured off req.user, which references its EntityManager and identity map). Keeping redaction synchronous doubles as a detach that bounds memory — deferring it retains those graphs until flush and can OOM the host. See Performance.

The request flow, end to end:

GET /orders/42 → 5 queries (1 flagged slow, 2 duplicates) → 1 job dispatched (SendReceipt) → 1 outbound HTTP call → 1 exception — all sharing one batchId, reassembled in sequence order when you open the request in the dashboard.

On this page