Aviary
Concepts

Performance

Why capture doesn't slow your app — request capture off the response path, ~microsecond query capture, rollup-backed reads, and the /health endpoint that surfaces Telescope's own cost.

A core promise: you install Telescope, it works, and it does not impact the performance of what already runs. Capture is robust-or-nothing on the hot path — and Telescope makes its own cost visible, so you can see exactly what it spends.

Request capture is off the response path

The request watcher records inside response.once('finish', …) — it fires after the response is flushed to the client. Host TTFB impact is effectively zero: nothing the watcher does happens before the user gets their bytes.

Query capture is the only thing on the hot path — and it's ~microseconds

Query watchers call record() synchronously, in the query's async context (that's what enables correlation). record() runs enrich() — the redaction object-walk plus a trace lookup, taggers, and the sampling filter — then a ring-buffer push. No I/O; the flush is an async timer. So the per-query host cost is just enrich(), dominated by the redaction walk over the small { sql, bindings } payload.

Measured live against a production app: query capture is ~8.7µs, against millisecond-scale query I/O. Request capture is 0ms (it's a finish-callback). Memory stays flat under sustained load.

Why redaction stays synchronous

An attempt to defer redaction to the flush (so record() only stashed raw content) dropped capture cost to ~1µs but OOM'd the host — a JS heap blowup under load. Root cause: watcher content holds live object graphs. The request watcher captures req.user, a hydrated ORM entity that references its EntityManager, identity map, and relations. Synchronous redact() was load-bearing: it snapshots content into a plain, reference-free object at record() time, releasing the ORM graph. Deferring it retained up to bufferSize live entity graphs until flush. Lesson: redaction must stay synchronous; it doubles as a detach that bounds memory. If capture cost ever matters, the safe levers are capturing lighter content (project req.user → id/email) or making redact() itself faster — never deferral.

Reads scale on rollups, not raw scans

The read path uses a Pulse-style rollup model. A telescope_rollups table holds per-minute count / sum / max per metric, plus fixed-size latency histograms for percentile estimation. So:

  • timeseries reads rollups → flat cost regardless of raw row count.
  • pulse / stats estimate p50 / p95 / p99 from the histogram buckets — O(buckets), with a raw-row fallback — instead of scanning every entry. The histograms are fixed-size integer arrays with no retention, so the read path stays memory-safe even at production volume.

Any residual floor on a remote database is network round-trip time (one RTT per query), not payload size — the levers are round-trip count and avoiding raw scans, both of which the rollup model already addresses.

Bounded memory under backpressure

The Recorder's ring buffer has a hard cap. Under a flood it drops the oldest entry and increments a dropped counter — it never grows unboundedly and never blocks the app. On flush it nulls the drained slots, so a flushed (potentially fat) entry isn't retained in the ring afterward. Sampling and the filter() hook let you down-sample high-volume request floods; because requests are already off the response path, that's a store-volume lever, not a latency one.

Overload protection — pause under event-loop pressure

Bounded memory keeps Telescope from amplifying an incident through the heap; overload protection does the same for CPU. A guard samples the process event-loop delay histogram (perf_hooks.monitorEventLoopDelay) on a 1s interval and, when the p99 lag crosses a threshold, pauses the Recorderrecord() calls become no-ops — until the lag recovers, at which point capture resumes. So if the host is already drowning, Telescope steps out of the way rather than adding to the pile.

It's on by default at a 200ms p99 threshold. Pass false to disable it, or { maxEventLoopLagMs } to tune the threshold:

TelescopeModule.forRoot({
  overloadProtection: { maxEventLoopLagMs: 100 }, // pause sooner; default 200
});

The sampling timer is unref'd so it never keeps the host's event loop alive, and the guard degrades to a no-op when perf_hooks.monitorEventLoopDelay is unavailable. Pause/resume transitions are logged once each (a warn on pause, a log on resume).

Memory bounds

Telescope holds a bounded working set, by design — there's no unbounded leak. The heap it retains is roughly:

heap ≈ prune.after × ingest rate × bytes per entry

Those factors multiply, so a high rate of fat entries against a generous retention window can still pin real heap. Telescope bounds each factor at capture time.

Bounded redaction. redact() runs synchronously on the hot path; besides masking, it's the detach that snapshots content into a reference-free clone. That clone is hard-bounded so a fat object graph (e.g. a hydrated ORM req.user) can't become a fat entry — all on by default, tunable per app via redact:

BoundDefaultBeyond it
maxDepth8subtree → '[Truncated: depth]'
maxStringLength8_192string → first N chars + '…[truncated]'
maxArrayLength200first N items + '[Truncated: N of M items]'
maxNodes5_000remaining subtrees → '[Truncated: size]'

The defaults are generous: a normal request / query / cache entry is cloned byte-identically and never trips a bound — the bounds only bite on pathological mega-graphs.

TelescopeModule.forRoot({
  prune: { after: '1h' },
  redact: { maxNodes: 2_000 },     // tighter cap for fat content
  sampling: { cache: 0.1 },        // keep 10% of high-volume cache hits
});

Sampling for high-volume streams. Sampling is a store-volume lever (requests are already off the response path). Cache hits especially dominate the working-set product — keep a fraction. With prune set but sampling empty, Telescope logs a one-line INFO at boot pointing at this recipe.

The truncated counter

GET /telescope/api/health reports truncatedCount — entries whose content hit a redaction bound and was clipped. A climbing value means a watcher is capturing fat content (often an ORM entity); the fix is to project lighter content (req.user{ id, email }) or tighten redact / add sampling. The Overview health card surfaces it as a Truncated stat.

The /health endpoint — Telescope's own cost, surfaced

Telescope instruments itself with counters only (zero added host cost) and exposes them:

GET /telescope/api/health

It reports the buffered entry count, the ring-buffer high-water mark, flush durations and frequency, the dropped count, and an on-demand captureCostNanos. The dashboard's Overview renders this as a Telescope health card: host-path µs per capture, buffer pressure, flush p95, and drops.

So you can see, on your own data, that Telescope costs ~X µs per query and ~0ms per request — and that it's keeping up.

On this page