Instrument once.
We handle the rest.

Wrap your workflows with the SDK. Upkeel's platform detects every silent failure, fires the alerts, escalates to the right people, and shows you exactly what happened. Without touching your application code again.

SDK. Instrumentation

Seconds to install. Nothing else to configure.

The SDK is pure telemetry. It records what happened and when. That's the entire job. No alerting logic, no retry handling, no dashboards to wire up in your application. Install it once and forget it. Everything else happens on our side.

Zero production dependencies. Can't crash your application
Fully async, non-blocking. Under 1ms overhead on your hot path
Works in Node.js, edge runtimes, and browser environments
Local dev mode: structured console output, no network calls
Test mode: virtual clock, fully synchronous, no external I/O
terminal
npm install @upkeel/sdk
payment-flow.ts
import { Upkeel } from '@upkeel/sdk'
const keel = new Upkeel({ apiKey: process.env.UPKEEL_API_KEY })
// Your existing checkout code. Unchanged
const session = await stripe.checkout.sessions.create({
line_items: order.items,
metadata: { orderId: order.id },
})
// One line: Stripe webhook must arrive within 30s
keel.expect('payment.succeeded', {
within: '30s',
correlationId: session.id,
meta: { orderId: order.id },
})
// In your webhook handler, confirm it
app.post('/webhooks/stripe', (req, res) => {
const event = stripe.webhooks.constructEvent(req.body, sig, secret)
keel.fulfill('payment.succeeded', {
correlationId: event.data.object.id,
})
res.sendStatus(200)
})
Detection Engine

We watch your expectations so you don't have to.

Every expectation you register runs through our server-side detection pipeline. The worker sweeps continuously and flips overdue expectations to missed the second the deadline lapses. Then your escalation rules decide whether a single miss is noise or a scope-wide incident.

Detection windows from 5 seconds to 30 days, set per-expectation
Continuous sweep. Misses surface within seconds of the deadline
Rate-over-window escalation rules promote bursts to degraded or critical
One alert per incident edge. No per-event spam during a burst
Crash-safe and durable. Postgres-backed, survives any restart
Idempotent event delivery. No duplicate alerts from SDK retries
upkeel.dev / flows
Flows tracked
14
Checks / day
48k
Missed (30d)
7
Flow health. Last 30 days
payment-processing
99.2%
email-confirmation
94.1%
order-fulfillment
99.8%
research-agent
97.4%
kyc-verification
100%
Dashboard preview
Alerting & Escalation

We send the alerts. You write zero notification code.

When Upkeel detects a missing event or a degrading flow, we handle the entire notification chain. Initial alert, escalation, incident ticket. No webhooks to configure in your application, no Slack bots to build, no on-call routing logic to maintain. Configure your channels once in the dashboard and you're done.

Email alerts with full flow context, timeline, and affected run IDs
Slack messages to any channel with formatted diagnostic detail
PagerDuty pages with severity routing and automatic escalation
Jira and Linear tickets created automatically on detection
Multi-channel routing. Critical failures page, degraded flows email
Alert deduplication. One notification per incident, not one per check
#alerts-payments just now
🚨 Missing event: payment.succeeded
Expected within 30s window for flow payment-processing (run abc-4829). Time since last event: 4m 12s. 3 customers potentially affected.
Critical · payment-processing
PagerDuty · payments-oncall just now
TRIGGERED: Payment pipeline critical
payment.succeeded missing for 3 consecutive runs. Auto-escalated to P1. Acknowledge in PagerDuty to stop escalation.
P1 · Auto-escalated
Jira · OPS-1847 created just now
[Critical] payment.succeeded missing. Payment-processing
Auto-created with full timeline, 3 affected run IDs, and link to Upkeel incident view. Assigned to payments-team.
Explore

See when things happened. Drag to zoom. Filter to drill in.

A live histogram of every event flowing through your system, stacked by type or severity. Drag across the chart to brush-select a window. The table below filters to the same range instantly. Stack the bars by event type to see what changed, or by severity to see when things got noisy. The chart and table share filter state, so a query you build is shareable as a URL.

Stacked event-volume histogram across the time range you pick
Drag-to-brush time selection. Chart and table update together
Stack by type (event / fulfillment / miss / cancellation) or by severity
Filter by project, environment, scope glob, name glob, severity floor
Health rollup beside it: fulfillment rate + recent activity stream
Filter state lives in the URL. Every view is a shareable link
upkeel.dev / insights / payment-processing
payment-processing · last 14 days
Apr 20 Apr 26 May 2 ⚠ May 8
selection · May 2 · 14:18 → 14:42
brushed
payment.succeeded missed · run abc-4829 May 2, 14:22
Slack PD Jira
Resolved automatically · 18m incident May 2, 14:40
Dashboard preview
Flow Tracking

Multi-step workflows, visualized and verified.

Flows turn scattered expect/fulfill pairs into a coherent timeline. When an expectation is missed, the alert doesn't just say "event X didn't arrive". It says "step Y of this workflow didn't complete, here's the last thing that happened, and here's who's affected." Context that makes alerts actionable, not just noisy.

Track multi-step business processes as named flows
flow.expect() and flow.fulfill() scope expectations to the flow
Alerts include flow name, last step, and full timeline
Standalone expect/fulfill still works. Flows are optional
Link flows across services with correlationId
order-flow.ts
import { Upkeel } from '@upkeel/sdk'
const keel = new Upkeel({ apiKey: process.env.UPKEEL_API_KEY })
async function processOrder(order) {
const flow = keel.flow('order-fulfillment')
flow.step('payment-confirmed')
flow.expect('warehouse.picked', {
within: '4h',
correlationId: order.id,
description: 'Warehouse must confirm pick within 4 hours',
})
// When expectation is missed, the alert includes:
// - Flow name: "order-fulfillment"
// - Last step: "payment-confirmed"
// - Description: "Warehouse must confirm pick within 4 hours"
// - All metadata you attached
}
Testing

Test that your flows do what you think they do.

@upkeel/testing brings Upkeel's entire detection model into your test suite. Assert on expectations, simulate known failure scenarios, and advance virtual time in a single line. No real waiting, no network calls, no flakiness.

Jest and Vitest compatible out of the box
Virtual clock. Advance 24 hours in microseconds
Fluent assertion API: .wasRegistered().withTimeoutOf().andWasFulfilled()
Pre-built scenarios via @upkeel/scenarios (Stripe, SendGrid, OpenAI)
Snapshot testing catches unintended flow shape changes in CI
payment.test.ts
import { createTestKit } from '@upkeel/testing'
it('detects a missing Stripe webhook', async () => {
const { keel, clock, expectations } = createTestKit()
// Run your checkout code with the test SDK
keel.expect('payment.succeeded', { within: '30s' })
// Simulate 31 seconds passing. Webhook never arrived
clock.advance('31s')
// Assert the expectation was missed
expect(expectations.missed('payment.succeeded')).toBe(true)
})
it('fulfillment completes within SLA', async () => {
const { keel, clock, expectations } = createTestKit()
keel.expect('warehouse.picked', { within: '4h' })
clock.advance('2h')
keel.fulfill('warehouse.picked')
expect(expectations.met('warehouse.picked')).toBe(true)
})

Frequently asked questions

No, and you shouldn't want it to. The class of tools you probably already run watches the inside of your service: requests per second, latency, errors, CPU. Keep using them. We watch for outcomes. The webhook that was supposed to arrive, the job that was supposed to run, the step that was supposed to complete. When your API returned 200 but nothing actually happened downstream, that's our beat. Run both.
No. That's the whole point. Instrument your flows with the SDK, configure your alert channels once in the Upkeel dashboard, and we handle everything from there. No webhooks to build in your application, no notification logic, no on-call routing to maintain. We send the Slack messages, emails, PagerDuty pages, and tickets.
No meaningful latency. The SDK's hot path is fully asynchronous and non-blocking. Telemetry is queued in-process and flushed to our API without interrupting your application. Measured overhead is under 1ms. The SDK fails silently if it can't reach our API. It never crashes your application.
Every paid plan (Solo, Team, Enterprise) gets 365-day retention on events; Enterprise contracts can extend to 7 years for audit logs on request. The audit log is immutable and can never be shortened ahead of schedule. See the pricing page for the full breakdown.
Three packages are MIT licensed: @upkeel/sdk, @upkeel/testing, and @upkeel/scenarios. The backend detection engine, alerting pipeline, and dashboard are proprietary. The SDK being open source means you can audit exactly what runs in your production application before shipping it.

Ready to see what you've been missing?

Solo is $9/mo, Team is $39. And the first 14 days are free.