Owlmetry

Issues

Automatic error detection, deduplication, and tracking with resolution workflows and notifications.

Issues are error events automatically grouped by fingerprint into trackable units. A background job scans for error-level events every hour, deduplicates them, and creates issues that can be investigated, resolved, silenced, snoozed, or merged.

How Issues Are Created

The issue scan job runs hourly (system job, issue_scan). It:

  1. Queries all error-level events (level = 'error') received since the last scan.
  2. Normalizes each error message by stripping variable parts (UUIDs, numbers, quoted strings) and lowercasing.
  3. Generates a SHA-256 fingerprint from the normalized message + source module + an optional per-event-name discriminator (see Fingerprinting below).
  4. Groups events by session and clusters them into 5-second bursts (see below).
  5. For each burst: looks up each fingerprint in issue_fingerprints; aliases new fingerprints onto a co-occurring issue when one exists, or creates a single new issue for the whole burst when none do.
  6. Records a new occurrence (one per session per issue).

This means the scan has zero impact on event ingestion — the ingest endpoint is never slowed down.

Fingerprinting and Deduplication

Two error events are considered the same issue when they produce the same fingerprint. The fingerprint is computed from:

  • Normalized message: Variable parts stripped — "User 123 not found" and "User 456 not found" both normalize to "user <n> not found".
  • Source module: The source_module field from the event, if present.
  • Discriminator (optional): The fingerprint can include an extra discriminator so semantically distinct errors don't collapse onto a single issue. Two cases use one today: sdk:network_request events (see below) and any error event carrying an _error_type reserved attribute (set automatically when you call Owl.error(error) with an Error/Exception value — different runtime types stay on separate issues even when the message is identical).
  • App + dev mode: Fingerprints are scoped per app and per dev/prod mode.

The normalization strips:

  • UUIDs (e.g., 550e8400-... becomes <uuid>)
  • Numbers (e.g., 404 becomes <n>)
  • Quoted strings (e.g., "username" becomes "<s>")

Network request errors split by endpoint

sdk:network_request errors (connection failures emitted by the Swift SDK's URLSession instrumentation) discriminate on ${method} ${host}${templated_path} from the event's _http_url and _http_method custom attributes. So a connection failure to api.revenuecat.com/v1/subscribers/<id>/offerings becomes a different issue from one to your own backend, instead of all SDK-tracked network errors collapsing onto a single issue per source module.

Path templating runs over each /-separated segment to keep per-user IDs from fragmenting the issue:

  • UUIDs become <uuid> (e.g. /v1/sessions/550e8400-.../v1/sessions/<uuid>)
  • Pure-numeric segments become <n> (e.g. /users/123/users/<n>)
  • 12+ char segments containing at least one digit become <id> — generic across Firebase Auth UIDs, Stripe IDs (cus_*, sub_*, pi_*), MongoDB ObjectIds, Cuid/Cuid2, Nanoid, KSUID, ULID, Auth0 sub claims (URL-decoded so auth0%7C... works), and did:plc:* DIDs. The "must contain a digit" guard prevents real endpoint names like /metrics-aggregator/ from being templated.

Network issues get a human-readable title Network error: METHOD host/path (e.g. Network error: GET api.revenuecat.com/v1/subscribers/<id>/offerings).

Session-burst aliasing

A single logical failure often produces multiple error events — a loader throws, a caller catches and logs, an OwlOperation.fail(error:) emits metric:X:fail. Each has its own (message, source_module), so each hashes to a distinct fingerprint. Without aliasing, a single failure would create many separate issues.

To avoid that, the scan clusters events by session and time. Within one session, any set of error events whose timestamps all fall within 5 seconds of the burst's first event is treated as one logical failure.

Rules:

  • No existing issue in the burst → create one issue for the entire burst; alias every distinct fingerprint in the burst onto it. The title is picked from the first event whose message does not begin with metric:, step:, or track: (lifecycle prefixes are less human-readable as titles).
  • Some fingerprints already have an issue → pick the oldest existing issue as the alias target and attach only previously-unseen fingerprints to it. Pre-existing issues are never merged by the scan — if two already-distinct issues happen to co-occur in a burst, each keeps its own fingerprints and receives only its own events.
  • Dev and prod never cross-aliasis_dev=true and is_dev=false are partitioned independently inside a burst.

Once a fingerprint is aliased onto an issue, future events of that fingerprint route to that issue regardless of whether they appear in a burst.

Known limitation: the burst is computed only over events returned by the current scan's received_at > last_scan_time filter. If a session's burst spans a scan boundary (e.g., a delayed retry flush causes one event to arrive in a later scan), those events may not be aliased together. In practice the scan is hourly and the burst window is 5s, so this is rare.

Occurrences

Each issue tracks occurrences — one per unique session where the error happened. An occurrence records:

  • Session ID: Links back to the session's full event stream for investigation.
  • User ID: Which user was affected (if known).
  • App version: Which version the error occurred in.
  • SDK name and version: Which Owlmetry SDK (e.g. owlmetry-swift) and SDK version produced the originating event. Auto-stamped by official SDKs, nullable.
  • Environment: iOS, Android, web, backend, etc.
  • Timestamp: When the error occurred.

The issue's occurrence count and unique user count are denormalized for fast querying and sorting by severity. Issue rows also carry first_seen_app_version and last_seen_app_version, denormalised by the hourly scan from the underlying occurrences. Pair last_seen_app_version with the app's latest_app_version to tell whether an issue is still happening on the current release — see Latest Version Detection for the comparison rules and how the green/amber badge is rendered.

Each occurrence also captures sdk_name and sdk_version from the originating event, and the issue rolls these up as first_seen_sdk_version and last_seen_sdk_version — useful for answering "is this issue specific to a particular SDK release?" without scanning every occurrence.

Status Lifecycle

Issues follow a status lifecycle:

new → in_progress → resolved
                  → silenced
                  → snoozed

resolved → regressed (automatic, via scan job)
         → new (manual reopen)
         → snoozed

regressed → in_progress → resolved
                        → silenced
                        → snoozed

silenced → new (manual reopen)
         → snoozed

snoozed → new (automatic, on next occurrence — see Auto-revert below)
        → in_progress, resolved, silenced (manual)
StatusMeaning
newDetected by the scan job, not yet investigated.
in_progressClaimed by a user or agent for investigation.
resolvedFixed. Tagged with the app version where the fix was applied (required — see Regression Detection below).
silencedKnown issue, notifications suppressed. Occurrences still tracked. Stays silenced even if it keeps happening — use when there's nothing to fix and you don't want to hear about it again (e.g. transient infra blip you've decided to live with).
snoozedSuspected one-off. Like silenced (no notifications, no fix version) but automatically reverts to new and re-fires the issue.new push the next time it recurs. Use when you think an error was a one-off and only want to be alerted if the assumption turns out wrong. The transition to snoozed records snoozed_at; the auto-revert clears it.
regressedWas resolved, but the error reappeared in a newer app version.

Auto-revert from snoozed

When the issue-scan job processes a new occurrence for an issue currently in snoozed, the job flips the issue's status to new, clears snoozed_at, and adds it to the per-team issue.new push the same way a brand-new prod issue would. There's no version comparison (snooze carries no fix claim) and no extra notification type — the team gets the same instant push they'd get if the issue had just been created. Once back in new, the next issue.digest cycle picks the issue up too.

Regression Detection

When the scan job finds an error matching a resolved issue, it compares the event's app_version against the issue's resolved_at_version. If the event version is newer, the issue is automatically set to regressed.

For this reason, resolved_at_version is required when transitioning an issue to resolved (across the dashboard, iOS app, CLI --version, MCP version, and the REST API). If you don't have a fix version, pick the option that matches your intent: use silenced for a known issue you've decided to live with (e.g. a transient infrastructure blip with nothing to fix — won't bother you again), or snoozed if you suspect it was a one-off and want to be re-alerted only if it actually recurs. Resolving without a version would silently disable regression detection for that issue, which is rarely what you want.

The comparison is semver-aware1.10.0 correctly regresses past 1.9.0 (and standard build-number suffixes like 1.2.3 (456) plus date-style versions like 2024.10.15 are handled). See Latest Version Detection for the full comparator rules.

If the incoming event itself has no app_version (e.g. backend events that don't tag versions), the scan does not trigger a regression — there's nothing to compare against.

Comments

Issues support comments for investigation notes. Both users and agents can add comments:

  • User comments (author_type: "user"): Posted via the dashboard or API with JWT auth.
  • Agent comments (author_type: "agent"): Posted via CLI or MCP with agent key auth.

Comments support markdown and are soft-deleted (excluded from queries, hard-deleted after 7 days).

Merging

If two issues turn out to be the same underlying problem (e.g., different normalization paths), they can be merged:

  1. All fingerprints from the source issue are reassigned to the target.
  2. All occurrences are moved (duplicates by session are skipped).
  3. All comments are transferred.
  4. The source issue is deleted.
  5. Future events matching any of the merged fingerprints automatically route to the surviving issue.

Dev vs Production

Both dev (is_dev: true) and production events create issues, tracked separately via the is_dev flag on each issue. Dev issues:

  • Are visible in the dashboard with a dev badge.
  • Are never included in email notifications.
  • Can be filtered with is_dev=true in the API.

Notifications

Two notification paths fire off issue activity, both production-only (dev issues are never surfaced to a team's inbox or push):

  • issue.new — instant fan-out. The issue_scan job dispatches one notification per team at the end of every run summarizing all production issues created or regressed during that scan. Bypasses the per-project digest cadence so push lands in close to real time. Defaults to in-app + mobile push on, email off.
  • issue.digest — per-project rollup. The issue_notify job assembles a periodic digest at each project's configured cadence and sends it to every team member over their enabled channels. Defaults to email only (in-app + mobile push off so the digest doesn't double up with the instant issue.new push).

Both types route through the unified notifications dispatcher, so each recipient's per-channel toggles at /dashboard/profile/notifications decide whether a given delivery becomes an inbox row, an email, a mobile push, or some combination.

Per-Project Alert Frequency

The issue_alert_frequency setting controls only the digest cadenceissue.new push fires every scan regardless.

FrequencyInterval
noneDigests disabled
hourlyEvery hour
6_hourlyEvery 6 hours
dailyEvery 24 hours (default)
weeklyEvery 7 days

The issue_notify job runs hourly @ :05 and checks each project's frequency. When a digest is due, it lists new and regressed issues with activity since the last notification. The job is silent when nothing has changed.

Configure the frequency via the project settings in the dashboard or the API (PATCH /v1/projects/:id with issue_alert_frequency). Setting it to none only mutes the digest — instant issue.new notifications still fire and can be muted per-user under notification preferences.

Permissions

PermissionDescription
issues:readList issues, view details, list comments
issues:writeUpdate status, merge, add/edit/delete comments

Both permissions are included in the default agent key permissions.

Ready to get started?

Connect your agent via MCP or CLI and start tracking.