Issues
Automatic error detection, deduplication, and tracking with resolution workflows and notifications.
Issues are error events automatically grouped by fingerprint into trackable units. A background job scans for error-level events every hour, deduplicates them, and creates issues that can be investigated, resolved, silenced, snoozed, or merged.
How Issues Are Created
The issue scan job runs hourly (system job, issue_scan). It:
- Queries all error-level events (
level = 'error') received since the last scan. - Normalizes each error message by stripping variable parts (UUIDs, numbers, quoted strings) and lowercasing.
- Generates a SHA-256 fingerprint from the normalized message + source module + an optional per-event-name discriminator (see Fingerprinting below).
- Groups events by session and clusters them into 5-second bursts (see below).
- For each burst: looks up each fingerprint in
issue_fingerprints; aliases new fingerprints onto a co-occurring issue when one exists, or creates a single new issue for the whole burst when none do. - Records a new occurrence (one per session per issue).
This means the scan has zero impact on event ingestion — the ingest endpoint is never slowed down.
Fingerprinting and Deduplication
Two error events are considered the same issue when they produce the same fingerprint. The fingerprint is computed from:
- Normalized message: Variable parts stripped —
"User 123 not found"and"User 456 not found"both normalize to"user <n> not found". - Source module: The
source_modulefield from the event, if present. - Discriminator (optional): The fingerprint can include an extra discriminator so semantically distinct errors don't collapse onto a single issue. Two cases use one today:
sdk:network_requestevents (see below) and any error event carrying an_error_typereserved attribute (set automatically when you callOwl.error(error)with an Error/Exception value — different runtime types stay on separate issues even when the message is identical). - App + dev mode: Fingerprints are scoped per app and per dev/prod mode.
The normalization strips:
- UUIDs (e.g.,
550e8400-...becomes<uuid>) - Numbers (e.g.,
404becomes<n>) - Quoted strings (e.g.,
"username"becomes"<s>")
Network request errors split by endpoint
sdk:network_request errors (connection failures emitted by the Swift SDK's URLSession instrumentation) discriminate on ${method} ${host}${templated_path} from the event's _http_url and _http_method custom attributes. So a connection failure to api.revenuecat.com/v1/subscribers/<id>/offerings becomes a different issue from one to your own backend, instead of all SDK-tracked network errors collapsing onto a single issue per source module.
Path templating runs over each /-separated segment to keep per-user IDs from fragmenting the issue:
- UUIDs become
<uuid>(e.g./v1/sessions/550e8400-...→/v1/sessions/<uuid>) - Pure-numeric segments become
<n>(e.g./users/123→/users/<n>) - 12+ char segments containing at least one digit become
<id>— generic across Firebase Auth UIDs, Stripe IDs (cus_*,sub_*,pi_*), MongoDB ObjectIds, Cuid/Cuid2, Nanoid, KSUID, ULID, Auth0 sub claims (URL-decoded soauth0%7C...works), anddid:plc:*DIDs. The "must contain a digit" guard prevents real endpoint names like/metrics-aggregator/from being templated.
Network issues get a human-readable title Network error: METHOD host/path (e.g. Network error: GET api.revenuecat.com/v1/subscribers/<id>/offerings).
Session-burst aliasing
A single logical failure often produces multiple error events — a loader throws, a caller catches and logs, an OwlOperation.fail(error:) emits metric:X:fail. Each has its own (message, source_module), so each hashes to a distinct fingerprint. Without aliasing, a single failure would create many separate issues.
To avoid that, the scan clusters events by session and time. Within one session, any set of error events whose timestamps all fall within 5 seconds of the burst's first event is treated as one logical failure.
Rules:
- No existing issue in the burst → create one issue for the entire burst; alias every distinct fingerprint in the burst onto it. The title is picked from the first event whose message does not begin with
metric:,step:, ortrack:(lifecycle prefixes are less human-readable as titles). - Some fingerprints already have an issue → pick the oldest existing issue as the alias target and attach only previously-unseen fingerprints to it. Pre-existing issues are never merged by the scan — if two already-distinct issues happen to co-occur in a burst, each keeps its own fingerprints and receives only its own events.
- Dev and prod never cross-alias —
is_dev=trueandis_dev=falseare partitioned independently inside a burst.
Once a fingerprint is aliased onto an issue, future events of that fingerprint route to that issue regardless of whether they appear in a burst.
Known limitation: the burst is computed only over events returned by the current scan's received_at > last_scan_time filter. If a session's burst spans a scan boundary (e.g., a delayed retry flush causes one event to arrive in a later scan), those events may not be aliased together. In practice the scan is hourly and the burst window is 5s, so this is rare.
Occurrences
Each issue tracks occurrences — one per unique session where the error happened. An occurrence records:
- Session ID: Links back to the session's full event stream for investigation.
- User ID: Which user was affected (if known).
- App version: Which version the error occurred in.
- SDK name and version: Which Owlmetry SDK (e.g.
owlmetry-swift) and SDK version produced the originating event. Auto-stamped by official SDKs, nullable. - Environment: iOS, Android, web, backend, etc.
- Timestamp: When the error occurred.
The issue's occurrence count and unique user count are denormalized for fast querying and sorting by severity. Issue rows also carry first_seen_app_version and last_seen_app_version, denormalised by the hourly scan from the underlying occurrences. Pair last_seen_app_version with the app's latest_app_version to tell whether an issue is still happening on the current release — see Latest Version Detection for the comparison rules and how the green/amber badge is rendered.
Each occurrence also captures sdk_name and sdk_version from the originating event, and the issue rolls these up as first_seen_sdk_version and last_seen_sdk_version — useful for answering "is this issue specific to a particular SDK release?" without scanning every occurrence.
Status Lifecycle
Issues follow a status lifecycle:
new → in_progress → resolved
→ silenced
→ snoozed
resolved → regressed (automatic, via scan job)
→ new (manual reopen)
→ snoozed
regressed → in_progress → resolved
→ silenced
→ snoozed
silenced → new (manual reopen)
→ snoozed
snoozed → new (automatic, on next occurrence — see Auto-revert below)
→ in_progress, resolved, silenced (manual)| Status | Meaning |
|---|---|
new | Detected by the scan job, not yet investigated. |
in_progress | Claimed by a user or agent for investigation. |
resolved | Fixed. Tagged with the app version where the fix was applied (required — see Regression Detection below). |
silenced | Known issue, notifications suppressed. Occurrences still tracked. Stays silenced even if it keeps happening — use when there's nothing to fix and you don't want to hear about it again (e.g. transient infra blip you've decided to live with). |
snoozed | Suspected one-off. Like silenced (no notifications, no fix version) but automatically reverts to new and re-fires the issue.new push the next time it recurs. Use when you think an error was a one-off and only want to be alerted if the assumption turns out wrong. The transition to snoozed records snoozed_at; the auto-revert clears it. |
regressed | Was resolved, but the error reappeared in a newer app version. |
Auto-revert from snoozed
When the issue-scan job processes a new occurrence for an issue currently in snoozed, the job flips the issue's status to new, clears snoozed_at, and adds it to the per-team issue.new push the same way a brand-new prod issue would. There's no version comparison (snooze carries no fix claim) and no extra notification type — the team gets the same instant push they'd get if the issue had just been created. Once back in new, the next issue.digest cycle picks the issue up too.
Regression Detection
When the scan job finds an error matching a resolved issue, it compares the event's app_version against the issue's resolved_at_version. If the event version is newer, the issue is automatically set to regressed.
For this reason, resolved_at_version is required when transitioning an issue to resolved (across the dashboard, iOS app, CLI --version, MCP version, and the REST API). If you don't have a fix version, pick the option that matches your intent: use silenced for a known issue you've decided to live with (e.g. a transient infrastructure blip with nothing to fix — won't bother you again), or snoozed if you suspect it was a one-off and want to be re-alerted only if it actually recurs. Resolving without a version would silently disable regression detection for that issue, which is rarely what you want.
The comparison is semver-aware — 1.10.0 correctly regresses past 1.9.0 (and standard build-number suffixes like 1.2.3 (456) plus date-style versions like 2024.10.15 are handled). See Latest Version Detection for the full comparator rules.
If the incoming event itself has no app_version (e.g. backend events that don't tag versions), the scan does not trigger a regression — there's nothing to compare against.
Comments
Issues support comments for investigation notes. Both users and agents can add comments:
- User comments (
author_type: "user"): Posted via the dashboard or API with JWT auth. - Agent comments (
author_type: "agent"): Posted via CLI or MCP with agent key auth.
Comments support markdown and are soft-deleted (excluded from queries, hard-deleted after 7 days).
Merging
If two issues turn out to be the same underlying problem (e.g., different normalization paths), they can be merged:
- All fingerprints from the source issue are reassigned to the target.
- All occurrences are moved (duplicates by session are skipped).
- All comments are transferred.
- The source issue is deleted.
- Future events matching any of the merged fingerprints automatically route to the surviving issue.
Dev vs Production
Both dev (is_dev: true) and production events create issues, tracked separately via the is_dev flag on each issue. Dev issues:
- Are visible in the dashboard with a dev badge.
- Are never included in email notifications.
- Can be filtered with
is_dev=truein the API.
Notifications
Two notification paths fire off issue activity, both production-only (dev issues are never surfaced to a team's inbox or push):
issue.new— instant fan-out. Theissue_scanjob dispatches one notification per team at the end of every run summarizing all production issues created or regressed during that scan. Bypasses the per-project digest cadence so push lands in close to real time. Defaults to in-app + mobile push on, email off.issue.digest— per-project rollup. Theissue_notifyjob assembles a periodic digest at each project's configured cadence and sends it to every team member over their enabled channels. Defaults to email only (in-app + mobile push off so the digest doesn't double up with the instantissue.newpush).
Both types route through the unified notifications dispatcher, so each recipient's per-channel toggles at /dashboard/profile/notifications decide whether a given delivery becomes an inbox row, an email, a mobile push, or some combination.
Per-Project Alert Frequency
The issue_alert_frequency setting controls only the digest cadence — issue.new push fires every scan regardless.
| Frequency | Interval |
|---|---|
none | Digests disabled |
hourly | Every hour |
6_hourly | Every 6 hours |
daily | Every 24 hours (default) |
weekly | Every 7 days |
The issue_notify job runs hourly @ :05 and checks each project's frequency. When a digest is due, it lists new and regressed issues with activity since the last notification. The job is silent when nothing has changed.
Configure the frequency via the project settings in the dashboard or the API (PATCH /v1/projects/:id with issue_alert_frequency). Setting it to none only mutes the digest — instant issue.new notifications still fire and can be muted per-user under notification preferences.
Permissions
| Permission | Description |
|---|---|
issues:read | List issues, view details, list comments |
issues:write | Update status, merge, add/edit/delete comments |
Both permissions are included in the default agent key permissions.
