Skip to main content

Admin Alerts

UTM writes structured error rows to the audit log whenever a critical trading function fails. The audit log is the forensic record. Admin alerts are the page: when an error row of a known shape lands, UTM emails the admin recipient list and posts a webhook so an operator hears about the problem before the next daily audit.

This page describes what fires an alert, how to configure the recipients, and how the dedup window stops a burst of failures from paging repeatedly.

When an alert fires

Four alert types map to four call sites. Each alert carries a type, severity (HIGH or CRITICAL), a one-line title, a longer message, and a details object with the entity ids relevant to the failure.

TypeSeverityTrigger
AUTO_CLOSE_FAILURECRITICALThe auto-close worker could not place a close order for a tracked trade at market close, or the post-close discrepancy check found a position that did not flatten after the settle window.
SYNC_FAILUREHIGHThe periodic sync worker failed an entire user iteration (not a transient single-account flake).
ORDER_FAILUREHIGHA non-retryable broker rejection landed an order in FAILED. Normal CANCELLED and EXPIRED order lifecycle does not trigger this.
CREDENTIAL_ERRORHIGHA broker account transitioned to credentialStatus=invalid after the circuit-breaker threshold (three consecutive 401s).

The details object is always populated with the relevant ids (tradeId, strategyId, accountId, brokerId, userId, orderId) so a webhook recipient can deep-link straight to the affected entity.

Configuration

Recipients and the webhook URL live in SystemSetting so they can be changed from the admin Monitoring UI without a redeploy. They are not environment variables.

Three keys drive the system:

KeyTypeDefaultPurpose
admin_alert.enabledbooleanfalseMaster switch. When false, dispatch is a silent no-op regardless of the values below.
admin_alert.email_recipientsstring""Comma-separated email addresses. Leave empty to disable email dispatch.
admin_alert.webhook_urlstring""Single webhook URL. Slack, Discord, and PagerDuty all accept the JSON body that UTM posts. Leave empty to disable webhook dispatch.

Set the recipients first, then flip admin_alert.enabled=true. The default false prevents the system from blasting an unconfigured mailbox the moment the migration runs.

Email dispatch

Email goes through the same notification email path used elsewhere in UTM (SMTP or Resend, whichever is configured under Email Configuration). Each recipient gets one email per alert. Subject prefix is [UTM CRITICAL] or [UTM HIGH] followed by the alert type and title.

Webhook dispatch

Body is a JSON object:

{
"type": "AUTO_CLOSE_FAILURE",
"severity": "CRITICAL",
"title": "Auto-close failed: AAPL",
"message": "Worker could not place close order for trade ...",
"details": {
"tradeId": "...",
"strategyId": "...",
"errorMessage": "..."
},
"timestamp": "2026-05-19T20:00:00.000Z"
}

A 5xx response triggers one retry. A 4xx response is treated as non-retryable. When both attempts fail (or a 4xx lands), UTM writes an alert.webhook_failed row to the audit log at severity=error with the full payload so the alert is never fully lost.

Send test email

The Admin Alerts card has a "Send test email" button next to the email recipients field. It exercises the same email path a live page uses, so you can confirm deliverability for a specific address without waiting for a real failure to fire.

  • Type a single address into the recipients box and click "Send test email" to target just that address. Leave the box on the saved list (or type a comma-separated list and save it first) to test every configured recipient.
  • The result shows per recipient: delivered with the provider message id, or failed with the provider error. The message id is the handle you use to look the send up in the Resend dashboard when chasing a provider-side drop.
  • Every attempt persists one alert.test_email row to the audit log: severity=info on success, severity=error on failure, with the recipient, message id, and error in metadata. This is the in-product delivery evidence, so a one-off miss is reproducible after the fact.

The endpoint is POST /api/v1/admin/alerts/test and requires the admin:write scope. The body is an optional { "email": "..." }; when omitted the configured admin_alert.email_recipients list is used. The response is a recipients array of per-recipient results. Email delivery uses the active provider (SMTP or Resend) configured under Email Configuration, so configure a provider first.

Dispatched audit row

Every successful dispatch writes one alert.dispatched row to the audit log at severity=error under category=admin. This is the error-tier trail the daily audit grades, so an out-of-band page is no longer invisible to the audit's error sections. The row carries the alert type, severity, title, dedupKey, and full details in its metadata, and links the affected userId when one is present. The write has its own guard, so a logging failure can never break the page itself.

Naming the affected user

When the details carry a userId, UTM resolves it to an email and folds an ownerEmail field into the details before dispatch, so the email body, the webhook payload, and the dispatched audit row all name the person rather than showing a bare UUID. The lookup is best-effort: a missing user or a database blip leaves the alert untouched and the page still fires.

Dedup

The same (type, entityId) is suppressed for 15 minutes after a successful dispatch. Practically, a single stuck trade pages once at 4:01 pm and not 100 times as the auto-close worker keeps retrying.

Dedup state is per-process and in-memory by design. A process restart clears the window, which is desirable: a restart often means an operator is already paying attention, and one extra page on restart-with-still-stuck-state is safer than swallowing the signal.

Disabled or unconfigured

When admin_alert.enabled=false or both email_recipients and webhook_url are empty, dispatch is a no-op. UTM writes one alert.skipped row to the audit log per process so the silence is visible during a triage. Repeated skips in the same process do not re-emit the row.

Reading the audit log

Both the dispatch path and the fallbacks land in the audit log under category=admin. Useful filters:

ActionMeaning
alert.dispatchedAn alert was sent. Written at severity=error with the full alert metadata.
alert.skippedThe system was disabled or unconfigured when an alert tried to dispatch.
alert.webhook_failedThe webhook failed both attempts. Body is in metadata.
alert.test_emailA test email was sent from the admin tool. info on success, error on failure; recipient, message id, and error are in metadata.

What this is not

This system is the operator page for trading-critical failures. It is not a replacement for:

  • Per-user notifications (those still fire to the affected user via the in-app and push channels).
  • Production observability (Sentry, OpenTelemetry traces, log aggregation).
  • An on-call rotation. The webhook URL is a single endpoint; PagerDuty or similar handles rotation downstream.