Reference/Operator Runbook

Operator Runbook

This runbook describes production workflows for checking health, syncing Alpha trades, running sync diagnostics, triggering snapshots, dispatching alerts, and triaging missing numbers.

Production URLs

ServiceCanonical URL
Frontendhttps://vol-frontend-244916812493.us-east4.run.app
Backendhttps://vol-backend-244916812493.us-east4.run.app

Use canonical project-number URLs in docs, tickets, and handoff notes. Treat *.a.run.app as Cloud Run internal output only.

Health Check

Backend:

curl -s -o /dev/null -w '%{http_code}\n' \
  https://vol-backend-244916812493.us-east4.run.app/api/health

Expected:

200

Frontend:

curl -s -o /dev/null -w '%{http_code}\n' \
  https://vol-frontend-244916812493.us-east4.run.app/

Expected:

200

Alpha Trade Sync Workflow

Use /trades in the frontend.

1. Check Sync Scope

The OMS/PMS Sync card should show:

  • Portfolios: IMST, ICOI
  • Include pending: no by default
  • Last successful sync timestamp
  • Watermark

If the card shows a configuration error, check backend env vars and Secret Manager references.

2. Incremental Preview

Keep Dry run checked and leave the full-scope diagnostic unchecked.

Expected if no new Alpha trades exist:

  • Fetched = 0
  • Normalized = 0
  • Rejected = 0
  • Premium delta 0.00
  • Quantity delta 0.00
  • Sync diagnostic scope is incremental delta

Use incremental preview to check whether there are new or changed source rows.

3. Full Sync Diagnostic Preview

Check Dry run and the full-scope diagnostic.

Use this when you need to confirm the selected Alpha scope is fully loaded, or after any sync logic change.

Expected for a clean loaded scope:

  • Source rows equals cached rows
  • Source quantity equals cached quantity
  • Source premium USD equals cached premium USD
  • Quantity delta is 0.0
  • Premium delta USD is 0.0
  • No unmatched source IDs
  • No unmatched local IDs

4. Real Sync

Only turn Dry run off after the preview is acceptable.

After a real sync:

  • Last successful sync time should update.
  • Watermark should advance if new source rows were processed.
  • Trade table should show Alpha rows with portfolio, status, recon status, price per contract, multiplier, premium, and fees.

Current Verified Alpha Scope

The current production scope is:

portfolio_names=ICOI,IMST|include_pending=false

Known clean full-scope sync diagnostic evidence:

source_row_count: 749
local_scope_row_count: 749
source_total_quantity: 269267.0
local_total_quantity: 269267.0
quantity_delta: 0.0
source_total_premium_usd: 466630651.75
local_total_premium_usd: 466630651.75
premium_delta_usd: 0.0
unmatched_source_trade_ids: []
unmatched_local_trade_ids: []

Trade Analytics Backfill And Freshness

Real Alpha syncs from /trades now run this enrichment pipeline automatically after the Alpha rows are written. Use the manual checks below after deploying trade-performance analytics changes, when investigating a partial post-sync analytics result, or when the /performance page reports stale or missing enrichment.

The pipeline enriches synced Alpha executions with:

  • trade-time dashboard market context
  • execution benchmark quality
  • persisted forward outcome marks
  • quality counts for missing or stale enrichment
  • freshness state for the last successful analytics run
  • derived-on-read opportunity alignment

The analytics backfill reuses internal_trade_sync_state under the trade_analytics_pipeline source. It is safe to re-run because the underlying enrichment tables upsert by source trade ID and context key.

1. Check Trade Analytics Status

curl -sS \
  'https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/status?source=alpha_sync'

Optional portfolio-scoped status:

curl -sS \
  'https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/status?source=alpha_sync&portfolio=IMST'

Read the response as:

  • quality.total_trades: Alpha rows in the selected scope.
  • quality.missing_context: rows without trade-time market context.
  • quality.stale_context: rows where nearest market context was outside tolerance.
  • quality.unmapped_context: rows whose Alpha underlying does not map to a dashboard market proxy.
  • quality.missing_benchmark: rows without an execution benchmark record.
  • quality.unavailable_benchmark: rows where no defensible quote/model mark exists.
  • quality.available_outcomes: rows with a sourced 7d forward mark and computable P&L.
  • quality.pending_outcomes: rows where the 7d horizon has not matured yet.
  • quality.unavailable_outcomes: rows without sourced outcome marks.
  • freshness.last_successful_sync_at: last completed analytics backfill state write.
  • freshness.last_error: last non-dry-run pipeline error, if any.

Unavailable counts are quality states, not zero-valued metrics. Do not interpret missing context, unavailable benchmarks, or unavailable outcomes as neutral performance.

2. Dry-Run The Backfill

ADMIN_SECRET="$(gcloud secrets versions access latest --secret=vol-admin-secret)"

curl -sS -X POST \
  https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/backfill \
  -H "x-admin-secret: ${ADMIN_SECRET}" \
  -H 'Content-Type: application/json' \
  -d '{
    "source": "alpha_sync",
    "limit": 500,
    "dry_run": true,
    "include_context": true,
    "include_execution": true,
    "include_outcomes": false
  }'

Expected:

  • status is ok or partial_error.
  • rows_processed shows rows processed by persisted enrichment steps.
  • steps lists market_context, execution_benchmarks, opportunity_alignment, and outcome_attribution.
  • opportunity_alignment is derived_on_read; it recomputes from persisted context rows when the API is requested.
  • outcome_attribution reads persisted marks when present. It remains blank/unavailable until sourced marks are backfilled.
  • Dry run does not update freshness state.
  • If one step fails, the other still runs and the failed step is explicit in errors.

3. Run The Backfill

Only run this after the dry-run output is acceptable:

curl -sS -X POST \
  https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/backfill \
  -H "x-admin-secret: ${ADMIN_SECRET}" \
  -H 'Content-Type: application/json' \
  -d '{
    "source": "alpha_sync",
    "limit": 500,
    "dry_run": false,
    "include_context": true,
    "include_execution": true,
    "include_outcomes": false
  }'

After the real run, repeat the status check and verify:

  • freshness.last_successful_sync_at updated when there were no pipeline errors.
  • freshness.last_error is empty for a clean run.
  • freshness.last_run_summary.rows_processed reflects the latest non-dry-run enrichment batch.
  • missing/stale/unavailable counts are understood and documented before relying on Performance.
  • /performance loads and discloses quality states rather than converting unavailable values to zero.

4. Backfill Forward Outcome Marks

Forward marks are source-intensive because the backend looks up TradFi option quotes for the actual MSTR/COIN contract around each horizon. Run this as its own bounded batch, starting with dry run:

curl -sS -X POST \
  https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/outcomes/backfill \
  -H "x-admin-secret: ${ADMIN_SECRET}" \
  -H 'Content-Type: application/json' \
  -d '{
    "source": "alpha_sync",
    "limit": 100,
    "lookback_days": 365,
    "horizons": [1, 7, 30],
    "dry_run": true,
    "quote_window_minutes": 30,
    "max_mark_staleness_minutes": 60
  }'

If the dry run shows acceptable source coverage, rerun with "dry_run": false.

Interpretation:

  • unrealized_mark means sourced mark and P&L are available.
  • pending means the horizon has not matured.
  • unavailable means no fresh sourced mark or required economic input exists.
  • stale mark quality means a quote existed but was outside tolerance and is excluded from P&L.
  • Missing or stale marks are excluded from averages; they are not zero P&L.

5. Operational Sequence

Standard production sequence:

  1. Run Alpha full-scope sync diagnostic preview on /trades.
  2. Run Alpha real sync if the preview is clean.
  3. Review the post-sync analytics summary shown on /trades.
  4. Verify /performance.
  5. If the post-sync analytics summary is partial or failed, check analytics status.
  6. Dry-run analytics backfill.
  7. Run analytics backfill if the dry run is acceptable.
  8. Re-check analytics status.
  9. Verify /performance again.

Portfolio filtering scopes execution and outcome backfills. Use the global source=alpha_sync run for production refreshes unless you are deliberately refreshing one portfolio.

Trigger Daily Snapshot Manually

Snapshots are normally scheduled through Cloud Scheduler with an OIDC token from the compute service account. The scheduled/default run targets the latest complete UTC date (today - 1). To trigger manually, use the admin endpoint with the backend admin secret.

ADMIN_SECRET="$(gcloud secrets versions access latest --secret=vol-admin-secret)"

curl -sS -X POST \
  https://vol-backend-244916812493.us-east4.run.app/api/admin/snapshot \
  -H "x-admin-secret: ${ADMIN_SECRET}" \
  -H 'Content-Type: application/json' \
  -d '{}'

Pass target_date=YYYY-MM-DD only when you intentionally want a specific UTC snapshot date. Avoid broad historical backfills through the generic admin route unless the snapshot type is known to be historical/as-of safe.

Do not print the admin secret.

Verify Snapshot-Backed Pages

Start with /data. It is the first-class snapshot observability page and should show:

  • Latest status for gex, iv_surface, vol_metrics, spot, and oi by currency.
  • Freshness, source timestamp, record count, created time, and suspicious count/freshness warnings.
  • Latest status is evaluated against the latest complete UTC snapshot date (today - 1). Current-day partial rows can still appear in the coverage matrix, but should not make the latest-health cards look stale or suspicious.
  • Recent coverage matrix by UTC snapshot date so missing or empty rows are visible without running SQL.
  • Probability readiness for BTC/ETH/SOL 7d and 30d, including density_quality.state, area, input points, surface timestamp, and spot source.
  • A remediation-history panel. In the first pass this is derived from snapshot tables only; durable manual backfill audit history still needs deployment/runbook notes until a persisted remediation event table is added.

After snapshot changes or backfills, check:

  • /data: snapshot capture health, coverage gaps, and probability readiness.
  • /analytics: gamma profile, live GEX magnet/repeller levels with freshness state, thresholds, spot-vol, smile tracking, regime stationarity.
  • /probability: implied density, conditional density, touch probabilities, and the experimental portfolio candidate screen. The regime overlay uses stored gex_hourly history, not Model Diagnostics live strike-level GEX.
  • /flow: GEX time series and conditional returns.

No-data states usually mean one of:

  • Not enough snapshot history.
  • Latest snapshot missing required detail rows.
  • Spot history missing from gex_hourly.
  • Amberdata endpoint returned no payload for the selected currency/period.

Alert Signals And Slack Dispatch

On /analytics, the Alert Signals card can refresh current alert candidates and dispatch to Slack through an admin route.

Safe workflow:

  1. Click refresh.
  2. Review alert text and severity.
  3. Use dry run first where available.
  4. Dispatch only if alert text is acceptable.

If Slack dispatch fails:

  • Check backend Slack webhook configuration.
  • Check admin proxy path allowlist includes /api/alerts/dispatch.
  • Check backend logs for upstream HTTP errors.

Missing Number Triage

Overview price missing

Check:

  • /api/reference-rates?asset=btc
  • If reference rates return forbidden/unavailable, backend should fall back to derivatives metrics underlying price where possible.

Volume missing

Check:

  • /api/volume-24h?currency=BTC
  • Backend aggregates Amberdata level-1 quotes by volumeUSD and contract volume.

IV/RV or probability no-data

Check:

  • Snapshot history exists for iv_surface.
  • gex_hourly has recent spot rows.
  • Requested DTE has enough surface points.
  • Probability density quality passed. Degraded density usually means the selected IV surface is sparse, malformed, or still backed by old mixed-timestamp rows.

For IV surface remediation after the timestamp-preservation fix:

  1. Back up or export Cloud SQL before changing historical rows.
  2. Deploy the backend migration and snapshotter fix.
  3. Re-run iv_surface snapshots for affected currencies/dates, for example through the admin snapshot route for recent dates or back-end/backfill.py --types iv_surface for a controlled historical range.
  4. Confirm iv_surface_snapshots has distinct source timestamps per day and that probability API responses report density_quality.state = ok.
  5. If Amberdata retention prevents rebuilding older dates, treat those historical probability outputs as unavailable/degraded rather than using mixed surfaces.

Trade P&L blank

This may be correct. P&L should remain blank when required valuation inputs are missing or untrustworthy.

For Alpha rows, current known missing inputs include:

  • traded_iv
  • spot_at_trade
  • sourced execution-time bid/ask quote for the exact MSTR/COIN option contract
  • sourced forward mark for the exact MSTR/COIN option contract and horizon
  • authoritative realized P&L or validated lifecycle accounting

Production Incident Rule

If the dashboard shows a number that appears wrong, prioritize data accuracy over availability:

  1. Identify source endpoint/table.
  2. Check raw payload or DB row.
  3. Check normalization and unit conversion.
  4. Check UI formatting.
  5. If source or calculation cannot be verified, hide/blank the number rather than guessing.