Reference/Trade Copilot Evaluation Specification

Trade Copilot Evaluation Specification

This document defines the preparatory contract for accident-avoidance warnings, trade-action taxonomy, restructuring comparison, and candidate-screen diagnostics. It covers ENG-5246, ENG-5247, ENG-5248, and ENG-5250.

It is not an implementation and it does not authorize live recommendations. It defines the product and data contract that should be reviewed before code changes.

Purpose

The first valuable copilot behavior is accident avoidance:

Do not let the trader add exposure in a part of the surface where the current
regime and conditional distribution say the trade is poorly compensated.

The later behavior is restructuring evaluation:

Given the current book and regime, compare credible alternatives such as
buying back May, selling July, using June, trading a vertical, selling a 1x2,
or doing nothing.

Ticket Coverage

TicketProduct concernPreparatory output
ENG-5246Accident-avoidance warnings for short upside vol restructurings.Warning triggers, severity, evidence, and wording policy.
ENG-5247Trade intent and action taxonomy.Bounded action menu and intent-source policy.
ENG-5248V1 restructuring evaluator.Comparison dimensions and output contract.
ENG-5250Candidate-screen diagnostics and filter provenance.Filter counts, row reasons, and empty-state explanations.

Accident-Avoidance Warning Contract

Gordon's example:

  • trader is short May upside calls
  • spot rallies
  • trader buys back one strike and sells more of a higher strike, such as a 1x2-style roll
  • in a spot up / vol up regime, this can increase short vol in exactly the area where vol may reprice higher

The system should eventually detect this type of structure and explain the risk.

Minimum warning output:

accident_warning = {
  trade_or_scenario_id,
  warning_type,
  warning_severity,
  surface_region,
  exposure_change,
  regime_evidence,
  conditional_distribution_evidence,
  term_structure_evidence,
  liquidity_evidence,
  trust_state,
  source_quality,
  message,
  caveats
}

Warning types:

Warning typeMeaning
short_upside_vol_in_positive_spot_vol_regionProposed action adds short upside vol where spot-up/vol-up behavior is active.
short_band_underpriced_by_conditionalProposed short option lies in a strike band where conditional probability is materially above vanilla.
front_month_gamma_reintroducedProposed action adds front-month gamma/short-vol exposure after the thesis says front-month sensitivity is dangerous.
term_structure_wrong_bucketProposed action sells the tenor most sensitive to the current regime when a later tenor appears less sensitive.
low_confidence_no_strong_warningPossible issue exists but trust/source quality is too weak for assertive wording.

Severity:

SeverityUse
infoEvidence is weak, missing, or exploratory.
cautionEvidence suggests risk, but trust is discounted or source coverage is incomplete.
dangerEvidence, trust, and exposure change all support a clear accident-avoidance warning.
blockedIdentity/source state is too poor to evaluate the action.

Warning wording must be risk-control language, not formal trade advice.

Trade Intent And Action Taxonomy

The current shortcut that "buy means long vol" and "sell means short vol" is useful for a first pass but is not enough for a copilot. The system needs action and intent labels.

Action taxonomy:

ActionMeaning
open_long_volBuy option exposure intended to own volatility/convexity.
open_short_volSell option exposure intended to harvest volatility premium.
close_or_reduceBuy back short exposure or sell long exposure to reduce risk.
roll_strikeMove exposure from one strike to another in the same tenor.
roll_tenorMove exposure from one expiry/tenor to another.
vertical_spreadBuy one strike and sell another in same expiry.
calendar_or_diagonalTrade same or related strike exposure across expiries.
ratio_spreadTrade unequal quantities across strikes, including 1x2 structures.
hedge_overlayAdd exposure intended to hedge existing book risk.
do_nothingExplicitly leave exposure unchanged.
unknownSource data cannot determine the action.

Intent source:

SourceMeaning
explicit_sourceTrade feed or user input supplies strategy intent.
manual_tagOperator manually tags the trade.
inferred_lifecycleSystem infers action from position lifecycle and nearby trades.
direction_fallbackSystem falls back to buy/sell long-vol/short-vol assumption.
unknownNo defensible intent is available.

Every score should report the intent source. If intent is only inferred or direction-based, confidence should be lower.

V1 Restructuring Evaluator

The first evaluator should be a bounded comparator, not a full optimizer. It should compare realistic actions Gordon mentioned:

  • buy back May and do nothing else
  • buy back May and sell June
  • buy back May and sell July
  • trade a vertical spread
  • trade a calendar or diagonal
  • sell a 1x2 or similar ratio structure
  • do nothing

Scoring dimensions:

DimensionQuestion
Regime fitDoes the action add or remove exposure in the current spot-vol regime?
Surface edgeIs the relevant strike/tenor rich or cheap after smile and conditional adjustments?
Conditional distribution impactDoes the action sell options where conditional probability exceeds vanilla?
Term structureDoes the action move exposure into a more or less sensitive tenor?
LiquidityIs the contract tradable enough to support the comparison?
Source qualityAre marks, identity, and surface inputs direct, proxy, estimated, stale, or unavailable?
Portfolio exposureDoes the action reduce or increase existing concentrated risk?
ConfidenceDo trust-engine and source states support the conclusion?

Minimum evaluator output:

restructuring_candidate = {
  scenario_id,
  action_type,
  legs,
  regime_fit_score,
  surface_edge_score,
  conditional_distribution_score,
  term_structure_score,
  liquidity_score,
  source_quality_score,
  portfolio_exposure_score,
  confidence,
  warning_flags,
  rank,
  explanation
}

The explanation should be direct. Example:

Selling more May upside improves premium but increases short exposure in a
front-month region where spot-vol sensitivity is positive and conditional
probability exceeds vanilla. Confidence is discounted because execution
liquidity is partial.

Candidate-Screen Diagnostics

The candidate screen ranks live option quotes against expected payoff under the conditional density and applies filters. Gordon and operators need to know why candidates appear or disappear.

Every candidate view should report:

  • starting candidate count
  • count removed by DTE filter
  • count removed by open-interest filter
  • count removed by premium filter
  • count removed by liquidity/source-quality filter
  • count removed by conditional-edge threshold
  • final candidate count
  • reason for empty results
  • source timestamps and trust state

Filter provenance output:

candidate_filter_summary = {
  starting_count,
  after_dte_count,
  after_open_interest_count,
  after_premium_count,
  after_liquidity_count,
  after_conditional_edge_count,
  final_count,
  removed_counts_by_filter,
  empty_state_reason,
  filter_settings,
  source_quality
}

Candidate rows should be described as construction context, not recommendations, until the warning and recommendation policy is approved.

Decision Points

DecisionWhy it mattersCan proceed now?
Which action alternatives should V1 compare?Keeps the evaluator bounded and trader-relevant.Define menu now; final menu needs Gordon review.
What warning severity can appear before P&L data is complete?Accident avoidance may be useful before outcome attribution is complete.Define severity tiers now; strong warnings need approval.
What confidence threshold is required for danger warnings?Prevents overconfident warnings from weak data.Parameterize and document now.
Who supplies trade intent?Buy/sell alone cannot identify hedges, closes, rolls, or spreads.Add intent-source policy now; source integration later.
Should candidate rows be described as recommendations?Affects legal/business interpretation and trader behavior.Keep as diagnostics until approved.

Preparatory Acceptance

This specification is complete when:

  • accident warnings have named triggers and severity states
  • trade actions and intent-source labels are defined
  • restructuring comparison dimensions are explicit
  • candidate filters have provenance requirements
  • warning language is separated from formal recommendations
  • low-confidence and blocked states remain first-class outcomes