Trade Copilot Evaluation Specification

This document defines the preparatory contract for accident-avoidance warnings, trade-action taxonomy, restructuring comparison, and candidate-screen diagnostics. It covers ENG-5246, ENG-5247, ENG-5248, and ENG-5250.

It is not an implementation and it does not authorize live recommendations. It defines the product and data contract that should be reviewed before code changes.

Purpose

The first valuable copilot behavior is accident avoidance:

Do not let the trader add exposure in a part of the surface where the current
regime and conditional distribution say the trade is poorly compensated.

The later behavior is restructuring evaluation:

Given the current book and regime, compare credible alternatives such as
buying back May, selling July, using June, trading a vertical, selling a 1x2,
or doing nothing.

Ticket Coverage

Ticket	Product concern	Preparatory output
`ENG-5246`	Accident-avoidance warnings for short upside vol restructurings.	Warning triggers, severity, evidence, and wording policy.
`ENG-5247`	Trade intent and action taxonomy.	Bounded action menu and intent-source policy.
`ENG-5248`	V1 restructuring evaluator.	Comparison dimensions and output contract.
`ENG-5250`	Candidate-screen diagnostics and filter provenance.	Filter counts, row reasons, and empty-state explanations.

Accident-Avoidance Warning Contract

Gordon's example:

trader is short May upside calls
spot rallies
trader buys back one strike and sells more of a higher strike, such as a 1x2-style roll
in a spot up / vol up regime, this can increase short vol in exactly the area where vol may reprice higher

The system should eventually detect this type of structure and explain the risk.

Minimum warning output:

accident_warning = {
  trade_or_scenario_id,
  warning_type,
  warning_severity,
  surface_region,
  exposure_change,
  regime_evidence,
  conditional_distribution_evidence,
  term_structure_evidence,
  liquidity_evidence,
  trust_state,
  source_quality,
  message,
  caveats
}

Warning types:

Warning type	Meaning
`short_upside_vol_in_positive_spot_vol_region`	Proposed action adds short upside vol where spot-up/vol-up behavior is active.
`short_band_underpriced_by_conditional`	Proposed short option lies in a strike band where conditional probability is materially above vanilla.
`front_month_gamma_reintroduced`	Proposed action adds front-month gamma/short-vol exposure after the thesis says front-month sensitivity is dangerous.
`term_structure_wrong_bucket`	Proposed action sells the tenor most sensitive to the current regime when a later tenor appears less sensitive.
`low_confidence_no_strong_warning`	Possible issue exists but trust/source quality is too weak for assertive wording.

Severity:

Severity	Use
`info`	Evidence is weak, missing, or exploratory.
`caution`	Evidence suggests risk, but trust is discounted or source coverage is incomplete.
`danger`	Evidence, trust, and exposure change all support a clear accident-avoidance warning.
`blocked`	Identity/source state is too poor to evaluate the action.

Warning wording must be risk-control language, not formal trade advice.

Trade Intent And Action Taxonomy

The current shortcut that "buy means long vol" and "sell means short vol" is useful for a first pass but is not enough for a copilot. The system needs action and intent labels.

Action taxonomy:

Action	Meaning
`open_long_vol`	Buy option exposure intended to own volatility/convexity.
`open_short_vol`	Sell option exposure intended to harvest volatility premium.
`close_or_reduce`	Buy back short exposure or sell long exposure to reduce risk.
`roll_strike`	Move exposure from one strike to another in the same tenor.
`roll_tenor`	Move exposure from one expiry/tenor to another.
`vertical_spread`	Buy one strike and sell another in same expiry.
`calendar_or_diagonal`	Trade same or related strike exposure across expiries.
`ratio_spread`	Trade unequal quantities across strikes, including 1x2 structures.
`hedge_overlay`	Add exposure intended to hedge existing book risk.
`do_nothing`	Explicitly leave exposure unchanged.
`unknown`	Source data cannot determine the action.

Intent source:

Source	Meaning
`explicit_source`	Trade feed or user input supplies strategy intent.
`manual_tag`	Operator manually tags the trade.
`inferred_lifecycle`	System infers action from position lifecycle and nearby trades.
`direction_fallback`	System falls back to buy/sell long-vol/short-vol assumption.
`unknown`	No defensible intent is available.

Every score should report the intent source. If intent is only inferred or direction-based, confidence should be lower.

V1 Restructuring Evaluator

The first evaluator should be a bounded comparator, not a full optimizer. It should compare realistic actions Gordon mentioned:

buy back May and do nothing else
buy back May and sell June
buy back May and sell July
trade a vertical spread
trade a calendar or diagonal
sell a 1x2 or similar ratio structure
do nothing

Scoring dimensions:

Dimension	Question
Regime fit	Does the action add or remove exposure in the current spot-vol regime?
Surface edge	Is the relevant strike/tenor rich or cheap after smile and conditional adjustments?
Conditional distribution impact	Does the action sell options where conditional probability exceeds vanilla?
Term structure	Does the action move exposure into a more or less sensitive tenor?
Liquidity	Is the contract tradable enough to support the comparison?
Source quality	Are marks, identity, and surface inputs direct, proxy, estimated, stale, or unavailable?
Portfolio exposure	Does the action reduce or increase existing concentrated risk?
Confidence	Do trust-engine and source states support the conclusion?

Minimum evaluator output:

restructuring_candidate = {
  scenario_id,
  action_type,
  legs,
  regime_fit_score,
  surface_edge_score,
  conditional_distribution_score,
  term_structure_score,
  liquidity_score,
  source_quality_score,
  portfolio_exposure_score,
  confidence,
  warning_flags,
  rank,
  explanation
}

The explanation should be direct. Example:

Selling more May upside improves premium but increases short exposure in a
front-month region where spot-vol sensitivity is positive and conditional
probability exceeds vanilla. Confidence is discounted because execution
liquidity is partial.

Candidate-Screen Diagnostics

The candidate screen ranks live option quotes against expected payoff under the conditional density and applies filters. Gordon and operators need to know why candidates appear or disappear.

Every candidate view should report:

starting candidate count
count removed by DTE filter
count removed by open-interest filter
count removed by premium filter
count removed by liquidity/source-quality filter
count removed by conditional-edge threshold
final candidate count
reason for empty results
source timestamps and trust state

Filter provenance output:

candidate_filter_summary = {
  starting_count,
  after_dte_count,
  after_open_interest_count,
  after_premium_count,
  after_liquidity_count,
  after_conditional_edge_count,
  final_count,
  removed_counts_by_filter,
  empty_state_reason,
  filter_settings,
  source_quality
}

Candidate rows should be described as construction context, not recommendations, until the warning and recommendation policy is approved.

Decision Points

Decision	Why it matters	Can proceed now?
Which action alternatives should V1 compare?	Keeps the evaluator bounded and trader-relevant.	Define menu now; final menu needs Gordon review.
What warning severity can appear before P&L data is complete?	Accident avoidance may be useful before outcome attribution is complete.	Define severity tiers now; strong warnings need approval.
What confidence threshold is required for `danger` warnings?	Prevents overconfident warnings from weak data.	Parameterize and document now.
Who supplies trade intent?	Buy/sell alone cannot identify hedges, closes, rolls, or spreads.	Add intent-source policy now; source integration later.
Should candidate rows be described as recommendations?	Affects legal/business interpretation and trader behavior.	Keep as diagnostics until approved.

Preparatory Acceptance

This specification is complete when:

accident warnings have named triggers and severity states
trade actions and intent-source labels are defined
restructuring comparison dimensions are explicit
candidate filters have provenance requirements
warning language is separated from formal recommendations
low-confidence and blocked states remain first-class outcomes