How the Engine Computes Historical Context

Overview

Konseki searches the recent price behaviour of a symbol against 10+ years of historical conditions across hundreds of S&P 500 symbols, scored on structural similarity rather than identical price levels. The matches it finds are returned with their forward outcomes — not as a prediction, but as a documented historical distribution.

Every output is structured around four blocks: meta (identity and freshness), benchmark (the current symbol's condition), analysis (aggregate statistics across the match set), and matches (the individual historical analogs themselves, unordered). This page explains how each of those blocks is actually computed.

Block: `matches`

What defines a historical match

A match is a historical period — for a possibly different symbol — whose price behaviour over the lookback window was structurally similar to the benchmark symbol's current behaviour. Similarity is computed across seven independent dimensions, captured in matches[].score_components:

Component	What it measures
`normalized_price_correlation`	How closely the price path's shape correlates with the benchmark's, independent of absolute price level.
`shape_distance`	Geometric distance between the two price paths after normalisation — the core structural similarity measure.
`volatility_distance`	Difference in realised volatility over the lookback window.
`trend_distance`	Difference in directional slope over the lookback window.
`range_position_distance`	Difference in where price sits within its own recent high/low range.
`volume_distance`	Difference in volume behaviour over the lookback window.
`risk_distance`	Difference in downside/drawdown character over the lookback window.

These seven distances combine into a single matches[].similarity_score — lower means closer. The match search runs across the full universe, not just the benchmark symbol's own history. This is the cross-market premise the entire engine is built on: structurally similar conditions can appear in a different symbol, a different sector, or a different period than the one being analysed, and the search treats all of those as valid evidence.

Matches are returned unordered. rank is deliberately not a field in the response — a fixed rank would only ever be correct for one sort order. Sort or filter the match array client-side by whichever component matters most for your use case.

Block: `matches`, `analysis`

Match quality scoring

Separate from the similarity score, every match also receives a match quality score on a 1–5 scale, broken into six component scores in matches[].match_quality.scores:

shape_similarity — how closely the price path's geometric shape matches
trend_similarity — how closely the directional slope matches
volatility_similarity — how closely realised volatility matches
range_position_similarity — how closely the position within the recent range matches
volume_similarity — how closely volume behaviour matches
risk_similarity — how closely downside character matches

Each component score also exposes its raw distance value, the scoring scale, and whether higher values are better — in matches[].match_quality.components — so you can see exactly how a 1–5 score was derived from the underlying distance metric, not just trust the final number.

At the aggregate level, analysis.match_quality reports the median overall score and median shape score across the full match set, plus a quality_tag of strong, moderate, or weak. This is computed once per file, not per forward window — match quality describes how well the setups matched at entry, which doesn't change depending on which forward window you're looking at.

Block: `benchmark`

Why benchmark.context spans 8 lookbacks

Each analysis file is scoped to a single lookback period — matches are searched against one specific window (meta.lookback_periods). But benchmark.context reports the symbol's volatility, slope, range position, volume z-score, and max drawdown across all 8 lookback periods — 5, 10, 15, 20, 25, 30, 40, and 50 days — regardless of which one the file's matches were searched on.

This is intentional, not redundant. It gives you cross-horizon regime awareness in a single fetch — looking at the LOOKBACK_15 file, you can still see whether the 50-day volatility regime looks calm or stressed, without making a second API call. The matches themselves are scoped to one lookback; the benchmark's own condition is reported across all of them.

Block: `analysis`

How seasonality is computed

Seasonality in Konseki is cross-symbol, never single-symbol calendar history. This is a deliberate methodological choice, not an oversight — the entire engine's premise is structural similarity across the universe, and a single-symbol seasonal stat (e.g. "AAPL tends to do X in June") would be the one feature computed on a fundamentally different basis than everything else in the output.

Instead, analysis.seasonality takes the existing match set — the same matches already found via structural similarity — and splits it into two groups: matches whose entry date fell in the same calendar month as the benchmark's current date, and matches from all other months. It then compares forward outcomes between the two groups.

// seasonality.same_month and .other_months each report:
"count": 8, // sample size for this group
"returns": { "3": { ... }, "5": { ... }, ... }

Read the count before the percentage. same_month.count is frequently small — a handful of matches, sometimes fewer. A 75% positive rate on 4 same-month matches is a much weaker signal than the same rate on 40. Always check the count fields before drawing conclusions from the seasonality comparison.

Block: `analysis.forward_outcome.{N}.tags`

What each tag means

Every forward window carries five machine-readable tags, computed independently per window — they can genuinely differ across horizons. A setup can be tagged bullish_strong at 3 days and mixed at 15 days as the evidence thins out or the historical pattern decays.

direction

The sign and strength of the historical skew for this window, derived from positive return rate and median return together, not either alone.

consistency

How tightly clustered historical outcomes were around the median. Wide dispersion across matches lowers consistency even if the average outcome looks clean.

reliability

Reflects evidence count and match diversity together — few matches, or matches concentrated in one symbol or time period, lower reliability regardless of how clean the outcome looks.

risk

Derived from intra-window drawdown behaviour (max adverse excursion) across the match set — a setup can have positive historical outcomes and still carry elevated intra-window risk.

outlier

Flags whether the aggregate statistics are being meaningfully driven by one or two extreme historical cases rather than the broader match set.

Tags are a triage layer, not a verdict. They're designed to let you filter across hundreds of symbols programmatically before reading the full distribution — not to replace reading the full distribution.

Block: `meta`

Dates and historical snapshots

Every day's complete output is stored as an independent, immutable snapshot — nothing is overwritten in place. meta.data_through reports the market date the analysis reflects, which is distinct from meta.generated_at since the pipeline runs after market close.

By default, the API resolves to the most recent available date. To query a specific historical date instead, pass it explicitly:

GET /v1/analysis/AAPL-NASDAQ?lookback=15&date=2026-01-02

This resolves directly to that date's stored snapshot — there's no reconstruction or recomputation involved, so historical queries return exactly what the engine actually output on that day, not a retroactively adjusted view.

Known limitations

The engine is built to be honest about the strength of its own evidence. These are the specific places where that honesty matters most:

Small sample sizes are common, and the output says so

Some symbol/lookback combinations return very few historical matches — sometimes a single digit. analysis.evidence_count and the reliability tag exist specifically so this is visible, not hidden behind a clean-looking percentage. A 100% positive rate on 2 matches is not strong evidence, and the commentary is written to say so explicitly.

Commentary uses confidence-hedged language throughout

Every natural-language field in commentary is written to describe what historically happened, never to assert what will happen. You'll see "historically leaned," "tended to," "in similar past conditions" — never "will," "should," or unqualified directional claims.

The engine does not claim to predict outcomes

Konseki surfaces what happened in structurally similar historical conditions. It does not model causality, does not account for information not present in price/volume data, and does not claim that historical patterns will repeat. The output is evidence to weigh, not a forecast.

Total price history is 20+ years; producible historical snapshots are concentrated in the more recent ~10 years

Computing a valid match set for any given date — including a historical snapshot from years ago — requires enough prior data before that date to search against. With roughly 20 years of total price history available, only dates in the more recent ~10 years have a full lookback window of prior data behind them. This is a data availability constraint, not a deliberate recency preference: as more historical data is added over time, the range of dates that can be fully computed will extend further back.

How the engine
computes historical context.