Skip to main content

Scoring Explorer

Scoring ranks pattern matches by how surprising they are. This page walks through how the three scorers work with concrete numbers.

The scoring module is a post-processing layer. The engine finds matches; the scorers rank them. Each scorer answers a different question:

ScorerQuestionUnit
SurpriseScorerHow often does this pattern fire relative to my expectations?bits (higher = rarer)
StuScorerHow rare are the properties of this particular match?frequency (lower = rarer, except TfIdf)
SequentialScorerHow unexpected is this pattern given what just happened?bits (higher = rarer)

1. Pattern-Level Surprise (SurpriseScorer)

Shannon surprise compares observed frequency against a baseline expectation. The formula:

p = (match_count + 1) / (total_rounds + 1)       # Laplace smoothing
surprise = -log2(p / baseline) # bits

The +1 terms are Laplace smoothing -- they prevent division by zero and give novel patterns a small nonzero probability rather than undefined surprise.

Worked example

You set a baseline of 0.5 for the "betrayal" pattern (you expect it to fire about half the time). After 10 observation rounds, betrayal matched in 3 of them.

p = (3 + 1) / (10 + 1) = 4/11 = 0.364
surprise = -log2(0.364 / 0.5) = -log2(0.727) = 0.459 bits

Positive surprise: the pattern fires less often than expected.

Building intuition

This table shows how surprise changes as the match count varies across 10 rounds, with baseline = 0.5:

Matches (of 10)p = (n+1)/11p / baselinesurprise (bits)Interpretation
00.0910.1822.46Much rarer than expected
10.1820.3641.46Notably rare
20.2730.5450.87Somewhat rare
30.3640.7270.46Slightly rare
50.5451.091-0.13About as expected
70.7271.455-0.54Slightly common
101.0002.000-1.00Fires every round

Key observations:

  • Surprise is zero when observed frequency equals baseline. The Laplace smoothing shifts this slightly.
  • Positive surprise means rarer than expected. Negative means more common.
  • The scale is logarithmic: 1 bit of surprise means the pattern fires at half the expected rate.

What the baseline means

The baseline is your prior expectation, not a frequency computed from data. It encodes domain knowledge:

  • A common social interaction in a simulation might get baseline 0.5.
  • A betrayal event might get baseline 0.1.
  • A once-per-playthrough climax might get baseline 0.01.

If you set baseline = 0.1 for betrayal and it fires 3 out of 10 rounds:

p = (3 + 1) / (10 + 1) = 0.364
surprise = -log2(0.364 / 0.1) = -log2(3.636) = -1.86 bits

Negative surprise -- betrayal is firing more than expected. The pattern is unsurprising relative to this baseline.


2. Property-Level Surprise (StuScorer)

The StU heuristic (Kreminski et al., ICIDS 2022) goes deeper than pattern identity. Two matches of the same "betrayal" pattern might differ in who is involved. A betrayal by a loyal character is more surprising than one by a known schemer.

The scorer tracks properties -- categorical attributes like "actor_trait=ambitious" or "target_role=king". The frequency of each property within a pattern's match history determines how surprising it is.

Frequency formula

freq(property) = (count + 1) / (total_matches + V)    # Laplace smoothing

Where V is the vocabulary size (number of distinct properties observed for this pattern). The vocabulary-scaled denominator prevents novel properties from collapsing to zero probability.

Worked example

20 matches of the "betrayal" pattern have been observed. Two properties to track:

PropertyMatches containing itCount
actor_trait=ambitious33
actor_trait=loyal1515

The vocabulary size V = 2 (two distinct properties observed).

freq(ambitious) = (3 + 1) / (20 + 2) = 4/22  = 0.182
freq(loyal) = (15 + 1) / (20 + 2) = 16/22 = 0.727

Now score a match with both properties, using ArithmeticMean (the default):

raw_score = (0.182 + 0.727) / 2 = 0.455

Lower = more surprising. A match where both the actor is ambitious AND the target is loyal is middling -- one rare property, one common.

Aggregation modes

The same property frequencies can be combined four ways. Using the values above (ambitious = 0.182, loyal = 0.727):

ModeFormulaResultPolarity
ArithmeticMean(0.182 + 0.727) / 20.455Lower = more surprising
GeometricMeanexp((ln(0.182) + ln(0.727)) / 2)0.364Lower = more surprising
Minmin(0.182, 0.727)0.182Lower = more surprising
TfIdf-log2(0.182) + (-log2(0.727))2.919Higher = more surprising

The modes encode different "theories of surprise":

  • ArithmeticMean -- the original StU heuristic. Average rarity.
  • GeometricMean -- a single rare property pulls the score down multiplicatively. More sensitive to outliers than arithmetic mean.
  • Min -- only the single rarest property matters. If any one property is unusual, the whole match is surprising.
  • TfIdf -- total information content. Sums the self-information of each property. Reversed polarity: higher values mean more surprise.

Cold-start confidence

With only a few observations, the frequency estimates are noisy. The scorer attenuates scores toward "unsurprising" when data is sparse.

confidence = 1 - 1 / (total_matches + 1)
Matches observedConfidenceEffect
10.500Heavy attenuation -- scores halfway to unsurprising
30.750Moderate attenuation
100.909Mild attenuation
500.980Negligible
1000.990Near-transparent

For ArithmeticMean/GeometricMean/Min (lower = more surprising), the lerp pushes toward 1.0:

final_score = 1.0 - (1.0 - raw_score) * confidence

For TfIdf (higher = more surprising), it pushes toward 0.0:

final_score = raw_score * confidence

Concrete example: using the ArithmeticMean raw score of 0.455 from above:

ObservationsConfidenceFinal scorevs. raw 0.455
30.7501.0 - (1.0 - 0.455) * 0.75 = 0.591Pushed toward 1.0 (less surprising)
100.9091.0 - (1.0 - 0.455) * 0.909 = 0.505Closer to raw
1000.9901.0 - (1.0 - 0.455) * 0.990 = 0.460Nearly raw

The intuition: with 3 observations you do not yet know whether "ambitious" is truly rare or just hasn't shown up yet. The confidence weight hedges against premature conclusions.


3. Sequential Surprise (SequentialScorer)

Sequential surprise uses a bigram model: given that pattern A just completed, how unexpected is it that pattern B completes next?

P(current | prev) = (count + 1) / (total + V)    # Laplace smoothing
surprise = -log2(P(current | prev)) # bits

Where V is the number of distinct successors observed after prev, and total is the total number of transitions observed from prev.

Worked example

Over a simulation run, you observe these transitions after "betrayal" completes:

SuccessorCount
betrayal8
reconciliation2
exile1

Total transitions from "betrayal" = 11. Vocabulary V = 3 (three distinct successors).

Compute each transition probability with Laplace smoothing:

P(betrayal | betrayal)       = (8 + 1) / (11 + 3)  = 9/14  = 0.643
P(reconciliation | betrayal) = (2 + 1) / (11 + 3) = 3/14 = 0.214
P(exile | betrayal) = (1 + 1) / (11 + 3) = 2/14 = 0.143

Now compute sequential surprise:

TransitionP(next | betrayal)Surprise (bits)Interpretation
betrayal -> betrayal0.6430.64Common follow-up, low surprise
betrayal -> reconciliation0.2142.22Uncommon, narratively interesting
betrayal -> exile0.1432.81Rare, potentially dramatic

A novel successor (never seen before) also gets nonzero probability via Laplace smoothing:

P(forgiveness | betrayal) = (0 + 1) / (11 + 3) = 1/14 = 0.071
surprise = -log2(0.071) = 3.81 bits

The never-before-seen transition is the most surprising of all.

PMI correction

When two properties frequently co-occur, counting both independently inflates the surprise score. Pointwise Mutual Information (PMI) detects this correlation and corrects for it.

Worked example. 20 matches observed. Three properties tracked:

PropertyMatches containing it
actor_trait=warrior8 of 20
actor_trait=aggressive6 of 20
target_role=king4 of 20

"warrior" and "aggressive" co-occur in 5 of those 20 matches. Are they correlated?

P(warrior) = 8/20 = 0.40
P(aggressive) = 6/20 = 0.30
P(warrior, aggressive) = 5/20 = 0.25

Expected P(warrior AND aggressive) if independent = 0.40 * 0.30 = 0.12
PMI = log2(0.25 / 0.12) = log2(2.08) = 1.06 bits

PMI exceeds the 1-bit threshold, so the scorer flags this pair as correlated.

Before correction: Both warrior (freq = 0.40) and aggressive (freq = 0.30) contribute their individual rarity to the match score. A match with both properties gets "double surprise" from two seemingly rare traits.

After correction: The scorer replaces the marginal frequency of "aggressive" with the conditional frequency P(aggressive | warrior) = 5/8 = 0.625. The "aggressive" property is now much less rare in context -- if you are already a warrior, being aggressive is not surprising.

Effect: The match score decreases because the correlated surprise is removed. The remaining property ("target_role=king", freq = 0.20) still contributes its full rarity. The correction ensures that genuinely independent rare properties drive the score, not redundant co-occurrences.


Combining with pattern-level surprise

Sequential surprise and pattern-level surprise are independent measurements. A pattern can be common overall but surprising in context (or vice versa). Composing them is up to the caller -- for example, summing bits from SurpriseScorer and SequentialScorer gives joint information content.


Putting it together

A typical scoring pipeline:

  1. Engine produces matches (batch or incremental).
  2. SurpriseScorer ranks by pattern-level rarity: "betrayal is rare this run."
  3. StuScorer ranks by property-level rarity: "this betrayal is unusual because of who is involved."
  4. SequentialScorer ranks by contextual surprise: "a betrayal right after reconciliation is unexpected."

Each scorer operates independently. The caller decides how to weight and combine them -- there is no built-in composite score.


Further reading