Methodology

How the simulation works, from plate appearances to AI decisions.

Simulation Engine

Deep Dugout simulates baseball at the plate appearance level. Each at-bat resolves through a chained binomial decision tree: HBP, walk, strikeout, error, home run, hit type, or out type. The probabilities come from the Odds Ratio Method, which combines batter rates, pitcher rates, and league averages to produce matchup-specific outcomes.

The engine handles left/right splits, switch hitters, pitcher fatigue (times through the order + pitch count), and home field advantage. Baserunning uses speed-dependent advancement probabilities. All randomness flows through a seeded random number generator, making every game fully reproducible.

Statistical Foundation

Player stats come from FanGraphs (2025 season actuals via the pybaseball library). Rosters come from the MLB Stats API (40-man rosters). Player IDs are cross-referenced using the Chadwick Baseball Bureau register, with a name+team fallback for newer players not yet in the Chadwick database.

Players without FanGraphs data (fringe 40-man roster players with zero or minimal MLB innings) fall back to league averages. The simulation engine handles this gracefully — these players would not realistically make a 26-man World Series roster.

AI Manager Design

Each AI manager implements a ManagerInterface Protocol with four decision hooks:

  • set_lineup() — Set the batting order before each game
  • select_starting_pitcher() — Choose the starter
  • should_pull_pitcher() — Evaluate whether to make a pitching change
  • select_relief_pitcher() — Choose the replacement

The AI manager calls the Claude API for each decision, with the full game state, roster context, and managerial personality as the prompt. A smart query gate reduces API calls by only consulting the AI in high-leverage situations (leverage index ≥ 1.5, high pitch counts, multiple runs allowed, etc.). Routine situations fall back to a heuristic manager silently.

Prompt Caching

The system prompt (personality + full roster context, ~1500 tokens) uses Anthropic's prompt caching. The first call per session pays full price; subsequent calls get a 90% discount on cached input tokens. This kept costs to $1.56 for a full 7-game Claude Sonnet 4.6 vs Sonnet 4.6 series.

Graceful Degradation

Every API call is wrapped in error handling. On any failure — network error, rate limit, malformed JSON, invalid player IDs — the manager falls back to the heuristic manager and logs the reason. A game will always complete, even if the API goes down mid-simulation. Across both series (13 games, 418 API calls), there were 19 fallbacks — all handled silently with zero impact on gameplay.

Personality System

Each team's AI manager operates under an ~800-word system prompt expressing a distinct managerial philosophy. These aren't just flavor text — they create observable differences in decision patterns.

The Optimizer (Dodgers) quotes leverage indexes, tracks times-through-the-order penalties, and makes pitching changes based on probability thresholds. The Skipper (Mariners) talks about "feel" and "trust," keeps starters in longer, and manages the bullpen across the arc of a series rather than optimizing each individual game.

Reproducibility

Every game uses a deterministic seed (base_seed + game_number). Given the same seed, the same rosters, and the same AI decisions, a game will produce identical results. The AI manager's decisions are the only source of non-determinism — and every decision is logged with the full prompt, response, token usage, and latency.

All game logs, decision logs, and content are serialized as JSON. The entire project — simulation engine, AI manager, content pipeline, and this website — is open source.

Content Pipeline

Game recaps, press conferences, beat writer analysis, and series narratives are generated by Claude from the raw game logs and decision data. Each content type has its own system prompt establishing voice, style, and structure guidelines. Audio podcasts are generated via ElevenLabs text-to-speech.

Total content generation cost across both series: $1.41 for 57 markdown files.