- Python 100%
DeepSeek (and any provider with loose structured-output support) can return a
bare JSON value like `42` instead of `{"answer": "42"}`. parse_json_content then
returned an int and `data["answer"]` raised TypeError, killing the whole run.
parse_json_content now only accepts a JSON object: a top-level non-dict falls
through to the brace scan and ultimately raises ValueError, which every phase
already catches and turns into a wrong answer / failed generation — matching the
"schema violation counts as wrong" rule instead of aborting.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|---|---|---|
| ai_battlequiz | ||
| tests | ||
| .gitignore | ||
| CLAUDE.md | ||
| config.example.toml | ||
| pyproject.toml | ||
| README.md | ||
AI Battlequiz
A CLI game where two or more LLMs challenge each other. Each round every model invents a quiz task with a reference solution, then every model — including the author, in a fresh session that never sees its own solution — answers every task. The author's own fresh-session answer is the built-in quality gate: no neutral judge model.
Runs against the OpenRouter API. Everything is logged losslessly to a single run file, which a separate render step turns into a website.
Scoring
- Correct answer to an opponent's task: +1, wrong: −1.
- The author's self-answer is the quality gate: correct → 0 points but unlocks the stump bonus; wrong → −1 and no bonus.
- Stump bonus: +1 to the author for every opponent who gets the task wrong — only if the author answered its own task correctly.
Grading is a strict normalized exact match (lowercased, whitespace collapsed). Models are pushed to over-specify the answer format, otherwise they fail their own task.
Install
pip install -e .
export OPENROUTER_API_KEY="sk-or-..."
Configure
Copy the example config and pick your models:
cp config.example.toml config.toml
Each [[models]] entry needs an OpenRouter id; name is an optional display label.
At least two models are required.
Play
ai-battlequiz run # uses config.toml
ai-battlequiz run --config foo.toml --rounds 5
ai-battlequiz run --skip-model-check # skip the OpenRouter catalog lookup
On startup every configured model id is verified against the OpenRouter catalog; the run aborts if any is missing. Progress is shown live in a modern CLI style.
Render a run to a website
ai-battlequiz render runs/run-20260610-153000.jsonl
# -> runs/run-20260610-153000.html
Module layout
| Module | Responsibility |
|---|---|
config.py |
Load & validate the TOML config |
client.py |
Thin async OpenRouter wrapper (structured output, retry) |
prompts.py |
Prompt builders + JSON schemas (generation vs answering) |
engine.py |
Round orchestration: generate → answer → grade → score |
grader.py |
Normalized exact-match grading & scoring |
logger.py |
Append-only lossless JSON-lines run log |
console.py |
Rich-based terminal UI |
renderer.py |
Run log → self-contained HTML site |
cli.py |
Argument parsing, startup checks, run loop |
Tests
pytest