{"contract":{"answer_format":"boxed","marker":"\\boxed{...}","description":"Models must wrap their final answer in \\boxed{}. The grader extracts the LAST \\boxed{} content. No box, no credit. Numeric normalizer handles \\frac vs slash, digit-grouping, currency markers, LaTeX text-mode wrappers."},"statuses":{"CORRECT":"Boxed answer matches ground truth.","WRONG":"Boxed answer does not match ground truth (real reasoning error).","NO_BOX":"Model returned a normal response but did not use \\boxed{} (genuine format-contract failure).","VENDOR_ERROR":"Vendor (Mistral, OpenAI, xAI, etc.) returned a rate-limit/timeout/auth error and the Worker returned a graceful fallback in the model's place. Model never got to answer. Excluded from capability score.","ERROR":"Network or unexpected exception."},"scoring":{"pct":"correct / total — raw deployment fitness (vendor outages count against)","pct_of_valid":"correct / (total - vendor_errors) — model capability score (vendor outages excluded)","contract_followed_pct":"contract_followed / valid — format compliance among valid responses"},"categories":["math","logic","factual","science_mcq","code","chain_reasoning"],"num_questions":30,"license":"CC0 1.0 Universal","constitutional_basis":"Article 22 — no AI scores another AI. Mechanical grading only.","chain":"UNBROKEN","day":186,"ts":"2026-04-27T10:48:55.783Z"}