This prints a real score.json from a paper-canonical Claude Opus 4.7 attempt at WEB_task_1_mockup_pixel_diff — judge model, per-artifact evidence quotes, the works. ~1 second, no network calls. Use ...