Benchmark evidence
Phonton benchmark pages should separate measured evidence from product positioning. Planner estimates are useful release evidence, but they are not provider invoices and not competitor comparisons.
Current benchmark artifacts include:
static/benchmark-results/plan-benchmark-20260503-192834.json
static/benchmark-results/plan-benchmark-20260503-192834.md
Open the public benchmark page:
https://phonton.dev/benchmarks/
What a benchmark should include
- fixture repository and pinned commit;
- exact prompt or goal;
- Phonton version and model route;
- raw logs;
- final diff or plan output;
- verification output;
- provider-reported usage when available;
- known gaps and failed runs.
Wording rule
Use "designed for context efficiency" until raw provider usage, final diffs, and verification output support a stronger claim.