Ranks more than raw speed.
Traditional benchmarks show speed. LlamaMon leaderboards show performance, cost, and compliance — all from one measured run.
View public entries and submit your own runs from the free core edition.
Adds richer workflow tooling on top of the same open benchmark submission lane.
All three lanes including legal trust, VEMR badge, and SHA-verified entries.
NVML-sourced, sampled per second during the measured session.
Actual VRAM usage versus recommended context ceiling per hardware tier.
Log-level metrics confirming stable startup and first-token responsiveness.
Session-level cost from electricity rate, token throughput, and kWh consumed.
Certified runs, filterable by enterprise reality
This view treats a leaderboard entry as a reproducible run package: model, quantization, hardware, cost profile, legal posture, and trust state all move together.
Every entry is a certified run recipe
A useful leaderboard needs model, quantization, hardware fingerprint, power profile, and trust status — not just a number.
Tokens/s, TTFT, p95 stability, context-size tested, and quantization-aware runtime behavior.
Tokens per watt, VRAM load, power curve, and hardware fit on common tiers like 8GB, 24GB, or multi-GPU.
Local cost per 1M tokens, cloud price comparison, amortization-aware TCO, and payback estimates.
Commercial-use risk, attribution requirements, checksum verification, and region-aware ESG reporting.
One measured run · three audience views
- Throughput (tok/s)
- TTFT in ms
- p95 stability
- VRAM fit by tier
- Tokens / watt
- Local cost / 1M tokens
- Cloud price comparison
- Breakeven horizon
- Amortization-aware TCO
- Hardware payback
- License class
- Commercial-use safety
- SHA checksum state
- Attribution requirements
- Security posture
Benchmark submission is part of the free core edition. Start with Community and publish your measured runs.