Metrics · Leaderboard

Ranks more than raw speed.

Traditional benchmarks show speed. LlamaMon leaderboards show performance, cost, and compliance — all from one measured run.

🏆
Operator Lane
Which setup performs best?
💰
Procurement Lane
Which saves the most money?
⚖️
Legal Lane
Which meets compliance requirements?
Who can do what
CommunityFree
Read + submit benchmarks

View public entries and submit your own runs from the free core edition.

ProfessionalPaid
Advanced analysis

Adds richer workflow tooling on top of the same open benchmark submission lane.

EnterprisePaid
Full certification

All three lanes including legal trust, VEMR badge, and SHA-verified entries.

What gets measured
GPU temperature & utilization

NVML-sourced, sampled per second during the measured session.

VRAM pressure & context fit

Actual VRAM usage versus recommended context ceiling per hardware tier.

Server logs & runtime readiness

Log-level metrics confirming stable startup and first-token responsiveness.

Token usage, cost & energy

Session-level cost from electricity rate, token throughput, and kWh consumed.

Interactive Leaderboard UI

Certified runs, filterable by enterprise reality

This view treats a leaderboard entry as a reproducible run package: model, quantization, hardware, cost profile, legal posture, and trust state all move together.

Rank
Run
Trust + legal
Speed
Economics
Fit
Score composition

Every entry is a certified run recipe

A useful leaderboard needs model, quantization, hardware fingerprint, power profile, and trust status — not just a number.

Performance

Tokens/s, TTFT, p95 stability, context-size tested, and quantization-aware runtime behavior.

Efficiency

Tokens per watt, VRAM load, power curve, and hardware fit on common tiers like 8GB, 24GB, or multi-GPU.

Economics

Local cost per 1M tokens, cloud price comparison, amortization-aware TCO, and payback estimates.

Compliance

Commercial-use risk, attribution requirements, checksum verification, and region-aware ESG reporting.

Three lanes

One measured run · three audience views

OperatorCommunity read · Community submit
Performance & efficiency
  • Throughput (tok/s)
  • TTFT in ms
  • p95 stability
  • VRAM fit by tier
  • Tokens / watt
ProcurementCommunity read · Community submit
Cost & ROI
  • Local cost / 1M tokens
  • Cloud price comparison
  • Breakeven horizon
  • Amortization-aware TCO
  • Hardware payback
LegalEnterprise only
Trust & compliance
  • License class
  • Commercial-use safety
  • SHA checksum state
  • Attribution requirements
  • Security posture
Want to submit your own runs?

Benchmark submission is part of the free core edition. Start with Community and publish your measured runs.

Download LlamaMon →