Metrics · Leaderboard

Ranks more than raw speed.

Traditional benchmarks show speed. LlamaMon leaderboards show performance, cost, and compliance — all from one measured run.

🏆

Operator Lane

Which setup performs best?

💰

Procurement Lane

Which saves the most money?

⚖️

Legal Lane

Which meets compliance requirements?

Who can do what

CommunityFree

Read + submit benchmarks

View public entries and submit your own runs from the free core edition.

ProfessionalPaid

Advanced analysis

Adds richer workflow tooling on top of the same open benchmark submission lane.

EnterprisePaid

Full certification

All three lanes including legal trust, VEMR badge, and SHA-verified entries.

What gets measured

GPU temperature & utilization

NVML-sourced, sampled per second during the measured session.

VRAM pressure & context fit

Actual VRAM usage versus recommended context ceiling per hardware tier.

Server logs & runtime readiness

Log-level metrics confirming stable startup and first-token responsiveness.

Token usage, cost & energy

Session-level cost from electricity rate, token throughput, and kWh consumed.

Interactive Leaderboard UI

Certified runs, filterable by enterprise reality

This view treats a leaderboard entry as a reproducible run package: model, quantization, hardware, cost profile, legal posture, and trust state all move together.

Rank

Run

Trust + legal

Speed

Economics

Fit

Score composition

Every entry is a certified run recipe

A useful leaderboard needs model, quantization, hardware fingerprint, power profile, and trust status — not just a number.

Performance

Tokens/s, TTFT, p95 stability, context-size tested, and quantization-aware runtime behavior.

Efficiency

Tokens per watt, VRAM load, power curve, and hardware fit on common tiers like 8GB, 24GB, or multi-GPU.

Economics

Local cost per 1M tokens, cloud price comparison, amortization-aware TCO, and payback estimates.

Compliance

Commercial-use risk, attribution requirements, checksum verification, and region-aware ESG reporting.

Three lanes

One measured run · three audience views

OperatorCommunity read · Community submit

Performance & efficiency

Throughput (tok/s)
TTFT in ms
p95 stability
VRAM fit by tier
Tokens / watt

ProcurementCommunity read · Community submit

Cost & ROI

Local cost / 1M tokens
Cloud price comparison
Breakeven horizon
Amortization-aware TCO
Hardware payback

LegalEnterprise only

Trust & compliance

License class
Commercial-use safety
SHA checksum state
Attribution requirements
Security posture

Want to submit your own runs?

Benchmark submission is part of the free core edition. Start with Community and publish your measured runs.

Download LlamaMon →