Everything you need to run llamamon
Real documentation for the real product — covering hardware setup, model selection, configuration, and monitoring.
Quick Start
Install from signed release binaries and walk through the first-run configuration screen.
Open quick start →Hardware Guide
Match GPU tiers, VRAM budgets, and practical local inference expectations by hardware class.
Read hardware guide →Hardware Configuration
Walk through the hardware wizard, auto-detection, and every editable field in the config screen.
Review config guide →VRAM & Quantization
Use model size and quantization guidance to avoid trial-and-error launches. Includes VRAM formulas.
Explore VRAM notes →Context Windows
Choose context sizes that fit your hardware and use case. Tradeoff analysis for common model sizes.
See context guide →HuggingFace Browser
Use the built-in model browser to find and configure HF models directly from within the TUI.
View browser guide →Find and launch models without leaving the TUI
Search HuggingFace, view quantization variants, and configure model files — all from within llamamon.
Intelligence layers beyond the core TUI
These modules extend llamamon beyond monitoring into cost intelligence, compliance, and security — Professional is available now, while select Enterprise workflows roll out via Design Partner tracks.
FinOps Engine
Hardware Projection Engine, cloud price-book (50+ providers), ROI calculations, and breakeven analysis. Translates tok/s into dollar savings vs GPT-4o.
ESG Auditor
Session-level carbon intensity (Our World in Data, 200+ countries), ISO 14064 / EU CSRD-compliant PDF exports, and energy efficiency reporting.
Security Sentry
Real-time network exfiltration monitoring for the llama-server process, with enterprise-focused trust indicators and policy workflows for CISOs.
VEMR Registry
Verified Enterprise Model Registry — license class, hardware fit score, SHA-256 checksums, and commercial-use safety, curated and maintained.