Documentation

Everything you need to run llamamon

Real documentation for the real product — covering hardware setup, model selection, configuration, and monitoring.

Quick Start

Install from signed release binaries and walk through the first-run configuration screen.

Open quick start →

Hardware Guide

Match GPU tiers, VRAM budgets, and practical local inference expectations by hardware class.

Read hardware guide →

Hardware Configuration

Walk through the hardware wizard, auto-detection, and every editable field in the config screen.

Review config guide →

VRAM & Quantization

Use model size and quantization guidance to avoid trial-and-error launches. Includes VRAM formulas.

Explore VRAM notes →

Context Windows

Choose context sizes that fit your hardware and use case. Tradeoff analysis for common model sizes.

See context guide →

HuggingFace Browser

Use the built-in model browser to find and configure HF models directly from within the TUI.

View browser guide →

HuggingFace browser

Find and launch models without leaving the TUI

Search HuggingFace, view quantization variants, and configure model files — all from within llamamon.

Search, filter, and select GGUF variants from HuggingFace directly in the TUI — read the browser guide →

Paid editions

Intelligence layers beyond the core TUI

These modules extend llamamon beyond monitoring into cost intelligence, compliance, and security — Professional is available now, while select Enterprise workflows roll out via Design Partner tracks.

Professional

FinOps Engine

Hardware Projection Engine, cloud price-book (50+ providers), ROI calculations, and breakeven analysis. Translates tok/s into dollar savings vs GPT-4o.

Enterprise