How to Keep Your Self-Hosted AI Assistant's Bill in Single Digits
The number one thing that scares people off running their own AI assistant isn’t the setup — it’s the imagined bill. “If it’s calling a frontier model all day, won’t that cost a fortune?”
It won’t, if you stop sending every task to your most expensive model. The trick that keeps a self-hosted assistant in the single digits per month is delegation: matching the model to the job.
Most of what an assistant does is cheap work
Watch what your assistant actually handles in a day and you’ll notice most of it is small:
- “Add milk to the shopping list.”
- “What’s on my calendar tomorrow?”
- Summarizing a note you just dropped in.
- Formatting, extraction, classification, routing.
None of that needs your smartest, priciest model. A small, cheap model does it instantly and for a fraction of a cent. You only need the expensive model for the genuinely hard asks — reasoning over a big document, drafting something nuanced, multi-step planning.
The delegation pattern
The architecture is simple: a router in front, deciding which model handles each request.
- A default cheap model handles the bulk — quick commands, lookups, short replies.
- An escalation path to a stronger model for the small fraction of requests that need it.
- A rule for when to escalate — length, complexity, an explicit “think hard about this,” or a confidence threshold.
The result: your average cost per interaction collapses, because the expensive model only runs when it earns its price. Most days, the bill is a few dollars — and you control both the box (a $0–10/mo home server or VPS) and the model spend.
The two cost traps to avoid
Delegation saves money only if you also avoid the two ways costs quietly balloon:
- The wrong model wired into a busy job. A cron job that runs hourly against your priciest model is how pennies become a statement surprise. Right-size every scheduled task specifically.
- Loops and retries. A retry storm on a paid endpoint can spend real money fast. Cap it, and add an early-warning check so you find out the same day, not at month’s end.
Owning the dial
This is the quiet superpower of self-hosting: you hold the cost dial. A cloud assistant charges you its margin on every token whether the task was trivial or hard. Yours charges you actual cost, and delegation lets you push most of that toward zero.
The full delegation setup — the router, the escalation rules, model selection per task, and the cost guardrails — is Module 7 of Build Your Own Self-Hosted AI Assistant ($29). The guide walks the whole build end to end: a private assistant on your own box, wired into your files, calendar, and email, that works while you sleep — for a few dollars a month.
Want the full build? The Guide covers it end to end →