Configure models for stable inference and predictable costs
Models in QuantenRam are provided via alias IDs that decouple your code from upstream provider changes. This abstraction is only valuable when alias and upstream IDs stay consistent, pricing fields are maintained, and release states are intentionally set.
The most common source of unexpected costs is model drift—when an alias suddenly points to a different upstream model with different pricing. In QuantenRam, models have three key properties: the alias ID (what you use in requests), the upstream ID (where the request actually goes), and the pricing (what you pay).
Keep alias and upstream IDs consistent
When you configure quantenram-start/glm-5, you expect it to stay that way. In QuantenRam, alias mappings are versioned and changes are announced. Don't change upstream mappings without updating dependent configurations.
Maintain input/output/cached prices
Pricing isn't just input and output tokens. Cached tokens, reasoning tokens, and special feature tokens all have different rates. Keeping these fields current ensures your cost predictions stay accurate.
Set release state intentionally
Models can be in states: active, preview, deprecated, or disabled. Only active models should be used in production. Preview models are for testing. Deprecated models will be removed soon. Disabled models are temporarily unavailable.
Model selection by tier
QuantenRam organizes models into tiers that reflect their capabilities and costs. Selecting the right tier for the task is crucial for cost-effective operation.
// Model configuration with tier-aware selection
{
"models": {
"routine": {
"tier": "start",
"model": "quantenram-start/glm-5",
"use_for": ["summarization", "simple_qa", "formatting"]
},
"coding": {
"tier": "coding",
"model": "quantenram-coding/qwen3codernext",
"use_for": ["code_generation", "refactoring", "code_review"]
},
"reasoning": {
"tier": "zenmaster",
"model": "quantenram-zenmaster/gpt-5.4",
"use_for": ["architecture", "complex_analysis", "planning"],
"requires_approval": true
}
}
}
The tier-aware selection ensures you don't overpay for simple tasks or underpower complex ones. Start tier handles volume work efficiently. Coding tier provides specialized capabilities for development tasks. Zenmaster tier is reserved for high-value reasoning where quality justifies the cost.
Model fallback and resilience
Production systems need fallback strategies when primary models are unavailable. QuantenRam supports automatic fallback to alternative models within the same tier or across tiers.
// Fallback configuration
{
"primary": "quantenram-coding/qwen3codernext",
"fallbacks": [
{
"model": "quantenram-coding/qwen3.5-9b",
"trigger": "primary_unavailable",
"notify": true
},
{
"model": "quantenram-start/glm-5",
"trigger": "cost_threshold_exceeded",
"threshold": "$0.10/request"
}
]
}
Fallbacks can be triggered by availability, cost thresholds, or latency requirements. The key is that your code continues working even when the primary model has issues.
Model configuration is about predictability. When aliases are stable, pricing is transparent, and fallbacks are in place, you can build reliable systems on top of LLMs.