← Back to Docs
Operations

Troubleshooting for QuantenRam Operations

When an integration suddenly stops responding, frantic trial and error rarely helps. In QuantenRam, almost any disruption can be traced back to four layers: connection, authentication, model authorization, or billing status. This page walks through this exact sequence, turning a vague error pattern into a cleanly verifiable operational case.

The most important practical principle is therefore: check the contract first, not the prompt. As long as it's unclear whether the request reaches QuantenRam at all, whether the key used is valid, or whether the requested model is visible for this specific access, deeper debugging brings little value. Those who start diagnosis with /v1/models, HTTP status, and the same runtime environment resolve most incidents much faster.

401 and 403 are almost never model problems

A 401 in practice usually means the Bearer header is missing, the API key is outdated, or was never correctly set in the environment. A 403 indicates that while the key is recognized, your account or tier lacks authorization for the requested path or model family.

402 relates to hybrid billing in Start tier

Since the hybrid billing rollout, a 402 is not a classic network problem but a plan or credit signal. Typical situations include exhausted Start budget or activated prepaid overflow without available credit.

429 and 5xx need calm, not rushed action

A 429 indicates temporary protection mechanisms or load spikes and should be handled with backoff, retry, and proper client logic. Temporary 5xx errors are treated similarly: first test with a small retry window, then secure the request context for support.

Narrow down API connection problems first with a minimal test

When nothing works anymore, don't start with a large chat payload but with a very small read test. GET /v1/models is ideal for this because it simultaneously checks network, TLS, authentication, and the visible model list. If no response comes through at all, the problem is almost always before the actual inference. If a response comes, you can subsequently narrow the search to model selection or billing.

curl -i https://quantenram.net/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

A clean test delivers an HTTP status and a JSON response. If the request hangs in DNS, proxy, or TLS, it's not a model problem. In enterprise networks, you often see timeouts, certificate warnings, or blocked CONNECT tunnels at this point. In local everyday use, the most common causes are a wrongly exported key, a shell restart without new environment variables, or a swapped base URL.

Systematically read authentication errors

For auth problems, it's worth strictly separating two questions. First: was a valid Bearer key sent at all? Second: is this key allowed to use the targeted model and access path? This exact separation makes the difference between 401 and 403.

If you work with shell variables, first check the effective value in exactly the shell where the request runs. Especially with multiple terminal profiles, CI jobs, or Windows-WSL combinations, the API key is often only set in one context. Afterwards, the header should be explicitly visible in the test request so that no wrapper or library silently uses a different key.

curl https://quantenram.net/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "quantenram-start/glm-5",
    "messages": [
      {
        "role": "user",
        "content": "Reply only with the word ok."
      }
    ]
  }'

If this minimal request fails with 401, you don't need to think about model names yet. If it fails with 403, your next look isn't at the prompt but at the authorization logic of your plan or team. Especially in multi-user environments, it's then worth calling /v1/models with the same key. If the desired alias family is missing there, the cause is almost always a missing authorization rather than a technical defect.

Cleanly distinguish rate limiting from budget exhaustion

In old RPM thinking, almost every rejection case was prematurely read as a rate limit. In current QuantenRam operations, this equation is wrong. A 429 describes a short-term protection reaction in ongoing traffic. A 402 describes, in the hybrid billing context, an economic state: your included Start cycle budget is reached or your activated prepaid overflow cannot continue due to lack of credit.

This is important because both cases need completely different solutions. A 429 is answered by a robust client with backoff, some patience, and possibly a smaller fallback model. A 402 is not answered with more aggressive retries but with a look at the Plan tab. There you can see whether the Active Work budget or the 14-day Fair-Use budget is exhausted, whether overflow is activated, and whether prepaid balance is still available.

status=$(curl -sS -o response.json -D headers.txt -w "%{http_code}" \
  https://quantenram.net/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d @payload.json)

if [ "$status" = "429" ]; then
  grep -i "retry-after" headers.txt || true
elif [ "$status" = "402" ]; then
  cat response.json
fi

Practically, this means: a 429 is a topic for the client and its load behavior. A 402 is a topic for plan, budget, and possibly credit. This exact distinction prevents teams from investing hours in retry logic when they actually only need to check the Start cycle or prepaid window.

When a model suddenly becomes unavailable

Models in QuantenRam are rolled out, authorized, and adjusted via alias families. Therefore, the question should never be only whether a name looks syntactically correct, but whether it is publicly visible for your exact access. The truth in operations always lies in /v1/models, not in old screenshots, local config files, or a prompt that worked two weeks ago.

This applies especially when a team switches between quantenram-start/*, quantenram-zenmaster/*, and the Coder family. A model can technically exist but be invisible to your key if a different plan, team setup, or environment is in play. Therefore, every incident note should always document the actually used model name, the key context, and the output of /v1/models together.

Correctly classify dashboard, activity, and billing questions

Many apparent billing errors are actually visibility questions. Requests are processed in ongoing operations and subsequently prepared in Activity and Plan. Shortly after a request, there can therefore be a small delay before usage, costs, and budget status are synchronously visible in all views. This is not a contradiction in the system but the normal consequence of inference and telemetry not being the same processing step.

If you see a deviation, first check three things in exactly this order: whether the request was successful, whether it appears in Activity, and whether the Plan tab considers the same time period. In the Start tier, it's additionally important whether your request was still within the included cycle budget or already running via prepaid overflow. Especially there, a rejection sometimes feels like a rate limit when it's actually a budget state.

For questions like "Why wasn't a request executed anymore?" or "Why is the bar decreasing despite few chats?" it's worth thinking not in requests but in cost paths. A single request with large context or deeper reasoning can economically have a stronger effect than several small short dialogues. That's why QuantenRam shows not only activity but also a consumable cost and budget view in hybrid billing.

When and how to contact support

If you still have no clear cause after the minimal test, first collect the reproducible core of the problem and then contact support. For QuantenRam, the shortest direct route is currently a message to @itlerhilfe on Instagram. The more precise your message, the faster the incident can be classified.

Time of error: 2026-03-20 14:37 CET
Affected endpoint: /v1/chat/completions
Used model: quantenram-start/glm-5
HTTP status: 402
Request ID or log hint: if available
Short description: Start budget according to plan reached, overflow off

Especially helpful are timestamp, model ID, HTTP status, affected environment, and information on whether the problem also occurs with /v1/models. Never send complete secrets or unfiltered productive data if a shortened description suffices. For most operational cases, request time, model name, and a short error text are enough to bring the right team to the right place.

The fastest diagnostic sequence in everyday life is almost always the same: first /v1/models, then read HTTP status, then distinguish between auth, model authorization, hybrid billing, and real load situation. Those who follow this sequence usually find the cause faster than with any large debug setup.