API

Understanding the QuantenRam API before you integrate it

The QuantenRam API is deliberately built to be OpenAI-compatible so that you do not have to learn a new ecosystem just to address multiple model families cleanly. The key idea behind it is simplicity: a familiar request shape, a familiar response structure, but a model and hosting strategy tailored to QuantenRam.

OpenAI compatibility here is not a marketing label, but an integration decision. Many teams already have SDKs, helper functions, proxies, or internal wrappers built around OpenAI-style payloads. When QuantenRam adopts that same baseline contract, the switching barrier drops sharply: what would otherwise be a large migration often becomes only a new base URL, a new bearer key, and a decision about an alias model. That is exactly why QuantenRam feels familiar to developers even though multiple providers, hosting forms, and product lines are converging behind the scenes.

Base URL and authentication

The public entry point for the API is https://quantenram.net/v1. Every request is authenticated with a bearer token. This is intentionally kept simple because API access in everyday work should not be complicated, but predictable. When an integration fails, you want to be able to check authentication, model ID, and payload first instead of fighting through several proprietary handshakes.

Authorization: Bearer YOUR_API_KEY

In practice, that means you store your key once in the runtime environment and then send it with every request. If you work across multiple environments, you should keep test, staging, and production keys consciously separated. That way, it stays clear which models and permissions are available in which place.

The two most important entry points

`POST /v1/chat/completions`

This endpoint is the standard path for text-based inference. Here you send the model ID, the message history, and optional parameters such as temperature, top_p, max_tokens, or stream. The endpoint is central because almost every productive use of QuantenRam starts here.

`GET /v1/models`

The model list is the point of orientation for available alias models. It matters because QuantenRam deliberately works with product aliases. Instead of needing to know internal provider details, you ask the platform which models are visible for your access and align your application accordingly.

curl https://quantenram.net/v1/models   -H "Authorization: Bearer YOUR_API_KEY"

For day-to-day API work, this is an incredibly simple diagnostic path. Before renaming model identifiers or guessing at tier issues, you can use exactly this request to verify which alias list your current access actually sees.

How a request is structured

A typical request mainly needs two things: the desired alias model and a message list in the familiar role-based format. This shape is not only compatible, but also intentionally easy to read. That allows developers to debug, prompt, and version payloads quickly without learning a new syntax.

{
  "model": "quantenram-start/deepseek-chat",
  "messages": [
    {
      "role": "system",
      "content": "Du bist ein praeziser technischer Assistent."
    },
    {
      "role": "user",
      "content": "Fasse den Vorteil einer einheitlichen LLM-API in drei Saetzen zusammen."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 300,
  "stream": false
}

Why this format matters becomes especially clear in larger projects. As soon as prompts, system rules, and tooling are used across multiple services, you do not just want to send a request, you want to treat it as a readable contract object. That is exactly where the OpenAI-compatible structure helps: it is widely used, easy to document, and technically straightforward to pass along.

What responses typically look like

A successful response contains the generated message as usual under choices[0].message.content as well as a usage object for token visibility. This matters because QuantenRam does not just return text; it also makes usage traceable. Especially for teams and production applications, a good answer only becomes good operations when resource consumption and model choice remain visible as well.

{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "quantenram-start/deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Eine einheitliche LLM-API reduziert Integrationsaufwand und macht Modellwechsel einfacher. Teams koennen dadurch bessere Modelle einsetzen, ohne ihre Anwendung fuer jeden Anbieter neu anzupassen. Gleichzeitig werden Datenschutz- und Kostenentscheidungen klarer steuerbar."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "completion_tokens": 41,
    "total_tokens": 84
  }
}

Error handling: why the basics matter more than edge cases

Most API problems in daily work are not exotic edge cases, but recurring baseline situations. A 401 response usually means the bearer token is missing or invalid. A 403 response often indicates that your access is not enabled for a specific model or tier. A 404 response or a corresponding error message often shows that the requested alias model is not available for the key you are using. 429 and temporary 5xx errors, by contrast, signal that your client should work with backoff, retry logic, or a fallback model.

401 / 403

Check the key first and only then the tier or model enablement. That is almost always faster than searching through complex payload details.

404

Compare the model name you used with the visible alias list. In QuantenRam, the request syntax is usually not the problem; the model choice for the active access is.

429 / 5xx

Build a calm failure path with retry and backoff. Anyone who wants production robustness should treat these responses as a normal part of operations, not as exceptional events.

Code examples for getting started

The following examples show the same API contract in three common forms. That is where the value of OpenAI compatibility becomes especially visible: the logic stays the same even when the language or client changes.

curl

curl https://quantenram.net/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "quantenram-start/deepseek-chat",
    "messages": [
      {
        "role": "user",
        "content": "Erklaere mir den Unterschied zwischen Start und Zenmaster."
      }
    ]
  }'

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://quantenram.net/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="quantenram-start/deepseek-chat",
    messages=[
        {
            "role": "user",
            "content": "Erklaere mir den Unterschied zwischen Start und Zenmaster.",
        }
    ],
)

print(response.choices[0].message.content)

JavaScript

const response = await fetch("https://quantenram.net/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "quantenram-start/deepseek-chat",
    messages: [
      {
        role: "user",
        content: "Erklaere mir den Unterschied zwischen Start und Zenmaster."
      }
    ]
  })
});

if (!response.ok) {
  throw new Error(`QuantenRam request failed with ${response.status}`);
}

const data = await response.json();
console.log(data.choices[0].message.content);

The most important integration idea is therefore not "How do I talk to this one model?" but "How do I build my application so that the same contract can carry multiple model and hosting options?" That is exactly what the QuantenRam API is designed for.