POST
/
api
/
v1
/
chat
/
completions
cURL
curl --request POST \
  --url http://studio.premai.io/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "messages": [
    {
      "role": "system",
      "content": "<string>",
      "name": "<string>",
      "tool_calls": [
        {
          "id": "<string>",
          "type": "function",
          "function": {
            "name": "<string>",
            "arguments": "<string>"
          }
        }
      ],
      "tool_call_id": "<string>"
    }
  ],
  "model": "<string>"
}'
{
  "id": "<string>",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 1,
      "message": {
        "content": "<string>",
        "role": "assistant",
        "name": "<string>",
        "tool_calls": [
          {
            "id": "<string>",
            "type": "function",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            }
          }
        ],
        "tool_call_id": "<string>"
      }
    }
  ],
  "created": 123,
  "model": "<string>",
  "system_fingerprint": "<string>",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 1,
    "prompt_tokens": 1,
    "total_tokens": 1
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
messages
object[]
required

An array of messages comprising the conversation so far. Must contain at least one message. System messages are only allowed as the first message.

Minimum length: 1
model
string
required

The identifier of the model to use for generating completions. This can be a model ID or an alias.

frequency_penalty
number
default:0

A value between -2.0 and 2.0 that penalizes new tokens based on their frequency in the text so far. Higher values decrease the likelihood of the model repeating the same tokens.

Required range: -2 <= x <= 2
max_completion_tokens
integer | null

The maximum number of tokens to generate in the completion. If null, will use the model's maximum context length. This is the maximum number of tokens that will be generated.

Required range: x > 0
presence_penalty
number
default:0

A value between -2.0 and 2.0 that penalizes new tokens based on whether they appear in the text so far. Higher values increase the likelihood of the model talking about new topics.

Required range: -2 <= x <= 2
seed
integer

A seed value for deterministic sampling. Using the same seed with the same parameters will generate the same completion.

stop

One or more sequences where the API will stop generating further tokens. Can be a single string or an array of strings.

stream
boolean
default:false

If true, partial message deltas will be sent as server-sent events. Useful for showing progressive generation in real-time.

temperature
number | null
default:0.7

Controls randomness in the model's output. Values between 0 and 2. Lower values make the output more focused and deterministic, higher values make it more random and creative.

Required range: 0 <= x <= 2
top_p
number | null
default:1

An alternative to temperature for controlling randomness. Controls the cumulative probability of tokens to consider. Lower values make output more focused.

Required range: 0 <= x <= 1
response_format
object

Specifies the format of the model's output. Use "json_schema" to constrain responses to valid JSON matching the provided schema.

tools
(any | null)[]

A list of tools the model may call. Each tool has a specific function the model can use to achieve specific tasks.

tool_choice

Controls how the model uses tools. "none" disables tools, "auto" lets the model decide, or specify a particular tool configuration.

Available options:
none

Response

chat completion response

id
string
required

A unique identifier for this chat completion response. Can be used for tracking or debugging.

choices
object[]
required

An array of completion choices. Each choice represents a possible completion for the input prompt, though currently only one choice is typically returned.

created
integer
required

The Unix timestamp (in seconds) indicating when this completion was generated by the API.

model
string
required

The specific model used to generate this completion. This will be the model's full identifier string.

object
enum<string>
required

The type of object returned, always "chat.completion" for chat completion responses.

Available options:
chat.completion
system_fingerprint
string | null

A unique identifier for the system state that generated this response. Useful for tracking model behavior across requests.

usage
object

Statistics about token usage for this request and response. May be omitted in error cases or when not available.