Streaming
Stream responses token-by-token using server-sent events (SSE). This gives your users a real-time typing experience instead of waiting for the full response.
Enable streaming
Set stream: true in your request:
stream = client.chat.completions.create(
model="...",
messages=[{"role": "user", "content": "Write a poem about the Nile"}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)SSE format
Each event is a JSON object prefixed with data: :
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
The stream ends with data: [DONE].
Node.js streaming
const stream = await client.chat.completions.create({
model: '...',
messages: [{ role: 'user', content: 'Write a haiku about Cairo' }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}Raw HTTP streaming
curl -N https://sovereigneg.com/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "...",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'The -N flag disables curl's output buffering, so you see tokens as they arrive.
Frontend integration
For web applications, use the fetch API with ReadableStream:
const response = await fetch('https://sovereigneg.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-...',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '...',
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.replace('data: ', '');
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices[0]?.delta?.content;
if (content) {
// Append to your UI
document.getElementById('output').textContent += content;
}
}
}Performance notes
- Time to first token (TTFT): Varies by model, prompt size, and routing — measure on your workload.
- Token throughput: Varies by model size and load.
- Streaming adds no overhead — the model generates tokens one at a time regardless.