Embeddings
Generate vector representations of text. Use embeddings for semantic search, RAG pipelines, clustering, and classification.
Endpoint
POST https://sovereigneg.com/v1/embeddings
Request
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Embedding model ID |
input | string/array | Yes | Text to embed (string or array of strings) |
response = client.embeddings.create(
model="Qwen/Qwen3-Embedding-0.6B",
input="What is the meaning of life?"
)
embedding = response.data[0].embedding
print(f"Dimension: {len(embedding)}") # 1024Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, ...]
}
],
"model": "Qwen/Qwen3-Embedding-0.6B",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}Batch embeddings
Embed multiple texts in one call for efficiency:
response = client.embeddings.create(
model="Qwen/Qwen3-Embedding-0.6B",
input=[
"First document about AI in Egypt",
"Second document about renewable energy",
"Third document about ancient history"
]
)
for item in response.data:
print(f"Text {item.index}: {len(item.embedding)} dimensions")Available models
The catalog of currently-live embedding models is at
GET /v1/models?modality=embed (or filter the model list in the
dashboard by Modality = embed). The set evolves; the table below is a
recent snapshot.
| Model | Dimensions | Best for |
|---|---|---|
Qwen/Qwen3-Embedding-0.6B | 1024 | English text, RAG, search |
sentence-transformers/all-MiniLM-L6-v2 | 384 | Lightweight, latency-sensitive |
cURL quick-test
The fastest way to confirm an embedding model is live and your API key is wired correctly:
curl https://sovereigneg.com/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Embedding-0.6B",
"input": "Cairo is the capital of Egypt.",
"encoding_format": "float"
}' | jq '{dim: (.data[0].embedding | length), tokens: .usage.prompt_tokens}'
# expect: { "dim": 1024, "tokens": <small int> }A non-embedding model id will return HTTP 400 with
error.code = "model_not_supported_on_endpoint" — that's the catalog
contract: chat models cannot be called against /v1/embeddings, and
vice-versa.
Use cases
- Semantic search: Find documents by meaning, not just keywords
- RAG pipelines: Retrieve relevant context for chat completions
- Classification: Cluster or categorize text by similarity
- Deduplication: Find near-duplicate content in large datasets