Ollama Management
| Name | Host | Status | Models | Actions |
Models
| Model | Desc | Type | Size | Available | Best Speed | Action |
Official API Key
API Documentation
Base URL:
OpenAI Format
Anthropic Format
POST /v1/chat/completions
OpenAI 兼容聊天补全接口,支持流式输出。
请求参数
| 参数 | 类型 | 必填 | 默认值 | 说明 |
model | string | 是 | - | 模型名称,如 llama3.2、kimi-k2.6:cloud |
messages | array | 是 | - | 对话历史,每项含 role 和 content |
temperature | float | 否 | 0.5 | 温度 (0-2),控制随机性 |
max_tokens | int | 否 | 无限制 | 最大生成 token 数 |
stream | bool | 否 | false | 是否流式输出 |
top_p | float | 否 | - | Top-p 采样 (0-1) |
frequency_penalty | float | 否 | - | 频率惩罚 (-2 到 2) |
presence_penalty | float | 否 | - | 存在惩罚 (-2 到 2) |
repetition_penalty | float | 否 | 1.2 | 重复惩罚 (Ollama 特有) |
stop | array | 否 | - | 停止序列列表 |
请求示例
curl ${location.origin}/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}'
响应格式
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"model": "llama3.2",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 5, "completion_tokens": 10, "total_tokens": 15}
}
流式响应 (stream: true)
data: {"choices":[{"delta":{"role":"assistant"}}]}
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: {"choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]
扩展:支持推理的模型会发送 reasoning_content 字段:
data: {"choices":[{"delta":{"reasoning_content":"让我思考..."}}]}
GET /v1/models
curl ${location.origin}/v1/models \
-H "Authorization: Bearer sk-your-api-key"
{
"object": "list",
"data": [
{"id": "llama3.2", "object": "model", "owned_by": "ollama"},
{"id": "kimi-k2.6:cloud", "object": "model", "owned_by": "cloud"}
]
}
Python SDK 示例
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="${location.origin}/v1",
)
# 非流式
resp = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}],
temperature=0.7,
max_tokens=1024,
)
print(resp.choices[0].message.content)
# 流式
stream = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
if chunk.choices[0].delta.reasoning_content:
print(chunk.choices[0].delta.reasoning_content, end="")
POST /v1/messages
Anthropic Messages API 兼容接口,支持流式输出和扩展思考。
请求头
Authorization: Bearer sk-your-api-key
Content-Type: application/json
anthropic-version: 2023-06-01
或使用 x-api-key: sk-your-api-key 代替 Authorization
请求参数
| 参数 | 类型 | 必填 | 默认值 | 说明 |
model | string | 是 | - | 模型名称 |
messages | array | 是 | - | 对话历史,每项含 role 和 content |
max_tokens | int | 是 | 1024 | 最大生成 token 数 |
temperature | float | 否 | 0.5 | 温度 (0-1) |
top_p | float | 否 | - | Top-p 采样 (0-1) |
top_k | int | 否 | - | Top-k 采样 |
stream | bool | 否 | false | 是否流式输出 |
stop_sequences | array | 否 | - | 停止序列列表 |
system | string | 否 | - | 系统提示词 |
thinking | object | 否 | - | 扩展思考配置,如 {"type": "enabled", "budget_tokens": 10000} |
请求示例
curl ${location.origin}/v1/messages \
-H "Authorization: Bearer sk-your-api-key" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-sonnet",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}],
"system": "你是一个友好的助手",
"temperature": 0.7
}'
响应格式
{
"id": "msg-xxx",
"type": "message",
"role": "assistant",
"model": "claude-3-sonnet",
"content": [{"type": "text", "text": "..."}],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {"input_tokens": 10, "output_tokens": 15}
}
流式响应 (stream: true)
完整 SSE 事件序列:
event: message_start
data: {"type":"message_start","message":{"id":"msg-xxx","role":"assistant","content":[],"usage":{"input_tokens":10}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":15}}
event: message_stop
data: {"type":"message_stop"}
扩展思考 (thinking_delta)
支持推理的模型会先发送 thinking 块:
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"让我分析..."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"答案..."}}
Python SDK 示例
import anthropic
client = anthropic.Anthropic(
api_key="sk-your-api-key",
base_url="${location.origin}",
)
# 非流式
msg = client.messages.create(
model="claude-3-sonnet",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
system="你是一个友好的助手",
)
print(msg.content[0].text)
# 流式
with client.messages.stream(
model="claude-3-sonnet",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
) as stream:
for event in stream:
if event.type == "content_block_delta":
if event.delta.type == "thinking_delta":
print(event.delta.thinking, end="")
elif event.delta.type == "text_delta":
print(event.delta.text, end="")
Invite Codes
| Code | Uses | Max | Expires | Status | Actions |