diff --git a/docs/api.md b/docs/api.md index 4a8e2f17..486b54c0 100644 --- a/docs/api.md +++ b/docs/api.md @@ -20,6 +20,10 @@ Model names follow a `model:tag` format. Some examples are `orca-mini:3b-q4_1` a All durations are returned in nanoseconds. +### Streams + +Many API responses are streams of JSON objects showing the current status. For examples of working with streams in various languages, see [streaming.md](./streaming.md) + ## Generate a completion ``` diff --git a/docs/streaming.md b/docs/streaming.md new file mode 100644 index 00000000..2b627709 --- /dev/null +++ b/docs/streaming.md @@ -0,0 +1,35 @@ +# Streaming responses in the Ollama Client API + +## JavaScript / TypeScript / Deno + +```javascript +const pull = async () => { + const request = await fetch("http://localhost:11434/api/pull", { + method: "POST", + body: JSON.stringify({ name: "llama2:7b-q5_0" }), + }); + + const reader = await request.body?.pipeThrough(new TextDecoderStream()); + if (!reader) throw new Error("No reader"); + for await (const chunk of reader) { + const out = JSON.parse(chunk); + if (out.status.startsWith("downloading")) { + console.log(`${out.status} - ${(out.completed / out.total) * 100}%`); + } + } +} + +pull(); +``` + +## Python + +```python +import requests +import json +response = requests.post("http://localhost:11434/api/pull", json={"name": "llama2:7b-q5_0"}, stream=True) +for data in response.iter_lines(): + out = json.loads(data) + if "completed" in out: + print(out["completed"] / out["total"] * 100) +```