ollama/docs/api.md
Matt Williams e2389b63aa add examples of streaming in python and node
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-09-14 07:12:09 -07:00

5.6 KiB

API

Endpoints

Conventions

Model names

Model names follow a model:tag format. Some examples are orca-mini:3b-q4_1 and llama2:70b. The tag is optional and if not provided will default to latest. The tag is used to identify a specific version.

Durations

All durations are returned in nanoseconds.

Streams

Many API responses are streams of JSON objects showing the current status. For examples of working with streams in various languages, see streaming.md

Generate a completion

POST /api/generate

Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.

Parameters

  • model: (required) the model name
  • prompt: the prompt to generate a response for

Advanced parameters:

  • options: additional model parameters listed in the documentation for the Modelfile such as temperature
  • system: system prompt to (overrides what is defined in the Modelfile)
  • template: the full prompt or prompt template (overrides what is defined in the Modelfile)
  • context: the context parameter returned from a previous request to /generate, this can be used to keep a short conversational memory

Request

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'

Response

A stream of JSON objects:

{
  "model": "llama2:7b",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
  "done": false
}

The final response in the stream also includes additional data about the generation:

  • total_duration: time spent generating the response
  • load_duration: time spent in nanoseconds loading the model
  • sample_count: number of samples generated
  • sample_duration: time spent generating samples
  • prompt_eval_count: number of tokens in the prompt
  • prompt_eval_duration: time spent in nanoseconds evaluating the prompt
  • eval_count: number of tokens the response
  • eval_duration: time in nanoseconds spent generating the response
  • context: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory

To calculate how fast the response is generated in tokens per second (token/s), divide eval_count / eval_duration.

{
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "context": [1, 2, 3],
  "done": true,
  "total_duration": 5589157167,
  "load_duration": 3013701500,
  "sample_count": 114,
  "sample_duration": 81442000,
  "prompt_eval_count": 46,
  "prompt_eval_duration": 1160282000,
  "eval_count": 113,
  "eval_duration": 1325948000
}

Create a Model

POST /api/create

Create a model from a Modelfile

Parameters

  • name: name of the model to create
  • path: path to the Modelfile

Request

curl -X POST http://localhost:11434/api/create -d '{
  "name": "mario",
  "path": "~/Modelfile"
}'

Response

A stream of JSON objects. When finished, status is success

{
  "status": "parsing modelfile"
}

List Local Models

GET /api/tags

List models that are available locally.

Request

curl http://localhost:11434/api/tags

Response

{
  "models": [
    {
      "name": "llama2:7b",
      "modified_at": "2023-08-02T17:02:23.713454393-07:00",
      "size": 3791730596
    },
    {
      "name": "llama2:13b",
      "modified_at": "2023-08-08T12:08:38.093596297-07:00",
      "size": 7323310500
    }
  ]
}

Copy a Model

POST /api/copy

Copy a model. Creates a model with another name from an existing model.

Request

curl http://localhost:11434/api/copy -d '{
  "source": "llama2:7b",
  "destination": "llama2-backup"
}'

Delete a Model

DELETE /api/delete

Delete a model and its data.

Parameters

  • model: model name to delete

Request

curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama2:13b"
}'

Pull a Model

POST /api/pull

Download a model from a the model registry. Cancelled pulls are resumed from where they left off, and multiple calls to will share the same download progress.

Parameters

  • name: name of the model to pull

Request

curl -X POST http://localhost:11434/api/pull -d '{
  "name": "llama2:7b"
}'

Response

{
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208
}

Generate Embeddings

POST /api/embeddings

Generate embeddings from a model

Parameters

  • model: name of model to generate embeddings from
  • prompt: text to generate embeddings for

Advanced parameters:

  • options: additional model parameters listed in the documentation for the Modelfile such as temperature

Request

curl -X POST http://localhost:11434/api/embeddings -d '{
  "model": "llama2:7b",
  "prompt": "Here is an article about llamas..."
}'

Response

{
  "embeddings": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
}