5.6 KiB
API
Endpoints
- Generate a completion
- Create a model
- List local models
- Copy a model
- Delete a model
- Pull a model
- Generate embeddings
Conventions
Model names
Model names follow a model:tag
format. Some examples are orca-mini:3b-q4_1
and llama2:70b
. The tag is optional and if not provided will default to latest
. The tag is used to identify a specific version.
Durations
All durations are returned in nanoseconds.
Streams
Many API responses are streams of JSON objects showing the current status. For examples of working with streams in various languages, see streaming.md
Generate a completion
POST /api/generate
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
Parameters
model
: (required) the model nameprompt
: the prompt to generate a response for
Advanced parameters:
options
: additional model parameters listed in the documentation for the Modelfile such astemperature
system
: system prompt to (overrides what is defined in theModelfile
)template
: the full prompt or prompt template (overrides what is defined in theModelfile
)context
: the context parameter returned from a previous request to/generate
, this can be used to keep a short conversational memory
Request
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2:7b",
"prompt": "Why is the sky blue?"
}'
Response
A stream of JSON objects:
{
"model": "llama2:7b",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
The final response in the stream also includes additional data about the generation:
total_duration
: time spent generating the responseload_duration
: time spent in nanoseconds loading the modelsample_count
: number of samples generatedsample_duration
: time spent generating samplesprompt_eval_count
: number of tokens in the promptprompt_eval_duration
: time spent in nanoseconds evaluating the prompteval_count
: number of tokens the responseeval_duration
: time in nanoseconds spent generating the responsecontext
: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
To calculate how fast the response is generated in tokens per second (token/s), divide eval_count
/ eval_duration
.
{
"model": "llama2:7b",
"created_at": "2023-08-04T19:22:45.499127Z",
"context": [1, 2, 3],
"done": true,
"total_duration": 5589157167,
"load_duration": 3013701500,
"sample_count": 114,
"sample_duration": 81442000,
"prompt_eval_count": 46,
"prompt_eval_duration": 1160282000,
"eval_count": 113,
"eval_duration": 1325948000
}
Create a Model
POST /api/create
Create a model from a Modelfile
Parameters
name
: name of the model to createpath
: path to the Modelfile
Request
curl -X POST http://localhost:11434/api/create -d '{
"name": "mario",
"path": "~/Modelfile"
}'
Response
A stream of JSON objects. When finished, status
is success
{
"status": "parsing modelfile"
}
List Local Models
GET /api/tags
List models that are available locally.
Request
curl http://localhost:11434/api/tags
Response
{
"models": [
{
"name": "llama2:7b",
"modified_at": "2023-08-02T17:02:23.713454393-07:00",
"size": 3791730596
},
{
"name": "llama2:13b",
"modified_at": "2023-08-08T12:08:38.093596297-07:00",
"size": 7323310500
}
]
}
Copy a Model
POST /api/copy
Copy a model. Creates a model with another name from an existing model.
Request
curl http://localhost:11434/api/copy -d '{
"source": "llama2:7b",
"destination": "llama2-backup"
}'
Delete a Model
DELETE /api/delete
Delete a model and its data.
Parameters
model
: model name to delete
Request
curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama2:13b"
}'
Pull a Model
POST /api/pull
Download a model from a the model registry. Cancelled pulls are resumed from where they left off, and multiple calls to will share the same download progress.
Parameters
name
: name of the model to pull
Request
curl -X POST http://localhost:11434/api/pull -d '{
"name": "llama2:7b"
}'
Response
{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208
}
Generate Embeddings
POST /api/embeddings
Generate embeddings from a model
Parameters
model
: name of model to generate embeddings fromprompt
: text to generate embeddings for
Advanced parameters:
options
: additional model parameters listed in the documentation for the Modelfile such astemperature
Request
curl -X POST http://localhost:11434/api/embeddings -d '{
"model": "llama2:7b",
"prompt": "Here is an article about llamas..."
}'
Response
{
"embeddings": [
0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
]
}