History

Jesse Gross 312d9de1d1 llama: Improve error handling

Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.

2024-11-02 13:37:55 -07:00

cache_test.go

runner.go: Better abstract vision model integration

2024-10-30 14:53:43 -07:00

cache.go

runner.go: Better abstract vision model integration

2024-10-30 14:53:43 -07:00

image_test.go

runner.go: Better abstract vision model integration

2024-10-30 14:53:43 -07:00

image.go

llama: Improve error handling

2024-11-02 13:37:55 -07:00

README.md

Re-introduce the llama package (#5034 )

2024-10-08 08:53:54 -07:00

requirements.go

Re-introduce the llama package (#5034 )

2024-10-08 08:53:54 -07:00

runner.go

llama: Improve error handling

2024-11-02 13:37:55 -07:00

stop_test.go

runner.go: Handle truncation of tokens for stop sequences

2024-10-09 20:39:04 -07:00

stop.go

runner.go: Handle truncation of tokens for stop sequences

2024-10-09 20:39:04 -07:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings