History

Jesse Gross 8db94469e0 runner.go: Support GGUF LoRAs

The current CGo bindings for loading LoRAs only supports the older
GGLA file format, which is no longer supported. This switches to
use functions that load the newer GGUF LoRAs.

2024-09-03 21:15:14 -04:00

README.md

fix issues with runner

2024-09-03 21:15:13 -04:00

runner.go

runner.go: Support GGUF LoRAs

2024-09-03 21:15:14 -04:00

stop_test.go

cleanup stop code

2024-09-03 21:15:13 -04:00

stop.go

cleanup stop code

2024-09-03 21:15:13 -04:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings

TODO

Parallization
More tests