History

Jesse Gross 8aa97b5e83 llama.go: Advance though tokens when processing multiple batches

If the number of input tokens exceeds the size of the batch, multiple
batches will be submitted but they will all contain the first tokens.
This processes the input tokens as expected so that each batch has
the next set of tokens.

2024-09-03 21:15:14 -04:00

README.md

fix issues with runner

2024-09-03 21:15:13 -04:00

runner.go

llama.go: Advance though tokens when processing multiple batches

2024-09-03 21:15:14 -04:00

stop_test.go

cleanup stop code

2024-09-03 21:15:13 -04:00

stop.go

cleanup stop code

2024-09-03 21:15:13 -04:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings

TODO

Parallization
More tests