Jesse Gross fcbf5f5e51 runner.go: Use stable llama.cpp sampling interface
Currently for sampling we are using an internal interface for the
llama.cpp examples, which tends to change from release to release.
This is the only such interface used for text models, though llava
and clip are also used for image processing.

This switches to use the stable interfaces, reducing the amount of
work needed for future llama.cpp bumps. It also significantly
reduces the amount of code that we need to vendor (much of it is
unused but is a dependency).

The sampling logic is the same as it is now for the parameters that
we support and is done at the CGo layer. However, in the future if
there are benefits to reconfiguring it then we can expose the
primatives to native Go code.
2024-11-04 14:14:41 -08:00
..