ollama

History

Daniel Hiltgen 7784ca33ce Tighten up memory prediction logging

Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.

2024-06-18 09:15:35 -07:00

ext_server

Fix server.cpp for the new cuda build macros

2024-06-14 14:51:40 -07:00

generate

Add back lower level parallel flags

2024-06-17 13:44:46 -07:00

llama.cpp @ 7c26775adb

llm: update llama.cpp commit to 7c26775 (#4896 )

2024-06-17 15:56:16 -04:00

patches

llm: update llama.cpp commit to 7c26775 (#4896 )