This commit aims to provide the Ollama maintainers with maximum control
of the distribution build process by creating a cross-platform shim.
Currently, we have no flexibility, or control of the process (pre and
post) or even the quality of the build.
By introducing a shim, and propagating it out to Homebrew, et al., we
can soon after ensure that the build process is consistent, and
reliable.
This also happens to remove the requirement for go generate and the
build tag hacks, but it does still support go generate in the flow, at
least until we can remove it after the major distribution use the new
build process.
About the script:
Beyond giving the Ollama maintainers drastically more control over the
build process, the script also provides a few other benefits:
- It is cross-platform, and can be run on any platform that supports Go
(a hard requirement for building Ollama anyway).
- It can can check for correct versions of cmake, and other dependencies
before starting the build process, and provide helpful error messages
to the user if they are not met.
- It can be used to build the distribution for any platform,
architecture, or build type (debug, release, etc.) with a single
command. Currently, it is two commands.
- It can skip parts of the build process if they are already done, such
as build the C dependencies. Of course there is a -f flag to force
rebuild.
- So much more!
We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.
Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.
This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.
This script also provides nicer feedback to the user about what is
happening during the build process.
At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.
We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.
For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
Even though we weren't setting it to on, somewhere in the cmake config
it was getting toggled on. By explicitly setting it to off, we get `/arch:AVX`
as intended.