ollama

History

Daniel Hiltgen 359b15a597 Handle models with divergent layer sizes

The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.

2024-06-18 11:05:34 -07:00

ext_server

Fix server.cpp for the new cuda build macros

2024-06-14 14:51:40 -07:00

generate

Add back lower level parallel flags

2024-06-17 13:44:46 -07:00

llama.cpp @ 7c26775adb

llm: update llama.cpp commit to 7c26775 (#4896 )

2024-06-17 15:56:16 -04:00

patches

llm: update llama.cpp commit to 7c26775 (#4896 )

2024-06-17 15:56:16 -04:00

filetype.go

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322 )

2024-05-23 13:21:49 -07:00

ggla.go

simplify safetensors reading

2024-05-21 11:28:22 -07:00

ggml.go

Improve multi-gpu handling at the limit

2024-06-14 14:51:40 -07:00

gguf.go

Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"

2024-06-11 15:56:17 -07:00

llm_darwin_amd64.go

Switch back to subprocessing for llama.cpp

2024-04-01 16:48:18 -07:00

llm_darwin_arm64.go

Switch back to subprocessing for llama.cpp

2024-04-01 16:48:18 -07:00

llm_linux.go

Switch back to subprocessing for llama.cpp

2024-04-01 16:48:18 -07:00

llm_windows.go

Move nested payloads to installer and zip file on windows

2024-04-23 16:14:47 -07:00

llm.go

revert tokenize ffi (#4761 )

2024-05-31 18:54:21 -07:00

memory_test.go

review comments and coverage

2024-06-14 14:55:50 -07:00

memory.go

Handle models with divergent layer sizes

2024-06-18 11:05:34 -07:00

payload.go

review comments and coverage

2024-06-14 14:55:50 -07:00

server.go

Tighten up memory prediction logging

2024-06-18 09:15:35 -07:00

status.go

Switch back to subprocessing for llama.cpp

2024-04-01 16:48:18 -07:00