ollama

History

Bruce MacDonald a897e833b8

do not cache prompt (#2018 )

- prompt cache causes inferance to hang after some time

2024-01-16 13:48:05 -05:00

ext_server

Disable mmap with lora layers (#1985 )

2024-01-13 23:36:31 -05:00

generate

Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection

2024-01-14 18:00:11 -08:00

llama.cpp @ 328b83de23

revert submodule back to 328b83de23b33240e28f4e74900d1d06726f5eb1

2024-01-10 18:42:39 -05:00

dyn_ext_server.c

Always dynamically load the llm server library

2024-01-11 08:42:47 -08:00

dyn_ext_server.go

do not cache prompt (#2018 )

2024-01-16 13:48:05 -05:00

dyn_ext_server.h

Always dynamically load the llm server library

2024-01-11 08:42:47 -08:00

ggml.go

add max context length check

2024-01-12 14:54:07 -08:00

gguf.go

add max context length check

2024-01-12 14:54:07 -08:00

llama.go

remove unused fields and functions

2024-01-09 09:37:40 -08:00

llm.go

add max context length check

2024-01-12 14:54:07 -08:00

payload_common.go

Merge pull request #1935 from dhiltgen/cpu_fallback

2024-01-11 15:52:32 -08:00

payload_darwin.go

Always dynamically load the llm server library

2024-01-11 08:42:47 -08:00

payload_linux.go

Always dynamically load the llm server library

2024-01-11 08:42:47 -08:00

payload_test.go

Fix up the CPU fallback selection

2024-01-11 15:27:06 -08:00

payload_windows.go

Always dynamically load the llm server library

2024-01-11 08:42:47 -08:00

utils.go

partial decode ggml bin for more info

2023-08-10 09:23:10 -07:00