ollama

History

Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 )

When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.

2023-11-24 14:05:57 -05:00

ggml @ 9e232f0234

subprocess llama.cpp server (#401 )

2023-08-30 16:35:03 -04:00

gguf @ 0b871f1a04

update llama.cpp

2023-11-21 09:50:02 -08:00

patches

update llama.cpp

2023-11-21 09:50:02 -08:00

generate_darwin_amd64.go

consistent cpu instructions on macos and linux

2023-11-22 16:26:46 -05:00

generate_darwin_arm64.go

update llama.cpp

2023-11-21 09:50:02 -08:00

generate_linux.go

Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 )

2023-11-24 14:05:57 -05:00

generate_windows.go

restore building runner with AVX on by default (#900 )

2023-10-27 12:13:44 -07:00