Logo
Explore Help
Sign In
third-party-mirrors/ollama
1
0
Fork 1
You've already forked ollama
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
ollama/llm
History
Jeffrey Morgan a50a87a7b8
partial offloading: allow flash attention and disable mmap (#4734)
* partial offloading: allow flash attention and disable mmap

* allow mmap with num_gpu=0
2024-05-30 16:58:01 -07:00
..
ext_server
rm unused infill
2024-05-29 11:26:47 -07:00
generate
Merge pull request #3278 from zhewang1-intc/rebase_ollama_main
2024-05-28 16:30:50 -07:00
llama.cpp @ 5921b8f089
Update llama.cpp submodule to 5921b8f0 (#4731)
2024-05-30 16:20:22 -07:00
patches
Update llama.cpp submodule to 5921b8f0 (#4731)
2024-05-30 16:20:22 -07:00
filetype.go
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322)
2024-05-23 13:21:49 -07:00
ggla.go
simplify safetensors reading
2024-05-21 11:28:22 -07:00
ggml.go
Update llm/ggml.go
2024-05-24 16:10:43 -07:00
gguf.go
simplify safetensors reading
2024-05-21 11:28:22 -07:00
llm_darwin_amd64.go
…
llm_darwin_arm64.go
…
llm_linux.go
…
llm_windows.go
Move nested payloads to installer and zip file on windows
2024-04-23 16:14:47 -07:00
llm.go
use ffi for tokenizing/detokenizing
2024-05-29 11:26:47 -07:00
memory.go
Move envconfig and consolidate env vars (#4608)
2024-05-24 14:57:15 -07:00
payload.go
Move nested payloads to installer and zip file on windows
2024-04-23 16:14:47 -07:00
server.go
partial offloading: allow flash attention and disable mmap (#4734)
2024-05-30 16:58:01 -07:00
status.go
…
Powered by Gitea Version: 1.23.4 Page: 1608ms Template: 103ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API