2334 Commits

Author SHA1 Message Date
Blake Mizerany
d54e0fb3b2 ... 2024-04-03 16:14:22 -07:00
Blake Mizerany
bdd05e0ae0 x/registry: skip ref test 2024-04-03 15:59:23 -07:00
Blake Mizerany
1a346640db x/registry: work on getting basic test passing 2024-04-03 15:58:04 -07:00
Blake Mizerany
f5883070f8 x/registry: upload smoke test passing 2024-04-03 14:30:58 -07:00
Blake Mizerany
adc23d5f96 Add 'x/' from commit 'a10a11b9d371f36b7c3510da32a1d70b74e27bd1'
git-subtree-dir: x
git-subtree-mainline: 7d05a6ee8f44b314fa697a427439e5fa4d78c3d7
git-subtree-split: a10a11b9d371f36b7c3510da32a1d70b74e27bd1
2024-04-03 10:40:23 -07:00
Blake Mizerany
a10a11b9d3 registry: initial work on multipart pushes 2024-04-03 10:39:30 -07:00
Blake Mizerany
7d05a6ee8f
cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino
531324a9be
feat: add OLLAMA_DEBUG in ollama server help message (#3461)
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>
2024-04-02 18:20:03 -07:00
Daniel Hiltgen
6589eb8a8c Revert options as a ref in the server 2024-04-02 16:44:10 -07:00
Michael Yang
a039e383cd
Merge pull request #3465 from ollama/mxyng/fix-metal
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang
80163ebcb5 fix metal gpu 2024-04-02 16:06:45 -07:00
Daniel Hiltgen
a57818d93e
Merge pull request #3343 from dhiltgen/bump_more2
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Blake Mizerany
94befe366a ... 2024-04-02 14:28:06 -07:00
Blake Mizerany
c95f97689b utils/upload: init 2024-04-02 14:15:21 -07:00
Blake Mizerany
618eb5b909 registry: multipart push 2024-04-02 13:40:23 -07:00
Daniel Hiltgen
841adda157 Fix windows lint CI flakiness 2024-04-02 12:22:16 -07:00
Daniel Hiltgen
0035e31af8 Bump to b2581 2024-04-02 11:53:07 -07:00
Blake Mizerany
eb75418be9 build/blob: test ParseRef round-trip 2024-04-02 11:45:01 -07:00
Blake Mizerany
9959da05de build/blob: break out test refs for other tests/fuzzing 2024-04-02 11:38:10 -07:00
Daniel Hiltgen
c863c6a96d
Merge pull request #3218 from dhiltgen/subprocess
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Blake Mizerany
aff7970628 build: remove superfluous parseCompleteRef 2024-04-01 23:41:42 -07:00
Blake Mizerany
628f1feb36 build: back to taking manifests as []byte
Its nicer to have the manifests be an opaque []byte, rather than a
struct. This way users of the build package don't need to know about the
internal structure of the manifests. The registry can interpret the
manifests as it sees fit, while letting build keep its own Go type of
manifest which is easier to work with in the build package.
2024-04-01 23:18:58 -07:00
Blake Mizerany
ce3125afd5 registry: add New and take a minio client as argument 2024-04-01 22:53:49 -07:00
Blake Mizerany
f488652ba7 build: make Build accept only refs without builds 2024-04-01 22:12:43 -07:00
Blake Mizerany
2318ed2919 build: remove unused manifest() 2024-04-01 21:59:38 -07:00
Blake Mizerany
b1b8be33d9 build: cleanup error names and other things 2024-04-01 21:57:34 -07:00
Blake Mizerany
876f7eab81 build: move Manifest from internal/blobstore to build
It was getting confusing to have the arbirary handling of manifests in
the blobstore. It also prevented us from using model.Ref in the
blobstore because of cyclic dependencies.

This is much easier to grok now.
2024-04-01 21:43:30 -07:00
Blake Mizerany
7cfc8a0838 build/blob: fix awkward Ref type 2024-04-01 21:25:18 -07:00
Daniel Hiltgen
1f11b52511 Refined min memory from testing 2024-04-01 16:48:33 -07:00
Daniel Hiltgen
526d4eb204 Release gpu discovery library after use
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process.  This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
0a74cb31d5 Safeguard for noexec
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
10ed1b6292 Detect too-old cuda driver
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
4fec5816d6 Integration test improvements
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
0a0e9f3e0f Apply 01-cache.diff 2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion (#3422) 2024-04-01 16:14:53 -07:00
Michael Yang
d6dd2ff839
Merge pull request #3241 from ollama/mxyng/mem
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang
e57a6ba89f
Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang
12ec2346ef
Merge pull request #3442 from ollama/mxyng/generate-output
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang
1ec0df1069 fix generate output 2024-04-01 13:47:34 -07:00
Michael Yang
91b3e4d282 update memory calcualtions
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492 refactor model parsing 2024-04-01 13:16:15 -07:00
Philipp Gillé
011bb67351
Add chromem-go to community integrations (#3437) 2024-04-01 11:17:37 -04:00
Saifeddine ALOUI
d124627202
Update README.md (#3436) 2024-04-01 11:16:31 -04:00
Jesse Zhang
b0a8246a69
Community Integration: CRAG Ollama Chat (#3423)
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗

Support: 
- Ollama
- OpenAI APIs
2024-04-01 11:16:14 -04:00
Blake Mizerany
fd411b3cf6 registry: commit Manifest 2024-03-31 18:20:19 -07:00
Blake Mizerany
04f38cf3f4 registry: commit manifest on successful /v1/push 2024-03-31 15:09:24 -07:00
Blake Mizerany
c0eddb10fd registry: use exact match on path 2024-03-31 15:01:26 -07:00
Blake Mizerany
60ef0e6b4a oweb: remove Fault
Also, fix typo in the comment.
2024-03-31 15:00:25 -07:00