Blake Mizerany
d54e0fb3b2
...
2024-04-03 16:14:22 -07:00
Blake Mizerany
bdd05e0ae0
x/registry: skip ref test
2024-04-03 15:59:23 -07:00
Blake Mizerany
1a346640db
x/registry: work on getting basic test passing
2024-04-03 15:58:04 -07:00
Blake Mizerany
f5883070f8
x/registry: upload smoke test passing
2024-04-03 14:30:58 -07:00
Blake Mizerany
adc23d5f96
Add 'x/' from commit 'a10a11b9d371f36b7c3510da32a1d70b74e27bd1'
...
git-subtree-dir: x
git-subtree-mainline: 7d05a6ee8f44b314fa697a427439e5fa4d78c3d7
git-subtree-split: a10a11b9d371f36b7c3510da32a1d70b74e27bd1
2024-04-03 10:40:23 -07:00
Blake Mizerany
a10a11b9d3
registry: initial work on multipart pushes
2024-04-03 10:39:30 -07:00
Blake Mizerany
7d05a6ee8f
cmd: provide feedback if OLLAMA_MODELS is set on non-serve command ( #3470 )
...
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
...
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino
531324a9be
feat: add OLLAMA_DEBUG in ollama server help message ( #3461 )
...
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>
2024-04-02 18:20:03 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Michael Yang
a039e383cd
Merge pull request #3465 from ollama/mxyng/fix-metal
...
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang
80163ebcb5
fix metal gpu
2024-04-02 16:06:45 -07:00
Daniel Hiltgen
a57818d93e
Merge pull request #3343 from dhiltgen/bump_more2
...
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Blake Mizerany
94befe366a
...
2024-04-02 14:28:06 -07:00
Blake Mizerany
c95f97689b
utils/upload: init
2024-04-02 14:15:21 -07:00
Blake Mizerany
618eb5b909
registry: multipart push
2024-04-02 13:40:23 -07:00
Daniel Hiltgen
841adda157
Fix windows lint CI flakiness
2024-04-02 12:22:16 -07:00
Daniel Hiltgen
0035e31af8
Bump to b2581
2024-04-02 11:53:07 -07:00
Blake Mizerany
eb75418be9
build/blob: test ParseRef round-trip
2024-04-02 11:45:01 -07:00
Blake Mizerany
9959da05de
build/blob: break out test refs for other tests/fuzzing
2024-04-02 11:38:10 -07:00
Daniel Hiltgen
c863c6a96d
Merge pull request #3218 from dhiltgen/subprocess
...
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Blake Mizerany
aff7970628
build: remove superfluous parseCompleteRef
2024-04-01 23:41:42 -07:00
Blake Mizerany
628f1feb36
build: back to taking manifests as []byte
...
Its nicer to have the manifests be an opaque []byte, rather than a
struct. This way users of the build package don't need to know about the
internal structure of the manifests. The registry can interpret the
manifests as it sees fit, while letting build keep its own Go type of
manifest which is easier to work with in the build package.
2024-04-01 23:18:58 -07:00
Blake Mizerany
ce3125afd5
registry: add New and take a minio client as argument
2024-04-01 22:53:49 -07:00
Blake Mizerany
f488652ba7
build: make Build accept only refs without builds
2024-04-01 22:12:43 -07:00
Blake Mizerany
2318ed2919
build: remove unused manifest()
2024-04-01 21:59:38 -07:00
Blake Mizerany
b1b8be33d9
build: cleanup error names and other things
2024-04-01 21:57:34 -07:00
Blake Mizerany
876f7eab81
build: move Manifest from internal/blobstore to build
...
It was getting confusing to have the arbirary handling of manifests in
the blobstore. It also prevented us from using model.Ref in the
blobstore because of cyclic dependencies.
This is much easier to grok now.
2024-04-01 21:43:30 -07:00
Blake Mizerany
7cfc8a0838
build/blob: fix awkward Ref type
2024-04-01 21:25:18 -07:00
Daniel Hiltgen
1f11b52511
Refined min memory from testing
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
526d4eb204
Release gpu discovery library after use
...
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process. This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
0a74cb31d5
Safeguard for noexec
...
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
10ed1b6292
Detect too-old cuda driver
...
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
4fec5816d6
Integration test improvements
...
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
0a0e9f3e0f
Apply 01-cache.diff
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Michael Yang
d6dd2ff839
Merge pull request #3241 from ollama/mxyng/mem
...
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang
e57a6ba89f
Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
...
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang
12ec2346ef
Merge pull request #3442 from ollama/mxyng/generate-output
...
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang
1ec0df1069
fix generate output
2024-04-01 13:47:34 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Philipp Gillé
011bb67351
Add chromem-go to community integrations ( #3437 )
2024-04-01 11:17:37 -04:00
Saifeddine ALOUI
d124627202
Update README.md ( #3436 )
2024-04-01 11:16:31 -04:00
Jesse Zhang
b0a8246a69
Community Integration: CRAG Ollama Chat ( #3423 )
...
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗
Support:
- Ollama
- OpenAI APIs
2024-04-01 11:16:14 -04:00
Blake Mizerany
fd411b3cf6
registry: commit Manifest
2024-03-31 18:20:19 -07:00
Blake Mizerany
04f38cf3f4
registry: commit manifest on successful /v1/push
2024-03-31 15:09:24 -07:00
Blake Mizerany
c0eddb10fd
registry: use exact match on path
2024-03-31 15:01:26 -07:00
Blake Mizerany
60ef0e6b4a
oweb: remove Fault
...
Also, fix typo in the comment.
2024-03-31 15:00:25 -07:00