ollama

Author	SHA1	Message	Date
Blake Mizerany	d54e0fb3b2	...	2024-04-03 16:14:22 -07:00
Blake Mizerany	bdd05e0ae0	x/registry: skip ref test	2024-04-03 15:59:23 -07:00
Blake Mizerany	1a346640db	x/registry: work on getting basic test passing	2024-04-03 15:58:04 -07:00
Blake Mizerany	f5883070f8	x/registry: upload smoke test passing	2024-04-03 14:30:58 -07:00
Blake Mizerany	adc23d5f96	Add 'x/' from commit 'a10a11b9d371f36b7c3510da32a1d70b74e27bd1' git-subtree-dir: x git-subtree-mainline: 7d05a6ee8f44b314fa697a427439e5fa4d78c3d7 git-subtree-split: a10a11b9d371f36b7c3510da32a1d70b74e27bd1	2024-04-03 10:40:23 -07:00
Blake Mizerany	a10a11b9d3	registry: initial work on multipart pushes	2024-04-03 10:39:30 -07:00
Blake Mizerany	7d05a6ee8f	cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470 ) This also moves the checkServerHeartbeat call out of the "RunE" Cobra stuff (that's the only word I have for that) to on-site where it's after the check for OLLAMA_MODELS, which allows the helpful error message to be printed before the server heartbeat check. This also arguably makes the code more readable without the magic/superfluous "pre" function caller.	2024-04-02 22:11:13 -07:00
Daniel Hiltgen	464d817824	Merge pull request #3464 from dhiltgen/subprocess Fix numgpu opt miscomparison	2024-04-02 20:10:17 -07:00
Pier Francesco Contino	531324a9be	feat: add OLLAMA_DEBUG in ollama server help message (#3461 ) Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>	2024-04-02 18:20:03 -07:00
Daniel Hiltgen	6589eb8a8c	Revert options as a ref in the server	2024-04-02 16:44:10 -07:00
Michael Yang	a039e383cd	Merge pull request #3465 from ollama/mxyng/fix-metal fix metal gpu	2024-04-02 16:29:58 -07:00
Michael Yang	80163ebcb5	fix metal gpu	2024-04-02 16:06:45 -07:00
Daniel Hiltgen	a57818d93e	Merge pull request #3343 from dhiltgen/bump_more2 Bump llama.cpp to b2581	2024-04-02 15:08:26 -07:00
Blake Mizerany	94befe366a	...	2024-04-02 14:28:06 -07:00
Blake Mizerany	c95f97689b	utils/upload: init	2024-04-02 14:15:21 -07:00
Blake Mizerany	618eb5b909	registry: multipart push	2024-04-02 13:40:23 -07:00
Daniel Hiltgen	841adda157	Fix windows lint CI flakiness	2024-04-02 12:22:16 -07:00
Daniel Hiltgen	0035e31af8	Bump to b2581	2024-04-02 11:53:07 -07:00
Blake Mizerany	eb75418be9	build/blob: test ParseRef round-trip	2024-04-02 11:45:01 -07:00
Blake Mizerany	9959da05de	build/blob: break out test refs for other tests/fuzzing	2024-04-02 11:38:10 -07:00
Daniel Hiltgen	c863c6a96d	Merge pull request #3218 from dhiltgen/subprocess Switch back to subprocessing for llama.cpp	2024-04-02 10:49:44 -07:00
Blake Mizerany	aff7970628	build: remove superfluous parseCompleteRef	2024-04-01 23:41:42 -07:00
Blake Mizerany	628f1feb36	build: back to taking manifests as []byte Its nicer to have the manifests be an opaque []byte, rather than a struct. This way users of the build package don't need to know about the internal structure of the manifests. The registry can interpret the manifests as it sees fit, while letting build keep its own Go type of manifest which is easier to work with in the build package.	2024-04-01 23:18:58 -07:00
Blake Mizerany	ce3125afd5	registry: add New and take a minio client as argument	2024-04-01 22:53:49 -07:00
Blake Mizerany	f488652ba7	build: make Build accept only refs without builds	2024-04-01 22:12:43 -07:00
Blake Mizerany	2318ed2919	build: remove unused manifest()	2024-04-01 21:59:38 -07:00
Blake Mizerany	b1b8be33d9	build: cleanup error names and other things	2024-04-01 21:57:34 -07:00
Blake Mizerany	876f7eab81	build: move Manifest from internal/blobstore to build It was getting confusing to have the arbirary handling of manifests in the blobstore. It also prevented us from using model.Ref in the blobstore because of cyclic dependencies. This is much easier to grok now.	2024-04-01 21:43:30 -07:00
Blake Mizerany	7cfc8a0838	build/blob: fix awkward Ref type	2024-04-01 21:25:18 -07:00
Daniel Hiltgen	1f11b52511	Refined min memory from testing	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	526d4eb204	Release gpu discovery library after use Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	0a74cb31d5	Safeguard for noexec We may have users that run into problems with our current payload model, so this gives us an escape valve.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	10ed1b6292	Detect too-old cuda driver "cudart init failure: 35" isn't particularly helpful in the logs.	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	4fec5816d6	Integration test improvements Cleaner shutdown logic, a bit of response hardening	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	0a0e9f3e0f	Apply 01-cache.diff	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Patrick Devine	3b6a9154dd	Simplify model conversion (#3422 )	2024-04-01 16:14:53 -07:00
Michael Yang	d6dd2ff839	Merge pull request #3241 from ollama/mxyng/mem update memory estimations for gpu offloading	2024-04-01 13:59:14 -07:00
Michael Yang	e57a6ba89f	Merge pull request #2926 from ollama/mxyng/decode-ggml-v2 refactor model parsing	2024-04-01 13:58:13 -07:00
Michael Yang	12ec2346ef	Merge pull request #3442 from ollama/mxyng/generate-output fix generate output	2024-04-01 13:56:09 -07:00
Michael Yang	1ec0df1069	fix generate output	2024-04-01 13:47:34 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	d338d70492	refactor model parsing	2024-04-01 13:16:15 -07:00
Philipp Gillé	011bb67351	Add chromem-go to community integrations (#3437 )	2024-04-01 11:17:37 -04:00
Saifeddine ALOUI	d124627202	Update README.md (#3436 )	2024-04-01 11:16:31 -04:00
Jesse Zhang	b0a8246a69	Community Integration: CRAG Ollama Chat (#3423 ) Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗 Support: - Ollama - OpenAI APIs	2024-04-01 11:16:14 -04:00
Blake Mizerany	fd411b3cf6	registry: commit Manifest	2024-03-31 18:20:19 -07:00
Blake Mizerany	04f38cf3f4	registry: commit manifest on successful /v1/push	2024-03-31 15:09:24 -07:00
Blake Mizerany	c0eddb10fd	registry: use exact match on path	2024-03-31 15:01:26 -07:00
Blake Mizerany	60ef0e6b4a	oweb: remove Fault Also, fix typo in the comment.	2024-03-31 15:00:25 -07:00

1 2 3 4 5 ...

2334 Commits