Josh Yan
309307c8f9
update test, remove comments
2024-07-17 10:46:50 -07:00
Josh Yan
f378058b51
whitespace
2024-07-16 16:45:41 -07:00
Josh
d069cf753b
Merge branch 'main' into jyan/reord-g
2024-07-16 16:42:49 -07:00
Josh Yan
64405525b4
clean up
2024-07-16 16:40:38 -07:00
Josh Yan
dea2204b82
rmv comments
2024-07-16 16:37:50 -07:00
Josh Yan
6ee22d5080
clean
2024-07-16 16:35:15 -07:00
Josh Yan
703ecccc6b
clean
2024-07-16 14:17:44 -07:00
Josh Yan
873f334783
IT WORKS
2024-07-16 14:12:07 -07:00
Josh Yan
fa49bfc0bd
FIXED TESTS
2024-07-16 12:14:10 -07:00
Josh Yan
fc1b3ee9bf
test
2024-07-16 11:21:13 -07:00
Michael Yang
4a565cbf94
add chat and generate tests with mock runner
2024-07-16 09:39:31 -07:00
Josh Yan
25be20949c
test
2024-07-15 15:08:24 -07:00
royjhan
b9f5e16c80
Introduce /api/embed
endpoint supporting batch embedding ( #5127 )
...
* Initial Batch Embedding
* Revert "Initial Batch Embedding"
This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.
* Initial Draft
* mock up notes
* api/embed draft
* add server function
* check normalization
* clean up
* normalization
* playing around with truncate stuff
* Truncation
* Truncation
* move normalization to go
* Integration Test Template
* Truncation Integration Tests
* Clean up
* use float32
* move normalize
* move normalize test
* refactoring
* integration float32
* input handling and handler testing
* Refactoring of legacy and new
* clear comments
* merge conflicts
* touches
* embedding type 64
* merge conflicts
* fix hanging on single string
* refactoring
* test values
* set context length
* clean up
* testing clean up
* testing clean up
* remove function closure
* Revert "remove function closure"
This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.
* remove function closure
* remove redundant error check
* clean up
* more clean up
* clean up
2024-07-15 12:14:24 -07:00
Josh Yan
903e9df46f
test
2024-07-15 11:46:49 -07:00
Josh Yan
40c0f9612e
unneccesary
2024-07-14 18:41:16 -07:00
Jeffrey Morgan
ef98803d63
llm: looser checks for minimum memory ( #5677 )
2024-07-13 09:20:05 -07:00
Josh Yan
15a0215203
running
2024-07-12 16:49:57 -07:00
Josh Yan
faa3c937cf
writeto
2024-07-12 15:37:27 -07:00
Josh Yan
cf57246aba
write
2024-07-12 12:59:51 -07:00
Josh Yan
6fafe4f753
gguf
2024-07-12 12:58:00 -07:00
Josh Yan
d7c8d4f3f4
ggufwritekv
2024-07-12 12:25:13 -07:00
Josh Yan
3d0fd31f0e
TensorWriter
2024-07-12 12:18:46 -07:00
Josh Yan
e75fb73839
types
2024-07-12 09:42:10 -07:00
Josh Yan
2fdebffc8d
sawp
2024-07-11 18:18:26 -07:00
Josh Yan
29ecfe493b
write
2024-07-11 17:56:51 -07:00
Josh
10e768826c
fix: quant err message ( #5616 )
2024-07-11 17:24:29 -07:00
Jeffrey Morgan
c4cf8ad559
llm: avoid loading model if system memory is too small ( #5637 )
...
* llm: avoid loading model if system memory is too small
* update log
* Instrument swap free space
On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models
* use `systemSwapFreeMemory` in check
---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2024-07-11 16:42:57 -07:00
Jeffrey Morgan
791650ddef
sched: only error when over-allocating system memory ( #5626 )
2024-07-11 00:53:12 -07:00
Jeffrey Morgan
efbf41ed81
llm: dont link cuda with compat libs ( #5621 )
2024-07-10 20:01:52 -07:00
Michael Yang
37a570f962
Merge pull request #5612 from ollama/mxyng/mem
...
chatglm graph
2024-07-10 14:18:33 -07:00
Michael Yang
5a739ff4cb
chatglm graph
2024-07-10 13:43:47 -07:00
Jeffrey Morgan
4e262eb2a8
remove GGML_CUDA_FORCE_MMQ=on
from build ( #5588 )
2024-07-10 13:17:13 -07:00
Daniel Hiltgen
b50c818623
Merge pull request #5607 from dhiltgen/win_rocm_v6
...
Bump ROCm on windows to 6.1.2
2024-07-10 12:47:10 -07:00
Daniel Hiltgen
1f50356e8e
Bump ROCm on windows to 6.1.2
...
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen
22c81f62ec
Remove duplicate merge glitch
2024-07-10 09:01:33 -07:00
Daniel Hiltgen
2d1e3c3229
Merge pull request #5503 from dhiltgen/dual_rocm
...
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
Daniel Hiltgen
b51e3b63ac
Statically link c++ and thread lib
...
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang
9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
...
update message processing
2024-07-09 09:20:44 -07:00
Daniel Hiltgen
0bacb30007
Workaround broken ROCm p2p copy
...
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan
53da2c6965
llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation ( #5535 )
2024-07-07 14:32:05 -04:00
Jeffrey Morgan
d8def1ff94
llm: allow gemma 2 to context shift ( #5534 )
2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955
Update llama.cpp submodule to a8db2a9c
( #5530 )
2024-07-07 13:03:09 -04:00
Jeffrey Morgan
0e09c380fc
llm: print caching notices in debug only ( #5533 )
2024-07-07 12:38:04 -04:00
Jeffrey Morgan
4607c70641
llm: add -DBUILD_SHARED_LIBS=off
to common cpu cmake flags ( #5520 )
2024-07-06 18:58:16 -04:00
jmorganca
a08f20d910
release: remove unwanted mingw dll.a files
2024-07-06 15:21:15 -04:00
jmorganca
6cea036027
Revert "llm: only statically link libstdc++"
...
This reverts commit 5796bfc4013f4ebe26cdbf13554332a25c405027.
2024-07-06 15:10:48 -04:00
jmorganca
5796bfc401
llm: only statically link libstdc++
2024-07-06 14:06:20 -04:00
jmorganca
f1a379aa56
llm: statically link pthread and stdc++ dependencies in windows build
2024-07-06 12:54:02 -04:00
jmorganca
9ae146993e
llm: add GGML_STATIC
flag to windows static lib
2024-07-06 03:27:05 -04:00
Jeffrey Morgan
e0348d3fe8
llm: add COMMON_DARWIN_DEFS
to arm static build ( #5513 )
2024-07-05 22:42:42 -04:00