Josh Yan
a548eb6003
a8db2a9
2024-07-10 13:10:58 -07:00
Josh Yan
f92818d90d
patch again
2024-07-10 13:06:40 -07:00
Josh Yan
1ef59057d0
patch llama.cpp
2024-07-10 13:02:37 -07:00
Josh Yan
106fe6b4ae
patch
2024-07-10 12:46:45 -07:00
Josh Yan
5fd359d117
added patch
2024-07-10 12:46:45 -07:00
Josh Yan
b0e4e8d76c
change
2024-07-10 12:46:44 -07:00
Josh Yan
e59453982d
logs
2024-07-10 12:46:22 -07:00
Josh Yan
369113970a
wooh
2024-07-10 12:46:14 -07:00
Josh Yan
26ed829415
test
2024-07-10 12:46:14 -07:00
Josh Yan
542134bf50
new
2024-07-10 12:46:09 -07:00
Josh Yan
9e0b8f1fe2
another change
2024-07-10 12:46:06 -07:00
Josh Yan
c498609ba3
cast
2024-07-10 12:45:47 -07:00
Josh Yan
c800a67f1b
cast
2024-07-10 12:45:12 -07:00
Josh Yan
dfc62648f3
cast
2024-07-10 12:45:12 -07:00
Josh Yan
24e8292e94
new changes
2024-07-10 12:45:10 -07:00
Josh Yan
c63b4ecbf7
quantize
2024-07-10 12:44:40 -07:00
Daniel Hiltgen
2d1e3c3229
Merge pull request #5503 from dhiltgen/dual_rocm
...
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
Daniel Hiltgen
b51e3b63ac
Statically link c++ and thread lib
...
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang
9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
...
update message processing
2024-07-09 09:20:44 -07:00
Daniel Hiltgen
0bacb30007
Workaround broken ROCm p2p copy
...
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan
53da2c6965
llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation ( #5535 )
2024-07-07 14:32:05 -04:00
Jeffrey Morgan
d8def1ff94
llm: allow gemma 2 to context shift ( #5534 )
2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955
Update llama.cpp submodule to a8db2a9c
( #5530 )
2024-07-07 13:03:09 -04:00
Jeffrey Morgan
0e09c380fc
llm: print caching notices in debug only ( #5533 )
2024-07-07 12:38:04 -04:00
Jeffrey Morgan
4607c70641
llm: add -DBUILD_SHARED_LIBS=off
to common cpu cmake flags ( #5520 )
2024-07-06 18:58:16 -04:00
jmorganca
a08f20d910
release: remove unwanted mingw dll.a files
2024-07-06 15:21:15 -04:00
jmorganca
6cea036027
Revert "llm: only statically link libstdc++"
...
This reverts commit 5796bfc4013f4ebe26cdbf13554332a25c405027.
2024-07-06 15:10:48 -04:00
jmorganca
5796bfc401
llm: only statically link libstdc++
2024-07-06 14:06:20 -04:00
jmorganca
f1a379aa56
llm: statically link pthread and stdc++ dependencies in windows build
2024-07-06 12:54:02 -04:00
jmorganca
9ae146993e
llm: add GGML_STATIC
flag to windows static lib
2024-07-06 03:27:05 -04:00
Jeffrey Morgan
e0348d3fe8
llm: add COMMON_DARWIN_DEFS
to arm static build ( #5513 )
2024-07-05 22:42:42 -04:00
Jeffrey Morgan
2cc854f8cb
llm: fix missing dylibs by restoring old build behavior on Linux and macOS ( #5511 )
...
* Revert "fix cmake build (#5505 )"
This reverts commit 4fd5f3526a116d05cd74cfcc7217d4e6326e1bea.
* llm: fix missing dylibs by restoring old build behavior
* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan
5304b765b2
llm: put back old include dir ( #5507 )
...
* llm: put back old include dir
* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
Jeffrey Morgan
4fd5f3526a
fix cmake build ( #5505 )
2024-07-05 19:07:01 -04:00
Michael Yang
ac7a842e55
fix model reloading
...
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Jeffrey Morgan
78fb33dd07
fix typo in cgo directives in llm.go
( #5501 )
2024-07-05 15:18:36 -04:00
Jeffrey Morgan
8f8e736b13
update llama.cpp submodule to d7fd29f
( #5475 )
2024-07-05 13:25:58 -04:00
Jeffrey Morgan
d89454de80
Use slot with cached prompt instead of least recently used ( #5492 )
...
* Use common prefix to select slot
* actually report `longest`
2024-07-05 12:32:47 -04:00
Jeffrey Morgan
e9188e971a
Fix assert on small embedding inputs ( #5491 )
...
* Fix assert on small embedding inputs
* Update llm/patches/09-pooling.diff
2024-07-05 11:20:57 -04:00
Daniel Hiltgen
02c24d3d01
Merge pull request #5466 from dhiltgen/fix_clip_unicode
...
Fix clip model loading with unicode paths
2024-07-05 08:16:58 -07:00
Jeffrey Morgan
4d71c559b2
fix error detection by limiting model loading error parsing ( #5472 )
2024-07-03 20:04:30 -04:00
Daniel Hiltgen
ccd7785859
Merge pull request #5243 from dhiltgen/modelfile_use_mmap
...
Fix use_mmap for modefiles
2024-07-03 13:59:42 -07:00
royjhan
3b5a4a77f3
Return Correct Prompt Eval Count Regardless of Cache Prompt ( #5371 )
...
* openai compatibility
* Revert "openai compatibility"
This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c.
* remove erroneous subtraction of prompt cache
2024-07-03 13:46:23 -07:00
Daniel Hiltgen
0e982bc1f4
Fix corner cases on tmp cleaner on mac
...
When ollama is running a long time, tmp cleaners can remove the
runners. This tightens up a few corner cases on arm macs where
we failed with "server cpu not listed in available servers map[]"
2024-07-03 13:10:14 -07:00
Daniel Hiltgen
6298f49816
Fix clip model loading with unicode paths
...
On windows, if the model dir contained unicode characters
clip models would fail to load. This fixes the file name
handling in clip.cpp to support utf16 on windows.
2024-07-03 12:46:36 -07:00
Josh Yan
33a65e3ba3
error
2024-07-01 16:04:13 -07:00
Daniel Hiltgen
97c9e11768
Switch use_mmap to a pointer type
...
This uses nil as undefined for a cleaner implementation.
2024-07-01 08:44:59 -07:00
Daniel Hiltgen
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
...
Enable concurrency by default
2024-07-01 08:32:29 -07:00
Jeffrey Morgan
717f7229eb
Do not shift context for sliding window models ( #5368 )
...
* Do not shift context for sliding window models
* truncate prompt > 2/3 tokens
* only target gemma2
2024-06-28 19:39:31 -07:00
Michael Yang
de2163dafd
gemma2 graph
2024-06-27 13:34:52 -07:00