156 Commits

Author SHA1 Message Date
Jesse Gross
523d84c563 llama.go: Use dynamic buffer for TokenToPiece
The cgo binding for llama_token_to_piece uses a fixed 12 byte buffer,
which is usually but not always enough to hold a token. This increase
the buffer size if needed, similar to what llama.cpp does internally.
2024-09-03 21:15:14 -04:00
Jesse Gross
ed19fad862 llama.go: Make batch memory allocation match configuration
Batch size defaults to 512 but is configurable. However, llama.go uses
a fixed size buffer, causing crashes is the batch size is increase.
This changes the array size to follow the configuration.
2024-09-03 21:15:14 -04:00
Jesse Gross
5d34320b7c runner.go: Fix off by one in batch size check
When adding tokens to a batch, the index is zero based but is
checked against being greater than the max batch size. This results
in an out-of-bounds access when the final token is added.
2024-09-03 21:15:14 -04:00
Jesse Gross
0c2f95f3de runner: Initialize numPredict
numPredict is used to enforce a limit on the number of tokens to
generate. Is it passed in from Ollama but it is never stored to
be checked.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
8fe30d161c Fix filename for non darwin arm builds 2024-09-03 21:15:14 -04:00
jmorganca
a483a4c4ed lint 2024-09-03 21:15:14 -04:00
Daniel Hiltgen
b267ab92b0 Add missing vendor headers to ggml sync 2024-09-03 21:15:14 -04:00
Daniel Hiltgen
189ca38f1d Wire up native source file dependencies
This should make sure incremental builds correctly identify
when to rebuild components based on which native files
are modified.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
80db43b7b4 Bump llama sync to 1e6f65 2024-09-03 21:15:14 -04:00
Daniel Hiltgen
47b0e81219 fix dolphin-mistral 2024-09-03 21:15:14 -04:00
Daniel Hiltgen
751009a5d7 Runtime selection of new or old runners
This adjusts the new runners to comingle with existing runners so we can use an
env var to toggle the new runners on.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
8527028bf4 Implement timings response in Go server
This implements the fields necessary for `run --verbose`
to generate timing information.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
e0241118d0 Get embeddings working
Truncation doesn't pass, but the other embeddings tests pass
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
f97ee8c506 Fix parallel requests 2024-09-03 21:15:13 -04:00
Daniel Hiltgen
e9dd656ff5 Update sync with latest llama.cpp layout, and run against b3485 2024-09-03 21:15:13 -04:00
Daniel Hiltgen
6c0d892498 Prefix all build artifacts with an OS/ARCH dir
This will help keep incremental builds from stomping on each other and make it
easier to stitch together the final runner payloads
2024-09-03 21:15:13 -04:00
Daniel Hiltgen
13348e3629 Get linux building
Still needs a bit more refinement to (auto)detect cuda/hip and fallback
gracefully if not detected.
2024-09-03 21:15:13 -04:00
jmorganca
3d5a08c315 add note in readme 2024-09-03 21:15:13 -04:00
jmorganca
a29851bc9b clean up metal code 2024-09-03 21:15:13 -04:00
jmorganca
8dda9293fa fix Makefile on windows 2024-09-03 21:15:13 -04:00
jmorganca
b3c62dcafd remove printing 2024-09-03 21:15:13 -04:00
jmorganca
9b8b7cd9b5 dont apply license to stb_image.h and json.hpp 2024-09-03 21:15:13 -04:00
jmorganca
1da6c40f4f lint 2024-09-03 21:15:13 -04:00
jmorganca
76ca2de06e update sync header 2024-09-03 21:15:13 -04:00
jmorganca
dded27dcfa fix metal 2024-09-03 21:15:13 -04:00
jmorganca
080b600865 add header to not edit 2024-09-03 21:15:13 -04:00
jmorganca
d6b6de9a5a add header to not edit 2024-09-03 21:15:13 -04:00
jmorganca
24a741424f fix build on windows 2024-09-03 21:15:13 -04:00
jmorganca
4d476d894e fix Makefile 2024-09-03 21:15:13 -04:00
jmorganca
bd94ddfc56 fix README.md 2024-09-03 21:15:13 -04:00
jmorganca
f1f54c5bd5 fix README.md 2024-09-03 21:15:13 -04:00
jmorganca
18662d1180 consistent whitespace 2024-09-03 21:15:13 -04:00
jmorganca
083a9e9b4e link metal 2024-09-03 21:15:13 -04:00
jmorganca
d0703eaf44 wip 2024-09-03 21:15:13 -04:00
jmorganca
ce00e387c3 wip meta 2024-09-03 21:15:13 -04:00
jmorganca
763d7b601c sync 2024-09-03 21:15:13 -04:00
jmorganca
4d0e6c55b0 remove perl docs 2024-09-03 21:15:13 -04:00
jmorganca
3375b82c56 remove build scripts 2024-09-03 21:15:13 -04:00
jmorganca
b8c1065ab6 remove need for perl 2024-09-03 21:15:13 -04:00
jmorganca
a632a04426 fix output 2024-09-03 21:15:13 -04:00
jmorganca
110f37ffb0 arch build 2024-09-03 21:15:13 -04:00
jmorganca
f2f03ff7f2 add temporary makefile 2024-09-03 21:15:13 -04:00
jmorganca
ba0ff1c46a fix cuda and rocm builds 2024-09-03 21:15:13 -04:00
jmorganca
9966a055e5 fix cgo flags for darwin amd64 2024-09-03 21:15:13 -04:00
jmorganca
7aa7a3c1e5 remove -fPIC from build_hipblas.sh 2024-09-03 21:15:13 -04:00
jmorganca
de634b7fd7 fix issues with runner 2024-09-03 21:15:13 -04:00
jmorganca
795753be7e move sync script back in for now 2024-09-03 21:15:13 -04:00
jmorganca
0eed68fed4 llama: sync 2024-09-03 21:15:13 -04:00
jmorganca
783134a3bb update to d5c938cd 2024-09-03 21:15:13 -04:00
jmorganca
74a158a79e add patches 2024-09-03 21:15:13 -04:00