Jesse Gross
523d84c563
llama.go: Use dynamic buffer for TokenToPiece
...
The cgo binding for llama_token_to_piece uses a fixed 12 byte buffer,
which is usually but not always enough to hold a token. This increase
the buffer size if needed, similar to what llama.cpp does internally.
2024-09-03 21:15:14 -04:00
Jesse Gross
ed19fad862
llama.go: Make batch memory allocation match configuration
...
Batch size defaults to 512 but is configurable. However, llama.go uses
a fixed size buffer, causing crashes is the batch size is increase.
This changes the array size to follow the configuration.
2024-09-03 21:15:14 -04:00
Jesse Gross
5d34320b7c
runner.go: Fix off by one in batch size check
...
When adding tokens to a batch, the index is zero based but is
checked against being greater than the max batch size. This results
in an out-of-bounds access when the final token is added.
2024-09-03 21:15:14 -04:00
Jesse Gross
0c2f95f3de
runner: Initialize numPredict
...
numPredict is used to enforce a limit on the number of tokens to
generate. Is it passed in from Ollama but it is never stored to
be checked.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
8fe30d161c
Fix filename for non darwin arm builds
2024-09-03 21:15:14 -04:00
jmorganca
a483a4c4ed
lint
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
b267ab92b0
Add missing vendor headers to ggml sync
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
189ca38f1d
Wire up native source file dependencies
...
This should make sure incremental builds correctly identify
when to rebuild components based on which native files
are modified.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
80db43b7b4
Bump llama sync to 1e6f65
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
47b0e81219
fix dolphin-mistral
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
751009a5d7
Runtime selection of new or old runners
...
This adjusts the new runners to comingle with existing runners so we can use an
env var to toggle the new runners on.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
8527028bf4
Implement timings response in Go server
...
This implements the fields necessary for `run --verbose`
to generate timing information.
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
e0241118d0
Get embeddings working
...
Truncation doesn't pass, but the other embeddings tests pass
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
f97ee8c506
Fix parallel requests
2024-09-03 21:15:13 -04:00
Daniel Hiltgen
e9dd656ff5
Update sync with latest llama.cpp layout, and run against b3485
2024-09-03 21:15:13 -04:00
Daniel Hiltgen
6c0d892498
Prefix all build artifacts with an OS/ARCH dir
...
This will help keep incremental builds from stomping on each other and make it
easier to stitch together the final runner payloads
2024-09-03 21:15:13 -04:00
Daniel Hiltgen
13348e3629
Get linux building
...
Still needs a bit more refinement to (auto)detect cuda/hip and fallback
gracefully if not detected.
2024-09-03 21:15:13 -04:00
jmorganca
3d5a08c315
add note in readme
2024-09-03 21:15:13 -04:00
jmorganca
a29851bc9b
clean up metal code
2024-09-03 21:15:13 -04:00
jmorganca
8dda9293fa
fix Makefile
on windows
2024-09-03 21:15:13 -04:00
jmorganca
b3c62dcafd
remove printing
2024-09-03 21:15:13 -04:00
jmorganca
9b8b7cd9b5
dont apply license to stb_image.h
and json.hpp
2024-09-03 21:15:13 -04:00
jmorganca
1da6c40f4f
lint
2024-09-03 21:15:13 -04:00
jmorganca
76ca2de06e
update sync header
2024-09-03 21:15:13 -04:00
jmorganca
dded27dcfa
fix metal
2024-09-03 21:15:13 -04:00
jmorganca
080b600865
add header to not edit
2024-09-03 21:15:13 -04:00
jmorganca
d6b6de9a5a
add header to not edit
2024-09-03 21:15:13 -04:00
jmorganca
24a741424f
fix build on windows
2024-09-03 21:15:13 -04:00
jmorganca
4d476d894e
fix Makefile
2024-09-03 21:15:13 -04:00
jmorganca
bd94ddfc56
fix README.md
2024-09-03 21:15:13 -04:00
jmorganca
f1f54c5bd5
fix README.md
2024-09-03 21:15:13 -04:00
jmorganca
18662d1180
consistent whitespace
2024-09-03 21:15:13 -04:00
jmorganca
083a9e9b4e
link metal
2024-09-03 21:15:13 -04:00
jmorganca
d0703eaf44
wip
2024-09-03 21:15:13 -04:00
jmorganca
ce00e387c3
wip meta
2024-09-03 21:15:13 -04:00
jmorganca
763d7b601c
sync
2024-09-03 21:15:13 -04:00
jmorganca
4d0e6c55b0
remove perl docs
2024-09-03 21:15:13 -04:00
jmorganca
3375b82c56
remove build scripts
2024-09-03 21:15:13 -04:00
jmorganca
b8c1065ab6
remove need for perl
2024-09-03 21:15:13 -04:00
jmorganca
a632a04426
fix output
2024-09-03 21:15:13 -04:00
jmorganca
110f37ffb0
arch build
2024-09-03 21:15:13 -04:00
jmorganca
f2f03ff7f2
add temporary makefile
2024-09-03 21:15:13 -04:00
jmorganca
ba0ff1c46a
fix cuda and rocm builds
2024-09-03 21:15:13 -04:00
jmorganca
9966a055e5
fix cgo flags for darwin amd64
2024-09-03 21:15:13 -04:00
jmorganca
7aa7a3c1e5
remove -fPIC
from build_hipblas.sh
2024-09-03 21:15:13 -04:00
jmorganca
de634b7fd7
fix issues with runner
2024-09-03 21:15:13 -04:00
jmorganca
795753be7e
move sync script back in for now
2024-09-03 21:15:13 -04:00
jmorganca
0eed68fed4
llama: sync
2024-09-03 21:15:13 -04:00
jmorganca
783134a3bb
update to d5c938cd
2024-09-03 21:15:13 -04:00
jmorganca
74a158a79e
add patches
2024-09-03 21:15:13 -04:00