Jesse Gross
76718ead40
runner.go: Support MinP parameter
...
MinP is a user-facing parameter that is exposed that is exposed
through the APIs but is not currently plumbed through.
2024-09-03 21:15:14 -04:00
Jesse Gross
477f529d26
runner.go: Implement RepeatLastN to penalize repeated tokens
...
RepeatLastN is a user-facing parameter that is exposed that is exposed
through the APIs but is not currently plumbed through.
2024-09-03 21:15:14 -04:00
Jesse Gross
69cc5795a7
runner.go: Shift context window when KV cache space is exceeded
...
Currently, once the KV cache is full, text generation stops. Instead,
we should shift out the oldest context so that new generation can
continue based on more recent context.
This uses the algorithm from llama.cpp that is currently used by Ollama
with the server.cpp code. There are others but they are never turned
on through Ollama, so this restores parity.
The algorithm is:
- Retain a configurable number of tokens at the beginning (for things
like beginning of sequence tokens
- Drop the oldest half of the remaining tokens
- Shift the remaining new tokens to the back of the cache
2024-09-03 21:15:14 -04:00
Jesse Gross
523d84c563
llama.go: Use dynamic buffer for TokenToPiece
...
The cgo binding for llama_token_to_piece uses a fixed 12 byte buffer,
which is usually but not always enough to hold a token. This increase
the buffer size if needed, similar to what llama.cpp does internally.
2024-09-03 21:15:14 -04:00
Jesse Gross
ed19fad862
llama.go: Make batch memory allocation match configuration
...
Batch size defaults to 512 but is configurable. However, llama.go uses
a fixed size buffer, causing crashes is the batch size is increase.
This changes the array size to follow the configuration.
2024-09-03 21:15:14 -04:00
jmorganca
a483a4c4ed
lint
2024-09-03 21:15:14 -04:00
Daniel Hiltgen
e9dd656ff5
Update sync with latest llama.cpp layout, and run against b3485
2024-09-03 21:15:13 -04:00
Daniel Hiltgen
6c0d892498
Prefix all build artifacts with an OS/ARCH dir
...
This will help keep incremental builds from stomping on each other and make it
easier to stitch together the final runner payloads
2024-09-03 21:15:13 -04:00
jmorganca
a29851bc9b
clean up metal code
2024-09-03 21:15:13 -04:00
jmorganca
8dda9293fa
fix Makefile
on windows
2024-09-03 21:15:13 -04:00
jmorganca
b3c62dcafd
remove printing
2024-09-03 21:15:13 -04:00
jmorganca
1da6c40f4f
lint
2024-09-03 21:15:13 -04:00
jmorganca
dded27dcfa
fix metal
2024-09-03 21:15:13 -04:00
jmorganca
24a741424f
fix build on windows
2024-09-03 21:15:13 -04:00
jmorganca
083a9e9b4e
link metal
2024-09-03 21:15:13 -04:00
jmorganca
d0703eaf44
wip
2024-09-03 21:15:13 -04:00
jmorganca
ce00e387c3
wip meta
2024-09-03 21:15:13 -04:00
jmorganca
763d7b601c
sync
2024-09-03 21:15:13 -04:00
jmorganca
4d0e6c55b0
remove perl docs
2024-09-03 21:15:13 -04:00
jmorganca
3375b82c56
remove build scripts
2024-09-03 21:15:13 -04:00
jmorganca
a632a04426
fix output
2024-09-03 21:15:13 -04:00
jmorganca
110f37ffb0
arch build
2024-09-03 21:15:13 -04:00
jmorganca
f2f03ff7f2
add temporary makefile
2024-09-03 21:15:13 -04:00
jmorganca
9966a055e5
fix cgo flags for darwin amd64
2024-09-03 21:15:13 -04:00
jmorganca
43efc893d7
basic progress
2024-09-03 21:15:13 -04:00
jmorganca
20afaae020
add more runner params
2024-09-03 21:15:13 -04:00
jmorganca
b2ef3bf490
embeddings
2024-09-03 21:15:12 -04:00
jmorganca
ce15ed6d69
remove dependency on llm
2024-09-03 21:15:12 -04:00
jmorganca
c0b94376b2
grammar
2024-09-03 21:15:12 -04:00
jmorganca
72be8e27c4
sampling
2024-09-03 21:15:12 -04:00
jmorganca
d12db0568e
better example
module, add port
2024-09-03 21:15:12 -04:00
jmorganca
ec17359a68
wip
2024-09-03 21:15:12 -04:00
jmorganca
fbc8572859
add llava
to runner
2024-09-03 21:15:12 -04:00
jmorganca
28bedcd807
wip
2024-09-03 21:15:12 -04:00
jmorganca
b22d78720e
cuda linux
2024-09-03 21:15:12 -04:00
jmorganca
9547aa53ff
disable log file
2024-09-03 21:15:12 -04:00
jmorganca
a8f91d3cc1
add llava
2024-09-03 21:15:12 -04:00
jmorganca
e86db9381a
avx2
should only add avx2
2024-09-03 21:15:12 -04:00
jmorganca
9fe48978a8
move runner
package down
2024-09-03 21:15:12 -04:00
jmorganca
01ccbc07fe
replace static build in llm
2024-09-03 21:15:12 -04:00
jmorganca
0110994d06
Initial llama
Go module
2024-09-03 21:15:12 -04:00
jmorganca
2ef3a217d1
add sync of llama.cpp
2024-09-03 21:15:12 -04:00
Michael Yang
fccf8d179f
partial decode ggml bin for more info
2023-08-10 09:23:10 -07:00
Bruce MacDonald
984c9c628c
fix embeddings invalid values
2023-08-09 16:50:53 -04:00
Bruce MacDonald
09d8bf6730
fix build errors
2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile
2023-08-09 10:26:19 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
...
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83
embed text document in modelfile
2023-08-08 11:27:17 -04:00
Jeffrey Morgan
5eb712f962
trim whitespace before checking stop conditions
...
Fixes #295
2023-08-08 00:29:19 -04:00
Michael Yang
4dc5b117dd
automatically set num_keep if num_keep < 0
...
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00