ollama

Author	SHA1	Message	Date
Jesse Gross	46a7c682f2	runner.go: Fix embeddings endpoint The embeddings endpoint only takes a single input and provides a single output, instead of multiple as the current implementation expected. Fixing this also allows the implementation to be simplified and a few embedding-specific issues to be addressed.	2024-09-03 21:15:14 -04:00
Jesse Gross	0b73cca386	runner.go: Fix resource leaks when removing sequences There are multiple causes and paths that result in a sequence ending. Not all of these free the sampling context or reset the pieces slice. This factors out the removal code so that all paths release resources.	2024-09-03 21:15:14 -04:00
Jesse Gross	76718ead40	runner.go: Support MinP parameter MinP is a user-facing parameter that is exposed that is exposed through the APIs but is not currently plumbed through.	2024-09-03 21:15:14 -04:00
Jesse Gross	477f529d26	runner.go: Implement RepeatLastN to penalize repeated tokens RepeatLastN is a user-facing parameter that is exposed that is exposed through the APIs but is not currently plumbed through.	2024-09-03 21:15:14 -04:00
Jesse Gross	69cc5795a7	runner.go: Shift context window when KV cache space is exceeded Currently, once the KV cache is full, text generation stops. Instead, we should shift out the oldest context so that new generation can continue based on more recent context. This uses the algorithm from llama.cpp that is currently used by Ollama with the server.cpp code. There are others but they are never turned on through Ollama, so this restores parity. The algorithm is: - Retain a configurable number of tokens at the beginning (for things like beginning of sequence tokens - Drop the oldest half of the remaining tokens - Shift the remaining new tokens to the back of the cache	2024-09-03 21:15:14 -04:00
Jesse Gross	523d84c563	llama.go: Use dynamic buffer for TokenToPiece The cgo binding for llama_token_to_piece uses a fixed 12 byte buffer, which is usually but not always enough to hold a token. This increase the buffer size if needed, similar to what llama.cpp does internally.	2024-09-03 21:15:14 -04:00
Jesse Gross	ed19fad862	llama.go: Make batch memory allocation match configuration Batch size defaults to 512 but is configurable. However, llama.go uses a fixed size buffer, causing crashes is the batch size is increase. This changes the array size to follow the configuration.	2024-09-03 21:15:14 -04:00
jmorganca	a483a4c4ed	lint	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	e9dd656ff5	Update sync with latest llama.cpp layout, and run against b3485	2024-09-03 21:15:13 -04:00
Daniel Hiltgen	6c0d892498	Prefix all build artifacts with an OS/ARCH dir This will help keep incremental builds from stomping on each other and make it easier to stitch together the final runner payloads	2024-09-03 21:15:13 -04:00
jmorganca	a29851bc9b	clean up metal code	2024-09-03 21:15:13 -04:00
jmorganca	8dda9293fa	fix `Makefile` on windows	2024-09-03 21:15:13 -04:00
jmorganca	b3c62dcafd	remove printing	2024-09-03 21:15:13 -04:00
jmorganca	1da6c40f4f	lint	2024-09-03 21:15:13 -04:00
jmorganca	dded27dcfa	fix metal	2024-09-03 21:15:13 -04:00
jmorganca	24a741424f	fix build on windows	2024-09-03 21:15:13 -04:00
jmorganca	083a9e9b4e	link metal	2024-09-03 21:15:13 -04:00
jmorganca	d0703eaf44	wip	2024-09-03 21:15:13 -04:00
jmorganca	ce00e387c3	wip meta	2024-09-03 21:15:13 -04:00
jmorganca	763d7b601c	sync	2024-09-03 21:15:13 -04:00
jmorganca	4d0e6c55b0	remove perl docs	2024-09-03 21:15:13 -04:00
jmorganca	3375b82c56	remove build scripts	2024-09-03 21:15:13 -04:00
jmorganca	a632a04426	fix output	2024-09-03 21:15:13 -04:00
jmorganca	110f37ffb0	arch build	2024-09-03 21:15:13 -04:00
jmorganca	f2f03ff7f2	add temporary makefile	2024-09-03 21:15:13 -04:00
jmorganca	9966a055e5	fix cgo flags for darwin amd64	2024-09-03 21:15:13 -04:00
jmorganca	43efc893d7	basic progress	2024-09-03 21:15:13 -04:00
jmorganca	20afaae020	add more runner params	2024-09-03 21:15:13 -04:00
jmorganca	b2ef3bf490	embeddings	2024-09-03 21:15:12 -04:00
jmorganca	ce15ed6d69	remove dependency on `llm`	2024-09-03 21:15:12 -04:00
jmorganca	c0b94376b2	grammar	2024-09-03 21:15:12 -04:00
jmorganca	72be8e27c4	sampling	2024-09-03 21:15:12 -04:00
jmorganca	d12db0568e	better `example` module, add port	2024-09-03 21:15:12 -04:00
jmorganca	ec17359a68	wip	2024-09-03 21:15:12 -04:00
jmorganca	fbc8572859	add `llava` to `runner`	2024-09-03 21:15:12 -04:00
jmorganca	28bedcd807	wip	2024-09-03 21:15:12 -04:00
jmorganca	b22d78720e	cuda linux	2024-09-03 21:15:12 -04:00
jmorganca	9547aa53ff	disable log file	2024-09-03 21:15:12 -04:00
jmorganca	a8f91d3cc1	add llava	2024-09-03 21:15:12 -04:00
jmorganca	e86db9381a	`avx2` should only add `avx2`	2024-09-03 21:15:12 -04:00
jmorganca	9fe48978a8	move `runner` package down	2024-09-03 21:15:12 -04:00
jmorganca	01ccbc07fe	replace static build in `llm`	2024-09-03 21:15:12 -04:00
jmorganca	0110994d06	Initial `llama` Go module	2024-09-03 21:15:12 -04:00
jmorganca	2ef3a217d1	add sync of llama.cpp	2024-09-03 21:15:12 -04:00
Michael Yang	fccf8d179f	partial decode ggml bin for more info	2023-08-10 09:23:10 -07:00
Bruce MacDonald	984c9c628c	fix embeddings invalid values	2023-08-09 16:50:53 -04:00
Bruce MacDonald	09d8bf6730	fix build errors	2023-08-09 10:45:57 -04:00
Bruce MacDonald	7a5f3616fd	embed text document in modelfile	2023-08-09 10:26:19 -04:00
Michael Yang	f2074ed4c0	Merge pull request #306 from jmorganca/default-keep-system automatically set num_keep if num_keep < 0	2023-08-08 09:25:34 -07:00
Bruce MacDonald	a6f6d18f83	embed text document in modelfile	2023-08-08 11:27:17 -04:00

1 2

88 Commits