ollama

Author	SHA1	Message	Date
Jesse Gross	5d34320b7c	runner.go: Fix off by one in batch size check When adding tokens to a batch, the index is zero based but is checked against being greater than the max batch size. This results in an out-of-bounds access when the final token is added.	2024-09-03 21:15:14 -04:00
Jesse Gross	1c36f36c41	llm: Fix array out-of-bounds memory access when tokenizing tokenize() passes a string length longer than the actual data into llama_tokenize(). This entire string length gets scanned in the C++ code despite there being a NULL terminator in the correct location (because it gets converted into std::string). The result is read of uninitialized memory, which depending on the contents of that memory fails the check for partial multi-byte UTF8 characters. In addition, if there is not enough space in the passed buffer for token output then llama_tokenize() returns the required space as a negative number. We should convert this to a positive number before reallocing. The first problem results in the following splat: libc++abi: terminating due to uncaught exception of type std::invalid_argument: failed to convert utf8 to codepoint SIGABRT: abort PC=0x193cd55f0 m=11 sigcode=0 signal arrived during cgo execution goroutine 27 gp=0x14000708700 m=11 mp=0x14000584908 [syscall]: runtime.cgocall(0x105549e68, 0x140000c6bf8) /opt/homebrew/Cellar/go/1.22.5/libexec/src/runtime/cgocall.go:157 +0x44 fp=0x140000c6bc0 sp=0x140000c6b80 pc=0x104b372c4 github.com/ollama/ollama/llm._Cfunc_llama_tokenize(0x15180f400, 0x152009a00, 0x5aa, 0x140002e8800, 0x5aa, 0x1, 0x1) _cgo_gotypes.go:270 +0x34 fp=0x140000c6bf0 sp=0x140000c6bc0 pc=0x104ef7664 github.com/ollama/ollama/llm.tokenize.func2(0x140001dd800?, 0x152009a00, 0x5aa, 0x1400012cdc0?) /Users/jesse/ollama/llm/llm.go:74 +0x8c fp=0x140000c6c50 sp=0x140000c6bf0 pc=0x104ef83cc github.com/ollama/ollama/llm.tokenize(0x140003f7da0, {0x140001dd800, 0x5a8}) /Users/jesse/ollama/llm/llm.go:74 +0xb4 fp=0x140000c6d90 sp=0x140000c6c50 pc=0x104ef7f94 github.com/ollama/ollama/llm.(llmServer).Tokenize(0x140000c6df8?, {0x105516574?, 0x5a8?}, {0x140001dd800?, 0x140000c6d00?}) /Users/jesse/ollama/llm/server.go:963 +0x2c fp=0x140000c6dc0 sp=0x140000c6d90 pc=0x104ef6b6c github.com/ollama/ollama/llm.LlamaServer.Tokenize-fm({0x105e876f0?, 0x140001e5c70?}, {0x140001dd800?, 0x140000350e0?}) <autogenerated>:1 +0x50 fp=0x140000c6e00 sp=0x140000c6dc0 pc=0x105532fc0 github.com/ollama/ollama/server.chatPrompt({0x105e876f0, 0x140001e5c70}, 0x14000616480, 0x140000c7508, 0x1400013e000, {0x1400014e008, 0x7, 0x7}, {0x0, 0x0, ...}) /Users/jesse/ollama/server/prompt.go:36 +0x2a0 fp=0x140000c7100 sp=0x140000c6e00 pc=0x1055165a0 github.com/ollama/ollama/server.(Server).ChatHandler(0x1400000e9c0, 0x1400011c100) /Users/jesse/ollama/server/routes.go:1340 +0x478 fp=0x140000c7610 sp=0x140000c7100 pc=0x105523318 github.com/ollama/ollama/server.(*Server).ChatHandler-fm(0x9?) <autogenerated>:1 +0x30 fp=0x140000c7630 sp=0x140000c7610 pc=0x105533130	2024-09-03 21:15:14 -04:00
Jesse Gross	0c2f95f3de	runner: Initialize numPredict numPredict is used to enforce a limit on the number of tokens to generate. Is it passed in from Ollama but it is never stored to be checked.	2024-09-03 21:15:14 -04:00
Jesse Gross	ebdf781397	server: Fix double free on runner subprocess error. If the runner subprocess encounters an error, it will close the HTTP connect, which causes Ollama to free the instance of the model that has open. When Ollama exits, it will again try to free the models for all of the runners that were open, resulting in a double free.	2024-09-03 21:15:14 -04:00
Jesse Gross	23c7c1326e	llm: Fix lint	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	8fe30d161c	Fix filename for non darwin arm builds	2024-09-03 21:15:14 -04:00
jmorganca	a483a4c4ed	lint	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	b267ab92b0	Add missing vendor headers to ggml sync	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	189ca38f1d	Wire up native source file dependencies This should make sure incremental builds correctly identify when to rebuild components based on which native files are modified.	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	80db43b7b4	Bump llama sync to 1e6f65	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	47b0e81219	fix dolphin-mistral	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	21947d5c1b	harden integration tests	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	751009a5d7	Runtime selection of new or old runners This adjusts the new runners to comingle with existing runners so we can use an env var to toggle the new runners on.	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	8527028bf4	Implement timings response in Go server This implements the fields necessary for `run --verbose` to generate timing information.	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	e0241118d0	Get embeddings working Truncation doesn't pass, but the other embeddings tests pass	2024-09-03 21:15:14 -04:00
Daniel Hiltgen	f97ee8c506	Fix parallel requests	2024-09-03 21:15:13 -04:00
Daniel Hiltgen	e9dd656ff5	Update sync with latest llama.cpp layout, and run against b3485	2024-09-03 21:15:13 -04:00
Daniel Hiltgen	6c0d892498	Prefix all build artifacts with an OS/ARCH dir This will help keep incremental builds from stomping on each other and make it easier to stitch together the final runner payloads	2024-09-03 21:15:13 -04:00
Daniel Hiltgen	13348e3629	Get linux building Still needs a bit more refinement to (auto)detect cuda/hip and fallback gracefully if not detected.	2024-09-03 21:15:13 -04:00
jmorganca	3d5a08c315	add note in readme	2024-09-03 21:15:13 -04:00
jmorganca	a29851bc9b	clean up metal code	2024-09-03 21:15:13 -04:00
jmorganca	8dda9293fa	fix `Makefile` on windows	2024-09-03 21:15:13 -04:00
jmorganca	b3c62dcafd	remove printing	2024-09-03 21:15:13 -04:00
jmorganca	9b8b7cd9b5	dont apply license to `stb_image.h` and `json.hpp`	2024-09-03 21:15:13 -04:00
jmorganca	1da6c40f4f	lint	2024-09-03 21:15:13 -04:00
jmorganca	76ca2de06e	update sync header	2024-09-03 21:15:13 -04:00
jmorganca	0eabc2e34d	remove unused script	2024-09-03 21:15:13 -04:00
jmorganca	dded27dcfa	fix metal	2024-09-03 21:15:13 -04:00
jmorganca	080b600865	add header to not edit	2024-09-03 21:15:13 -04:00
jmorganca	d6b6de9a5a	add header to not edit	2024-09-03 21:15:13 -04:00
jmorganca	24a741424f	fix build on windows	2024-09-03 21:15:13 -04:00
jmorganca	4d476d894e	fix `Makefile`	2024-09-03 21:15:13 -04:00
jmorganca	bd94ddfc56	fix `README.md`	2024-09-03 21:15:13 -04:00
jmorganca	f1f54c5bd5	fix `README.md`	2024-09-03 21:15:13 -04:00
jmorganca	18662d1180	consistent whitespace	2024-09-03 21:15:13 -04:00
jmorganca	3d1f3569cf	update `.gitattributes`	2024-09-03 21:15:13 -04:00
jmorganca	083a9e9b4e	link metal	2024-09-03 21:15:13 -04:00
jmorganca	d0703eaf44	wip	2024-09-03 21:15:13 -04:00
jmorganca	ce00e387c3	wip meta	2024-09-03 21:15:13 -04:00
jmorganca	763d7b601c	sync	2024-09-03 21:15:13 -04:00
jmorganca	4d0e6c55b0	remove perl docs	2024-09-03 21:15:13 -04:00
jmorganca	3375b82c56	remove build scripts	2024-09-03 21:15:13 -04:00
jmorganca	b8c1065ab6	remove need for perl	2024-09-03 21:15:13 -04:00
jmorganca	a632a04426	fix output	2024-09-03 21:15:13 -04:00
jmorganca	110f37ffb0	arch build	2024-09-03 21:15:13 -04:00
jmorganca	f2f03ff7f2	add temporary makefile	2024-09-03 21:15:13 -04:00
jmorganca	ba0ff1c46a	fix cuda and rocm builds	2024-09-03 21:15:13 -04:00
jmorganca	9966a055e5	fix cgo flags for darwin amd64	2024-09-03 21:15:13 -04:00
jmorganca	7aa7a3c1e5	remove `-fPIC` from `build_hipblas.sh`	2024-09-03 21:15:13 -04:00
jmorganca	de634b7fd7	fix issues with runner	2024-09-03 21:15:13 -04:00

1 2 3 4 5 ...

3529 Commits