60 Commits

Author SHA1 Message Date
Josh Yan
309307c8f9 update test, remove comments 2024-07-17 10:46:50 -07:00
Josh
d069cf753b
Merge branch 'main' into jyan/reord-g 2024-07-16 16:42:49 -07:00
Josh Yan
64405525b4 clean up 2024-07-16 16:40:38 -07:00
Josh Yan
6ee22d5080 clean 2024-07-16 16:35:15 -07:00
Josh Yan
873f334783 IT WORKS 2024-07-16 14:12:07 -07:00
Josh Yan
fa49bfc0bd FIXED TESTS 2024-07-16 12:14:10 -07:00
Michael Yang
4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
Josh Yan
25be20949c test 2024-07-15 15:08:24 -07:00
Josh Yan
40c0f9612e unneccesary 2024-07-14 18:41:16 -07:00
Josh Yan
15a0215203 running 2024-07-12 16:49:57 -07:00
Josh Yan
faa3c937cf writeto 2024-07-12 15:37:27 -07:00
Josh Yan
cf57246aba write 2024-07-12 12:59:51 -07:00
Josh Yan
6fafe4f753 gguf 2024-07-12 12:58:00 -07:00
Josh Yan
d7c8d4f3f4 ggufwritekv 2024-07-12 12:25:13 -07:00
Josh Yan
3d0fd31f0e TensorWriter 2024-07-12 12:18:46 -07:00
Josh Yan
e75fb73839 types 2024-07-12 09:42:10 -07:00
Josh Yan
2fdebffc8d sawp 2024-07-11 18:18:26 -07:00
Josh Yan
29ecfe493b write 2024-07-11 17:56:51 -07:00
Blake Mizerany
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Michael Yang
7bdcd1da94 Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
This reverts commit f5f245cc154580fa7b4052c001d2a7e3d771cfb8, reversing
changes made to 94d37fdcae30ddeb6c9f65c8707004f5ec9eaf33.

this change broke gguf v2 which is incorrectly detected as big endian
2024-06-11 15:56:17 -07:00
Michael Yang
620d5c569e fix parsing big endian gguf 2024-06-08 12:35:26 -07:00
Michael Yang
030e765e76 fix create model when template detection errors 2024-06-07 10:51:35 -07:00
Michael Yang
e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang
171eb040fc simplify safetensors reading 2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3 cleanup 2024-05-20 16:13:57 -07:00
Michael Yang
547132e820 bpe pretokenizer 2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed llama3 conversion 2024-05-20 16:13:57 -07:00
Patrick Devine
14476d48cc
fixes for gguf (#3863) 2024-04-23 20:57:20 -07:00
Michael Yang
e74163af4c fix padding to only return padding 2024-04-16 15:43:26 -07:00
Michael Yang
6d53b67c2c
Merge pull request #3663 from ollama/mxyng/fix-padding 2024-04-15 17:44:54 -07:00
Michael Yang
969238b19e fix padding in decode
TODO: update padding() to _only_ returning the padding
2024-04-15 17:27:06 -07:00
Patrick Devine
9f8691c6c8
Add llama2 / torch models for ollama create (#3607) 2024-04-15 11:26:42 -07:00
Michael Yang
8b2c10061c refactor tensor query 2024-04-10 11:37:20 -07:00
Michael Yang
d338d70492 refactor model parsing 2024-04-01 13:16:15 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion (#3250)
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Michael Yang
22f326464e
Merge pull request #3083 from ollama/mxyng/refactor-readseeker
refactor readseeker
2024-03-16 12:08:56 -07:00
Blake Mizerany
6ce37e4d96
llm,readline: use errors.Is instead of simple == check (#3161)
This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-03-15 07:14:12 -07:00
Michael Yang
0085297928 refactor readseeker 2024-03-12 12:54:18 -07:00
Michael Yang
76bdebbadf decode ggla 2024-03-08 15:46:25 -08:00
Patrick Devine
2c017ca441
Convert Safetensors to an Ollama model (#2824) 2024-03-06 21:01:51 -08:00
Michael Yang
949d7b1c48
add gguf file types (#2532) 2024-02-20 19:06:29 -05:00
Michael Yang
cd22855ef8 refactor tensor read 2024-01-24 10:48:31 -08:00
Michael Yang
eaed6f8c45 add max context length check 2024-01-12 14:54:07 -08:00
Jeffrey Morgan
08f1e18965
Offload layers to GPU based on new model size estimates (#1850)
* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-01-08 16:42:00 -05:00
Michael Yang
56ffc3023a remove per-model types
mostly replaced by decoding tensors except ggml models which only
support llama
2023-12-11 09:40:21 -08:00
Michael Yang
5a5dca13b2 comments 2023-12-04 16:59:23 -08:00
Michael Yang
72e7a49aa9 seek instead of copyn 2023-12-04 16:59:23 -08:00
Michael Yang
2cb0fa7d40 split from into one or more models 2023-12-04 16:59:23 -08:00
Michael Yang
199941cd15 fix: gguf int type 2023-11-22 11:40:30 -08:00