Daniel Hiltgen
359b15a597
Handle models with divergent layer sizes
...
The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.
2024-06-18 11:05:34 -07:00
..
2024-06-14 14:51:40 -07:00
2024-06-17 13:44:46 -07:00
2024-06-17 15:56:16 -04:00
2024-06-17 15:56:16 -04:00
2024-05-23 13:21:49 -07:00
2024-05-21 11:28:22 -07:00
2024-06-14 14:51:40 -07:00
2024-06-11 15:56:17 -07:00
2024-04-01 16:48:18 -07:00
2024-04-01 16:48:18 -07:00
2024-04-01 16:48:18 -07:00
2024-04-23 16:14:47 -07:00
2024-05-31 18:54:21 -07:00
2024-06-14 14:55:50 -07:00
2024-06-18 11:05:34 -07:00
2024-06-14 14:55:50 -07:00
2024-06-18 09:15:35 -07:00
2024-04-01 16:48:18 -07:00