ollama

History

Daniel Hiltgen 17b7186cd7 Enable concurrency by default

This adjusts our default settings to enable multiple models and parallel
requests to a single model.  Users can still override these by the same
env var settings as before.  Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s).  As before, multiple models will only load
concurrently if they fully fit in VRAM.

2024-06-21 15:45:05 -07:00

config_test.go

move OLLAMA_HOST to envconfig (#5009 )

2024-06-12 18:48:16 -04:00

config.go

Enable concurrency by default

2024-06-21 15:45:05 -07:00