While models are loading, the VRAM metrics are dynamic, so try to load on a GPU that doesn't have a model actively loading, or wait to avoid races that lead to OOMs
github.com/jmorganca/ollama
github.com/ollama/ollama