Merge 1aecb9c579195b0c9f6f729f951a79f4b00eed6b into d7eb05b9361febead29a74e71ddffc2ebeff5302
This commit is contained in:
commit
7cb6553be6
11
README.md
11
README.md
@ -73,6 +73,17 @@ Here are some example models that can be downloaded:
|
|||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
|
> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
|
||||||
|
|
||||||
|
### Quantization Level
|
||||||
|
|
||||||
|
Model Quantization is a technique used to reduce the size of large neural networks with an acceptable reduction of its capabilities and accuracy.
|
||||||
|
|
||||||
|
By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags.
|
||||||
|
```
|
||||||
|
ollama run llama2:70b-text-q2_K
|
||||||
|
ollama run llama2:70b-text-q8_0
|
||||||
|
```
|
||||||
|
The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.
|
||||||
|
|
||||||
## Customize a model
|
## Customize a model
|
||||||
|
|
||||||
### Import from GGUF
|
### Import from GGUF
|
||||||
|
Loading…
x
Reference in New Issue
Block a user