Update README.md

updated readme with small sub-paragraph about quantization.
This commit is contained in:
Marco D'Almo 2023-12-25 18:34:17 +01:00 committed by GitHub
parent c5f21f73a4
commit 1aecb9c579
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -63,6 +63,17 @@ Here are some example open-source models that can be downloaded:
> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
### Quantization Level
Model Quantization is a technique used to reduce the size of large neural networks with an acceptable reduction of its capabilities and accuracy.
By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags.
```
ollama run llama2:70b-text-q2_K
ollama run llama2:70b-text-q8_0
```
The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.
## Customize a model
### Import from GGUF