Update README.md
updated readme with small sub-paragraph about quantization.
This commit is contained in:
parent
c5f21f73a4
commit
1aecb9c579
11
README.md
11
README.md
@ -63,6 +63,17 @@ Here are some example open-source models that can be downloaded:
|
||||
|
||||
> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
|
||||
|
||||
### Quantization Level
|
||||
|
||||
Model Quantization is a technique used to reduce the size of large neural networks with an acceptable reduction of its capabilities and accuracy.
|
||||
|
||||
By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags.
|
||||
```
|
||||
ollama run llama2:70b-text-q2_K
|
||||
ollama run llama2:70b-text-q8_0
|
||||
```
|
||||
The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.
|
||||
|
||||
## Customize a model
|
||||
|
||||
### Import from GGUF
|
||||
|
Loading…
x
Reference in New Issue
Block a user