From 1aecb9c579195b0c9f6f729f951a79f4b00eed6b Mon Sep 17 00:00:00 2001 From: Marco D'Almo <97637845+moDal7@users.noreply.github.com> Date: Mon, 25 Dec 2023 18:34:17 +0100 Subject: [PATCH] Update README.md updated readme with small sub-paragraph about quantization. --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index e70efe05..6f7a9730 100644 --- a/README.md +++ b/README.md @@ -63,6 +63,17 @@ Here are some example open-source models that can be downloaded: > Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. +### Quantization Level + +Model Quantization is a technique used to reduce the size of large neural networks with an acceptable reduction of its capabilities and accuracy. + +By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags. + ``` + ollama run llama2:70b-text-q2_K + ollama run llama2:70b-text-q8_0 + ``` +The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires. + ## Customize a model ### Import from GGUF