Deepseek-R1 GGUFs + All distilled 2 to 16bit GGUFs + 2bit MoE GGUFs
Hey guys we uploaded GGUFs including 2, 3, 4, 5, 6, 8 and 16bit quants for Deepseek-R1's distilled models.
There's also for now a Q2_K_L 200GB quant for the large R1 MoE and R1 Zero models as well (uploading more)
We also uploaded Unsloth 4-bit dynamic quant versions of the models for higher accuracy.
See all versions of the R1 models including GGUF's on Hugging Face: huggingface.co/collections/unsloth/deepseek-r1. For example the Llama 3 R1 distilled version GGUFs are here: https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF
GGUF's:
DeepSeek R1 version | GGUF links |
---|---|
R1 (MoE 671B params) | R1 • R1 Zero |
Llama 3 | Llama 8B • Llama 3 (70B) |
Qwen 2.5 | 14B • 32B |
Qwen 2.5 Math | 1.5B • 7B |
4-bit dynamic quants:
DeepSeek R1 version | 4-bit links |
---|---|
Llama 3 | Llama 8B |
Qwen 2.5 | 14B |
Qwen 2.5 Math | 1.5B • 7B |
See more detailed instructions on how to run the big R1 model via llama.cpp in our blog: unsloth.ai/blog/deepseek-r1 once we finish uploading it here.
For some general steps:
Do not forget about `<|User|>` and `<|Assistant|>` tokens! - Or use a chat template formatter
Obtain the latest `llama.cpp` at https://github.com/ggerganov/llama.cpp
Example:
./llama.cpp/llama-cli \
--model unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf \
--cache-type-k q8_0 \
--threads 16 \
--prompt '<|User|>What is 1+1?<|Assistant|>' \
-no-cnv
Example output:
<think>
Okay, so I need to figure out what 1 plus 1 is. Hmm, where do I even start? I remember from school that adding numbers is pretty basic, but I want to make sure I understand it properly.
Let me think, 1 plus 1. So, I have one item and I add another one. Maybe like a apple plus another apple. If I have one apple and someone gives me another, I now have two apples. So, 1 plus 1 should be 2. That makes sense.
Wait, but sometimes math can be tricky. Could it be something else? Like, in a different number system maybe? But I think the question is straightforward, using regular numbers, not like binary or hexadecimal or anything.
...
PS. hope you guys have an amazing week! :) Also I'm still uploading stuff - some quants might not be there yet!