You can now run DeepSeek-R1 on your own local device!
Hey amazing people! You might know me for fixing bugs in Microsoft & Google’s open-source models - well I'm back again.
I run an open-source project Unsloth with my brother & worked at NVIDIA, so optimizations are my thing. Recently, there’s been misconceptions that you can't run DeepSeek-R1 locally, but as of yesterday, we made it possible for even potato devices to handle the actual R1 model!
- We shrank R1 (671B parameters) from 720GB to 131GB (80% smaller) while keeping it fully functional and great to use.
- Over the weekend, we studied R1's architecture, then selectively quantized layers to 1.58-bit, 2-bit etc. which vastly outperforms basic versions with minimal compute.
- Minimum requirements: a CPU with 20GB of RAM - and 140GB of diskspace (to download the model weights)
- E.g. if you have a RTX 4090 (24GB VRAM), running R1 will give you at least 2-3 tokens/second.
- Optimal requirements: sum of your RAM+VRAM = 80GB+ (this will be pretty fast)
- No, you don’t need 100's of RAM+VRAM, but with 2xH100, you can hit 140 tokens/sec for throughput and 14tokens/sec for single user inference, which is even faster than DeepSeek's own API.
And yes, we collabed with the DeepSeek team on some bug fixes - details are on our blog:unsloth.ai/blog/deepseekr1-dynamic
Hundreds of people have tried running the dynamic GGUFs on their potato devices & say it works very well (including mine).
R1 GGUF's uploaded to Hugging Face: huggingface.co/unsloth/DeepSeek-R1-GGUF
To run your own R1 locally we have instructions + details: unsloth.ai/blog/deepseekr1-dynamic