“Boujee” is just a state of mind. Do the best with what you got.
Mistral Small 3 24b is the first model under 70b I’ve seen pass the “apple” test (even using Q4).
My PC 10 seconds after I typed “ollama run deepseek-r1:671b”:
SOLVED = Running Ollama as a Windows service (for use in server environments)
Running Deepseek R1 IQ2XXS (200GB) from SSD actually works
1.58bit DeepSeek R1 - 131GB Dynamic GGUF
NVIDIA CUDA Toolkit 12.8 is out now! (they skipped 12.7 🤷♂️)
Will Deepseek soon be banned in the US?
Confucius-o1-14B
I benchmarked (almost) every model that can fit in 24GB VRAM (Qwens, R1 distils, Mistrals, even Llama 70b gguf)
DeepSeek-R1 distilled models below 70b still struggle with the “apple test”. Here are my test results:
Open WebUI adds reasoning-focused features in two new releases OUT TODAY!!! 0.5.5 adds "Thinking" tag support to streamline reasoning model chats (works with R1) . 0.5.6 brings new "reasoning_effort" parameter to control cognitive effort.
I ran the "apple test" benchmark on all DeepSeek-R1 distilled versions so you don't have to. Here are the results:
Better R1 Experience in open webui
Actual video of DeepSeek-R1 being trained.
Deepseek just uploaded 6 distilled verions of R1 + R1 "full" now available on their website.
New Thinking Model: Art (Auto Regressive Thinker)
The “apple” test - Why aren’t newer reasoning models doing better on this basic benchmark? (and yes, I know token prediction mechanics play a role)
AutoGen v0.4.2 released
Pre prep day meal suggestions?
What is the best embedding model for OpenWebUI?
New open source SAEs for model steering, including the first ever SAE for Llama 3.3 70b
Which Local LLMs know best when to speak and when to STFU in group chat agent-to-agent conversations?
Lime Cucumber Gatorade for the win. 🍋🟩 🥒 👍
Prep day tomorrow! LET’S DO THIS!!!