A bunch of LLMs scheduled to come at end of January were cancelled / delayed
Mistral Small 3 one-shotting Unsloth's Flappy Bird coding test in 1 min (vs 3hrs for DeepSeek R1 using NVME drive)
UMbreLLa: Llama3.3-70B INT4 on RTX 4070Ti Achieving up to 9.6 Tokens/s! 🚀
Claude’s reasoning model will be scary
R1+Sonnet set a new SOTA on the aider polyglot benchmark, at 14X less cost compared to o1
First 5090 LLM results, compared to 4090 and 6000 ada
The first performant open-source byte-level model without tokenization has been released. EvaByte is a 6.5B param model that also has multibyte prediction for faster inference (vs similar sized tokenized models)
I created a single-prompt benchmark (with 5-questions) that anyone can use to easily evaluate LLMs. Mistral-Next somehow vastly outperformed all others. Prompt and more details in the post.
MiniMax-Text-01 - A powerful new MoE language model with 456B total parameters (45.9 billion activated)
DDR6 RAM and a reasonable GPU should be able to run 70b models with good speed
Why I think that NVIDIA Project DIGITS will have 273 GB/s of memory bandwidth
i dont get the hype around nvidia project digits?
Llama 4 compute estimates & timeline
Are you gonna wait for Digits or get the 5090?
Now THIS is interesting
RTX 5000 series official specs
RTX 5090 rumored to have 1.8 TB/s memory bandwidth
RTX 5090 Blackwell - Official Price
Killed by LLM – I collected data on AI benchmarks we thought would last years
A new Microsoft paper lists sizes for most of the closed models
For 2025
DeepSeek does not need 5 hours to generate $1 worth of tokens. Due to batching, they can get that in about 1 minute
I don't get it.
Aider has released a new much harder code editing benchmark since their previous one was saturated. The Polyglot benchmark now tests on 6 different languages (C++, Go, Java, JavaScript, Python and Rust).
Day 10 🙂