Deepseek-r1 is trash and the "tests" are fake

https://preview.redd.it/ck9qcpnpktfe1.png?width=1228&format=png&auto=webp&s=d083f67215c9b584c277331cb59121ea7a1e0e35

They claim a 1.5b param model beats ChatGPT-4o and Claude Sonnet. Try it, heck, try their 70b model, run it through many tests, and it gets beaten by phi4, a 14b param model.

Initially excited by the news, I tried my own tests, and was surprised how poorly it these DS-R1 models performed (I tried the 70b and 671b models mostly). I can only conclude the tests are rigged and I'm calling it now, independent benchmarks will show these benchmarks are faked or rigged (for example by pre-training the models on the tests).