Evidence of DeepSeek R1 memorising benchmark answers?

Hi,

All there… is some possible evidence that DeepSeek R1 could have trained on benchmark answers - rather than using true reasoning.

These are screenshots done by a team called Valent.

They have run 1000 pages of analysis on DeepSeek outputs showing similarity of outputs to the official benchmark answers.

I have only dipped into a handful but for some answers there is a 50-90% similarity.

This is just a small sample, so cannot get carried away here… but it really suggests this needs to be checked further.

You can check the analysis here: