DeepSeek does not need 5 hours to generate $1 worth of tokens. Due to batching, they can get that in about 1 minute

I saw this heavily upvoted post and felt it was misleading. All LLM providers use batching during inference which allows a single instance of an LLM like Deepseek V3 to serve hundreds of customers at once. If we consider a system such as an 8xH200 hosting Deepseek V3, it looks like they can use a batch size of about 256 while still achieving 60tokens/sec/user. This means they are actually generating 15,000 tokens/sec or roughly $1/min or $60/hr. Divide that by the 8 GPUs and that is about $7.50/gpu/hr which is very reasonable.

There's a good (but older) post on batching here. Also, note that yes, Sonnet uses batching as well but since we have no idea of the size of the model (it likely has a lot more active params) they have to limit the batch size a lot to still get a reasonable tokens/sec/user which is why it is more expensive. I also think they take higher profit. If any of my calculations seem off please let me know.