Even with temperature set to zero, LLMs may not always produce identical answers because non-determinism arises from the inference process itsel

September 14, 2025

Even with temperature set to zero, LLMs may not always produce identical answers because non-determinism arises from the inference process itself, not just randomness. Traditionally, GPU parallelism and floating-point rounding errors were blamed. However, research from Thinking Machines Lab identifies the deeper cause as a “lack of batch invariance.” When servers group multiple requests, they change execution strategies depending on batch size or composition. These shifts subtly alter numerical results, and small differences accumulate into divergent outputs. The effect is especially clear in matrix multiplications, attention, and RMSNorm. The solution is to enforce batch-independent kernels, but this improves reproducibility at the cost of reduced performance.

https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

Search This Blog

IT lists

Even with temperature set to zero, LLMs may not always produce identical answers because non-determinism arises from the inference process itsel

Comments

Post a Comment

Popular posts from this blog

Japan Jazz Anthology Select: Jazz of the SP Era

In practice, the most workable approach is to measure a composite “civility score” built from multiple indicators.