Nvidia absolutely dominates in training neural networks, but the inference stage — generating responses in real time — requires different math. Traditional GPUs are overkill here and too expensive in terms of energy costs (TCO). By introducing "deterministic inference," Nvidia intends to kill two birds with one stone: destroy emerging competitors like Groq on their own turf and radically reduce the cost of API requests for cloud providers. This is the beginning of a new phase of price wars in hardware.
Source: WSJ / Reuters
HardwareNvidiaGroqInferenceChips