Hidden Big Brother Behind GPT-5 Training, Made OpenAI on a Muon Blog Post

Researcher Keller Jordan has managed to join OpenAI based on a single blog post about the Muon optimizer, which may be being used for GPT-5 training; Muon is an optimizer for the hidden layers of neural networks that uses Newton-Schultz iteration to achieve update matrix orthogonalization, and is faster to train than AdamW; Keller has criticized the research literature on optimizers for being full of methods that have failed to be adopted methods and advocates validating the effectiveness of new methods in competitive training tasks.

Search