Evolution Strategies at Scale (Cognizant 2025): First Full-Parameter ES on Billion-Parameter LLMs
A September 2025 paper from {{Cognizant AI Lab}} demonstrated full-parameter {{evolution strategies}} fine-tuning of billion-parameter {{LLMs}} using a population of just 30 perturbations, breaking the prior assumption that ES could not scale past roughly a million parameters.
"Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning" (arXiv 2509.24372, September 29 2025) from Cognizant AI Lab is the first paper to apply evolution strategies to full-parameter fine-tuning of LLMs at billion-parameter scale without dimensionality reduction. Cognizant AI Lab is associated with Risto Miikkulainen, a long-time evolutionary computation researcher at UT Austin. The headline result is a population size of just 30, compared to 10,000+ for prior ES on smaller networks — roughly a 300-fold compute reduction in the population dimension. The paper claims four advantages over RL methods like Reward Hacking Classic Examples: 1. Improved tolerance to long-horizon and delayed rewards. 2. Robustness across diverse base LLMs. 3. Reduced susceptibility to reward hacking. 4. Improved training stability versus RL methods. The theoretical claim for why a population of 30 works is that the useful directions for improving an LLM are concentrated in a low-dimensional subspace of parameter space. With small Gaussian noise perturbations applied to billions of parameters, a handful of the 30 samples will tilt slightly uphill on the loss landscape, and averaging cancels the noise component while the signal accumulates. LLMs also turn out to behave more smoothly under small perturbations than expected — small Gaussian neighborhoods sample sensibly nearby behaviors rather than jumping to random outputs. The paper's significance is structural: since the 2017 OpenAI ES paper, ES had been considered fundamentally non-scalable to modern LLMs because covariance-matrix ES (CMA-ES) requires an N-by-N matrix that becomes infeasible at billion-parameter scale. By replacing learned covariance with simple Gaussian noise (the OpenAI 2017 insight) and demonstrating that a population of 30 suffices at billion-parameter scale, the paper extends the viable range of ES by three orders of magnitude. Code is available at github.com/VsonicV/es-fine-tuning-paper.