Evolution Strategies at the Hyperscale

Best AI papers explained - A podcast by Enoch H. Kang

Podcast artwork

Categories:

This paper introduces Evolution Guided General Optimization via Low-rank Learning (EGGROLL), a novel algorithm that enhances the scalability of **Evolution Strategies (ES)** for optimizing neural networks with billions of parameters. ES is an optimization method that bypasses the need for gradient backpropagation, offering advantages like handling non-differentiable objectives and superior parallelization potential. EGGROLL overcomes the memory and computational bottlenecks of traditional ES by substituting expensive full-rank parameter perturbations with **efficient low-rank matrix perturbations**, significantly increasing training throughput, as demonstrated by a **hundredfold speedup** for large models. The architecture is tested across various domains, including reinforcement learning, large language model (LLM) fine-tuning for reasoning tasks, and the stable pretraining of a custom **purely integer-based recurrent language model (EGG)**, showcasing the method's efficiency and flexibility. A **theoretical analysis** also confirms that the low-rank update rapidly converges to the full-rank ES update.