PREPRINTS

      
High-dimensional Limit of SGD
for Diagonal Linear Networks
arXiv:2605.17177, 2026   
pdf  
Abstract
Begoña García Malaxechebarría, Courtney Paquette,
Maryam Fazel, Dmitriy Drusvyatskiy
Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning.
Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing
the optimization and generalization properties of neural models. In this work, we show that in the
high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by
continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the
drift from the gradient noise. We further derive a deterministic partial differential equation whose solution
propagates the relevant state of the iterates and characterizes the time evolution of a broad class of
observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that,
under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially
fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their
long-time behavior. Numerical simulations corroborate our theoretical findings.
CONFERENCE PROCEEDINGS

      
The High Line: Exact Risk and Learning Rate Curves
of Stochastic Adaptive Learning Rate Algorithms
Proceedings of NeurIPS, 2024   
pdf   
code  
Abstract
Elizabeth Collins-Woodfin, Inbar Seroussi, Begoña García Malaxechebarría,
Andrew W. Mackenzie, Elliot Paquette, Courtney Paquette
We develop a framework for analyzing the training and learning rate dynamics on a large class of
high-dimensional optimization problems, which we call the high line, trained using one-pass stochastic
gradient descent (SGD) with adaptive learning rates. We give exact expressions for the risk and learning
rate curves in terms of a deterministic solution to a system of ODEs. We then investigate in detail
two adaptive learning rates – an idealized exact line search and AdaGrad-Norm – on the least squares
problem. When the data covariance matrix has strictly positive eigenvalues, this idealized exact line
search strategy can exhibit arbitrarily slower convergence when compared to the optimal fixed learning
rate with SGD. Moreover we exactly characterize the limiting learning rate (as time goes to infinity)
for line search in the setting where the data covariance has only two distinct eigenvalues. For noiseless
targets, we further demonstrate that the AdaGrad-Norm learning rate converges to a deterministic
constant inversely proportional to the average eigenvalue of the data covariance matrix, and identify a
phase transition when the covariance density of eigenvalues follows a power law distribution. We provide
our code for evaluation at
https://github.com/amackenzie1/highline2024.
JOURNAL PUBLICATIONS
FULLY-DIVERSE LATTICES FROM RAMIFIED CYCLIC EXTENSIONS OF PRIME DEGREE
Int. J. Appl. Math. 33(6): 1009-1015, 2020   
pdf  
Abstract
J. Carmelo Interlando, Antonio A. Andrade, Begoña García Malaxechebarría, Agnaldo J. Ferrari, Robson R. de Araújo
Let \(p\) be an odd prime. Algebraic lattices of full diversity in
dimension \(p\) are obtained from ramified cyclic extensions of degree \(p\). The \(3\), \(5\),
and \(7\)-dimensional lattices are optimal with respect to sphere packing density
and therefore are isometric to laminated lattices in those dimensions.
THESES
A separable Banach lattice that contains isometric copies of all others
UNDERGRADUATE THESIS, UNIVERSIDAD DE MURCIA, 2021   
pdf
Begoña García Malaxechebarría - supervised by Antonio Avilés López and José David Rodríguez Abellán, awarded an honor distinction