Chaoyue Liu

I am an Assistant Professor in ECE department at Purdue University. I obtained my Ph.D. degree in Computer Science from The Ohio State University in 2021, where I was advised by Dr. Misha Belkin. After that, I spent one year at Meta (a.k.a. Facebook) as a research scientist, and two years at Halıcıoğlu Data Science Institute (HDSI), UC San Diego, working with Dr. Misha Belkin. I also hold B.S. and M.S. degrees in Physics from Tsinghua University.

I am looking for PhD students to join my group. Please email me if you are interested in working with me!

Research Interests: My research focuses on the (theoretical) foundation of deep learning and its applications. I am enthusiastic in studying fundamental deep learning problems and opening the “black box” of deep learning, by theoretically understanding the neural network models and the dynamics of neural network training. I am also interested in applying these new findings to solve practical problems.

In the past few years, my research effort has been devoted to finding fundamental properties of neural networks and/or algorithms that are responsible for the practical fast training. By doing so, we were able to establish optimization theories and develop accelerated algorithms for neural networks. Lately, I am also working on fundamental problems of deep learning, including properties and training dynamics of attention-models, feature learning, architecture’s effect on feature representation, and so on.

My research interests also include: experimentally finding new phenomena in deep learning and understanding/explaining them using mathematical tools, and the connections between optimization and generalization performance of neural networks.

News

  • 2024/08: Joining Purdue Uniersity, ECE department, as an Assistant Professor.
  • 2024/05: Our paper on catapult dynamics of SGD was accepted by ICML 2024! (with Libin, Adityanarayanan and Misha) [Preprint]
  • 2024/01: Our paper on quadratic models for understanding neural network catapult dynamics was accepted by ICLR 2024! [Preprint]
  • 2023/09: One paper was accepted by NeurIPS 2023! Arxiv version: arXiv:2306.02601
  • 2023/06: New paper showing that spikes in SGD training loss are catapult dynamics, with Libin Zhu, Adityanarayanan Radhakrishnan, Misha Belkin. See arXiv:2306.04815
  • 2023/06: New paper on the large learning rate and fast convergence of SGD for wide neural networks, with Dmitriy Drusvyatskiy, Misha Belkin, Damek Davis and Yi-An Ma. See arXiv:2306.02601
  • 2023/06: New paper studying the mechanism underlying clean-priority learning in noisy-label scenario, with Amirhesam Abedsoltan and Misha Belkin. See arXiv:2306.02533
  • 2023/05: New paper showing the effect of ReLU non-linear activation on the NTK condition number, with Like Hui. See arXiv:2305.08813
  • 2022/09: I am now a postdoc at the Halıcıoğlu Data Science Institute at UC San Diego.

Publications

On the Predictability of Fine-grained Cellular Network Throughput using Machine Learning Models
Omar Basit*, Phuc Dinh*, Imran Khan*, Z. Jonny Kong*, Y. Charlie Hu, Dimitrios Koutsonikolas, Myungjin Lee, Chaoyue Liu (to appear in IEEE MASS 2024)

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning [pdf]
Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin
International Conference on Machine Learning (ICML), 2024.

Quadratic models for understanding neural network dynamics [pdf]
Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin
International Conference on Learning Representations (ICLR), 2024.

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems [pdf]
Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma
Neural Information Processing Systems (NeurIPS), 2023.

SGD batch saturation for training wide neural networks [pdf]
Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma
NeurIPS Workshop on Optimization for Machine Learning, 2023.

On Emergence of Clean-Priority Learning in Early Stopped Neural Networks [pdf]
Chaoyue Liu*, Amirhesam Abedsoltan* and Mikhail Belkin
arXiv:2306.02533 (In submission)

ReLU soothes the NTK condition number and accelerates optimization for wide neural networks [pdf]
Chaoyue Liu, Like Hui
arXiv:2305.08813 (In submission)

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks [pdf]
Chaoyue Liu, Libin Zhu, Mikhail Belkin
Applied and Computational Harmonic Analysis (ACHA) 2022.

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models [pdf]
Chaoyue Liu, Libin Zhu, Mikhail Belkin
International Conference on Learning Representations (ICLR), 2022. (spotlight paper, 5.2% of all submissions)

Transition to linearity of general neural networks with directed acyclic graph architecture [pdf]
Libin Zhu, Chaoyue Liu, Mikhail Belkin
Neural Information Processing Systems (NeurIPS), 2022.

Understanding and Accelerating the Optimization of Modern Machine Learning [pdf]
Chaoyue Liu
Ph.D. dissertation, The Ohio State University. 2021.

Two-Sided Wasserstein Procrustes Analysis. [pdf]
Kun Jin, Chaoyue Liu, Cathy Xia
IJCAI, 2021

On the linearity of large non-linear models: when and why the tangent kernel is constant [pdf]
Chaoyue Liu, Libin Zhu, Mikhail Belkin
Neural Information Processing Systems (NeurIPS), 2020. (spotlight paper, 3.0% of all submissions)

Accelerating sgd with momentum for over-parameterized learning [pdf]
Chaoyue Liu, Mikhail Belkin
International Conference on Learning Representations (ICLR), 2020. (spotlight paper, 4.2% of all submissions)

Otda: a unsupervised optimal transport framework with discriminant analysis for keystroke inference [pdf]
Kun Jin, Chaoyue Liu, Cathy Xia
IEEE Conference on Communications and Network Security (CNS), 2020

Parametrized accelerated methods free of condition number [pdf]
Chaoyue Liu, Mikhail Belkin
arXiv:1802.10235

Clustering with Bregman divergences: an asymptotic analysis [pdf]
Chaoyue Liu, Mikhail Belkin
Neural Information Processing Systems (NeurIPS), 2016.

*: equal contribution

Talks

  • Why does SGD converge so fast on over-parameterized neural networks, CSE AI Seminar, CSE @ UCSD, Apr 2024.
  • Why does SGD converge so fast on over-parameterized neural networks, Information Theory and Application (ITA) workshop, San Diego, Feb 2024.
  • Transition to Linearity & Optimization Theories of Wide Neural Networks, Control and Pizza (Co-PI) seminar, ECE@ UCSD, Nov 2023.
  • Transition to Linearity of Wide Neural Networks, Math Machine Learning Seminar, Max Planck Institute & UCLA, Apr 2022.
  • Large Non-linear Models: Transition to Linearity & An Optimization Theory, NSF-Simons Journal Club, Jan 2021.
  • Accelerating SGD with Momentum for over-parameterized learning, MoDL workshop, Dec 2020.
  • Clustering with Bregman divergences: an asymptotic analysis, CSE AI seminar, Ohio State University, 2017.

Teaching

  • Purdue University, ECE 57000: Artificial Intelligence, 24 Fall. (Instructor, with Xiaoqian Wang)
  • OSU CSE 5523: Machine Learning, 17’Sp, 18’Sp, 19’Sp. (Teaching assistant)
  • OSU CSE 3421: Intro. to Computer Architecture, 18’Au. (Teaching assistant)
  • OSU CSE 2111: Modeling and Problem Solving with Spreadsheets and Databases, 16’Sp. (Teaching assistant)
  • OSU CSE 2321: Discrete Structure, 15’Au. (Teaching assistant)

Services

Organizer

Reviewer

  • 2024: ICLR, ICML, NeurIPS, TMLR, JMLR
  • 2023: ICLR, NeurIPS, ICML, TMLR, IEEE TNNLS, IMA, NeuroComputing
  • 2022: ICLR, NeurIPS, ICML, TMLR, AAAI, Swiss NSF grant
  • 2021: ICLR, NeurIPS, ICML, JASA, AAAI
  • 2020: NeurIPS, ICML, AAAI, UAI
  • 2019: NeurIPS, ICML, UAI
  • 2018: NeurIPS