// // // // //
Avatar

Ruisi Cai

Ph.D Student UT Austin

About Me

I’m a third year Ph.D. student in the VITA Group of Electrical and Computer Engineering Department, the Univeristiy of Texas at Austin, under the supervision of Prof. Zhangyang (Atlas) Wang. Prior to that, I obatined my B.E. degree from University of Science and Technology of China (USTC).

I’m currently working on machine learning, with research focus on:

  • Efficient training and inference for large foundation models:
    • Adaptive Framework: Elastic Model for Adaptive Deployment, Mixture of Experts (MoE)
    • Long Context Generation: Long Context Training & Serving, State Space Model (SSM)
  • AI security and privacy:
    • Trustworthy ML, Robustness for Mixture of Experts (MoE), Backdoor Attack
    • Distributed Training, Task Heterogeneity, Data Scaling

NEWS

  • Dec. 2024. Excited to announce that I have been selected as a recipient of the NVIDIA Fellowship. Thank you, NVIDIA! 💚
  • Sep, 2024. “READ-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design” is accepted by NeurIPS2024.
  • Sep, 2024. “Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild” is accepted by NeurIPS2024 D&B track.
  • May, 2024. My intern project at NVIDIA “Flextron: Many-in-One Flexible Large Language Model” is accepted for Oral Presentation at ICML2024! Check out our work at [Project Page]!
  • May, 2024. “LoCoCo: Dropping In Convolutions for Long Context Compression” is accepted by ICML2024!
  • Feb, 2024. My teammate, Yeonju Ro, and I have been chosen as finalists for the 2024 Qualcomm Innovation Fellowship.
  • Sep, 2023. I’ve just begun my incredible internship journey in NVIDIA.
  • Sep, 2023. “$\mathrm{H_2O}$: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models” is accepted by NeurIPS2023!
  • Jul, 2023. “Robust Mixture-of-Expert Training for Convolutional Neural Networks” is accepted by ICCV2023!
  • Apr, 2023. “Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?” is accepted by ICML2023!
  • Sep, 2022. “Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets” is accepted by NeurIPS2022!

Publication List

(A superscript * denotes equal contribution)

READ-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai*, Yeonju Ro*, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang
NeurIPS2024: Conference on Neural Information Processing Systems, [Paper] [Code]

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Xinyu Zhao*, Guoheng Sun*, Ruisi Cai*, Yukun Zhou*, Pingzhi Li*, Peihao Wang*, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang, Ang Li, Zhangyang Wang, Tianlong Chen
NeurIPS2024 D&B: Datasets and Benchmarks Track, Conference on Neural Information Processing Systems, [Paper] [Code]

Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov
ICML2024: International Conference on Machine Learning (Oral), [Paper] [Project]

LoCoCo: Dropping In Convolutions for Long Context Compression
Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen
ICML2024: International Conference on Machine Learning, [Paper] [Code]

Robust Mixture-of-Expert Training for Convolutional Neural Networks
Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Huan Zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, Sijia Liu
ICCV2023: International Conference on Computer Vision (Oral) [Paper] [Code]

$\mathrm{H_2O}$: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen
NeurIPS2023: Conference on Neural Information Processing Systems, [Paper] [Code]

Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?
Ruisi Cai, Zhenyu Zhang, Zhangyang Wang
ICML2023: International Conference on Machine Learning [Paper] [Code]

Many-Task Federated Learning: A New Problem Setting and a Simple Baseline
Ruisi Cai, Xiaohan Chen, Shiwei Liu, Jayanth Srinivasa, Myungjin Lee, Ramana Kompella, Zhangyang Wang
CVPRW: 2nd Workshop on Federated Learning for Computer Vision [Paper]

Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets
Ruisi Cai*, Zhenyu Zhang*, Tianlong Chen, Xiaohan Chen, Zhangyang Wang
NeurIPS2022: Conference on Neural Information Processing Systems [Paper] [Code]

Try everything.

悟已往之不谏,知来者之可追。