Abstract
Mixture of Experts (MoE) architecture improves Large Language Models (LLMs) with better scaling, but its higher parameter counts and memory demands create challenges for deployment. In this paper, we present MoE-SVD, a new decomposition-based compression framework tailored for MoE LLMs without any extra training. By harnessing the power of Singular Value Decomposition (SVD), MoE-SVD addresses the critical issues of decomposition collapse and matrix redundancy in MoE architectures. Specifically, we first decompose experts into compact low-rank matrices, resulting in accelerated inference and memory optimization. In particular, we propose selective decomposition strategy by measuring sensitivity metrics based on weight singular values and activation statistics to automatically identify decomposable expert layers. Then, we share a single V-matrix across all experts and employ a top-k selection for U-matrices. This low-rank matrix sharing and trimming scheme allows for significant parameter reduction while preserving diversity among experts. Comprehensive experiments on Mixtral, Phi-3.5, DeepSeek, and Qwen2 MoE LLMs show MoE-SVD outperforms other compression methods, achieving a 60% compression ratio and 1.5× faster inference with minimal performance loss.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 42nd International Conference on Machine Learning |
| Editors | Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, Jerry Zhu |
| Publisher | PMLR |
| Pages | 35209-35230 |
| Number of pages | 22 |
| Publication status | Published - 19 Jul 2025 |
| Event | 42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada Duration: 13 Jul 2025 → 19 Jul 2025 |
Publication series
| Name | Proceedings of Machine Learning Research |
|---|---|
| Volume | 267 |
| ISSN (Electronic) | 2640-3498 |
Conference
| Conference | 42nd International Conference on Machine Learning, ICML 2025 |
|---|---|
| Country/Territory | Canada |
| City | Vancouver |
| Period | 13/07/25 → 19/07/25 |
Bibliographical note
Publisher Copyright:© 2025, by the authors.
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver