MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition

  • Wei Li
  • , Lujun Li
  • , Hao Gu
  • , You Liang Huang
  • , Mark Lee*
  • , Shengjie Sun
  • , Wei Xue
  • , Yike Guo*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Mixture of Experts (MoE) architecture improves Large Language Models (LLMs) with better scaling, but its higher parameter counts and memory demands create challenges for deployment. In this paper, we present MoE-SVD, a new decomposition-based compression framework tailored for MoE LLMs without any extra training. By harnessing the power of Singular Value Decomposition (SVD), MoE-SVD addresses the critical issues of decomposition collapse and matrix redundancy in MoE architectures. Specifically, we first decompose experts into compact low-rank matrices, resulting in accelerated inference and memory optimization. In particular, we propose selective decomposition strategy by measuring sensitivity metrics based on weight singular values and activation statistics to automatically identify decomposable expert layers. Then, we share a single V-matrix across all experts and employ a top-k selection for U-matrices. This low-rank matrix sharing and trimming scheme allows for significant parameter reduction while preserving diversity among experts. Comprehensive experiments on Mixtral, Phi-3.5, DeepSeek, and Qwen2 MoE LLMs show MoE-SVD outperforms other compression methods, achieving a 60% compression ratio and 1.5× faster inference with minimal performance loss.

Original languageEnglish
Title of host publicationProceedings of the 42nd International Conference on Machine Learning
EditorsAarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, Jerry Zhu
PublisherPMLR
Pages35209-35230
Number of pages22
Publication statusPublished - 19 Jul 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Publication series

NameProceedings of Machine Learning Research
Volume267
ISSN (Electronic)2640-3498

Conference

Conference42nd International Conference on Machine Learning, ICML 2025
Country/TerritoryCanada
CityVancouver
Period13/07/2519/07/25

Bibliographical note

Publisher Copyright:
© 2025, by the authors.

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition'. Together they form a unique fingerprint.

Cite this