Evolutionary multi-objective model compression for deep neural networks

Z. Wang; T. Luo; Miqing Li; J.T. Zhou; R.S.M Goh; Liangli Zhen

doi:10.1109/MCI.2021.3084393

Evolutionary multi-objective model compression for deep neural networks

Z. Wang, T. Luo, Miqing Li, J.T. Zhou, R.S.M Goh, Liangli Zhen

Computer Science

Research output: Contribution to journal › Article › peer-review

93 Downloads (Pure)

Abstract

While deep neural networks (DNNs) deliver state-of-the-art accuracy on various applications from face recognition to language translation, it comes at the cost of high computational and space complexity, hindering their deployment on edge devices. To enable efficient processing of DNNs in inference, a novel approach, called Evolutionary Multi-Objective Model Compression (EMOMC), is proposed to optimize energy efficiency (or model size) and accuracy simultaneously. Specifically, the network pruning and quantization space are explored and exploited by using architecture population evolution. Furthermore, by taking advantage of the orthogonality between pruning and quantization, a two-stage pruning and quantization co-optimization strategy is developed, which considerably reduces time cost of the architecture search. Lastly, different dataflow designs and parameter coding schemes are considered in the optimization process since they have a significant impact on energy consumption and the model size. Owing to the cooperation of the evolution between different architectures in the population, a set of compact DNNs that offer trade-offs on different objectives (e.g., accuracy, energy efficiency and model size) can be obtained in a single run. Unlike most existing approaches designed to reduce the size of weight parameters with no significant loss of accuracy, the proposed method aims to achieve a trade-off between desirable objectives, for meeting different requirements of various edge devices. Experimental results demonstrate that the proposed approach can obtain a diverse population of compact DNNs that are suitable for a broad range of different memory usage and energy consumption requirements. Under negligible accuracy loss, EMOMC improves the energy efficiency and model compression rate of VGG-16 on CIFAR-10 by a factor of more than 8 9. X and 2.4 X, respectively.

Original language	English
Article number	9492169
Pages (from-to)	10-21
Number of pages	12
Journal	IEEE Computational Intelligence Magazine
Volume	16
Issue number	3
Early online date	21 Jul 2021
DOIs	https://doi.org/10.1109/MCI.2021.3084393
Publication status	Published - Aug 2021

Bibliographical note

Funding Information:
This work is partly supported by the Agency for Science,Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (No.A18A1b0045 and No.A1687b0033).

Publisher Copyright:
© 2005-2012 IEEE.

Keywords

Deep learning
Energy consumption
Quantization (signal)
Computational modeling
Sociology
Memory management
Energy efficiency
Deep neural networks

ASJC Scopus subject areas

Artificial Intelligence
Theoretical Computer Science

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/MCI.2021.3084393

WangZ2021Evolutionary
Z. Wang, T. Luo, M. Li, J. T. Zhou, R. S. M. Goh and L. Zhen, "Evolutionary Multi-Objective Model Compression for Deep Neural Networks," in IEEE Computational Intelligence Magazine, vol. 16, no. 3, pp. 10-21, Aug. 2021, doi: 10.1109/MCI.2021.3084393. © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 439 KBLicence: Other (please specify with Rights Statement)

Cite this

@article{e7d5cd8965654cdb85eb162a2c1508d2,

title = "Evolutionary multi-objective model compression for deep neural networks",

abstract = "While deep neural networks (DNNs) deliver state-of-the-art accuracy on various applications from face recognition to language translation, it comes at the cost of high computational and space complexity, hindering their deployment on edge devices. To enable efficient processing of DNNs in inference, a novel approach, called Evolutionary Multi-Objective Model Compression (EMOMC), is proposed to optimize energy efficiency (or model size) and accuracy simultaneously. Specifically, the network pruning and quantization space are explored and exploited by using architecture population evolution. Furthermore, by taking advantage of the orthogonality between pruning and quantization, a two-stage pruning and quantization co-optimization strategy is developed, which considerably reduces time cost of the architecture search. Lastly, different dataflow designs and parameter coding schemes are considered in the optimization process since they have a significant impact on energy consumption and the model size. Owing to the cooperation of the evolution between different architectures in the population, a set of compact DNNs that offer trade-offs on different objectives (e.g., accuracy, energy efficiency and model size) can be obtained in a single run. Unlike most existing approaches designed to reduce the size of weight parameters with no significant loss of accuracy, the proposed method aims to achieve a trade-off between desirable objectives, for meeting different requirements of various edge devices. Experimental results demonstrate that the proposed approach can obtain a diverse population of compact DNNs that are suitable for a broad range of different memory usage and energy consumption requirements. Under negligible accuracy loss, EMOMC improves the energy efficiency and model compression rate of VGG-16 on CIFAR-10 by a factor of more than 8 9. X and 2.4 X, respectively.",

keywords = "Deep learning, Energy consumption, Quantization (signal), Computational modeling, Sociology, Memory management, Energy efficiency, Deep neural networks",

author = "Z. Wang and T. Luo and Miqing Li and J.T. Zhou and R.S.M Goh and Liangli Zhen",

note = "Funding Information: This work is partly supported by the Agency for Science,Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (No.A18A1b0045 and No.A1687b0033). Publisher Copyright: {\textcopyright} 2005-2012 IEEE.",

year = "2021",

month = aug,

doi = "10.1109/MCI.2021.3084393",

language = "English",

volume = "16",

pages = "10--21",

journal = "IEEE Computational Intelligence Magazine",

issn = "1556-603X",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "3",

}

TY - JOUR

T1 - Evolutionary multi-objective model compression for deep neural networks

AU - Wang, Z.

AU - Luo, T.

AU - Li, Miqing

AU - Zhou, J.T.

AU - Goh, R.S.M

AU - Zhen, Liangli

N1 - Funding Information: This work is partly supported by the Agency for Science,Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (No.A18A1b0045 and No.A1687b0033). Publisher Copyright: © 2005-2012 IEEE.

PY - 2021/8

Y1 - 2021/8

N2 - While deep neural networks (DNNs) deliver state-of-the-art accuracy on various applications from face recognition to language translation, it comes at the cost of high computational and space complexity, hindering their deployment on edge devices. To enable efficient processing of DNNs in inference, a novel approach, called Evolutionary Multi-Objective Model Compression (EMOMC), is proposed to optimize energy efficiency (or model size) and accuracy simultaneously. Specifically, the network pruning and quantization space are explored and exploited by using architecture population evolution. Furthermore, by taking advantage of the orthogonality between pruning and quantization, a two-stage pruning and quantization co-optimization strategy is developed, which considerably reduces time cost of the architecture search. Lastly, different dataflow designs and parameter coding schemes are considered in the optimization process since they have a significant impact on energy consumption and the model size. Owing to the cooperation of the evolution between different architectures in the population, a set of compact DNNs that offer trade-offs on different objectives (e.g., accuracy, energy efficiency and model size) can be obtained in a single run. Unlike most existing approaches designed to reduce the size of weight parameters with no significant loss of accuracy, the proposed method aims to achieve a trade-off between desirable objectives, for meeting different requirements of various edge devices. Experimental results demonstrate that the proposed approach can obtain a diverse population of compact DNNs that are suitable for a broad range of different memory usage and energy consumption requirements. Under negligible accuracy loss, EMOMC improves the energy efficiency and model compression rate of VGG-16 on CIFAR-10 by a factor of more than 8 9. X and 2.4 X, respectively.

AB - While deep neural networks (DNNs) deliver state-of-the-art accuracy on various applications from face recognition to language translation, it comes at the cost of high computational and space complexity, hindering their deployment on edge devices. To enable efficient processing of DNNs in inference, a novel approach, called Evolutionary Multi-Objective Model Compression (EMOMC), is proposed to optimize energy efficiency (or model size) and accuracy simultaneously. Specifically, the network pruning and quantization space are explored and exploited by using architecture population evolution. Furthermore, by taking advantage of the orthogonality between pruning and quantization, a two-stage pruning and quantization co-optimization strategy is developed, which considerably reduces time cost of the architecture search. Lastly, different dataflow designs and parameter coding schemes are considered in the optimization process since they have a significant impact on energy consumption and the model size. Owing to the cooperation of the evolution between different architectures in the population, a set of compact DNNs that offer trade-offs on different objectives (e.g., accuracy, energy efficiency and model size) can be obtained in a single run. Unlike most existing approaches designed to reduce the size of weight parameters with no significant loss of accuracy, the proposed method aims to achieve a trade-off between desirable objectives, for meeting different requirements of various edge devices. Experimental results demonstrate that the proposed approach can obtain a diverse population of compact DNNs that are suitable for a broad range of different memory usage and energy consumption requirements. Under negligible accuracy loss, EMOMC improves the energy efficiency and model compression rate of VGG-16 on CIFAR-10 by a factor of more than 8 9. X and 2.4 X, respectively.

KW - Deep learning

KW - Energy consumption

KW - Quantization (signal)

KW - Computational modeling

KW - Sociology

KW - Memory management

KW - Energy efficiency

KW - Deep neural networks

UR - http://www.scopus.com/inward/record.url?scp=85111135876&partnerID=8YFLogxK

U2 - 10.1109/MCI.2021.3084393

DO - 10.1109/MCI.2021.3084393

M3 - Article

SN - 1556-603X

VL - 16

SP - 10

EP - 21

JO - IEEE Computational Intelligence Magazine

JF - IEEE Computational Intelligence Magazine

IS - 3

M1 - 9492169

ER -

Evolutionary multi-objective model compression for deep neural networks

Abstract

Bibliographical note

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Fingerprint

Cite this