Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models

  • Hyundong Jin
  • , Hyung Jin Chang
  • , Eunwoo Kim*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Continual learning enables pre-trained generative vision-language models (VLMs) to incorporate knowledge from new tasks without retraining data from previous ones. Recent methods update a visual projector to translate visual information for new tasks, connecting pre-trained vision encoders with large language models. However, such adjustments may cause the models to prioritize visual inputs over language instructions, particularly learning tasks with repetitive types of textual instructions. To address the neglect of language instructions, we propose a novel framework that grounds the translation of visual information on instructions for language models. We introduce a mixture of visual projectors, each serving as a specialized visual-to-language translation expert based on the given instruction context to adapt to new tasks. To avoid using experts for irrelevant instruction contexts, we propose an expert recommendation strategy that reuses experts for tasks similar to those previously learned. Additionally, we introduce expert pruning to alleviate interference from the use of experts that cumulatively activated in previous tasks. Extensive experiments on diverse vision-language tasks demonstrate that our method outperforms existing continual learning approaches by generating instruction-following responses.
Original languageEnglish
Title of host publication2025 IEEE/CVF International Conference on Computer Vision (ICCV)
PublisherIEEE
Publication statusAccepted/In press - 26 Jun 2025
EventInternational Conference on Computer Vision 2025 - Hawaii Convention Center, Honolulu, United States
Duration: 19 Oct 202523 Oct 2025
https://iccv.thecvf.com/virtual/2025/index.html

Publication series

NameInternational Conference on Computer Vision (ICCV)
PublisherIEEE
ISSN (Print)1550-5499
ISSN (Electronic)2380-7504

Conference

ConferenceInternational Conference on Computer Vision 2025
Abbreviated titleICCV 2025
Country/TerritoryUnited States
CityHonolulu
Period19/10/2523/10/25
Internet address

Bibliographical note

Not yet published as of 07/01/2026.

Fingerprint

Dive into the research topics of 'Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models'. Together they form a unique fingerprint.

Cite this