Skip to main navigation Skip to search Skip to main content

MDMIC: An Augmented Indic Corpus and Joint Multitask Attention-Based Fusion Framework for Cross-Domain, Multi-Intent NLU in LoRes Languages

  • Kathakali Mitra
  • , Amit Vishnu Kolasani
  • , P. Sai Shruthi
  • , Kumarasamy Chelliah
  • , Aruna Malapati
  • , Mark Lee*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)

Abstract

Advancements in conversational AI have revolutionized Natural Language Understanding (NLU), enabling systems to interpret user inputs and generate contextually relevant responses. The expansion of conversational systems in multilingual environments has been advancing rapidly. However, extending these capabilities to low-resource languages remains a critical challenge due to limited annotated datasets and the complexity of cross-domain information in user utterances. In this work, this gap is addressed by developing a benchmark dataset, MDMIC, through data augmentation strategies based on the MASSIVE dataset. MDMIC captures complex, multi-intent, and multi-domain user utterances across six Indic low-resource languages spanning Dravidian and Indo-Aryan families and consistently outperforms the MASSIVE dataset across all evaluation metrics for Intent Detection (ID), Domain Classification (DC) and Slot Filling (SF). The inter-dependencies among ID, DC, and SF for complex, multi-structured user utterances spanning cross-domains in low-resource languages are investigated. Our joint models demonstrate a +1.50 pp accuracy and +3.02 pp F1 for ID, + 1.15 pp accuracy, +3.28 pp F1 for DC and +0.34 pp accuracy, +0.87 pp F1 for SF as compared to individual task-specific models. Leveraging this relationship, a joint multitask, multilingual model using a language-specific XLM-R model and attention-based fusion is proposed. Empirical results demonstrate the superiority of our unified architecture for handling linguistic diversity and cross-domain complexity in low-resource NLU.

Original languageEnglish
Pages (from-to)28631-28653
Number of pages23
JournalIEEE Access
Volume14
Early online date13 Feb 2026
DOIs
Publication statusPublished - 25 Feb 2026

Keywords

  • Attention-based fusion
  • data augmentation
  • domain classification
  • intent detection
  • multitask learning
  • slot filling

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'MDMIC: An Augmented Indic Corpus and Joint Multitask Attention-Based Fusion Framework for Cross-Domain, Multi-Intent NLU in LoRes Languages'. Together they form a unique fingerprint.

Cite this