Abstract
Advancements in conversational AI have revolutionized Natural Language Understanding (NLU), enabling systems to interpret user inputs and generate contextually relevant responses. The expansion of conversational systems in multilingual environments has been advancing rapidly. However, extending these capabilities to low-resource languages remains a critical challenge due to limited annotated datasets and the complexity of cross-domain information in user utterances. In this work, this gap is addressed by developing a benchmark dataset, MDMIC, through data augmentation strategies based on the MASSIVE dataset. MDMIC captures complex, multi-intent, and multi-domain user utterances across six Indic low-resource languages spanning Dravidian and Indo-Aryan families and consistently outperforms the MASSIVE dataset across all evaluation metrics for Intent Detection (ID), Domain Classification (DC) and Slot Filling (SF). The inter-dependencies among ID, DC, and SF for complex, multi-structured user utterances spanning cross-domains in low-resource languages are investigated. Our joint models demonstrate a +1.50 pp accuracy and +3.02 pp F1 for ID, + 1.15 pp accuracy, +3.28 pp F1 for DC and +0.34 pp accuracy, +0.87 pp F1 for SF as compared to individual task-specific models. Leveraging this relationship, a joint multitask, multilingual model using a language-specific XLM-R model and attention-based fusion is proposed. Empirical results demonstrate the superiority of our unified architecture for handling linguistic diversity and cross-domain complexity in low-resource NLU.
| Original language | English |
|---|---|
| Pages (from-to) | 28631-28653 |
| Number of pages | 23 |
| Journal | IEEE Access |
| Volume | 14 |
| Early online date | 13 Feb 2026 |
| DOIs | |
| Publication status | Published - 25 Feb 2026 |
Keywords
- Attention-based fusion
- data augmentation
- domain classification
- intent detection
- multitask learning
- slot filling
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering
Fingerprint
Dive into the research topics of 'MDMIC: An Augmented Indic Corpus and Joint Multitask Attention-Based Fusion Framework for Cross-Domain, Multi-Intent NLU in LoRes Languages'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver