TY - GEN
T1 - Multi-class Hierarchical Question Classification for Multiple Choice Science Exams
AU - Xu, Dongfang
AU - Jansen, Peter
AU - Martin, Jaycie
AU - Xie, Zhengnan
AU - Yadav, Vikas
AU - Tayyar Madabushi, Harish
AU - Tafjord, Oyvind
AU - Clark, Peter
PY - 2020/5/13
Y1 - 2020/5/13
N2 - Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed classification labels from a fine-grained hierarchical taxonomy of 406 problem domains. We then show that a BERT-based model trained on this dataset achieves a large (+0.12 MAP) gain compared with previous methods, while also achieving state-of-the-art performance on benchmark open-domain and biomedical QC datasets. Finally, we show that using this model’s predictions of question topic significantly improves the accuracy of a question answering system by +1.7% P@1, with substantial future gains possible as QC performance improves.
AB - Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed classification labels from a fine-grained hierarchical taxonomy of 406 problem domains. We then show that a BERT-based model trained on this dataset achieves a large (+0.12 MAP) gain compared with previous methods, while also achieving state-of-the-art performance on benchmark open-domain and biomedical QC datasets. Finally, we show that using this model’s predictions of question topic significantly improves the accuracy of a question answering system by +1.7% P@1, with substantial future gains possible as QC performance improves.
KW - question answering
KW - question classification
UR - http://www.lrec-conf.org/proceedings/lrec2020/LREC-2020.pdf
M3 - Conference contribution
T3 - Language Resources and Evaluation (LREC) proceedings
SP - 5370
EP - 5382
BT - Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
A2 - Calzolari, Nicoletta
PB - European Language Resources Association (ELRA)
T2 - 12th Conference on Language Resources and Evaluation (LREC 2020)
Y2 - 13 May 2020 through 16 May 2020
ER -