Authors
Prasanth Yadla, North Carolina State University, USA
Abstract
Question Answering (QA) systems are advanced platforms designed to automatically respond to human queries expressed in natural language by utilizing pre-structured databases or collections of unstructured text documents. These systems represent a convergence of Natural Language Processing (NLP) and Information Retrieval (IR). Despite significant progress, challenges persist in reducing training time for large-scale datasets and improving model performance across diverse scenarios. This research builds upon the BERT implementation of QA systems, introducing key innovations to address existing limitations. We employ Knowledge Distillation, a regularization technique, to compress the learned representations of deep learning models, making them more efficient. Additionally, we integrate Data Augmentation to enrich the training dataset by generating diverse linguistic variations, thereby enhancing the model's robustness. Furthermore, Linguistic Post-Processing is applied to refine predictions, leveraging domain-specific heuristics to minimize false positives and improve reliability. The proposed system is validated using the Stanford Question Answering Dataset (SQuAD 2.0). By combining data augmentation, knowledge distillation, and linguistic knowledge, we aim to optimize the pipeline, reducing computational overhead while maintaining high accuracy. These advancements have broad applications, including real-time chatbot systems, domain-specific question answering, and efficient information retrieval for large-scale datasets.
Keywords
Natural Language Processing, Question Answering, Knowledge Distillation, Language Heuristics, Deep Learning.