keyboard_arrow_up
A Survey of Evaluating Question Answering Techniques in the Era of Large Language Model LLM

Authors

Jassir Altheyabi, Khaled Almuteb and Bader Alshemaimry, King Saud University, Saudi Arabia

Abstract

Large language models (LLMs) are increasingly popular in academia and industry due to their exceptional performance in various applications. As LLMs play a crucial role in research and everyday use, their evaluation becomes essential, not only at the task level but also at the societal level, to understand potential risks. This article provides a comprehensive review of LLM evaluation methods, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. It covers evaluation tasks in areas such as natural language processing, reasoning, medical usage, ethics, education, natural and social sciences, and agent applications. The article also discusses evaluation methods and benchmarks, addressing where and how to assess LLM performance. Additionally, it summarizes instances of success and failure of LLMs across different tasks and highlights important aspects to consider in the evaluation process.

Keywords

Question answering techniques, Large Language Model, Knowledge base question answering, open domain questions answering.

Full Text  Volume 14, Number 22