Managing the Syntactic Blindness of Latent Semantic Analysis


Raja Muhammad Suleman and Ioannis Korkontzelos, Edge Hill University, United Kingdom


Natural Language Processing is a sub-field of Artificial Intelligence that is used for analysing and representing human language automatically. Natural Language Processing has been employed in many applications, such as information retrieval, information processing, automated answer grading etc. Several approaches have been developed for understanding the meaning of text, commonly known as semantic analysis. Latent Semantic Analysis is a widely used corpus-based approach that evaluates similarity of text on the basis of semantic relations among words. Latent Semantic Analysis has been used successfully in different language systems for calculating the semantic similarity of texts. However, Latent Semantic Analysis ignores the structural composition of sentences and therefore this technique suffers from the syntactic blindness problem. Latent Semantic Analysis fails to distinguish between sentences that contain semantically similar words but have completely opposite meaning. Latent Semantic Analysis is also blind to the syntactic structure of a sentence and therefore cannot differentiate between sentences and lists of keywords. In such a situation, the comparison between a sentence and a list of keywords without any syntactic structure gets a high similarity score. In this research we propose an algorithmic extension to Latent Semantic Analysis which focuses on syntactic composition of a sentence to overcome Latent Semantic Analysis’s syntactic blindness problems. We tested our approach on sentence pairs containing similar words but having different meaning. Our results showed that our extension provides more realistic semantic similarity scores


Natural Language Processing, Natural Language Understanding, Latent Semantic Analysis, Semantic Similarity.

Full Text  Volume 10, Number 4