Punjabi Text Classification using Naive Bayes, Centroid and Hybrid Approach

Nidhi and Vishal Gupta, Panjab University, India; Nidhi and Vishal Gupta, Panjab University, India

Punjabi Text Classification using Naive Bayes, Centroid and Hybrid Approach

Authors

Nidhi and Vishal Gupta, Panjab University, India

Abstract

Punjabi Text Classification is the process of assigning predefined classes to the unlabelled text documents. Because of dramatic increase in the amount of content available in digital form, text classification becomes an urgent need to manage the digital data efficiently and accurately. Till now no Punjabi Text Classifier is available for Punjabi Text Documents. Therefore, in this paper, existing classification algorithm such as Naïve Bayes, Centroid Based techniques are used for Punjabi Text Classification. And one new approach is proposed for the Punjabi Text Documents which is the combination Naïve Bayes (to extract the relevant features so as to reduce the dimensionality) and Ontology Based Classification (that act as text classifier that used extracted features). These algorithms are performed over 184 Punjabi News Articles on Sports that classify the documents into 7 classes such as ਿਕਕਟ (krikaṭ), ਹਾਕੀ (hākī), ਕਬ ਡੀ(kabḍḍī), ਫੁਟਬਾਲ (phuṭbāl), ਟੈਿਨਸ (ṭainis), ਬੈਡਿਮੰਟਨ (baiḍmiṇṭan), ਓਲੰਿਪਕ (ōlmpik).

Keywords

Punjabi Text Classification, Hybrid Approach, Naïve Bayes, Centroid Based Classification, Ontology Based Classification (Domain Specific).

CS&IT Conference Proceedings

Punjabi Text Classification using Naive Bayes, Centroid and Hybrid Approach