Jincheng Li, Penghai Zhao, Yusheng Hao, Qiang Lin, Weilan Wang, Northwest Minzu University, China
The first stage of Tibetan-Chinese bilingual scene text detection and recognition is the detection of Tibetan- Chinese bilingual scene text. The detection results are mainly divided into three categories: successfully detected regions of Tibetan text and Chinese text, non-words regions with failed predictions. If the detected text image results are accurately classified, then the nontext images should be filtered in the recognition phase, meanwhile the Tibetan and Chinese text images can be identified by using different classifiers, such procedure can reduce the complexity of classification and recognition of two different characters by one recognition model. An accurate classification of Tibetan and Chinese text images is mattered. Therefore, this paper conducts a research on the classification of Tibetan, Chinese and non-text images by using convolutional neural networks. We perform a series of exploration about the classification accuracy of Tibetan, Chinese text images and non-text images with convolutional neural networks in different depths, and compare the accuracy with the classification results based on the transfer learning then analyze it. The results show that for the classification of Tibetan, Chinese and non-text images in the scene, using 7-layer convolutional neural network has reached saturation, and increasing the network depth does not improve the results, which provides reference values for Tibetan-Chinese text image classification.
Convolutional Neural Network, Tibetan-Chinese scene text image, image classification, transfer learning.