Ming-Jen Huang, Chun-Fang Huang, Chiching Wei, Foxit Software Inc., USA
With the recent advance of the deep neural network, we observe new applications of natural language processing (NLP) and computer vision (CV) technologies. Especaully, when applying them to document processing, NLP and CV tasks are usually treated individually in research work and open source libraries. However, designing a real-world document processing system needs to weave NLP and CV tasks and their generated information together. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. This paper introduces a framework to fulfil this need. The framework includes a representation model definition for holding the generated information and specifications defining the coordination between the NLP and CV tasks.
Document Processing, Framework, Formal definition, Machine Learning.