Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet

Breno W. S. R. Carvalho1, Aline Paes2 and Bernardo Gonçalves3, 1IBM Research, Universidade Federal Fluminense (UFF), Brazil, 2Universidade Federal Fluminense (UFF), Brazil, 3IBM Research, Brazil; Breno W. S. R. Carvalho1, Aline Paes2 and Bernardo Gonçalves3, 1IBM Research, Universidade Federal Fluminense (UFF), Brazil, 2Universidade Federal Fluminense (UFF), Brazil, 3IBM Research, Brazil

Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet

Authors

Breno W. S. R. Carvalho¹, Aline Paes² and Bernardo Gonçalves³, ¹IBM Research, Universidade Federal Fluminense (UFF), Brazil, ²Universidade Federal Fluminense (UFF), Brazil, ³IBM Research, Brazil

Abstract

Semantic Role Labelling (SRL) is the process of automatically finding the semantic roles of terms in a sentence. It is an essential task towards creating a machine-meaningful representation of textual information. One public linguistic resource commonly used for this task is the FrameNet Project. FrameNet is a human and machine-readable lexical database containing a considerable number of annotated sentences, those annotations link sentence fragments to semantic frames. However, while the annotations across all the documents covered in the dataset link to most of the frames, a large group of frames lack annotations in the documents pointing to them. In this paper, we present a data augmentation method for FrameNet documents that increases by over 13% the total number of annotations. Our approach relies on lexical, syntactic, and semantic aspects of the sentences to provide additional annotations. We evaluate the proposed augmentation method by comparing the performance of a state-of-the-art semantic-role-labelling system, trained using a dataset with and without augmentation.

Keywords

FrameNet, Frame Semantic Parsing, Semantic Role Labelling, Data Augmentation.

CS&IT Conference Proceedings

Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet