Augmenting Linguistic Semi-Structured Data for Machine Learning - A Case Study using Framenet


Breno W. S. R. Carvalho1, Aline Paes2 and Bernardo Gonçalves3, 1IBM Research, Universidade Federal Fluminense (UFF), Brazil, 2Universidade Federal Fluminense (UFF), Brazil, 3IBM Research, Brazil


Semantic Role Labelling (SRL) is the process of automatically finding the semantic roles of terms in a sentence. It is an essential task towards creating a machine-meaningful representation of textual information. One public linguistic resource commonly used for this task is the FrameNet Project. FrameNet is a human and machine-readable lexical database containing a considerable number of annotated sentences, those annotations link sentence fragments to semantic frames. However, while the annotations across all the documents covered in the dataset link to most of the frames, a large group of frames lack annotations in the documents pointing to them. In this paper, we present a data augmentation method for FrameNet documents that increases by over 13% the total number of annotations. Our approach relies on lexical, syntactic, and semantic aspects of the sentences to provide additional annotations. We evaluate the proposed augmentation method by comparing the performance of a state-of-the-art semantic-role-labelling system, trained using a dataset with and without augmentation.


FrameNet, Frame Semantic Parsing, Semantic Role Labelling, Data Augmentation.

Full Text  Volume 10, Number 12