Description:agTextMining provides text mining services to datasets.
Abstract:agTextMining provides text mining services to datasets. Specifically, it returns for a PDF the title, author, references and keywords. Currently, version 1.0 works with IEEE LOM records serialized as XML files.
The module is divided in four services that can be invoked separately. The keyword extractor uses KEA algorithm and statistical model to calculate keywords from the text. The title and author are parsed detecting sudden font size changes. Finally, the references are obtained parsing numbers between brackets.