Issues in Named Entity Recognition on Early Modern English Letters
Author | : Vanessa Woldenga-Racine |
Publisher | : |
Total Pages | : 47 |
Release | : 2019 |
ISBN-10 | : OCLC:1129599604 |
ISBN-13 | : |
Rating | : 4/5 ( Downloads) |
Download or read book Issues in Named Entity Recognition on Early Modern English Letters written by Vanessa Woldenga-Racine and published by . This book was released on 2019 with total page 47 pages. Available in PDF, EPUB and Kindle. Book excerpt: The influx of digitized historical documents into online collections has made the study of these documents much more accessible to researchers and the general public. This data, however, is frequently raw data sometimes obtained through automated methods such as optical character recognition. Without rich metadata, the content of these documents is difficult to search and organize. Tasks commonly undertaken in the field of computational linguistics can aid in this endeavour. These documents often present challenges for modern systems, however, as the text contained in historical documents frequently differs in many ways from the present-day newswire these systems are most often trained on. In this thesis I explore the task of Named Entity Recognition on texts written in Early Modern English. I investigate three methodologies for bootstrapping training data to train a character-based neural net model. The results show substantial improvements upon all baselines, with the best f-measure at 60.31%