European
Data Catalogues
Dataset

CKAN

Sub menu


Reuters-128 NIF NER Corpus

Dataset Profile

Odm ID
ce089a95-a81d-4599-9d98-818ee0003237
Τίτλος
Reuters-128 NIF NER Corpus
Σημειώσεις
This English corpus is based on the well known Reuters-21578 corpus which contains economic news articles. In particular, we chose 128 articles containing at least one NE. Compared to the News-100 corpus the documents of Reuters-128 are significantly shorter and thus carry a smaller context.

To create the annotation of NEs with URIs, we implemented a supporting judgement tool.
.
The input for the tool was a subset of more than 150 Reuters-21578 news articles sampled randomly. First, FOX (Ngonga Ngomo et al., 2011) was used for recognizing a first set of NEs. This reduced the amount of work to a feasible portion regarding the size of this dataset. Afterwards, the domain experts corrected the mistakes of FOX manually using the annotation tool. Therefore, the tool highlighted the entities in the texts and added initial URI candidates via simple string matching algorithms. Two scientists determined the correct URI for each named entity manually with an initial voter agreement of 74%. This low
initial agreement rate hints towards the difficulty of the disambiguation task.
In some cases judges did not agree initially, but came to an agreement shortly after reviewing the cases. While annotating, we left out ticker symbols of companies (e.g., GOOG for Google Inc.), abbreviations and job descriptions be-
cause those are always preceded by the full company name respectively a person’s name.
Συγγραφέας
Ricardo Usbeck
Ηλ. Διεύθ. Συγγραφέα
Σύνδεσμος Καταλόγου
Σύνδεσμος συνόλου δεδομένων
Ενημέρωση μεταδεδομένων
2015-09-15 13:07:00
Ετικέτες
Ημερομηνία Δημοσίευσης
Ημερομηνία Ενημέρωσης
Συχνότητα Ενημέρωσης
Οργανισμός
AKSW
Χώρα
Κατάσταση
Πλατφόρμα
ckan
Γλώσσα
en
Έκδοση
(μη ορισμένο)