Dataset | OpenDataMonitor

European
Data Catalogues
Dataset

CKAN

Sub menu

Data catalogues

MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis

Dataset Profile

Odm ID	043ad4fd-c827-427d-bf89-277ab7a53cea
Title	MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis
Notes	Sentence-layer annotation represents the most coarse-grained annotation in this corpus. We adhere to definitions of objectivity and subjectivity introduced in (Wiebe et al., 2005). Additionally, we followed guidelines drawn from (Balahur & Steinberger, 2009). Their clarifications proved to be quite effective, raising inter-annotator agreement in a sentence-layer polarity annotation task from about 50% to >80%. All sentences were annotated in two dimensions. The first dimension covers the factual nature of the sentence, i.e. whether it provides objective information or if it is intended to express an opinion, belief or subjective argument. Therefore, it is either objective or subjective. The second dimension covers the semantic orientation of the sentence, i.e. its polarity. Thus, it is either positive, negative or neutral. In the second layer, we model the contextually interpreted sentiments on the levels of words and NP/PP phrases. That is, the annotation decisions are based on the meaning of the words in the context of the sentence. Word sentiment markers: The sentiments on the level of individual words are expressed by single character markers added at the end of the words. A word might be positive (+), negative(-), neutral(empty), a shifter (~), an intensifier (^), or a diminisher (%). If a word ends with a hyphen (e.g., "auf beziehungs-_ bzw. partnerschaftliche Probleme-", an underscore is added to the word in order to prevent missinterpretations of the hyphen as a negative marker. Currently, only words that are part of an NP/PP are marked with sentiment markers. Annotated words are nouns, adjectives, negation particles, prepositions, adverbs. The world level annotation was done by 3 persons individually. The individual results were harmonized into a single reference annotation. Phrase level markers: Each phrase is marked up textually by brackets, e.g. "[auf beziehungs-_ bzw. partnerschaftliche Probleme-]". The type of a phrase (NP/PP) is not written to the brackets. We follow largely the annotation model of TIGER for structuring embedded NPs and PPs. Currently, the following limitations with regard to TIGER exist: (1) Adjectival phrases are not marked up (2) Relative or infinitival sentences are not included in NPs/PPs if they appear at the end of a phrase or if the are discontiguous. We do not only annotate the phrases which immediately contain words that are marked up as polar. Any dependent subphrase (NP/PP) is integrated into all its dominating NPs/PPs, e.g. "[Die tieferen Ursachen [der Faszination+]]". Dependent subphrases without any polar words are also included, however, there is no internal bracketing for them, e.g. "[hohe+ Ansprüche an Qualität und Lage]" At the level of phrases, we distinguish the following markers: positive (+), negative (-), neutral(0), bipolar (#). The category 'bipolar' is used mainly for coordinations where negative and positive sentiments of something are kept in balance by the writer. This is quite common for a lot of binomial constructions as "Krieg und Frieden".
Author	Interest Group on German Sentiment Analysis (IGGSA)
Author Email	iggsa@sentimental.li
Catalogue Url	http://datahub.io/
Dataset Url	http://thedatahub.org/dataset/mlsa
Metadata Updated	2015-09-15 07:10:04
Tags
Date Released
Date Updated
Update Frequency
Organisation	AKSW
Country
State
Platform	ckan
Language	en
Version	(not set)

OPENDATAMONITOR

This platform provides you with an overview of available open data resources. It allows you to undertake analysis and visualisation of existing data catalogues using innovative technologies.

CO-FUNDED BY THE EC

This project has received funding from the European Union 's Seventh Framework Programme for research, technological development and demonstration under grant agreement no 611988.

FOLLOW US

The OpenDataMonitor platform is in continuous improvement. Follow us to find out about our latest developments.

LICENSE

licensed under a Creative Commons Attribution 4.0 International License.

SELECT YOUR LANGUAGE