European
Data Catalogues
Dataset

CKAN

Sub menu


MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis

Dataset Profile

Odm ID
043ad4fd-c827-427d-bf89-277ab7a53cea
Title
MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis
Notes
Sentence-layer annotation represents the most coarse-grained annotation in this
corpus. We adhere to definitions of objectivity and subjectivity introduced in
(Wiebe et al., 2005). Additionally, we followed guidelines drawn from (Balahur &
Steinberger, 2009). Their clarifications proved to be quite effective, raising
inter-annotator agreement in a sentence-layer polarity annotation task from
about 50% to >80%. All sentences were annotated in two dimensions.

The first dimension covers the factual nature of the sentence, i.e. whether it
provides objective information or if it is intended to express an opinion,
belief or subjective argument. Therefore, it is either objective or subjective.
The second dimension covers the semantic orientation of the sentence, i.e. its
polarity. Thus, it is either positive, negative or neutral.

In the second layer, we model the contextually interpreted sentiments on the
levels of words and NP/PP phrases. That is, the annotation decisions
are based on the meaning of the words in the context of the sentence.

Word sentiment markers:
The sentiments on the level of individual words are expressed by
single character markers added at the end of the words.

A word might be positive (+), negative(-), neutral(empty), a shifter
(~), an intensifier (^), or a diminisher (%).

If a word ends with a hyphen (e.g., "auf beziehungs-_ bzw.
partnerschaftliche Probleme-", an underscore is added to the word in
order to prevent missinterpretations of the hyphen as a negative
marker.

Currently, only words that are part of an NP/PP are marked with
sentiment markers. Annotated words are nouns, adjectives, negation
particles, prepositions, adverbs.

The world level annotation was done by 3 persons individually. The
individual results were harmonized into a single reference annotation.


Phrase level markers:

Each phrase is marked up textually by brackets, e.g. "[auf
beziehungs-_ bzw. partnerschaftliche Probleme-]". The type of a phrase
(NP/PP) is not written to the brackets. We follow largely the
annotation model of TIGER for structuring embedded NPs and PPs.

Currently, the following limitations with regard to TIGER exist: (1)
Adjectival phrases are not marked up (2) Relative or infinitival
sentences are not included in NPs/PPs if they appear at the end of a
phrase or if the are discontiguous. We do not only annotate the
phrases which immediately contain words that are marked up as polar.
Any dependent subphrase (NP/PP) is integrated into all its dominating
NPs/PPs, e.g. "[Die tieferen Ursachen [der Faszination+]]". Dependent
subphrases without any polar words are also included, however, there
is no internal bracketing for them, e.g. "[hohe+ Ansprüche an Qualität
und Lage]"

At the level of phrases, we distinguish the following markers:
positive (+), negative (-), neutral(0), bipolar (#). The category
'bipolar' is used mainly for coordinations where negative and positive
sentiments of something are kept in balance by the writer. This is
quite common for a lot of binomial constructions as "Krieg und
Frieden".
Author
Interest Group on German Sentiment Analysis (IGGSA)
Author Email
Catalogue Url
Dataset Url
Metadata Updated
2015-09-15 07:10:04
Tags
Date Released
Date Updated
Update Frequency
Organisation
AKSW
Country
State
Platform
ckan
Language
en
Version
(not set)