sig_1
A selection of 1 documents, created by jhermes
PIII Codebook
ri1109053198
ri-1715658718
ri-158299318
43c3fafe-0d8f-4249-9e1c-a41d5dbb45c8
43c3fafe-0d8f-4249-9e1c-a41d5dbb45c8
de.uni_koeln.spinfo.tesla.component.reader.tika.DefaultTikaReader
de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.data.Url
de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.access.UrlAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
-1715658718
URL Detector
General information about this role: Detects URLs.
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Paragraph
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TParagraphAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
-158299318
Paragraph Detector
General information about this role: Detects paragraph boundaries.
de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataImpl
de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataAccessAdapterImpl
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
1109053198
Dublin Core Metadata Generator
General information about this role: Generates Dublin Core metadata annotations.
If enabled (default), the reader will detect URLs and generate corresponding annotations.
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
false
false
Stephan Schwiebert
sschwieb@spinfo.uni-koeln.de
Department of Computational Linguistics, University of Cologne
http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html
A general purpose reader which uses Apache Tika, such that it supports various formats, like RTF, PDF, ODF, HTML and MS Office. Note, however, that the structure of a document will not be extracted or annotated.
http://tika.apache.org/
java.lang.String
de.uni_koeln.spinfo.tesla.component.spre.SPre2Component
de.uni_koeln.spinfo.tesla.roles.core.impl.hibernate.data.Token
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TTokenizerAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
-664934613
Tokenizer
General information about this role: Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Sentence
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TSentenceTokenAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
-242024622
Sentence Detector
General information about this role: Detects sentence boundaries.
Configurations for the SPre Character parser
<?xml version="1.0" encoding="UTF-8"?>
<spre:characterParser
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xmlns:spre="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreCharacterParser"
xs:schemaLocation="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreCharacterParser SPreCharacterParser.xsd">
<spre:layer>CharacterLayer</spre:layer>
<spre:tokens>
<!-- ++++++++++++ -->
<!-- Letter & digit items will be generated -->
<!-- ++++++++++++ -->
<spre:token name="generateUnicodeTokens">config</spre:token>
<!-- +++++++++++++++++ -->
<!-- Punctuation items -->
<!-- +++++++++++++++++ -->
<spre:token name="Dot">.</spre:token>
<spre:token name="QuestionMark">?</spre:token>
<spre:token name="ExclamationMark">!</spre:token>
<spre:token name="Comma">,</spre:token>
<spre:token name="Colon">:</spre:token>
<spre:token name="SemiColon">;</spre:token>
<spre:token name="Hyphen">-</spre:token>
<spre:token name="ParenthesisOpen">(</spre:token>
<spre:token name="ParenthesisClose">)</spre:token>
<spre:token name="BracketOpen">[</spre:token>
<spre:token name="BracketClose">]</spre:token>
<spre:token name="BraceOpen">{</spre:token>
<spre:token name="BraceClose">}</spre:token>
<spre:token name="LowerThan"><</spre:token>
<spre:token name="GreaterThan">></spre:token>
<spre:token name="Apostroph">'</spre:token>
<spre:token name="Quotation">"</spre:token>
<spre:token name="Slash">/</spre:token>
<spre:token name="Backslash">\</spre:token>
<!-- +++++++++++++++ -->
<!-- Line feed items -->
<!-- +++++++++++++++ -->
<spre:token name="LineFeed">
</spre:token>
<spre:token name="CRLineFeed">
</spre:token>
<!-- +++++++++++++++++ -->
<!-- White space items -->
<!-- +++++++++++++++++ -->
<!-- Character-Tab -->
<spre:token name="Tab">	</spre:token>
<!-- Blank -->
<spre:token name="Space"> </spre:token>
<!-- +++++++++++++ -->
<!-- Further items -->
<!-- +++++++++++++ -->
<spre:token name="Paragraph">§</spre:token>
<spre:token name="Percent">%</spre:token>
<spre:token name="Ampersand">&</spre:token>
<spre:token name="Equals">=</spre:token>
<spre:token name="Asterisk">*</spre:token>
<spre:token name="Plus">+</spre:token>
<spre:token name="Sharp">#</spre:token>
<spre:token name="Underscore">_</spre:token>
<spre:token name="ParagraphSeparator">
</spre:token>
</spre:tokens>
<spre:tokenClasses>
<!-- Letter and Digit classes will be auto generated! -->
<spre:tokenClass name="Stop">
<spre:item>SemiStop</spre:item>
<spre:item>FullStop</spre:item>
</spre:tokenClass>
<spre:tokenClass name="SemiStop">
<spre:item>Comma</spre:item>
<spre:item>Colon</spre:item>
<spre:item>SemiColon</spre:item>
</spre:tokenClass>
<spre:tokenClass name="FullStop">
<spre:item>Dot</spre:item>
<spre:item>QuestionMark</spre:item>
<spre:item>ExclamationMark</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Punctuation">
<spre:item>Stop</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Parenthetical">
<spre:item>ParentheticalOpen</spre:item>
<spre:item>ParentheticalClose</spre:item>
</spre:tokenClass>
<spre:tokenClass name="ParentheticalOpen">
<spre:item>ParenthesisOpen</spre:item>
<spre:item>BracketOpen</spre:item>
<spre:item>BraceOpen</spre:item>
</spre:tokenClass>
<spre:tokenClass name="ParentheticalClose">
<spre:item>ParenthesisClose</spre:item>
<spre:item>BracketClose</spre:item>
<spre:item>BraceClose</spre:item>
</spre:tokenClass>
<spre:tokenClass name="WhiteSpace">
<spre:item>Tab</spre:item>
<spre:item>Space</spre:item>
</spre:tokenClass>
<!-- This class comprises everything that clearly divides words above including WhiteSpace -->
<spre:tokenClass name="Divider">
<spre:item>NewLine</spre:item>
<spre:item>WhiteSpace</spre:item>
<spre:item>Plus</spre:item>
<spre:item>Paragraph</spre:item>
</spre:tokenClass>
<spre:tokenClass name="NewLine">
<spre:item>LineFeed</spre:item>
<spre:item>CRLineFeed</spre:item>
<spre:item>ParagraphSeparator</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Other">
<spre:item>Percent</spre:item>
<spre:item>Ampersand</spre:item>
<spre:item>Equals</spre:item>
<spre:item>Asterisk</spre:item>
<spre:item>Sharp</spre:item>
<spre:item>Underscore</spre:item>
</spre:tokenClass>
</spre:tokenClasses>
</spre:characterParser>
Character parser for all unicoded texts
false
Configurations for the SPre parser based on the character parser
<?xml version="1.0" encoding="UTF-8"?>
<spre:defaultParser
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xmlns:spre="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser"
xs:schemaLocation="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser SPreDefaultParser.xsd">
<!-- ************* -->
<!-- 1. The tokens -->
<!-- ************* -->
<spre:layer>WordLayer</spre:layer>
<spre:tokens>
<!-- The tokenClass Unprocessable will always be generated by the
characterParser. It's a bit problematic that the name is only fixed
on the level of the source code. -->
<spre:token name="UnprocessableTokenSequence">
<spre:pattern>
<spre:startsWith>Unprocessable</spre:startsWith>
<spre:contains>Unprocessable</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="UnprocessableToken">
<spre:pattern>
<spre:containsOnly>Unprocessable</spre:containsOnly>
</spre:pattern>
</spre:token>
<!-- ******************** -->
<!-- 1.1 The "Word" token -->
<!-- ******************** -->
<spre:token name="Word">
<spre:pattern>
<spre:startsWith>Letter</spre:startsWith>
<spre:contains>Alphanumeric</spre:contains>
<spre:contains>Dot</spre:contains>
<spre:contains>Hyphen</spre:contains>
<spre:contains>Slash</spre:contains>
<spre:contains>Backslash</spre:contains>
<spre:ambiguity>
<spre:element>Dot</spre:element>
<!-- Merge the ambigue element with the previous element's belonging
WordLevel-element if the condition isAbbreviation is met. -->
<spre:merge type="left">
<spre:sequence>
<spre:element>Letter</spre:element>
<spre:element>Dot</spre:element>
<spre:element>Divider</spre:element>
</spre:sequence>
<!--The idea is that conditions call methods, defined in <spre:condition ...>,
to check for special properties of the item in question. The parameters
of the respective method is String, in this case the Token "Word" that
results by applying this Merge operation.-->
<spre:conditions>
<spre:condition>Abbreviations</spre:condition>
</spre:conditions>
</spre:merge>
<spre:merge type="left">
<spre:sequence>
<spre:element>Letter</spre:element>
<spre:element>Dot</spre:element>
<spre:element>Divider</spre:element>
<spre:element>LowerCaseLetter</spre:element>
</spre:sequence>
</spre:merge>
<!-- Merge all three elements to one of the type of the first element's
WordLevel-element type. -->
<spre:merge type="leftright">
<spre:sequence>
<spre:element>Alphanumeric</spre:element>
<spre:element>Dot</spre:element>
<spre:element>Alphanumeric</spre:element>
</spre:sequence>
</spre:merge>
</spre:ambiguity>
</spre:pattern>
</spre:token>
<!-- ************************* -->
<!-- 1.2 The "Numerical" token -->
<!-- ************************* -->
<spre:token name="Numerical">
<spre:pattern>
<spre:startsWith>Digit</spre:startsWith>
<spre:contains>Digit</spre:contains>
<spre:contains>Digit</spre:contains>
<spre:contains>Dot</spre:contains>
<spre:contains>Comma</spre:contains>
<spre:contains>Slash</spre:contains>
<!-- First amgigue Element: "Dot" -->
<spre:ambiguity>
<spre:element>Dot</spre:element>
<!--
E.g. <spre:merge Type="leftright">
... i n 4 3 . 5 % a l l e r F ä l l e ...
Here, "43.5" will be recognized as one token
and merging "leftright" means that both parts
left and right of the ambigue element "." will
be joined.
-->
<spre:merge type="leftright">
<spre:sequence>
<spre:element>Digit</spre:element>
<spre:element>Dot</spre:element>
<spre:element>Digit</spre:element>
</spre:sequence>
</spre:merge>
<!--
E.g. <spre:merge Type="left">
... b e i d e r 2 0 . ö f f e n t l i c h a u s g e t r a g e n e n ...
Here, "20." will be recognized as one token and merging
'left' means that the dot will be interpreted as part of the
numerical expression.
-->
<spre:merge type="left">
<spre:sequence>
<spre:element>Digit</spre:element>
<spre:element>Dot</spre:element>
<spre:element>WhiteSpace</spre:element>
<spre:element>LowerCaseLetter</spre:element>
</spre:sequence>
</spre:merge>
</spre:ambiguity>
<!-- Second ambigue element: "Comma" -->
<spre:ambiguity>
<!--
E.g. <spre:merge Type="leftright">
... i n 4 3 , 5 % a l l e r F ä l l e ...
Here, "43,5" will be recognized as one token
and merging "leftright" means that both parts
left and right of the ambigue element "." will
be joined.
-->
<spre:element>Comma</spre:element>
<spre:merge type="leftright">
<spre:sequence>
<spre:element>Digit</spre:element>
<spre:element>Comma</spre:element>
<spre:element>Digit</spre:element>
</spre:sequence>
</spre:merge>
</spre:ambiguity>
</spre:pattern>
</spre:token>
<!-- **************** -->
<!-- 1.3 Other tokens -->
<!-- **************** -->
<spre:token name="FullStopSequence">
<spre:pattern>
<spre:startsWith>Dot</spre:startsWith>
<spre:contains>Dot</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="ExclamationMarkSequence">
<spre:pattern>
<spre:startsWith>ExclamationMark</spre:startsWith>
<spre:contains>ExclamationMark</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="QuestionMarkSequence">
<spre:pattern>
<spre:startsWith>QuestionMark</spre:startsWith>
<spre:contains>QuestionMark</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="MixedSequence">
<spre:pattern>
<spre:startsWith>ExclamationMark</spre:startsWith>
<spre:startsWith>QuestionMark</spre:startsWith>
<spre:contains>ExclamationMark</spre:contains>
<spre:contains>QuestionMark</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="Apostroph">
<spre:pattern>
<spre:containsOnly>Apostroph</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="Slash">
<spre:pattern>
<spre:containsOnly>Slash</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="FullStop">
<spre:pattern>
<spre:containsOnly>Dot</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ExclamationMark">
<spre:pattern>
<spre:containsOnly>ExclamationMark</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="QuestionMark">
<spre:pattern>
<spre:containsOnly>QuestionMark</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="Comma">
<spre:pattern>
<spre:containsOnly>Comma</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="Colon">
<spre:pattern>
<spre:containsOnly>Colon</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="SemiColon">
<spre:pattern>
<spre:containsOnly>SemiColon</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="SingleQuote">
<spre:pattern>
<spre:containsOnly>Apostroph</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="DoubleQuote">
<spre:pattern>
<spre:containsOnly>Quotation</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ParentheticalOpen">
<spre:pattern>
<spre:containsOnly>ParentheticalOpen</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ParentheticalClosed">
<spre:pattern>
<spre:containsOnly>ParentheticalClose</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="NewLine">
<spre:pattern>
<spre:containsOnly>NewLine</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ParagraphSeparator">
<spre:pattern>
<spre:containsOnly>ParagraphSeparator</spre:containsOnly>
</spre:pattern>
</spre:token>
</spre:tokens>
<!-- ******************* -->
<!-- 2. The tokenClasses -->
<!-- ******************* -->
<spre:tokenClasses>
<spre:tokenClass name="Unprocessable">
<spre:item>UnprocessableToken</spre:item>
<spre:item>UnprocessableTokenSequence</spre:item>
</spre:tokenClass>
<spre:tokenClass name="FullStops">
<spre:item>FullStop</spre:item>
<spre:item>ExclamationMark</spre:item>
<spre:item>QuestionMark</spre:item>
</spre:tokenClass>
</spre:tokenClasses>
</spre:defaultParser>
Word parser for german texts
false
Configurations for the SPre parser based on the secondary parser
<?xml version="1.0" encoding="UTF-8"?>
<spre:defaultParser
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xmlns:spre="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser"
xs:schemaLocation="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser SPreDefaultParser.xsd">
<spre:layer>SentenceLayer</spre:layer>
<!-- ************* -->
<!-- 1. The tokens -->
<!-- ************* -->
<spre:tokens>
<!-- The tokenClass Unprocessable will always be generated by the
characterParser. It's a bit problematic that the name is only fixed
on the level of the source code. -->
<spre:token name="UnprocessableTokenSequence">
<spre:pattern>
<spre:startsWith>Unprocessable</spre:startsWith>
<spre:contains>Unprocessable</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="UnprocessableToken">
<spre:pattern>
<spre:containsOnly>Unprocessable</spre:containsOnly>
</spre:pattern>
</spre:token>
<!-- ******************** -->
<!-- 1.1 The "Sentence" token -->
<!-- ******************** -->
<spre:token name="Sentence">
<spre:pattern>
<spre:startsWith>Word</spre:startsWith>
<spre:endsWith>FullStops</spre:endsWith>
</spre:pattern>
</spre:token>
<!-- <spre:token name="NoSentence">
<spre:pattern>
<spre:startsWith>Word</spre:startsWith>
<spre:endsWith>NewLine</spre:endsWith>
</spre:pattern>
</spre:token>
-->
</spre:tokens>
<spre:tokenClasses>
<spre:tokenClass name="Sentence">
<spre:item>Sentence</spre:item>
</spre:tokenClass>
<!-- <spre:tokenClass name="NoSentence">
<spre:item>NoSentence</spre:item>
</spre:tokenClass>
-->
</spre:tokenClasses>
</spre:defaultParser>
Sentence parser for german texts
false
List of abbreviations
Abbreviations
A.
A.A.
a.a.
a.a.O.
Abb.
Abbr.
Abg.
Abk.
Abs.
Abt.
A.C.
A.D.
a.d.
a.D.
agr.
allg.
Alt.
a.m.
amerik.
Anm.
a.o.
A.T.
B.
B.c.
B.C.
Bd.
Bev.
Bj.
Bsp.
Btl.
bzw.
C.
ca.
chin.
Co.
Ct.
D.
D.C.
Dez.
d.h.
Di.
Dipl.
Do.
Dr.
dt.
E.
e.
e.G.
engl.
etc.
e.V.
ev.
F.
f.
Fr.
franz.
G.
Gef.
gegr.
gem.
ggf.
GMBl.
Grp.
H.
habil.
Hbf.
hist.
höchst.
Hptm.
I.
i.A.
id.
i.d.F.
I.K.
I.L.
Inc.
incl.
inkl.
i.S.v.
ital.
i.V.
i.V.m.
J.
jap.
Jh.
jmd.
Jt.
K.
kath.
K.O.
Kp.
L.
lat.
Ld.
Lj.
M.
m.
m.b.L.
med.
Mi.
Mill.
Min.
mind.
Mio.
Mo.
Mr.
Mrd.
Mrs.
Msp.
m.W.v.
N.
Nr.
N.T.
N.Y.
o.g.
O.
o.O.
P.
p.a.
Pfd.
pl.
p.m.
P.M.
Prof.
prot.
Q.
Q.b.A.
q.e.d.
Qual.
Quant.
R.
reg.
rer.
S.
Sa.
san.
sgl.
So.
sog.
span.
Std.
Str.
-str.
svw.
T.
Tel.
U.
u.a.
ugs.
urspr.
usw.
u.ä.
u.U.
V.
v.Chr.
Vfg.
Vgl.
v.H.
vs.
W.
Wdh.
Wv.
X.
Y.
Z.
z.A.
z.B.
Zbl.
z.d.A.
z.Hd.
Zi.
z.T.
Ztr.
z.V.
zw.
z.Z.
z.Zt.
zzgl.
ä.
Ä.
ö.
Ö.
ü.
Ü.
List of german abbreviations
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
false
false
Jürgen Hermes
jhermes@spinfo.uni-koeln.de
Sprachliche Informationsverarbeitung
http://www.phil-fak.uni-koeln.de/spinfo-juergenhermes.html
Christoph Benden
cbenden@spinfo.uni-koeln.de
Sprachliche Informationsverarbeitung
No external URL defined
A configurable layered tokenizer.
No external URL defined
de.uni_koeln.spinfo.formanalysis.teslacomponents.SimpleMorphemizerComponent
de.uni_koeln.spinfo.tesla.roles.core.impl.hibernate.data.Token
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TTokenizerAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter
1622039516
Morphemizer
General information about this role: Detects graphems using sucessor/predecessor counts. Provides access on words morphemes.
-
Tokenizer
Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer
f20f9ea3-dca6-452e-8a0e-40386264f43a
-664934613
de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter
de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken
Determines the minimal length of words that should be analysed
3
false
Determines the minimal score for morpheme boundaries
3
false
If true, words will be splittet to morphemes solely at the maximum evidence value
true
false
Count of re-analyses with detected morphemes treated as words
1
false
If true, accepted morpheme partitions must have a prefix in first, infix in mid and suffix in last position.
true
false
If true, increasing predecessor counts score
true
false
If true, increasing successor counts score
true
false
If true, lokal maximum predecessor counts score
true
false
If true, maximum predecessor counts score
true
false
If true, lokal maximum successor counts score
true
false
If true, maximum succecessor counts score
true
false
If true, lokal maximum combined (successor + predecessor) counts score
true
false
If true, maximum combined (successor + predecessor) counts score
true
false
if true, the word labels will be analysed instead of the signal content.
false
false
Determines how many occurences a type should have at least to be analyzed
1
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
false
false
Jürgen Hermes
jhermes@spinfo.uni-koeln.de
Sprachliche Informationsverarbeitung
none
Detects Morphemes using sucessor and predecessor counts.
none
Morpheme Analysis PIII
Experiment Description
jhermes
hermesj@uni-koeln.de
none