sig_1
A selection of 6 documents, created by jhermes
Renaissance Comparison Texts
ri735549461
ri561466935
ri220906638
5554b86b-8a39-4fa5-b0c5-1df92734956d
5554b86b-8a39-4fa5-b0c5-1df92734956d
de.uni_koeln.spinfo.tesla.component.spre.SPre2Component
de.uni_koeln.spinfo.tesla.roles.core.impl.hibernate.data.Token
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TTokenizerAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
-2102528184
Tokenizer
General information about this role: Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Sentence
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TSentenceTokenAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
-1269385762
Sentence Detector
General information about this role: Detects sentence boundaries.
Configurations for the SPre Character parser
<?xml version="1.0" encoding="UTF-8"?>
<spre:characterParser
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xmlns:spre="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreCharacterParser"
xs:schemaLocation="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreCharacterParser SPreCharacterParser.xsd">
<spre:layer>CharacterLayer</spre:layer>
<spre:tokens>
<!-- ++++++++++++ -->
<!-- Letter items -->
<!-- ++++++++++++ -->
<spre:token name="a_min">a</spre:token>
<spre:token name="b_min">b</spre:token>
<spre:token name="c_min">c</spre:token>
<spre:token name="d_min">d</spre:token>
<spre:token name="e_min">e</spre:token>
<spre:token name="f_min">f</spre:token>
<spre:token name="g_min">g</spre:token>
<spre:token name="h_min">h</spre:token>
<spre:token name="i_min">i</spre:token>
<spre:token name="j_min">j</spre:token>
<spre:token name="k_min">k</spre:token>
<spre:token name="l_min">l</spre:token>
<spre:token name="m_min">m</spre:token>
<spre:token name="n_min">n</spre:token>
<spre:token name="o_min">o</spre:token>
<spre:token name="p_min">p</spre:token>
<spre:token name="q_min">q</spre:token>
<spre:token name="r_min">r</spre:token>
<spre:token name="s_min">s</spre:token>
<spre:token name="t_min">t</spre:token>
<spre:token name="u_min">u</spre:token>
<spre:token name="v_min">v</spre:token>
<spre:token name="w_min">w</spre:token>
<spre:token name="x_min">x</spre:token>
<spre:token name="y_min">y</spre:token>
<spre:token name="z_min">z</spre:token>
<spre:token name="a_maj">A</spre:token>
<spre:token name="b_maj">B</spre:token>
<spre:token name="c_maj">C</spre:token>
<spre:token name="d_maj">D</spre:token>
<spre:token name="e_maj">E</spre:token>
<spre:token name="f_maj">F</spre:token>
<spre:token name="g_maj">G</spre:token>
<spre:token name="h_maj">H</spre:token>
<spre:token name="i_maj">I</spre:token>
<spre:token name="j_maj">J</spre:token>
<spre:token name="k_maj">K</spre:token>
<spre:token name="l_maj">L</spre:token>
<spre:token name="m_maj">M</spre:token>
<spre:token name="n_maj">N</spre:token>
<spre:token name="o_maj">O</spre:token>
<spre:token name="p_maj">P</spre:token>
<spre:token name="q_maj">Q</spre:token>
<spre:token name="r_maj">R</spre:token>
<spre:token name="s_maj">S</spre:token>
<spre:token name="t_maj">T</spre:token>
<spre:token name="u_maj">U</spre:token>
<spre:token name="v_maj">V</spre:token>
<spre:token name="w_maj">W</spre:token>
<spre:token name="x_maj">X</spre:token>
<spre:token name="y_maj">Y</spre:token>
<spre:token name="z_maj">Z</spre:token>
<spre:token name="ae_min">ä</spre:token>
<spre:token name="oe_min">ö</spre:token>
<spre:token name="ue_min">ü</spre:token>
<spre:token name="ae_maj">Ä</spre:token>
<spre:token name="oe_maj">Ö</spre:token>
<spre:token name="ue_maj">Ü</spre:token>
<spre:token name="sz">ß</spre:token>
<spre:token name="trith_on">õ</spre:token>
<!-- +++++++++++ -->
<!-- Digit items -->
<!-- +++++++++++ -->
<spre:token name="Null">0</spre:token>
<spre:token name="One">1</spre:token>
<spre:token name="Two">2</spre:token>
<spre:token name="Three">3</spre:token>
<spre:token name="Four">4</spre:token>
<spre:token name="Five">5</spre:token>
<spre:token name="Six">6</spre:token>
<spre:token name="Seven">7</spre:token>
<spre:token name="Eight">8</spre:token>
<spre:token name="Nine">9</spre:token>
<!-- +++++++++++++++++ -->
<!-- Punctuation items -->
<!-- +++++++++++++++++ -->
<spre:token name="Dot">.</spre:token>
<spre:token name="QuestionMark">?</spre:token>
<spre:token name="ExclamationMark">!</spre:token>
<spre:token name="Comma">,</spre:token>
<spre:token name="Colon">:</spre:token>
<spre:token name="SemiColon">;</spre:token>
<spre:token name="Hyphen">-</spre:token>
<spre:token name="ParenthesisOpen">(</spre:token>
<spre:token name="ParenthesisClose">)</spre:token>
<spre:token name="BracketOpen">[</spre:token>
<spre:token name="BracketClose">]</spre:token>
<spre:token name="BraceOpen">{</spre:token>
<spre:token name="BraceClose">}</spre:token>
<spre:token name="LowerThan"><</spre:token>
<spre:token name="GreaterThan">></spre:token>
<spre:token name="Apostroph">'</spre:token>
<spre:token name="Quotation">"</spre:token>
<spre:token name="Slash">/</spre:token>
<spre:token name="Backslash">\</spre:token>
<!-- +++++++++++++++ -->
<!-- Line feed items -->
<!-- +++++++++++++++ -->
<spre:token name="LineFeed">
</spre:token>
<spre:token name="CRLineFeed">
</spre:token>
<!-- +++++++++++++++++ -->
<!-- White space items -->
<!-- +++++++++++++++++ -->
<!-- Character-Tab -->
<spre:token name="Tab">	</spre:token>
<!-- Blank -->
<spre:token name="Space"> </spre:token>
<!-- +++++++++++++ -->
<!-- Further items -->
<!-- +++++++++++++ -->
<spre:token name="Paragraph">§</spre:token>
<spre:token name="Percent">%</spre:token>
<spre:token name="Ampersand">&</spre:token>
<spre:token name="Equals">=</spre:token>
<spre:token name="Asterisk">*</spre:token>
<spre:token name="Plus">+</spre:token>
<spre:token name="Sharp">#</spre:token>
<spre:token name="Underscore">_</spre:token>
<spre:token name="ParagraphSeparator">
</spre:token>
</spre:tokens>
<spre:tokenClasses>
<spre:tokenClass name="Alphanumeric">
<spre:item>Letter</spre:item>
<spre:item>Digit</spre:item>
</spre:tokenClass>
<spre:tokenClass name="LowerCaseLetter">
<spre:item>a_min</spre:item>
<spre:item>b_min</spre:item>
<spre:item>c_min</spre:item>
<spre:item>d_min</spre:item>
<spre:item>e_min</spre:item>
<spre:item>f_min</spre:item>
<spre:item>g_min</spre:item>
<spre:item>h_min</spre:item>
<spre:item>i_min</spre:item>
<spre:item>j_min</spre:item>
<spre:item>k_min</spre:item>
<spre:item>l_min</spre:item>
<spre:item>m_min</spre:item>
<spre:item>n_min</spre:item>
<spre:item>o_min</spre:item>
<spre:item>p_min</spre:item>
<spre:item>q_min</spre:item>
<spre:item>r_min</spre:item>
<spre:item>s_min</spre:item>
<spre:item>t_min</spre:item>
<spre:item>u_min</spre:item>
<spre:item>v_min</spre:item>
<spre:item>w_min</spre:item>
<spre:item>x_min</spre:item>
<spre:item>y_min</spre:item>
<spre:item>z_min</spre:item>
<spre:item>sz</spre:item>
<spre:item>trith_on</spre:item>
</spre:tokenClass>
<spre:tokenClass name="CapitalLetter">
<spre:item>a_maj</spre:item>
<spre:item>b_maj</spre:item>
<spre:item>c_maj</spre:item>
<spre:item>d_maj</spre:item>
<spre:item>e_maj</spre:item>
<spre:item>f_maj</spre:item>
<spre:item>g_maj</spre:item>
<spre:item>h_maj</spre:item>
<spre:item>i_maj</spre:item>
<spre:item>j_maj</spre:item>
<spre:item>k_maj</spre:item>
<spre:item>l_maj</spre:item>
<spre:item>m_maj</spre:item>
<spre:item>n_maj</spre:item>
<spre:item>o_maj</spre:item>
<spre:item>p_maj</spre:item>
<spre:item>q_maj</spre:item>
<spre:item>r_maj</spre:item>
<spre:item>s_maj</spre:item>
<spre:item>t_maj</spre:item>
<spre:item>u_maj</spre:item>
<spre:item>v_maj</spre:item>
<spre:item>w_maj</spre:item>
<spre:item>x_maj</spre:item>
<spre:item>y_maj</spre:item>
<spre:item>z_maj</spre:item>
<spre:item>ae_min</spre:item>
<spre:item>oe_min</spre:item>
<spre:item>ue_min</spre:item>
<spre:item>ae_maj</spre:item>
<spre:item>oe_maj</spre:item>
<spre:item>ue_maj</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Letter">
<spre:item>LowerCaseLetter</spre:item>
<spre:item>CapitalLetter</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Digit">
<spre:item>Null</spre:item>
<spre:item>One</spre:item>
<spre:item>Two</spre:item>
<spre:item>Three</spre:item>
<spre:item>Four</spre:item>
<spre:item>Five</spre:item>
<spre:item>Six</spre:item>
<spre:item>Seven</spre:item>
<spre:item>Eight</spre:item>
<spre:item>Nine</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Stop">
<spre:item>SemiStop</spre:item>
<spre:item>FullStop</spre:item>
</spre:tokenClass>
<spre:tokenClass name="SemiStop">
<spre:item>Comma</spre:item>
<spre:item>Colon</spre:item>
<spre:item>SemiColon</spre:item>
</spre:tokenClass>
<spre:tokenClass name="FullStop">
<spre:item>Dot</spre:item>
<spre:item>QuestionMark</spre:item>
<spre:item>ExclamationMark</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Punctuation">
<spre:item>Stop</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Parenthetical">
<spre:item>ParentheticalOpen</spre:item>
<spre:item>ParentheticalClose</spre:item>
</spre:tokenClass>
<spre:tokenClass name="ParentheticalOpen">
<spre:item>ParenthesisOpen</spre:item>
<spre:item>BracketOpen</spre:item>
<spre:item>BraceOpen</spre:item>
</spre:tokenClass>
<spre:tokenClass name="ParentheticalClose">
<spre:item>ParenthesisClose</spre:item>
<spre:item>BracketClose</spre:item>
<spre:item>BraceClose</spre:item>
</spre:tokenClass>
<spre:tokenClass name="WhiteSpace">
<spre:item>Tab</spre:item>
<spre:item>Space</spre:item>
</spre:tokenClass>
<!-- This class comprises everything that clearly divides words above including WhiteSpace -->
<spre:tokenClass name="Divider">
<spre:item>NewLine</spre:item>
<spre:item>WhiteSpace</spre:item>
<spre:item>Plus</spre:item>
<spre:item>Paragraph</spre:item>
</spre:tokenClass>
<spre:tokenClass name="NewLine">
<spre:item>LineFeed</spre:item>
<spre:item>CRLineFeed</spre:item>
<spre:item>ParagraphSeparator</spre:item>
</spre:tokenClass>
<spre:tokenClass name="Other">
<spre:item>Percent</spre:item>
<spre:item>Ampersand</spre:item>
<spre:item>Equals</spre:item>
<spre:item>Asterisk</spre:item>
<spre:item>Sharp</spre:item>
<spre:item>Underscore</spre:item>
</spre:tokenClass>
</spre:tokenClasses>
</spre:characterParser>
Character parser for trithemian cipher texts (Value from Template "Cipher Character Parser" of Template Set "Cipher Text Preprocessor")
false
Configurations for the SPre parser based on the character parser
<?xml version="1.0" encoding="UTF-8"?>
<spre:defaultParser
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xmlns:spre="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser"
xs:schemaLocation="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser SPreDefaultParser.xsd">
<!-- ************* -->
<!-- 1. The tokens -->
<!-- ************* -->
<spre:layer>WordLayer</spre:layer>
<spre:tokens>
<!-- The tokenClass Unprocessable will always be generated by the
characterParser. It's a bit problematic that the name is only fixed
on the level of the source code. -->
<spre:token name="UnprocessableTokenSequence">
<spre:pattern>
<spre:startsWith>Unprocessable</spre:startsWith>
<spre:contains>Unprocessable</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="UnprocessableToken">
<spre:pattern>
<spre:containsOnly>Unprocessable</spre:containsOnly>
</spre:pattern>
</spre:token>
<!-- ******************** -->
<!-- 1.1 The "Word" token -->
<!-- ******************** -->
<spre:token name="Word">
<spre:pattern>
<spre:startsWith>Letter</spre:startsWith>
<spre:contains>Letter</spre:contains>
<spre:contains>Unprocessable</spre:contains>
</spre:pattern>
</spre:token>
<!-- **************** -->
<!-- 1.3 Other tokens -->
<!-- **************** -->
<spre:token name="CommentStart">
<spre:pattern>
<spre:containsOnly>BraceOpen</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="CommentEnd">
<spre:pattern>
<spre:containsOnly>BraceClose</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="FullStopSequence">
<spre:pattern>
<spre:startsWith>Dot</spre:startsWith>
<spre:contains>Dot</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="ExclamationMarkSequence">
<spre:pattern>
<spre:startsWith>ExclamationMark</spre:startsWith>
<spre:contains>ExclamationMark</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="QuestionMarkSequence">
<spre:pattern>
<spre:startsWith>QuestionMark</spre:startsWith>
<spre:contains>QuestionMark</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="MixedSequence">
<spre:pattern>
<spre:startsWith>ExclamationMark</spre:startsWith>
<spre:startsWith>QuestionMark</spre:startsWith>
<spre:contains>ExclamationMark</spre:contains>
<spre:contains>QuestionMark</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="Slash">
<spre:pattern>
<spre:containsOnly>Slash</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="FullStop">
<spre:pattern>
<spre:containsOnly>Dot</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ExclamationMark">
<spre:pattern>
<spre:containsOnly>ExclamationMark</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="QuestionMark">
<spre:pattern>
<spre:containsOnly>QuestionMark</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="Comma">
<spre:pattern>
<spre:containsOnly>Comma</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="Colon">
<spre:pattern>
<spre:containsOnly>Colon</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="SemiColon">
<spre:pattern>
<spre:containsOnly>SemiColon</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="SingleQuote">
<spre:pattern>
<spre:containsOnly>Apostroph</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="DoubleQuote">
<spre:pattern>
<spre:containsOnly>Quotation</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ParentheticalOpen">
<spre:pattern>
<spre:containsOnly>ParentheticalOpen</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ParentheticalClosed">
<spre:pattern>
<spre:containsOnly>ParentheticalClose</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="NewLine">
<spre:pattern>
<spre:containsOnly>NewLine</spre:containsOnly>
</spre:pattern>
</spre:token>
<spre:token name="ParagraphSeparator">
<spre:pattern>
<spre:containsOnly>ParagraphSeparator</spre:containsOnly>
</spre:pattern>
</spre:token>
</spre:tokens>
<!-- ******************* -->
<!-- 2. The tokenClasses -->
<!-- ******************* -->
<spre:tokenClasses>
<spre:tokenClass name="Unprocessable">
<spre:item>UnprocessableToken</spre:item>
<spre:item>UnprocessableTokenSequence</spre:item>
</spre:tokenClass>
<spre:tokenClass name="FullStops">
<spre:item>FullStop</spre:item>
<spre:item>ExclamationMark</spre:item>
<spre:item>QuestionMark</spre:item>
</spre:tokenClass>
</spre:tokenClasses>
</spre:defaultParser>
Word parser for trithemian cipher texts (Value from Template "Cipher Word Parser" of Template Set "Cipher Text Preprocessor")
false
Configurations for the SPre parser based on the secondary parser
<?xml version="1.0" encoding="UTF-8"?>
<spre:defaultParser
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xmlns:spre="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser"
xs:schemaLocation="http://spinfo.uni_koeln.de/spre/xmlSchema/SPreDefaultParser SPreDefaultParser.xsd">
<spre:layer>SentenceLayer</spre:layer>
<!-- ************* -->
<!-- 1. The tokens -->
<!-- ************* -->
<spre:tokens>
<!-- The tokenClass Unprocessable will always be generated by the
characterParser. It's a bit problematic that the name is only fixed
on the level of the source code. -->
<spre:token name="UnprocessableTokenSequence">
<spre:pattern>
<spre:startsWith>Unprocessable</spre:startsWith>
<spre:contains>Unprocessable</spre:contains>
</spre:pattern>
</spre:token>
<spre:token name="UnprocessableToken">
<spre:pattern>
<spre:containsOnly>Unprocessable</spre:containsOnly>
</spre:pattern>
</spre:token>
<!-- ******************** -->
<!-- 1.1 The "Comment" token -->
<!-- ******************** -->
<spre:token name="Commment">
<spre:pattern>
<spre:startsWith>CommentStart</spre:startsWith>
<spre:endsWith>CommentEnd</spre:endsWith>
</spre:pattern>
</spre:token>
<!-- ******************** -->
<!-- 1.1 The "Paragraph" token -->
<!-- ******************** -->
<spre:token name="Paragraph">
<spre:pattern>
<spre:startsWith>Word</spre:startsWith>
<spre:endsWith>NewLine</spre:endsWith>
</spre:pattern>
</spre:token>
</spre:tokens>
<spre:tokenClasses>
<spre:tokenClass name="Unprocessable">
<spre:item>UnprocessableToken</spre:item>
<spre:item>UnprocessableTokenSequence</spre:item>
</spre:tokenClass>
</spre:tokenClasses>
</spre:defaultParser>
Paragraph parser for trithemian texts (Value from Template "Cipher Paragraph Parser" of Template Set "Cipher Text Preprocessor")
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
true
false
Jürgen Hermes
jhermes@spinfo.uni-koeln.de
Sprachliche Informationsverarbeitung
http://www.phil-fak.uni-koeln.de/spinfo-juergenhermes.html
Christoph Benden
cbenden@spinfo.uni-koeln.de
Sprachliche Informationsverarbeitung
No external URL defined
A configurable layered tokenizer.
No external URL defined
corpusstatistics.CoincidenceStatisticsComponent
de.uni_koeln.spinfo.tesla.roles.labeler.corpusstats.CoincidenceStatsImpl
de.uni_koeln.spinfo.tesla.roles.labeler.corpusstats.CoincidenceStatsAccessAdapterImpl
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
41734626
Coincidence Statistics Calculator
General information about this role: Calculates intra- and inter-signal coincidence statistics (kappa and chi)
-
Tokenizer
Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer
4747ede1-d9e4-4ce8-94da-f6bf397c217f
-2102528184
de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter
de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken
Select if you want to calculate the CI values for the whole selection, for each document or for document pairs.
Document Values
Selection Values
Document Values
Document Pair Values
false
if true, the statistics will calculated on the labels of each token, otherwise on the signal contents
false
false
if true, Statistics will be calculated per document, if false, per document selection
true
false
if true, all upper case letters of the signals will be replaced by their lower case counterparts
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
false
false
jhermes
jhermes@spinfo.uni-koeln.de
uni-koeln.ifl.spinfo
http://www.phil-fak.uni-koeln.de/spinfo-juergenhermes.html
Calculates coincidence values of texts - kappa, chi, psi, and phi
No external URL defined
corpusstatistics.CorpusStatisticsComponent
de.uni_koeln.spinfo.tesla.roles.labeler.corpusstats.CorpusStatsImpl
de.uni_koeln.spinfo.tesla.roles.labeler.corpusstats.CorpusStatsAccessAdapterImpl
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
418017320
Corpus Statistics Calculator
General information about this role: Calculates diverse corpus statistics (zipf and entropy values, word length distribution, type-token-frequency etc.).
de.uni_koeln.spinfo.tesla.roles.vectorengine.data.impl.hibernate.IntegerArrayVector
de.uni_koeln.spinfo.tesla.roles.vectorengine.access.impl.tunguska.IntegerVectorAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter
-1330984765
Integer Vector Generator
General information about this role: Generates a vector representation of the processed data in which each vector consists of integers.
de.uni_koeln.spinfo.tesla.roles.vectorengine.data.impl.hibernate.IntegerArrayVector
de.uni_koeln.spinfo.tesla.roles.vectorengine.access.impl.tunguska.IntegerVectorAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter
-461859408
Integer Vector Generator
General information about this role: Generates a vector representation of the processed data in which each vector consists of integers.
de.uni_koeln.spinfo.tesla.roles.vectorengine.data.impl.hibernate.DoubleArrayVector
de.uni_koeln.spinfo.tesla.roles.vectorengine.access.impl.tunguska.DoubleVectorAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter
1445352594
Double Vector Generator
General information about this role: Generates a vector representation of the processed data in which each vector consists of floating point numbers.
-
Tokenizer
Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer
4747ede1-d9e4-4ce8-94da-f6bf397c217f
-2102528184
de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter
de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken
Method to calculate type token relation. Choose between standard calculation (standardised with number of tokens per cohort) and position calculation (usual or Koehler-Galle style)
Standardised-1K
Position-Usual
Position-Koehler-Galle
Standardised-1K
Standardised-10K
Standardised-100K
Standardised-1M
false
Please choose between true and false. Thedefault value is true.
true
false
Please choose between true and false. Thedefault value is true.
true
false
Please choose between true and false. Thedefault value is true.
true
false
Please choose between true and false. Thedefault value is true.
true
false
Please choose between true and false. Thedefault value is true.
true
false
Please choosebetween true and false. Thedefault value is true.
true
false
Please choosebetween true and false. Thedefault value is true.
true
false
Please choose between true and false. Thedefault value is true.
true
false
if true, the statistics will calculated on the labels of each token, otherwise on the signal contents
false
false
if true, Statistics will be calculated per document, if false, per document selection
true
false
if true, all upper case letters of the signals will be replaced by their lower case counterparts
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
false
false
Jürgen Hermes
http://www.phil-fak.uni-koeln.de/spinfo-juergenhermes.html
CorpusStatistics, a Tesla (http://www.spinfo.uni-koeln.de/space/Forschung/Tesla) natural language processing component. Note: The calculation of the entropy values is very memory intesive. If you want to calculate document statistics,the documents should not be too large, if you want to calculate overall statistics, your corpus should not be to large.
No external URL defined
corpusstatistics.RandomWalkComponent
de.uni_koeln.spinfo.tesla.roles.vectorengine.data.impl.hibernate.DoubleArrayVector
de.uni_koeln.spinfo.tesla.roles.vectorengine.access.impl.tunguska.DoubleVectorAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter
1087309178
Double Vector Generator
General information about this role: Generates a vector representation of the processed data in which each vector consists of floating point numbers.
de.uni_koeln.spinfo.tesla.roles.vectorengine.data.impl.hibernate.DoubleArrayVector
de.uni_koeln.spinfo.tesla.roles.vectorengine.access.impl.tunguska.DoubleVectorAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter
2000505860
Double Vector Generator
General information about this role: Generates a vector representation of the processed data in which each vector consists of floating point numbers.
-
Tokenizer
Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer
4747ede1-d9e4-4ce8-94da-f6bf397c217f
-2102528184
de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter
de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken
Alphabet of characters that should be converted to bitsets, order is not significant
abcdefghijklmnopqrstuvwxyz
false
Upper bound of interval calculations relative to the walk length (ex: In case of value 10 the longest interval calculated has the size WalkLength/10)
1000
false
if true, the statistics will calculated on the labels of each token, otherwise on the signal contents
false
false
if true, Statistics will be calculated per document, if false, per document selection
true
false
if true, all upper case letters of the signals will be replaced by their lower case counterparts
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
false
false
jhermes
jhermes@spinfo.uni-koeln.de
http://www.phil-fak.uni-koeln.de/spinfo-juergenhermes.html
Performs Random Walks, calculates Long-Range Correlations
No external URL defined
corpusstatistics.RepeatedWordsDetector
de.uni_koeln.spinfo.tesla.roles.labeler.corpusstats.MultipleTokensStats
de.uni_koeln.spinfo.tesla.roles.labeler.corpusstats.MultipleTokensStatsAccessAdapterImpl
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
-416895701
Consecutive Multiples Detector
General information about this role: Detects occurences of same or very similar words (Levenshtein Distance smaller than 1) directly follow each other.
-
Tokenizer
Detects linguistic tokens.
de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer
4747ede1-d9e4-4ce8-94da-f6bf397c217f
-2102528184
de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter
de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken
if true, the statistics will calculated on the labels of each token, otherwise on the signal contents
false
false
if true, Statistics will be calculated per document, if false, per document selection
true
false
if true, all upper case letters of the signals will be replaced by their lower case counterparts
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
false
false
jhermes
jhermes@spinfo.uni-koeln.de
uni-koeln.ifl.spinfo
http://www.phil-fak.uni-koeln.de/spinfo-juergenhermes.html
Detects occurences of repeated words
No external URL defined
de.uni_koeln.spinfo.tesla.component.reader.tika.DefaultTikaReader
de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.data.Url
de.uni_koeln.spinfo.tesla.roles.expressions.impl.hibernate.access.UrlAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
220906638
URL Detector
General information about this role: Detects URLs.
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Paragraph
de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TParagraphAccessAdapter
de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff
561466935
Paragraph Detector
General information about this role: Detects paragraph boundaries.
de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataImpl
de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataAccessAdapterImpl
de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter
735549461
Dublin Core Metadata Generator
General information about this role: Generates Dublin Core metadata annotations.
If enabled (default), the reader will detect URLs and generate corresponding annotations.
true
false
If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change.
true
false
Stephan Schwiebert
sschwieb@spinfo.uni-koeln.de
Department of Computational Linguistics, University of Cologne
http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html
A general purpose reader which uses Apache Tika, such that it supports various formats, like RTF, PDF, ODF, HTML and MS Office. Note, however, that the structure of a document will not be extracted or annotated.
http://tika.apache.org/
java.lang.String
Compare Texts Statistics
Calculation of various statistics of various renaissance texts.
jhermes
hermesj@uni-koeln.de
none