sig_1 A selection of 28 documents, created by geduldia wikipedia_articles ri-92067217 ri1043728585 ri-1178423057 63cea67f-09e1-470a-a745-62ffa473e68e 63cea67f-09e1-470a-a745-62ffa473e68e de.uni_koeln.spinfo.tesla.component.simpletokenizer.SimpleTokenizer de.uni_koeln.spinfo.tesla.roles.core.impl.hibernate.data.Token de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TTokenizerAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff 167533922 Tokenizer General information about this role: Detects linguistic tokens. de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Sentence de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TSentenceTokenAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff 1746718580 Sentence Detector General information about this role: Detects sentence boundaries. The locale that will be used to determine word and sentence boundaries. For best results, set this value to the language of the texts which are being processed. See http://download.oracle.com/javase/6/docs/api/java/util/Locale.html and http://download.oracle.com/javase/6/docs/api/java/text/BreakIterator.html for technical details. default false If true, the tokenizer will produce annotations for whitespaces. If this option is enabled, nearly twice as much annotations will be generated, such that itis set to false by default. false false If true, the type id of an annotation will be calculated by the underlying string in lowercase letters. This reduces the overall quantity of types produced (at least if used for texts which contain lots of capital letters), however, it might affect the quality of components which make use of the type ids generated by this component. false false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html A quick and dirty layered tokenizer based on Java's java.text.BreakIterator. Intention of this tokenizer was to test the exchangeability of SPre and to offer a tokenizer which is "failsafe", as it cannot be misconfigured. Note, however, that other tokenizers will probably produce much better results than this one. No external URL defined de.uni_koeln.spinfo.tesla.component.gazetteer.GazetteerComponent de.uni_koeln.spinfo.tesla.roles.categorizer.impl.hibernate.data.MultiValueCategory de.uni_koeln.spinfo.tesla.component.gazetteer.GazetteerAccessAdapter de.uni_koeln.spinfo.tesla.component.gazetteer.GazetteerOutputAdapter 47203239 Multi Value Categorizer General information about this role: Assigns multiple categories to tokens. Sentence Detector Detects sentence boundaries. de.uni_koeln.spinfo.tesla.roles.tokenizer.SentenceDetector 1a02f399-ff20-4c6f-848f-5285386c7f33 1746718580 de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ISentenceAccessAdapter de.uni_koeln.spinfo.tesla.roles.tokenizer.data.ISentence Tokenizer Detects linguistic tokens. de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer 1a02f399-ff20-4c6f-848f-5285386c7f33 167533922 de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken Each occurrence of a element in this list will be annotated with the selected label. Stopword ab bei da deshalb ein für haben hier ich ja kann machen muesste nach oder seid sonst und vom wann wenn wie zu bin eines hat manche solches an anderm bis das deinem demselben dir doch einig er eurer hatte ihnen ihre ins jenen keinen manchem meinen nichts seine soll unserm welche werden wollte während alle allem allen aller alles als also am ander andere anderem anderen anderer anderes andern anderr anders auch auf aus bist bsp. daher damit dann dasselbe dazu daß dein deine deinen deiner deines dem den denn denselben der derer derselbe derselben des desselben dessen dich die dies diese dieselbe dieselben diesem diesen dieser dieses dort du durch eine einem einen einer einige einigem einigen einiger einiges einmal es etwas euch euer eure eurem euren eures ganz ganze ganzen ganzer ganzes gegen gemacht gesagt gesehen gewesen gewollt hab habe hatten hin hinter ihm ihn ihr ihrem ihren ihrer ihres im in indem ist jede jedem jeden jeder jedes jene jenem jener jenes jetzt kein keine keinem keiner keines konnte können könnte mache machst macht machte machten man manchen mancher manches mein meine meinem meiner meines mich mir mit muss musste müßt nicht noch nun nur ob ohne sage sagen sagt sagte sagten sagtest sehe sehen sehr seht sein seinem seinen seiner seines selbst sich sicher sie sind so solche solchem solchen solcher sollte sondern um uns unse unsen unser unses unter viel von vor war waren warst was weg weil weiter welchem welchen welcher welches werde wieder will wir wird wirst wo wolle wollen wollt wollten wolltest wolltet würde würden z.B. zum zur zwar zwischen über aber abgerufen abgerufene abgerufener abgerufenes acht acute allein allerdings allerlei allg allgemein allmählich allzu alsbald amp and andererseits andernfalls anerkannt anerkannte anerkannter anerkanntes anfangen anfing angefangen angesetze angesetzt angesetzten angesetzter ansetzen anstatt arbeiten aufgehört aufgrund aufhören aufhörte aufzusuchen ausdrücken ausdrückt ausdrückte ausgenommen ausser ausserdem author autor außen außer außerdem außerhalb background bald bearbeite bearbeiten bearbeitete bearbeiteten bedarf bedurfte bedürfen been befragen befragte befragten befragter begann beginnen begonnen behalten behielt beide beiden beiderlei beides beim beinahe beitragen beitrugen bekannt bekannte bekannter bekennen benutzt bereits berichten berichtet berichtete berichteten besonders besser bestehen besteht beträchtlich bevor bezüglich bietet bisher bislang biz bleiben blieb bloss bloß border brachte brachten brauchen braucht bringen bräuchte bzw böden ca ca. collapsed com comment content da? dabei dadurch dafür dagegen dahin damals danach daneben dank danke danken dannen daran darauf daraus darf darfst darin darum darunter darüber darüberhinaus dass davon davor demnach denen dennoch derart derartig derem deren derjenige derjenigen derzeit desto deswegen diejenige diesseits dinge direkt direkte direkten direkter doc doppelt dorther dorthin drauf drei dreißig drin dritte drunter drüber dunklen durchaus durfte durften dürfen dürfte eben ebenfalls ebenso ehe eher eigenen eigenes eigentlich einbaün einerseits einfach einführen einführte einführten eingesetzt einigermaßen eins einseitig einseitige einseitigen einseitiger einst einstmals einzig elf ende entsprechend entweder ergänze ergänzen ergänzte ergänzten erhalten erhielt erhielten erhält erneut erst erste ersten erster eröffne eröffnen eröffnet eröffnete eröffnetes etc etliche etwa fall falls fand fast ferner finden findest findet folgende folgenden folgender folgendes folglich for fordern fordert forderte forderten fortsetzen fortsetzt fortsetzte fortsetzten fragte frau frei freie freier freies fuer fünf gab ganzem gar gbr geb geben geblieben gebracht gedurft geehrt geehrte geehrten geehrter gefallen gefiel gefälligst gefällt gegeben gehabt gehen geht gekommen gekonnt gemocht gemäss genommen genug gern gestern gestrige getan geteilt geteilte getragen gewissermaßen geworden ggf gib gibt gleich gleichwohl gleichzeitig glücklicherweise gmbh gratulieren gratuliert gratulierte gute guten gängig gängige gängigen gängiger gängiges gänzlich haette halb hallo hast hattest hattet heraus herein heute heutige hiermit hiesige hinein hinten hinterher hoch html http hundert hätt hätte hätten höchstens igitt image immer immerhin important indessen info infolge innen innerhalb insofern inzwischen irgend irgendeine irgendwas irgendwen irgendwer irgendwie irgendwo je jed jedenfalls jederlei jedoch jemand jenseits jährig jährige jährigen jähriges kam kannst kaum kei nes keinerlei keineswegs klar klare klaren klares klein kleinen kleiner kleines koennen koennt koennte koennten komme kommen kommt konkret konkrete konkreten konkreter konkretes konnten könn könnt könnten künftig lag lagen langsam lassen laut lediglich leer legen legte legten leicht leider lesen letze letzten letztendlich letztens letztes letztlich lichten liegt liest links längst längstens mag magst mal mancherorts manchmal mann margin med mehr mehrere meist meiste meisten meta mindestens mithin mochte morgen morgige muessen muesst musst mussten muß mußt möchte möchten möchtest mögen möglich mögliche möglichen möglicher möglicherweise müssen müsste müssten müßte nachdem nacher nachhinein nahm natürlich ncht neben nebenan nehmen nein neu neue neuem neuen neuer neues neun nie niemals niemand nimm nimmer nimmt nirgends nirgendwo nter nutzen nutzt nutzung nächste nämlich nötigenfalls nützt oben oberhalb obgleich obschon obwohl oft online org padding per pfui plötzlich pro reagiere reagieren reagiert reagierte rechts regelmäßig rief rund sang sangen schlechter schließlich schnell schon schreibe schreiben schreibens schreiber schwierig schätzen schätzt schätzte schätzten sechs sect sehrwohl sei seit seitdem seite seiten seither selber senke senken senkt senkte senkten setzen setzt setzte setzten sicherlich sieben siebte siehe sieht singen singt sobald sodaß soeben sofern sofort sog sogar solange solc hen solch sollen sollst sollt sollten solltest somit sonstwo sooft soviel soweit sowie sowohl spielen später startet startete starteten statt stattdessen steht steige steigen steigt stets stieg stiegen such suchen sämtliche tages tat tatsächlich tatsächlichen tatsächlicher tatsächliches tausend teile teilen teilte teilten titel total trage tragen trotzdem trug trägt tun tust tut txt tät ueber umso unbedingt ungefähr unmöglich unmögliche unmöglichen unmöglicher unnötig unsem unser unsere unserem unseren unserer unseres unten unterbrach unterbrechen unterhalb unwichtig usw var vergangen vergangene vergangener vergangenes vermag vermutlich vermögen verrate verraten verriet verrieten version versorge versorgen versorgt versorgte versorgten versorgtes veröffentlichen veröffentlicher veröffentlicht veröffentlichte veröffentlichten veröffentlichtes viele vielen vieler vieles vielleicht vielmals vier vollständig voran vorbei vorgestern vorher vorne vorüber völlig während wachen waere warum weder wegen weitere weiterem weiteren weiterer weiteres weiterhin weiß wem wen wenig wenige weniger wenigstens wenngleich wer werdet weshalb wessen wichtig wieso wieviel wiewohl willst wirklich wodurch wogegen woher wohin wohingegen wohl wohlweislich womit woraufhin woraus worin wurde wurden währenddessen wär wäre wären zahlreich zehn zeitweise ziehen zieht zog zogen zudem zuerst zufolge zugleich zuletzt zumal zurück zusammen zuviel zwanzig zwei zwölf ähnlich übel überall überallhin überdies übermorgen übrig übrigens A collection of german stopwords false If enabled, upper-/lower case will be ignored false The text is processed case sensitive. false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html Fabian Steeg fsteeg@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/fsteeg.html A component for tagging single words or word sequences in a document. No external URL defined de.uni_koeln.spinfo.tesla.component.sttstagger.SttsTagger de.uni_koeln.spinfo.tesla.roles.categorizer.syntax.impl.tunguska.data.SimplePartOfSpeech de.uni_koeln.spinfo.tesla.roles.categorizer.syntax.impl.tunguska.access.SimplePartOfSpeechAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff -456038419 POS Tagger General information about this role: Assigns POS Tags to words de.uni_koeln.spinfo.tesla.roles.categorizer.stemming.impl.hibernate.Lemma de.uni_koeln.spinfo.tesla.roles.categorizer.stemming.impl.tunguska.DefaultSingleValueCategoryAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter -751932714 Stemmer General information about this role: Generates stemmed word forms If produced by 'Tree Tagger Wrapper (non-commercial usage only)': Stems (base form of words) Sentence Detector Detects sentence boundaries. de.uni_koeln.spinfo.tesla.roles.tokenizer.SentenceDetector 1a02f399-ff20-4c6f-848f-5285386c7f33 1746718580 de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ISentenceAccessAdapter de.uni_koeln.spinfo.tesla.roles.tokenizer.data.ISentence Tokenizer Detects linguistic tokens. de.uni_koeln.spinfo.tesla.roles.tokenizer.Tokenizer 1a02f399-ff20-4c6f-848f-5285386c7f33 167533922 de.uni_koeln.spinfo.tesla.roles.tokenizer.access.ITokenAccessAdapter de.uni_koeln.spinfo.tesla.roles.tokenizer.data.IToken The Tree Tagger binary file to use Tree Tagger (Windows) single application true The Tree Tagger model file to use. You can download additional models from http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html . german-par-linux-3.2.bin single config true No Description available. false false No Description available. false false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert (Tesla Wrapper) sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html Helmut Schmid (Tree Tagger) FirstName.LastName@ims.uni-stuttgart.de Institute of Natural Language Processing, University of Stuttgart http://www.ims.uni-stuttgart.de/~schmid/ A configurable POS Tagger and Stemmer for various languages http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ de.uni_koeln.spinfo.tesla.component.filter.GenericRangeFilter de.uni_koeln.spinfo.tesla.roles.filter.impl.tunguska.data.Filter de.uni_koeln.spinfo.tesla.roles.filter.impl.tunguska.access.RangeFilterAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter -751691479 Filter General information about this role: Filters annotations by their type id If produced by 'Generic Range Filter': The generated filter, which can be applied on all annotations Anchored Element Generator General information about this role: Generates anchored elements. If consumed by 'Generic Range Filter': The annotations which define if an annotation from will be filtered. de.uni_koeln.spinfo.tesla.roles.core.AnchoredElementGenerator 9210ac4e-8de6-44c8-ad42-02b31163fd6a 47203239 de.uni_koeln.spinfo.tesla.roles.core.access.IAnchoredElementAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IAnchoredElement If enabled, an annotation is not accepted if it matches an annotation from 'Filter'. If disabled, an annotation will be accepted if it matches an annotation from 'Filter' true false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html Filters annotations depending on their range within a signal. Each annotation in 'Reference' is compared to the annotations in 'Filter', and if an annotation with the same range was found, the annotation is accepted, otherwise, it is rejected. This behaviour can be inverted by modifying the configuration of this component. No external URL defined de.uni_koeln.spinfo.tesla.component.filter.SimplePOSFilter de.uni_koeln.spinfo.tesla.roles.filter.impl.tunguska.data.Filter de.uni_koeln.spinfo.tesla.roles.filter.impl.tunguska.access.RangeFilterAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter 1157803695 Filter General information about this role: Filters annotations by their type id If produced by 'Simple POS Filter': The generated filter, which can be applied to all annotations POS Tagger General information about this role: Assigns POS Tags to words If consumed by 'Simple POS Filter': The pos tags which define if an annotation from will be filtered. de.uni_koeln.spinfo.tesla.roles.categorizer.syntax.PosTagger 063592f0-5b17-4410-be26-cc130caa5250 -456038419 de.uni_koeln.spinfo.tesla.roles.categorizer.syntax.access.IPartOfSpeechAccessAdapter de.uni_koeln.spinfo.tesla.roles.categorizer.syntax.data.base.IPartOfSpeech If enabled, an annotation is not accepted if it matches an annotation from 'Filter'. If disabled, an annotation will be accepted if it matches an annotation from 'Filter' false false No Description available. N false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html No external URL defined Let only pass nouns that are no stopwords. de.uni_koeln.spinfo.tesla.component.filter.FilterWriter de.uni_koeln.spinfo.tesla.roles.linker.impl.hibernate.data.LinkedAnnotations de.uni_koeln.spinfo.tesla.roles.linker.impl.tunguska.access.DefaultLinkedAnnotationsAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter -1419049393 Linker General information about this role: Generates links from an annotation to a collection of annotations. Filter Filters annotations by their type id de.uni_koeln.spinfo.tesla.roles.filter.Filter f4833aae-0ca0-4c4b-964a-a473268bfdfd -751691479 de.uni_koeln.spinfo.tesla.roles.filter.access.IFilterAccessAdapter de.uni_koeln.spinfo.tesla.roles.filter.data.IFilter Filter Filters annotations by their type id de.uni_koeln.spinfo.tesla.roles.filter.Filter b4bd764a-00b4-4feb-82fb-bed690c68305 1157803695 de.uni_koeln.spinfo.tesla.roles.filter.access.IFilterAccessAdapter de.uni_koeln.spinfo.tesla.roles.filter.data.IFilter Filter Filters annotations by their type id de.uni_koeln.spinfo.tesla.roles.filter.Filter 1855141455 de.uni_koeln.spinfo.tesla.roles.filter.access.IFilterAccessAdapter de.uni_koeln.spinfo.tesla.roles.filter.data.IFilter Anchored Element Generator Generates anchored elements. de.uni_koeln.spinfo.tesla.roles.core.AnchoredElementGenerator 1a02f399-ff20-4c6f-848f-5285386c7f33 167533922 de.uni_koeln.spinfo.tesla.roles.core.access.IAnchoredElementAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IAnchoredElement If disabled (default), an annotation will be written if all filters accepted it. If enabled, it will be rewritten if one or more filters rejected it. false false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Department of Computational Linguistics, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html Rewrites annotations based on the result of one or more filters, in form of linked annotations. Useful for visualization or export. No external URL defined de.uni_koeln.spinfo.tesla.component.statistics.TfIdfCalculator de.uni_koeln.spinfo.tesla.roles.document.impl.data.Frequencies de.uni_koeln.spinfo.tesla.roles.document.impl.access.TunguskaStatisticsAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter 1391185792 TF/IDF General information about this role: Provides access to the term frequency/inverse document frequency statistics. Anchored Element Generator Generates anchored elements. de.uni_koeln.spinfo.tesla.roles.core.AnchoredElementGenerator 7f105afc-bc1d-4961-91b2-3094fa73d5a1 -1419049393 de.uni_koeln.spinfo.tesla.roles.core.access.IAnchoredElementAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IAnchoredElement If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Sprachliche Informationsverarbeitung http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html Calculates Term Frequency/Inverse Document Frequency of any data objects, based on their type id. http://en.wikipedia.org/wiki/Tf%E2%80%93idf documentvectorgenerator.DocumentVectorGeneratorComponent de.uni_koeln.spinfo.tesla.roles.vectorengine.data.impl.hibernate.LabeledDoubleVector de.uni_koeln.spinfo.tesla.roles.vectorengine.access.impl.tunguska.DoubleVectorAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter -69107424 Labeled Double Vector Generator General information about this role: Generates labeled double vectors. TF/IDF Provides access to the term frequency/inverse document frequency statistics. de.uni_koeln.spinfo.tesla.roles.document.statistics.TfIdf 1130ab3c-7b74-4107-b1b5-6e7f507ac137 1391185792 de.uni_koeln.spinfo.tesla.roles.document.access.ITfIdfAccessAdapter de.uni_koeln.spinfo.tesla.roles.document.data.IFrequencies Attribute Assigner Assigns attributes. de.uni_koeln.spinfo.tesla.roles.attributes.AttributeAssigner 86d32aee-84bc-4c0e-9932-b5f23a0f0418 -1178423057 de.uni_koeln.spinfo.tesla.roles.core.access.IAttributeValueMapAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IAttributeValueMap The minimal number of overall occurrences of a single annotation to be represented in a vector. 5 false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false geduldig none Sprachliche Informationsverarbeitung, Universität zu Köln No external URL defined DocumentVectorGenerator, a Tesla (http://www.tesla.uni-koeln.de) natural language processing component. No external URL defined de.uni_koeln.spinfo.tesla.component.kmeans.KMeansClustererComponent de.uni_koeln.spinfo.tesla.roles.cluster.impl.db4o.data.WeightedLabeledCluster de.uni_koeln.spinfo.tesla.roles.cluster.impl.db4o.access.WeightedClusterAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.db4o.DefaultDB4OOutputAdapter -1268185859 Weighted Clusterer General information about this role: Generates a weighted cluster. If produced by 'K-Means++ Clusterer': k different clusters. Each vector from 'Vectors' is assigned to exactly one cluster. Vector Generator General information about this role: Generates a vector representation of the processed data. If consumed by 'K-Means++ Clusterer': The vectors which will be clustered. de.uni_koeln.spinfo.tesla.roles.vectors.VectorGenerator 99e3cf93-9d23-44b7-b2f2-505b268f7ce1 -69107424 de.uni_koeln.spinfo.tesla.roles.vectorengine.access.IVectorAccessAdapter de.uni_koeln.spinfo.tesla.roles.vectorengine.data.IVector Random Seed for the cluster algorithm. Depending on this value, initial cluster centers will be chosen (see http://en.wikipedia.org/wiki/K-means++ for details). If set to -1, a new seed will be used each time the component is executed. Note that you might want to set 'resuable results' to 'false' in this case. 0 false The number of clusters. Choose this value with care, as the optimal number of clusters highly depends on the properties of the clustered data. If set to -1 (which is the default value), the number of clusters will be guessed with a simple rule of thumb: k = sqrt(n/2), where n is the number of vectors to cluster.See http://en.wikipedia.org/wiki/K-means_clustering and http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set for details. 3 false The maximum number of iterations. During each iteration, cluster centers may be modified and each vector may be assigned to a different cluster. If no cluster center was modified during an iteration, the algorithm terminates. Increasing this number will usually create better results, but also slow down the performance of the calcuation. 50 false The distance function to compare two vectors. de.uni_koeln.spinfo.tesla.roles.vectorengine.data.distance.CosinusDistanceCalculator de.uni_koeln.spinfo.tesla.roles.vectorengine.data.distance.IDistanceCalculator single class true Indicates, if members of clusters should be displayed in the report. true false If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. false false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Linguistic Data Processing, University of Cologne http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html A simple KMeans++ Clusterer, which assigns vectors to K different clusters. The implementation of this component uses the KMeansPlusPlusClusterer of the apache.commons.math project. http://en.wikipedia.org/wiki/K-means%2B%2B de.uni_koeln.spinfo.tesla.component.reader.SpinfoCorpusReader de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataImpl de.uni_koeln.spinfo.tesla.roles.dc.impl.hibernate.DublinCoreMetaDataAccessAdapterImpl de.uni_koeln.spinfo.tesla.annotation.adapter.hibernate.DefaultHibernateOutputAdapter 1043728585 Dublin Core Metadata Generator General information about this role: Generates Dublin Core metadata annotations. de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.hibernate.data.Paragraph de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TParagraphAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff -92067217 Paragraph Detector General information about this role: Detects paragraph boundaries. de.uni_koeln.spinfo.tesla.roles.core.impl.tunguska.access.AttributeValueMap de.uni_koeln.spinfo.tesla.roles.core.impl.tunguska.access.AttributeValueMapAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff -1178423057 Attribute Assigner General information about this role: Assigns attributes. If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. true false Jürgen Hermes jhermes@spinfo.uni-koeln.de Sprachliche Informationsverarbeitung No external URL defined Reads Files in Spinfo Corpus Format No external URL defined java.lang.String de.uni_koeln.spinfo.tesla.component.groupeval.PurityValidatorComponent de.uni_koeln.spinfo.tesla.roles.core.impl.hibernate.data.Token de.uni_koeln.spinfo.tesla.roles.tokenizer.impl.tunguska.access.TTokenizerAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff 1844926976 Tokenizer General information about this role: Detects linguistic tokens. Group Assigner Assigns annotations to containers (groups). de.uni_koeln.spinfo.tesla.roles.core.GroupAssigner 1835738706 de.uni_koeln.spinfo.tesla.roles.core.access.IContainerAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IContainer Anchored Element Generator Generates anchored elements. de.uni_koeln.spinfo.tesla.roles.core.AnchoredElementGenerator 86d32aee-84bc-4c0e-9932-b5f23a0f0418 -1178423057 de.uni_koeln.spinfo.tesla.roles.core.access.IAnchoredElementAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IAnchoredElement Group Assigner Assigns annotations to containers (groups). de.uni_koeln.spinfo.tesla.roles.core.GroupAssigner 5988d5a3-806e-4101-acc7-d781649b6325 -1523333280 de.uni_koeln.spinfo.tesla.roles.core.access.IContainerAccessAdapter de.uni_koeln.spinfo.tesla.roles.core.data.IContainer If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. true false Alena Geduldig -- Sprachliche Informationsverarbeitung http://www.spinfo.phil-fak.uni-koeln.de Calculates purity and Rand index of a set of groups (clusters or else) No external URL defined de.uni_koeln.spinfo.tesla.component.groupeval.ClusterGroupGenerator de.uni_koeln.spinfo.tesla.component.groupeval.RandomGroup de.uni_koeln.spinfo.tesla.component.groupeval.RandomGroupAccessAdapter de.uni_koeln.spinfo.tesla.annotation.adapter.tunguska.DefaultTunguskaOutputAdapter$ProtoStuff -1523333280 Group Assigner General information about this role: Assigns annotations to containers (groups). Clusterer Generates a cluster. de.uni_koeln.spinfo.tesla.roles.cluster.Clusterer 21a259a6-a646-49bc-bbfb-17d16c2ec391 -1268185859 de.uni_koeln.spinfo.tesla.roles.cluster.access.IClusterAccessAdapter de.uni_koeln.spinfo.tesla.roles.cluster.data.ICluster If false, this component will be executed whenever used in an experiment. If true, the annotations produced by this component earlier will be reused if the execution prerequesites did not change. true false Stephan Schwiebert sschwieb@spinfo.uni-koeln.de Sprachliche Informationsverarbeitung http://www.spinfo.phil-fak.uni-koeln.de/sschwieb.html Test select method No external URL defined TextClustering geduldia a none