Text preprocessing

Created: 2011-01-07 16:13:58      Last updated: 2011-01-07 16:17:13

The input to this workflow is plain text. The text is preprocessed so that non- alfanumeric symbols are removed, the text is transformed to to lower case and stop words are removed.

The workflow first removes the charachters from this set: `~!@#$%^&*()_+=-{}|\][":;'?><,./.

Then it transforms the text to lower case. The user will be prompted to select a dictionary for stop words from a list. The workflow will, based on the selected list, remove the stop words.
Stop words are words that do not carry meaning, like, the, an,... The web service for stop words removal integrates six English stop words dictionaries and one for the Slovenian language.

The output of the workflow is text in lower case without non-alfanumeric charachters and without stop words.

Information Preview

Information Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/1750/download?version=1
[ More InfoExpand ]


Information Workflow Components

Information Authors (1)
Information Titles (1)
Information Descriptions (1)
Information Dependencies (0)
Inputs (1)
Processors (10)
Beanshells (1)
Outputs (1)
Datalinks (11)
Coordinations (0)

Information Workflow Type

Taverna 2

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (1)

(People/Groups)

Information Attributions (0)

(Workflows/Files)

None

Information Tags (3)

Log in to add Tags

Information Shared with Groups (1)

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

 

Citations (0)

None


Version History

In chronological order:



Reviews Reviews (0)

No reviews yet

Be the first to review!



Comments Comments (0)

No comments yet

Log in to make a comment




Workflow Other workflows that use similar services (2)

Workflow Lemmatization (3)

Thumb
The workflow lemmatizes the text in the input port. Takes text as input and returns (language dependent) lemmatized text as output. All the words in the resulting text are in the same order as in the original text, but they are transformed to their dictionary form. The workflow asks for the language of lemmatization. Currently, 12 languages are supported: en,sl,ge,bg,cs,et,fr,hu,ro,sr,it,sp.

Created: 2010-12-17 | Last updated: 2010-12-23

Credits: User Petra Kralj Novak

Attributions: Workflow Select from a list of possible web service parameter values

Workflow Select from a list of possible web service... (1)

Thumb
The workflow for selecting from a list of possible web service parameter values has two input ports: the wsdl address of the web service and the variable name. It parses the web service wsdl description (the web service http://ropot.ijs.si/webservices/janez/getvalues.php?wsdl does that) and then it asks the user to select one value from a drop-down menu. This workflow is very useful when web services have inputs which expect as a parameter one value from a list of possible values.

Created: 2010-12-23 | Last updated: 2010-12-23

Credits: User Petra Kralj Novak User Janez Kranjc