Text preprocessing
The input to this workflow is plain text. The text is preprocessed so that non- alfanumeric symbols are removed, the text is transformed to to lower case and stop words are removed.
The workflow first removes the charachters from this set: `~!@#$%^&*()_+=-{}|\][":;'?><,./.
Then it transforms the text to lower case. The user will be prompted to select a dictionary for stop words from a list. The workflow will, based on the selected list, remove the stop words.
Stop words are words that do not carry meaning, like, the, an,... The web service for stop words removal integrates six English stop words dictionaries and one for the Slovenian language.
The output of the workflow is text in lower case without non-alfanumeric charachters and without stop words.
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1750/download?version=1
[ More Info ]
Workflow Components
Reviews (0)
Other workflows that use similar services (2)
Lemmatization (3)
Created: 2010-12-17 | Last updated: 2010-12-23
Credits: Petra Kralj Novak
Attributions: Select from a list of possible web service parameter values
Select from a list of possible web service... (1)
Created: 2010-12-23 | Last updated: 2010-12-23
Credits: Petra Kralj Novak Janez Kranjc
Comments (0)
No comments yet
Log in to make a comment