Text preprocessing
(1)
The input to this workflow is plain text. The text is preprocessed so that non- alfanumeric symbols are removed, the text is transformed to to lower case and stop words are removed.
The workflow first removes the charachters from this set: `~!@#$%^&*()_+=-{}|\][":;'?><,./.
Then it transforms the text to lower case. The user will be prompted to select a dictionary for stop words from a list. The workflow will, based on the selected list, remove the stop words.
Stop words are...
Created: 2011-01-07
| Last updated: 2011-01-07
Credits:
Petra Kralj Novak