From PDF to lemmatized text

Created: 2010-09-16 10:09:58      Last updated: 2012-01-18 10:27:27

This workflow uses the web service stationed in JSI (IJS Slovenia), which is based on Matjaž Juršič's LemmaGen - lemmatization engine.

The workflow accepts a PDF file as an input an uses James Eales's wrokflows to preprocess the data. The workflow interactively asks the user of which language is the text, since the lemmatization process is language based. The output is a string in Taverna Workbench.

Information Preview

Information Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/1516/download?version=1
[ More InfoExpand ]


Information Workflow Components

Information Authors (0)
Information Titles (0)
Information Descriptions (0)
Information Dependencies (0)
Inputs (1)
Processors (13)
Beanshells (1)
Outputs (1)
Datalinks (14)
Coordinations (0)

Information Workflow Type

Taverna 2

Information Uploader

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (2)

(People/Groups)

Information Attributions (2)

(Workflows/Files)

Information Tags (2)

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

 

Citations (0)

None


Version History

In chronological order:



Reviews Reviews (0)

No reviews yet

Be the first to review!



Comments Comments (0)

No comments yet

Log in to make a comment




Workflow Other workflows that use similar services (7)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.


Workflow Terms from collection of PDF files (2)

Thumb
This workflow will give you a set of candidate terms for each PDF document in a user-specified directory. You can also specify a c-value threshold that will restrict the terms to those with higher scores. This workflow was created using only nested workflows.  These workflow components work on their own and can be linked together to form more complex workflows such as this. You can view the text mining workflow components in this pack. If you receive errors when running this workflow t...

Created: 2010-02-19 | Last updated: 2011-12-13

Credits: User James Eales

Workflow PDF to plain text (1)

Thumb
This workflow will extract the plain text content of PDF files supplied to the input port.  You can connect the Load PDF from directory workflow to this workflows input. We recommend you send the output from this workflow to the Clean plain text workflow, because the PDF to text process can add characters into the text that are XML-invalid and therefore can not be sent to most services as plain text.  Another way round this problem is to encode the text as Base64 using the handy loc...

Created: 2010-02-19 | Last updated: 2011-12-13

Credits: User James Eales