Load PDF from directory
Created: 2010-02-19 08:59:01
Last updated: 2011-12-13 15:54:34
This workflow will automate the reading of a set of PDF files stored in a single directory (the path to which should be supplied as a single input value).
This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (0)
Titles (0)
Descriptions (0)
Dependencies (0)
Inputs (1)
Name |
Description |
pdfDirectoryPathIn |
|
Processors (3)
Name |
Type |
Description |
List_Files_by_Extension |
localworker |
Scriptclass FileExtFilter implements FileFilter {
public FileExtFilter(String ext) {
this.ext = ext;
}
public boolean accept(File file) {
return file.getName().endsWith(ext);
}
String ext = null;
}
if (extension == void || extension.equals("")) {
throw new RuntimeException(
"The 'extension' parameter cannot be null. Please enter a valid file extension.");
}
if (directory == void || directory.equals("")) {
throw new RuntimeException(
"The 'directory' parameter cannot be null. Please enter a valid file directory.");
}
File dirObj = new File(directory);
if (!dirObj.exists()) {
throw new RuntimeException("The 'directory' parameter specified:" + directory
+ "does not exist. Please enter a valid file directory.");
}
File[] fileObjList = dirObj.listFiles(new FileExtFilter(extension));
List filelist = new ArrayList();
for (int i = 0; i < fileObjList.length; i++) {
filelist.add(fileObjList[i].getAbsolutePath());
}
|
extension_value |
stringconstant |
Valuepdf |
binaryFileReader |
beanshell |
Scriptprivate void readAndWriteStreamsFully(InputStream source, OutputStream sink) {
//16kB byte buffer
byte[] buffer = new byte[1024 * 16];
int bytesRead = 0;
try {
while ((bytesRead = source.read(buffer)) != -1) {
sink.write(buffer, 0, bytesRead);
}
source.close();
sink.flush();
sink.close();
} catch (IOException e) {
throw new Exception("This binary file reader could not read from file \"" + absoluteFilePath + "\"");
}
}
try {
File f = new File(absoluteFilePath);
FileInputStream fis = new FileInputStream(f);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
readAndWriteStreamsFully(fis, baos);
fileContents = baos.toByteArray();
} catch (FileNotFoundException ex) {
throw new Exception("This binary file reader could not read from file \"" + absoluteFilePath + "\"");
}
|
Beanshells (1)
Name |
Description |
Inputs |
Outputs |
binaryFileReader |
|
absoluteFilePath
|
fileContents
|
Outputs (1)
Name |
Description |
pdfFileContentsOut |
|
Datalinks (4)
Source |
Sink |
pdfDirectoryPathIn |
List_Files_by_Extension:directory |
extension_value:value |
List_Files_by_Extension:extension |
List_Files_by_Extension:filelist |
binaryFileReader:absoluteFilePath |
binaryFileReader:fileContents |
pdfFileContentsOut |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (1)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (1)
Featured In Packs (1)
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment