Success-Abandonment-Classification
Created: 2008-02-06 14:35:41
Last updated: 2008-07-02 17:15:25
Retrieves data from FLOSSmole and from the Notre Dame SourceForge repository to compute project statistics based on releases, downloads and project lifespan. Project statistics are then used to classify projects according to the criteria set up in English & Schweik, but comparison criteria are parameterized so that a different set of criterion thresholds can be used to evaluate the project characteristics.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Inputs (1)
Name |
Description |
project_unixname |
|
Processors (13)
Name |
Type |
Description |
Class_Analysis |
rshell |
Author: Andrea Wiggins
Provides simple proportions for the classes of projects as output from the classification. |
delist_classtypes |
local |
Takes classtype output from RShell and takes them out of list format and into CSV instead. |
release_lag_threshold |
stringconstant |
Desired time between releases so that the releases are not made "too fast" for a sustainable rate of growth. Unit: days, integer values only. |
release_recency_threshold |
stringconstant |
Threshold for recency of last release, or how recently the newest release was made, as a signal of project activity. Unit: days, integer values only. |
release_count_threshold |
stringconstant |
Minimum number of releases to be considered a success. |
Stages_Analysis |
rshell |
Author: Andrea Wiggins
Provides simple proportions for the stages of projects as output from the classification. |
delist_stages |
local |
Takes stage output from RShell and takes them out of list format and into CSV instead. |
initiation_age_threshold |
stringconstant |
The threshold for how long a project may remain in the "initiation" stage without having produced a release, and still be considered not abandoned. Unit: days, integer values only |
release_rate_type |
stringconstant |
Allows switching between three versions of deriving the release rate values for comparison to a threshold to determine whether the releases are too frequent for sustainable growth; values should be first_last, average_rate, or recent_density. |
mortality_threshold |
stringconstant |
Length of time for a project to be considered abandoned, if no releases have been made by this time. Unit: days, integer values only. |
download_threshold |
stringconstant |
|
Classification |
rshell |
Author: Andrea Wiggins
Performs comparisons between variables and threshold inputs, and then classifies the project according to these comparisons. Makes a null value check for each variable and records "null" for the derived data points; in classification, any project with null values that interfere with classification are returned with a classtype of "other".
The classification outputs are data for analysis, and a set of classified data which provide the full detail for each project, including all retrieved data, derived data, and the project classification. |
GetData |
workflow |
Author: James Howison
Subworkflow process to take project unixname and fetch the appropriate data for classification from two data sources, FLOSSmole and the Notre Dame SourceForge repository. Returns a CSV line of data for classification. |
Beanshells (17)
Name |
Description |
Inputs |
Outputs |
BuildFLOSSmoleURLQueryString |
|
sf_unixname
|
url_query_string
|
MatchSFURL |
|
result_row
sf_unixname
|
has_sf_url
|
ConvertSQLDateToXSDDateTime |
|
sql_date
|
xsd_datetime
|
split_SQL_results |
|
result_row
|
aggregate_downloads
lifespan_days
data_for_date
|
buildFLOSSmoleStatisticsQueryString |
|
sf_unixname
|
queryString
|
TimeBetweenLastAndCutoff |
|
datetime_1
datetime_2
|
seconds_between
|
TruncateReleasesList |
|
release_datetimes
cutoff_date
|
trunc_release_datetimes
|
GetFirstRelease |
|
datetimes
|
chosen_datetime
|
GetLastRelease |
|
index_wanted
datetimes
|
chosen_datetime
|
GetReleaseForDensityCalc |
|
index_wanted
datetimes
|
chosen_datetime
|
CalcTimeBetweenFirstAndLast |
|
datetime_1
datetime_2
|
seconds_between
|
CalcDensityLength |
|
datetime_1
datetime_2
|
seconds_between
|
count_releases |
|
releases
|
release_count
|
buildQueryWhere |
|
sf_unixname
|
where_clause
|
MergeReleasesToYamlArray |
|
datetimes
|
releases_xml
|
ConvertEpochToXSD |
|
epoch
|
xsdDateTime
|
GatherCSV |
|
downloads_list
lifespan_list
sf_unixname_list
release_count_list
time_last_and_cutoff_list
release_density_list
time_first_and_last_list
has_sf_url_list
|
out_csv
|
Outputs (3)
Name |
Description |
Classification_Output |
|
Analysis_Output |
|
Stages_Output |
|
Links (17)
Source |
Sink |
Classification:classtypes |
delist_classtypes:stringlist |
Classification:stages |
delist_stages:stringlist |
GetData:data_for_classification_csv |
Classification:data_for_classification |
delist_classtypes:concatenated |
Class_Analysis:classtypes |
delist_stages:concatenated |
Stages_Analysis:stages |
download_threshold:value |
Classification:download_threshold |
initiation_age_threshold:value |
Classification:initiation_threshold |
project_unixname |
GetData:sf_unixname |
mortality_threshold:value |
Classification:mortality_threshold |
release_count_threshold:value |
Classification:release_count_threshold |
release_count_threshold:value |
GetData:num_releases_threshold |
release_lag_threshold:value |
Classification:release_lag_threshold |
release_rate_type:value |
Classification:release_rate_type |
release_recency_threshold:value |
Classification:release_recency_threshold |
Class_Analysis:analysis_output |
Analysis_Output |
Classification:classified_data |
Classification_Output |
Stages_Analysis:analysis_output |
Stages_Output |
Uploader
License
All versions of this Workflow are
licensed under:
Version 2
(of 3)
Credits (2)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (1)
Featured In Packs (1)
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (3)
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (1)
Log in to make a comment
I quite the way this workflow is designed. The author used nested-workflows to favor modularity, and the data links seems to be carefully designed. On the downside, I couldnt execute teh workflow, which may be due to the fact that some constituent services are no longer available. Would be good if the author (or someone knowledgeable) to repair and create a new version of this workflow :-)
khalid