This workflow is a replication of the analysis from an OSCon 2004 presentation by Megan Conklin, entitled "Do the Rich Get Richer?" to demonstrate scale-free distribution of FLOSS developers among projects.
The workflow retrieves the current number of active developers (for the most recent calculation of said statistic) from the FLOSSmole database. It summarizes and plots the distribution of developers to projects, on both a straight and log-log scale. It also generates a flat list of the developer counts for visualization in other software.
Changes the number of developers from the list format into a delimited format.
,
org.embl.ebi.escience.scuflworkers.java.StringListMerge
SQL query to fetch developer data from the FLOSSmole database. Retrieves data for all projects which have available data in the database. Please inquire from ossmole-discuss@lists.sourceforge.net for database access.
jdbc:mysql://thor.sdsc.edu/ossmole_next
com.mysql.jdbc.Driver
SELECT active_developers FROM ossmole_merged.project_alltime_statistics WHERE active_developers IS NOT NULL
FALSE
net.sourceforge.taverna.scuflworkers.jdbc.SQLQueryWorker
Flattens the SQL list.
org.embl.ebi.escience.scuflworkers.java.FlattenList
Produces a list of the data points for plotting with other software.
##A. Wiggins 2008
devnum <- read.csv(textConnection(dev_nums),header=FALSE,row.names=NULL)
dev_num <- as.vector(devnum, mode="integer")
points <- as.matrix(summary.factor(dev_num))
point_list <- paste(capture.output(points), collapse="\n")
dev_nums
point_list
Plots the distributions of developers to projects on a straight scale.
## A. Wiggins 7/14/08
## Plot distribution of numbers of developers
## read the data
devnum <- read.csv(textConnection(dev_numbers),header=FALSE,row.names=NULL)
dev_num <- as.vector(devnum, mode="integer")
## plot the data
png(dist_plot, width=800);
#dist_plot <- hist(dev_num,col="red",main="Distribution of Number of Developers in FLOSS Projects",xlab="Number of Developers",ylab="Projects")
datapoints <- as.table(summary.factor(dev_num))
dist_plot <- plot(datapoints,xlim=range(dev_num),col="blue",type="o",main="Distribution of Number of Developers in FLOSS Projects",xlab="Number of Developers (links)",ylab="Number of Projects (nodes)",pch=20)
dev.off()
dev_numbers
dist_plot
Plots the distribution of developers to projects on a log-log scale.
## A. Wiggins 7/14/08
## Plot distribution of numbers of developers
## read the data
devnum <- read.csv(textConnection(dev_numbers),header=FALSE,row.names=NULL)
dev_num <- as.vector(devnum, mode="integer")
## plot the data
png(log_plot, width=800);
datapoints <- as.vector(summary.factor(dev_num))
log_plot <- plot(datapoints,log="xy",col="blue",type="o",xlim=c(1,1000),main="Distribution of Number of Developers in FLOSS Projects",xlab="Number of Developers (links)",ylab="Number of Projects (nodes)",pch=20)
dev.off()
dev_numbers
log_plot
Calculates summary statistics for the distribution.
## A. Wiggins 7/14/08
## Script to provide a summary of the distribution of developer numbers
## read the data
devnum <- read.csv(textConnection(dev_numbers),header=FALSE,row.names=NULL)
dev_num <- as.vector(devnum, mode="integer")
## summarize the data
dev_dist <-paste(capture.output(summary(dev_num)), collapse="\n")
dev_numbers
dev_dist
Descriptive stats summary of developer-project distribution.
PNG plot of the distribution of developers to projects.
PNG graphic plot of log-log distribution of developers to projects.
List of points for plotting in other software.