Extract proteins using a gi - output as fasta file

None

//Note that this bean script disregards all geneids and their products

//Import the data from the input port as a String
	String input = xml_result;

//Initialize variables
        ArrayList gis = new ArrayList();
	ArrayList ids = new ArrayList();
	ArrayList products = new ArrayList();
        ArrayList seqns = new ArrayList();
	String fastas = "";
	String protein_prod = "";

//Loop that continues to execute while there exists a GBQualifier_name tag
        while (input.contains("")){

		//Finds the value associated with the qualifier name
		int start = input.indexOf("") + 18;
        	int end = input.indexOf("");
	        String output = input.substring(start, end);
	        input = input.substring(end + 19);

	        //Stores the protein product
		if(output.equals("product")){
        	        start = input.indexOf("") + 19;
                	end = input.indexOf("");

	                output = input.substring(start, end);
	                input = input.substring(end + 20);

			protein_prod = output;
		}

		//Stores the protein id
		if(output.equals("protein_id")){
                	start = input.indexOf("") + 19;
	                end = input.indexOf("");

        	        output = input.substring(start, end);
                	input = input.substring(end + 20);

	                ids.add(output);
		}

		//Stores the gi number
		if(output.equals("db_xref")){
                	start = input.indexOf("") + 19;
	                end = input.indexOf("");

			output = input.substring(start, end);
	                input = input.substring(end + 20);

			String temp = output.substring(0,3);
			if(temp.equals("GI:")) {
		
				String temp2 = output.substring(3);
			        gis.add(temp2);
	                	products.add(protein_prod);
			}
            	}

		//Stores the AA sequences
             	if(output.equals("translation")){
                	start = input.indexOf("") + 19;
                	end = input.indexOf("");

                	output = input.substring(start, end);
	                input = input.substring(end + 20);

        	        seqns.add(output);
            	}
	}

//Coverts the protein data into fasta format
	for(int n = 0; n < gis.size(); n++) {

		String value = ">gi|" + gis.get(n) + "|ref|" + ids.get(n) + "|" + products.get(n);

		boolean flag1 = true;
		int index = 0;

		while(flag1) {
			try {
                    		seqns.get(n).toString().substring(index , index+70);
                    		value = value + "\n" + seqns.get(n).toString().substring(index, index+70);
		                    index = index+70;
			}
			catch(Exception e){
        		        String temp = seqns.get(n).toString().substring(index);
		                if(temp.length()%70!=0)
                		{
		                        value = value + "\n" + seqns.get(n).toString().substring(index);
                		}
		                flag1 = false;
			}
            	}

            	fastas = fastas + value + "\n";  
        }

//Outputs the fasta data to the output port
fastas_out = fastas;

Name	Description
fasta_file_path	This is where the workflow will save the fasta file
id	The workflow uses the gi no to retrieve the xml genbank entry

Source	Sink
Get_Nucleotide_GBSeq_XML:outputText	Extraction_of_info:xml_result
fasta_file_path	Write_Text_File:outputFile
Extraction_of_info:fastas_out	Write_Text_File:filecontents
id	Get_Nucleotide_GBSeq_XML:id
Write_Text_File:outputFile	file_output

Extract proteins using a gi - output as fasta file

Preview

Run

Run this Workflow in the Taverna Workbench...

Workflow Components

Script

Script

Script

Reviews (0)

Comments (0)

Other workflows that use similar services (0)