This workflow does four things:
1. it retrieves documents relevant for the query string
2. it discovers entities in those documents, these are considered relevant entities
3. it filters proteins from those entities (on the tag protein_molecule)
4. it removes all terms from the list produced by 3 (query terms temporarily considered proteins)
ToDo
* Replace step 4 by the following procedure:
1. remove the query terms from the output of NER (probably by a regexp matching on what is inside the tag, possibly case-insensitive)
2. remove tag_as_protein_molecule (obsolete)
* Add synonym service/workflow
Note that Remove_inputquery has an alternative iteration strategy (dot product instead of cross product). Idem for 'Join' in 'SplitQuery'.
year:(2007^10 2006^9 2005^8 2004^7 2004^6 2003^5 2002^4 2001^3 2000^2 1999^1)
StringBuffer temp=new StringBuffer();
temp.append("+(");
temp.append(query_string);
temp.append(") +");
temp.append(priority_string);
String lucene_query = temp.toString();
query_string
priority_string
lucene_query
Lucene query string