Content based recommender
This process is a special case of the item to item similarity matrix based recommender where the item to item similarity is calculated as cosine similarity over TF-IDF word vectors obtained from the textual analysis over all the available textual data.
The inputs to the process are context defined macros: %{id} defines an item ID for which we would like to obtain recommendation and %{recommender_no} defines the required number of recommendations. The process internally uses an example set of items containing item ID and an arbitrary amount of textual attributes.
This process essentially selects only textual attributes which are then used as an input for text mining operator, Process Documents from Data. This operator lowers the case of the text, tokenizes it, filters out short and long tokens, filters out stopwords and in the end does stemming based on Porter’s algorithm. The resulting tokens are then filtered for their appearance in the data: tokens appearing in more than 30% or less than 1% are filtered out. The result of the analysis is an example set of TF-IDF word vectors and a bag of words. The bag of words is used to create a TF-IDF vector for the requested item. Afterwards, using the cosine similarity/distance, we calculate the distance between the requested item TF-IDF vector and all other items vectors. First %{recommender_no} items with their distance score are outputted as a final result.
The output of the process is an example set consisting of two attributes: recommendation and score of the recommendation.
Preview
Run
Not available
Workflow Components
Unavailable
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment