Tuesday, February 21, 2017

Build your own Recommendation engine(Collaborative Filtering) on Oracle DV using Custom R-Scripts


In this blog we will discuss about a custom R-script that creates a Recommendation engine by performing collaborative filtering. Before we get into any details about this R-script let us understand what is Collaborative Filtering and Recommendation system/engine. Collaborative Filtering is a method of making automatic predictions(filtering) about the interests of a user by collecting preferences or taste information from multiple users(collaborate). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, then A is more likely to have B's opinion on a different issue/object than that of a randomly chosen person. So when you have to design a recommendation engine which recommends items to be purchased by a user say A based on his past purchases, it can perform collaborative filtering by checking who else bought same products as user A and what additional items were bought by those users and recommends those additional items to user A based on ratings. In addition to the recommendation, collaborative filtering can also predict what could be the possible Rating given to the recommended product by user A. This custom R-script can be downloaded from Oracle BI Public store. This is the R-Script to download :

                                                                   

In addition to the R-Script we have provided you a sample dva project which demonstrates how to use the R-Script. This is how the project looks like after importing the .dva file in DV Desktop:



How does this script work: This script performs Collaborative Filtering by taking data on purchases/subscriptions/movies watched along with the ratings and returns top N recommendations for users along with rating that is expected(predicted) to be given by the user for those recommended items. This script performs two kinds of collaborative filtering depending on the users' input and they work as follows:
1) User Based Collaborative Filtering (UBCF): Look for users who share the same rating patterns with the active user (the user whom the prediction is for). Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user.
2) Item Based Collaborative Filtering (IBCF): users who bought x also bought y : Build an item-item matrix determining relationships between pairs of items. Infer the tastes of the current user by examining the matrix and matching that user's data.
Please note that IBCF is resource consuming process, so we recommend to save and reuse the Recommender model incase you are using IBCF. This can be done by setting optional parameter reuse_savedmodel to "YES". If you are reusing the model, then please make sure that you are reusing it on identical data i.e., User and Item Names/Ids should be the same as stored in the model.

This script also provides the option to save the prediction model and reuse it later. If we are reusing the saved model, then the data using which the model is created/saved will act as train data and current data will act as the test data. Application of this script is not limited to datasets related Movies/Television it can be applied for other product segments like books and/or for products from different categories.

Inputs to the Script:
1) userid: Name/ID of the user
2) itemid: ID of the item.

3) rating: Rating given by user for this item.

Optional Inputs: 
1) topn: Top N recommendations to be returned for each user.
2) method: What is the collaborative filering method to be used. Options are UBCF and IBCF
3) reuse_savedmodel: Option to choose already saved model for prediction or to create a new model. If reuse_savedmodel is set to "YES", currently saved model will be reused. If no model exists as of now, a new model will be created. If reuse_savedmodel is "NO" a new model will be created even if a model exists.
4) model_directory: Place where the created model should be saved. Even if you choose not to reuse the saved model, please select a valid directory to save the model as the script requires the model to be saved on disk. I am choosing temp directory, so that I need not worry about cleaning it up manually every time. Make sure you have correct privileges on the directory.

Output: 
1) userid: Name/ID of the user
2) recommended_item: ID/name of the item recommended.
3) predicted_rating: Predicted rating for the recommended item.
4) dummy: Dummy output.

R Packages needed:
1) reshape2
2) recommenderlab


Steps to deploy this plugin in your local Oracle DV:

1) Install Advanced Analytics feature in Oracle DV by clicking on the below icon. This will install Oracle R deployment. Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in <DV_INSTALL_DIRECTORY>


2) If not installed reshape2 & recommenderlab Package already, please install it using following instructions:
    Open R console(double click Rgui.exe present in <Advanced_Analytics_Install_Dir>\bin\x64),
    install arules Package. Following are the R-commands to install:
     Set Proxy:
        $ Sys.setenv(http_proxy="<your_proxy_host>:<port_number>")
           set proxy appropriate to your network config.
     Install Package(updated instructions):
        $ install.packages("reshape2")
        $ install.packages("recommenderlab")
3) Download Collaborative_Filtering_V1.zip from OracleBI Public Store and unzip it.
4) Copy R.CollaborativeFiltering.xml to <DV_INSTALL_DIRECTORY>\OracleBI1\bifoundation\advanced_analytics\script_repository
5) Create a directory Model_dir under D drive. This is to save the model files. If you intend to save the model files in a different directory, then please change the value of model_directory parameter in inputs to EVALUATE_SCRIPT function in DV.
6) Import the .dva project to Oracle DV. Password for the .dva is Admin123

No comments:

Post a Comment