Boolean data Sets
Input data set that doesn’t have a preference value, ie input data set would be of the format UserId1,ItemId1
UserId2,ItemId2
Here it’d based on some data where an user either likes an item or he doesn’t, there is no preference value associated with this.
When we use Boolean data sets we need to appropriately choose the Similarity algorithms and Recommenders
Similarity Algorithms
For Boolean data sets we can either go in for Tanimoto Coefficient Similarity or Log Likelihood Similarity
Recommender
We need to use Generic Boolean Pref User Based Recommender or Generic Boolean Pref Item Based Recommender
Sample codes for generating User based and Item Based recommendations are given below
Used Based Recommender for Boolean Data Sets
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
public class UserRecommender {
public static void main(String args[])
{
// specifying the user id to which the recommendations have to be generated for
int userId=510;
//specifying the number of recommendations to be generated
int noOfRecommendations=5;
//specifying theNeighborhood size
double thresholdValue=0.7;
try
{
// Data model created to accept the input file
FileDataModel dataModel = new FileDataModel(new File("D://input.txt"));
/*TanimotoCoefficientSimilarity is intended for "binary" data sets
where a user either expresses a generic "yes" preference for an item or has no preference.*/
UserSimilarity userSimilarity = new TanimotoCoefficientSimilarity(dataModel);
/*ThresholdUserNeighborhood is preferred in situations where we go in for a
similarity measure between neighbors and not any number*/
UserNeighborhood neighborhood =new ThresholdUserNeighborhood(thresholdValue, userSimilarity, dataModel);
/*GenericBooleanPrefUserBasedRecommender is appropriate for use when no notion
of preference value exists in the data. */
Recommender recommender =new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, userSimilarity);
//calling the recommend method to generate recommendations
List<RecommendedItem> recommendations =recommender.recommend(userId, noOfRecommendations);
//
for (RecommendedItem recommendedItem : recommendations)
System.out.println(recommendedItem.getItemID());
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TasteException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Item Based Recommender for Boolean Data Sets
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.recommender.ItemBasedRecommender;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
public class ItemRecommender {
public static void main(String args[])
{
// specifying the user id to which the recommendations have to be generated for
int userId=510;
//specifying the number of recommendations to be generated
int noOfRecommendations=5;
try
{
// Data model created to accept the input file
FileDataModel dataModel = new FileDataModel(new File("D://input.txt"));
/*Specifies the Similarity algorithm*/
ItemSimilarity itemSimilarity = new LogLikelihoodSimilarity(dataModel);
/*Initalizing the recommender */
ItemBasedRecommender recommender =new GenericItemBasedRecommender(dataModel, itemSimilarity);
//calling the recommend method to generate recommendations
List<RecommendedItem> recommendations =recommender.recommend(userId, noOfRecommendations);
//
for (RecommendedItem recommendedItem : recommendations)
System.out.println(recommendedItem.getItemID());
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TasteException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Note: For more details on basic recommendations refer generating recommendations with mahout
GenericItemBasedRecommender or GenericBooleanPrefItemBasedRecommender ??
ReplyDeleteHi JoNorman
ReplyDeleteYou can and you should use GenericBooleanPrefItemBasedRecommender since it is a latest and updated implementation. The code samples on this blog was based out of a previous release of mahout(0.4). (I was working on the same kind of an year ago). The earlier version had just GenericItemBasedRecommender no GenericBooleanPrefItemBasedRecommender.
I tried above code but it displays following error
ReplyDeletejava.io.FileNotFoundException: \org\apache\mahout\cf\taste\example\grouplens\input.txt
how to solve it? I tried all options:
1)File ratingsFile = new File("/org/apache/mahout/cf/taste/example/grouplens/input.txt");
DataModel dataModel = new FileDataModel(ratingsFile);
2)FileDataModel dataModel = new FileDataModel(new File("/org/apache/mahout/cf/taste/example/grouplens/input1.csv"));
Hi Ashish
ReplyDeleteIt is just a minor issue with the path of your input file. Did you try an ls on the path from CLI and confirmed that the file is present at the mentioned location
ls -l /org/apache/mahout/cf/taste/example/grouplens/
If you get the desired output from the above ls command then the code should run without any exceptions.
Please revert if you still have any issues.
I was just starting with Mahout and found this approach pretty good for what I'm trying to do. My question is this. I have a dataset of users liking or disliking an item. How would the input file look like. currently its in the form userid,itemid,(1/-1) depending on user following or not following.
ReplyDeleteHi Aditya
ReplyDeleteYour data set is of boolean type. The better approach is to use a pre processor that removes all records that have a preference value as -1. Then use it with the recommendation algorithms available in mahout.
Replacing the "-1" with "0" might be a good idea as the zero will ensure that the said user is not recommended that item.
ReplyDeleteIn mahout we need to save vectors elements as double values while we have binary data and this makes files very large and takes large memory. I have a binary dataset 20k * 200k. How can I reduce the clustering (kmeans) memory usage. Is there any dimension reduction algorithm? Note that I need Manhattan distance on binary data and the reduced dimensions should maintain that characteristics
ReplyDeleteThanks for the great article!
ReplyDeleteBut what if we have large dataset (for example 10^6 rows of user_id item_id)? How to speed up recommendation calculations? For me it takes about a minute
Hi Bejoy
ReplyDeleteyou have mentioned that
input data set would be of the format UserId1,ItemId1
UserId2,ItemId2
r u saying that the data need not have a binary value(preferring the like or dislike?)
is an input file of below mentioned format correct?
UserId2,ItemId2
1, 10
1, 20
2, 30
2, 40
Why i was asking this is because, when i set the prefernce value(as binary 1 or 0) for User Based Recommendation system i am not getting any recommendations.
ReplyDeletewhereas for the Item Based recommendations i am getting the predictions.
i shall experiment the rest of the things mean while
Thanks for this article, it has helped me a lot
When I am using an attribute having string values in the training data for a Recommender in Mahout, I am getting a NumberFormatException which is happening during the building of the FileDataModel from the data in the file. If the string attribute value is "1.0" which is basically a number represented as string, then the Recommender is not throwing the NumberFormatException. But if the attribute value is "Washington", then the NumberFormatException is thrown.
ReplyDeleteIs there any solution by which I can pass string attribute values as itemID/userID in the training data for Recommenders in Mahout?
This comment has been removed by the author.
DeleteI've tried to use the `GenericItemBasedRecommender` with several similarity types (llr, tanimoto, euclidean) - AND it always gave me the SAME results.. (precision, recall, etc)..
ReplyDeleteMay it be a bug? Has anyone encountered this problem?
Pls help.