Friday, May 27, 2011

Mahout Recommendations with Data Sets containing Alpha Numeric Item Ids


In real world data we can’t always ensure that the input data supplied to us in order to generate recommendations should contain only integer values for User and Item Ids. If these values or any one of these are not integers the default data models that mahout provides won’t be suitable to process our data. Here let us consider the case where out Item ID is Strings we’d define our custom data model. In our data model we need to override a method in order to read item id as string and convert the same into long and return the unique long value

Data Model Class

import java.io.File;
import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;

public class AlphaItemFileDataModel extends FileDataModel {
      private final ItemMemIDMigrator  memIdMigtr = new ItemMemIDMigrator();
     
      public AlphaItemFileDataModel(File dataFile) throws IOException {
            super(dataFile);       
      }

      public AlphaItemFileDataModel(File dataFile, boolean transpose) throws IOException {
            super(dataFile, transpose);
      }

      @Override
      protected long readItemIDFromString(String value) {
            long retValue =  memIdMigtr.toLongID(value);
            if(null == memIdMigtr.toStringID(retValue)){
                  try {
                        memIdMigtr.singleInit(value);
                  } catch (TasteException e) {
                        e.printStackTrace();
                  }
            }
            return retValue;
      }
   
      String getItemIDAsString(long itemId){
            return memIdMigtr.toStringID(itemId);
      }
}

Class that defines the map to store the String to Long values

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.AbstractIDMigrator;

      public  class ItemMemIDMigrator extends AbstractIDMigrator {
       
        private final FastByIDMap<String> longToString;
       
        public ItemMemIDMigrator() {
          this.longToString = new FastByIDMap<String>(100);
        }
       
        @Override
        public void storeMapping(long longID, String stringID) {
          synchronized (longToString) {
            longToString.put(longID, stringID);
          }
        }
       
        @Override
        public String toStringID(long longID) {
          synchronized (longToString) {
            return longToString.get(longID);
          }
        }
        public void singleInit(String stringID) throws TasteException {
            storeMapping(toLongID(stringID), stringID);
        }
       
      }

In your Recommender implementation you can use this Data Model class instead of the default file data model to accept an input that contains alpha numeric Item Ids. Similar you can device the code to form a data model that would accommodate alpha numeric User Ids as well.

3 comments: