- Middle East Technical University

Transkript

- Middle East Technical University
CORS: A Hybrid Music Recommender System
Gulfem Demir, Tugba Kaya, Cagatay Ogut, Ali Can Sag
Middle East Technical University, Ankara, Turkey
http://www.neo4j.org/learn/neo4j
Why do we need a recommender system?
Why CORS?
• Because CORS uses a graph database that has the ability to traverse deeply
among the all dataset faster than slow SQL queries that span many table joins
• Because CORS combines two main recommendation techniques into a hybrid
approach, where supplementary content features are employed to improve the
accuracy of collaborative filtering
• Because CORS is trained with huge amount of metadata (~300GB)
• To satisfy the need of discovery as a human
• To deal with the massive scale of music data, i.e. a library of over 15
million songs on demand for free on the web
• Because filters and guides are invaluable for music itself to coexist with
the new ways of getting at it
Introduction
As the digital content distribution rises, access to music collections started
skyrocketing. One million songs would take more than seven years of non-stop
listening. On the other hand, commercial music libraries exceed 15 million
songs, which is way greater than the listening capability of a single person. In
order to help the community cope with the rapidly growing catalogue of readily
available music, a wide range of academic efforts have been suggested to
automate search and retrieval of musical content. Computational
recommender systems have come into play to deal with this issue. They enable
people to share their opinions and benefit from each other’s experience.
2. Approaches
Most recommender systems take either of two basic approaches: collaborative
filtering or content-based filtering, which are compared in the Figure 2.
Basically, collaborative filtering arrives at a recommendation that's based on a
model of prior user behavior and content-based approach tries to recommend
items that are similar to those that a user liked in the past. In our project,
hybrid approach is used that combine collaborative and content-based filtering
are also increasing the efficiency (and complexity) of recommender systems.
Figure 1. Model of Recommendation Process
1. Dataset
The Million Song Dataset Challenge is a large scale, music recommendation
challenge, where the aim is to predict which songs a user will listen to, when
the listening history of the user is provided. The challenge is based on the
Million Song Dataset (MSD), that is used in our project, a freely- available
collection of meta data for one million of contemporary songs (e.g. song titles,
artists, year of publication, audio features, and much more). The Million Song
Dataset is also a cluster of complementary dataset contributed by the
community such as lyrics provided by MusiXmatch dataset, user data
provided by Taste Profile Subset.
Figure 2. Recommender System Approaches Comparison
CORS
1. Design and Implementation
CORS is a music recommendation system which provides users with real-time
song suggestions depending on their listening history. To be able to give
instant recommendations, CORS uses graph database which significantly
reduces the query time. Besides, in order to deliver richer content, CORS
adopts a hybrid approach combining collaborative filtering with content based
recommendations.
use the value of alpha as 0.3, whereas in item-based recommendation strategy,
the value of alpha was finalized as 0.15.
3. Evaluation
Dataset: We have used two separate data sets for evaluation. Training data set
is the one used in CORS for generating high quality recommendations. In order
to measure the accuracy of our recommender system, the test dataset provided
by the MSDC is used. Statistical data about these two collections is provided in
Figure 4.
Figure 4. Datasets
Figure 3. Neo4j Graph Database Structure [3]
2. Similarity Metrics
Collaborative filtering depends on similarity measure between users and
items. The cosine similarity is one of many similarity metrics available. It,
with a equals 0.5 in the following formulas, has the nice property to be
symmetric but, especially for the item case, we are more interested in
computing how likely it is that an item will be appreciated by a user when we
already know that the same user likes another item. It is clear that this
definition is not symmetric. That’s why a should not be equal to 0.5 in our
case.
Evaluation metrics: In the challenge it was asked to recommend 500 items (the
number is x) for each user. They will evaluate the results using mean average
precision, or MAP, metric. MAP@500 is also used in CORS so that we can
compare our results with the finalists in the challenge.
MAP is just an average of APs, or average precision, for all users. If we have 1000
users, we sum APs for each user and divide the sum by 1000. It is important to
underline that order matters in MAP, so it’s better to submit more certain
recommendations first, followed by recommendations we are less sure about.
Results: We have evaluated all three techniques, that is implemented for
generating recommendations, based on mean average precision.
U(i): The set of items rated by a generic user U, I(u): The set of users which have rated item,
a: parametrization variable that is between 0 and 1
As an alternative to the cosine similarity and, a parametric generalization of
the above similarity measures is proposed in CORS with the following
formulas. After some trials, in user-based recommendation, it was decided to
Acknowledgements
We would like to thank our advisors Dilek Onal, Prof. Dr. Ismail Hakki
Toroslu and Prof. Dr. Veysi Isler for their comments, which helped us
improve this project considerably.
Figure 5. Mean Average Precision Results
References
1. Aiolli F. A Preliminary Study on a Recommender System for the Million Songs Dataset
Challenge
2. McFee B., Ellis D. P.W., Bertin-Mahieux T., Lanckriet Gert R.G., The Million Song Dataset
Challenge
3. http://neo4j.com/blog/musicbrainz-in-neo4j-part-1/