This library should make it easier to find recommendations and similarities between different things. There are a couple of use cases that I developed it for:
- Recommend a list of music albums/artists to a user
- Recommend an article that is similar to the current one that a user is reading
- Find other users that have the same values as another user (think matchmaking ;)
The easiest way to get this installed in your project is by using composer
$ composer require stojg/recommend
Assuming that we have some data where users have rated music artists within a scale of one to five:
$artistRatings = array(
"Abe" => array(
"Blues Traveler" => 3,
"Broken Bells" => 2,
"Norah Jones" => 4,
"Phoenix" => 5,
"Slightly Stoopid" => 1,
"The Strokes" => 2,
"Vampire Weekend" => 2
),
"Blair" => array(
"Blues Traveler" => 2,
"Broken Bells" => 3,
"Deadmau5" => 4,
"Phoenix" => 2,
"Slightly Stoopid" => 3,
"Vampire Weekend" => 3
),
"Clair" => array(
"Blues Traveler" => 5,
"Broken Bells" => 1,
"Deadmau5" => 1,
"Norah Jones" => 3,
"Phoenix" => 5,
"Slightly Stoopid" => 1
)
);
We then load this data into the Data class
$data = new \stojg\recommend\Data($artistRatings);
If we want to find artists that Blair might like, we execute the recommend method.
$recommendations = $data->recommend('Blair', new \stojg\recommend\strategy\Manhattan());
var_export($recommendations);
The result of that computation would be:
array (
0 => array (
'key' => 'Norah Jones',
'value' => 4,
),
1 => array (
'key' => 'The Strokes',
'value' => 2,
)
)
This means that Blair might like Norah Jones
and not like The Strokes
.
The Recommender
works by finding someone in the $artistRatings
that have rated artist similar to to Blair. In this
case it turns out to be Abe, so it then tries to find artists that Abe have rated but not Blair and return them
as a list of recommendations.
How the 'nearest neighbour' is found depends on which strategy that is chosen and how big and dense the dataset is.
The general rule is that the bigger the dataset is, the better. It have to be formatted as an array in the following format:
array(
'uniqueID' => array(
'objectID' => (int)'rating'
)
);
Where in the case of the previous artist rating example
* uniqueID = Blair
* objectID = Music Artist
* rating = an numeric value
There are currently three strategies and which one to pick depends on how the data is organized and populated.
If the data is dense (almost all objectID
s in the full data set have a non null rating) and the magnitude (rating) of the attributes values
are important, this is a good strategy.
I.e. all users have rated all music artists and they all agree on the same scale.
Use this strategy if the data is dense but the ratings are subject to grade-inflation.
I.e. if user A have rated all artists between 2-4 and user B have rated artists between 4-5 this strategy tries to compensate for the fact that the user A’s rating of 2 is equal to Users B’s 4.
This is the strategy to pick if the data is sparse.
I.e. If there is a list with ten thousand artists, it quite likely that the users only listened and rated a few of them.
There is a provided helper class for recommending articles that are similar to another article. The implementation is quite stupid, but it should give you a hint on how to expand this library with your own datasets.
$articleData = new \stojg\recommend\ArticleData();
$allArticles = getFromDatabase();
foreach($allArticles as $article) {
$articleData->push($article->id, $article->content);
}
$recommendedArticle = $articleData->recommend($articleID = 4);