Recommendation Systems (quick blurb)

Recommendation Systems

Simple Preliminary Solution

Recommendation systems fall generally into 2 categories, content based and collaborative.

Content based recommendation systems look at the actual content of the objects you wish to recommend and suggests similar content. For example, if we wish to recommend movies, and we see that a user tends to enjoy action films, a content based recommendation system will recommend more action films, or films that have some intersection with the set of action films.

In this example here, one would recommend some action/comedy, and some action/horror since those categories bleed into action a little bit.
This same method of recommendation can be done using the actors, directors and other related variables.
This type of recommendation system is easiest to implement as long as there are descriptive and related ways of describing the content you wish to recommend (i.e. not every item is completely unique), however it does not broadly cover an entire space. For example, if a user can potentially enjoy something that is a completely different genre, then this system of recommendation probably would not be able to recommend it.

Collaborative recommendation systems utilize data pulled from other users to make recommendations. So, instead of simply looking at the content, we look at the behaviour of users instead. This method thus does not require the content to be similar to be recommended and instead captures what other users enjoy. Lets go back to our original movie recommendation problem.

A ___A bronx Tale
   \__Dictator
B __/

Consider the example above, where user A enjoys “A Bronx Tale”, and “The Dictator”. Note that “A Bronx Tale” is a mafia/crime/drama movie, and “The Dictator” is comedy. Now suppose user B enjoys “The Dictator”. A collaborative recommendation system would also recommend “A Bronx Tale”, since user A also enjoyed it. Note that the recommendation system did not care about the genre.
This system of recommendation would require many user interactions to be able to soundly recommend anything, especially when the set of items that need to be recommended is very large, since it depends on whether a user has interacted with it.

Taking the factors of both into consideration, its best to start with a content based recommendation when user interaction is not freely and highly available. In the case of merln, a preliminary recommendation system would take place via tags that describe each tile and graph. More on this in the next section.

Initial Solution

The initial solution is to specify numerous tags with each tile and graph that describe the content and its origin. For example, “Display Audience By Age/Gender” would have the following tags: analytics, Audience, Display, Age, Gender. The first tag, analytics describes the source of the data, the rest describe the content. Now, using a content type recommendation system, we would try to find tiles that contain as many tags as the ones specified here. This would give us tiles such as “Display Audience by Geography/Device”, “Search Audience by Age/Gender”, or perhaps even mastercharts such as “Display Impressions” and so on. The best recommendations would be based on what has the most number of similar tags. To be more concrete, suppose we have a set of tags $T_a$ , we want to find a set $T$ such that $\cap T_a|$ is maximized. In python this can be achieved with sets in an algorithm like this:

T_a = set(get_tags(tile_id))
T_best = set(get_tags(tile_id_initial))
score = len(T_a.intersection(T))
for T in tags:
	if len(T_a.intersection(T)) > score:
		T_best = T
		score = len(T_a.intersection(T))

Ideal Solution

Once we begin to extract real user data, we can begin to make smarter recommendations. We can combine both content based and collaborative methods using a method called “Graph Convolutional Matrix Completion” (GCMC). This method essentially takes a graph’s adjacency matrix, as well as some features and tries to “complete” the adjacency matrix, giving us a full graph with potential recomendations within it. As an example, consider the graph below:

0.8

0.1

0.3

0.6

0.1

In the graph above, we know nothing about the interactions between A with F and G, and similarly we know nothing about the interactions between E with B and D. We wish to predict how they would interact, i.e. we wish to find:

In both of these examples, the left side of the graphs represent users, and the right side represents items (tiles, graphs etc.)

The original graph can be described with a matrix as such:

  | B		C		D		F		G
--|-----------------------------------	
A |	0.8		0.1		0.1		0		0
E |	0		0.6		0		0.3		0.1

We wish to fill in the 0s with some values.

Note that each of these items have their own features. For example a user can have information such as their age, gender, and more, and an item can have a genre, tag and more. To make any inference to be able to accurately fill in the 0s, we must use this information. To achieve this, graph encoding and decoding is used.

Graph encoding essentially takes each node of a graph and generates a low dimensional representation of it in $Rn\mathbb{R}^n$ . Decoding takes this representation and tries to remake the original graph.

enter image description here

GCMC utilizes a special encoding and decoding method to achieve the above result. One application of this method goes as follows:

Neural network:

Graph Convolutional Network

Dense

Bilinear Mixture

Output

Here the output is the completed matrix.

There are however, few considerations for this method. One important one being that for the above method to work, we need discrete ratings (e.g. 1 to 5 or 0 to 10). In Merln however, if we don’t ask clients to rate tiles and only look at how they interact with them, we can create our own discrete ratings based on the metric we wish to measure interactions with. For example, if we measure clicks as the metric for interaction, we can organize the clicks into buckets (for example 0-10 clicks, 11-20 clicks, …) and use each bucket’s sorted index value as a rating instead (essentially, lower buckets will yield a lower rating, and higher buckets will yield a higher rating). Another thing to consider is that the side info needs to be vectorized. This becomes a challenge when considering text related values such as tags. This is tackled using any vectorization approach such as bag of words (since the set of words that we wish to use is not large, relatively speaking). Another consideration is training time, and data needed to effectively train the model. Since this is a complicated neural network, quite a bit of data is needed to train the model. This is one of the main reasons why this is not an initial solution, but a solution to consider at a large enough scale with many users.

Sources

https://arxiv.org/pdf/1706.02263.pdf
https://users.cecs.anu.edu.au/~akmenon/papers/autorec/autorec-paper.pdf