Displaying the most relevant advertisements to web users based on their visited pages - algorithm

I am asked to develop a pop-up ad display system for website. What it does is that the website will record the urls that the users visit and display the most relevant pop-up ad for them.
The website administrator first needs to define some groups (e.g. "Golfers", "Video game players") and then define some rules such as:
If the user visits the url pattern http://www.domain.com/golf-clubs/* and stay on that page for more than 10 seconds, he will be assigned to the Golfers group.
Also, the website administrator can create ads and assign them to different groups. For example, he can create a golf club promotion ad for users with the Golfers group. When the user visits the website again, the system will check if he belongs to any group(s) and display the most relevant ad for him.
For the user identification part, I am going to simply use cookies, which is to assign an unique cookie to every new website visitor.
The difficult part for me is to design the logic about which pop-up ad to display when the user belongs to multiple groups. For example, if he belongs to both Golfers and Video game players groups. Rather than randomly choosing one to display, is there a better way to handle such situation?
I have come up with a solution that I don't know if it's good or not. That is when a user is assigned a group it also comes with a score for that group. For example, if a user belongs to both Golfers and Video game players groups but he has a higher score for the Golfers group, the system will display a Golfers group ad for him as the first priority.
But this creates another difficult problem, how should the group scores calculated for each user? I also need to account for that recent page visits are of more importance, for example, maybe the user was a golfer and belongs to the Golfers group with a very high score but he recently visits a lot of video game web pages and gets assigned the Video game players group, how many scores should he get in this group?
Any thoughts would be appreciated.

Your problem is really close to some Operating System problems. For example when it decides about what to keep in cache and what to delete. Both "number" and "time" of visits influence the decision, and of course there are plenty of policies to select.
Here I try to make one in order to show how they work. I want to make it simple and manageable, so I use weights for time wand number of visits v. For each category, keep number of visits n and their relative times t. Then calculate sum of weight of time divided by relative times (except the expired ones) plus times of visits multiplied by corresponding weight: w/t+n*v.
Larger t leads to smaller score, while larger n (number of visits) improves the score.

Related

algorithm: match an user available to join the game to the closest user to him

my project matches 2 stranger users. so i have a database containing the status of the users and when an user is available for the match, i match him with another stranger user.
I added a function according to which if users shares their location, the user asking for the match is matched with the closest user to him.
Actually, despite i have more than 600 active users per minute, when an user is available for the match, he is matched with the user closest to him, but since there are no other preferences the queue is always made of 2 users, so the closest user to him is the first one still available and not matched.
example: i join the game ( i am available) and waiting for another user. Another user joins the game and he is the closest to me, but just because he is the only one.
i would like to make it more real. i was thinking for example to match users only if the distance is less than 200km, but again the problem is that the user is matched with the first user inside a range of 200km, but maybe 1 second later could have been available other users with a distance lower than the previous user.
example: i join the game, other users joins the game but they are not in the range of 200km. Finally an user joins the game in the range of 200km and he is matched with me, but probably the user coming one second later could have been much more near than me.
how could i make it more real? an idea to develop a better algorithm.
I would introduce some parameters:
match quality: in your case this just seems to be the distance between the two users, but you could also make it dependent on the waiting time, e.g. by multiplying with a function on elapsed seconds, so that the longer you have to wait the more acceptable are longer distances (with the siginificant disadvantage of having to recalculate distances constantly)
quality theshold: if the match quality between two users is below this threshold they cannot be matched
maximum waiting time: prevents users from waiting too long, if the match quality improves over time then this parameter could be left out (it would be implicitly defined by the maximal distance, the quality formula and the threshold), but the behavior of the queue is more transparent with it
minimum waiting time: allows the queue to fill up (only for the user to be matched, not for the one he is matched with)
queue threshold: if the number of users in the queue is above this threshold, match the first user in the queue immediately ignoring the minimum waiting time
optional: a second higher threshold for the quality above which users are matched immediately to reduce waiting times a little, so everytime you add a user to the queue you could calculate the match quality with the first one in the queue and match immediately if above the threshold
If your goal is to make it fair/balanced for all the users, then it is probably best to always match the first user in the queue with his best match (when he meets matching conditions). The disadvantage is that someone from a remote area might block the queue for "maximum waiting time". But this seems to be the most feasible way, because users are garanteed to be matched and not waiting forever (what could happen if you always tried to find the best pairings in the queue, not just the best match for the first user in the queue).

How to manage multiple positive implicit feedbacks?

When there are no ratings, a common scenario is to use implicit feedback (items bought, pageviews, clicks, ...) to suggests recommendations. I'm using a model-based approach and I wondering how to deal with multiple identical feedback.
As an example, let's imagine that consummers buy items more than once. Should I have to consider the number of feedback (pageviews, items bought, ...) as a rating or compute a custom value ?
To model implicit feedback, we usually have a mapping procedure to map implicit user feedback into the explicit ratings. I guess in most domains, repeated user action against the same item indicates that the user's preference over the item is increasing.
This is certainly true if the domain is music or video recommendation. In a shopping site, such a behavior might indicate the item is consumed periodically, e.g., diapers or printer ink.
One way I am aware of to model this multiple implicit feedback is to create a numeric rating mapping function. When the number of times (k) of implicit feedback increases, the mapped value of rating should increase. At k = 1, you have a minimal rating of positive feedback, for example 0.6; when k increases, it approaches 1. For sure, you don't need to map to [0,1]; you can have integer ratings, 0,1,2,3,4,5.
To give you a concrete example of the mapping, here is what they did in a music recommendation domain. For short, they used the statistic info of the items per user to define the mapping function.
We assume that the more
times the user has listened to an artist the more the user
likes that particular artist. Note that user’s listening habits
usually present a power law distribution, meaning that a few
artists have lots of plays in the users profile, while the rest
of the artists have significantly less play counts. Therefore,
we compute the complementary cumulative distribution of
artist plays in the users’ profile. Artists located in the top
80-100% of the distribution are assigned a score of 5, while
artists in the 60-80% range assign a score of 4.
Another way I have seen in the literature is to create another variable besides a binary rating variable. They call it confidence levels. See here for details.
Probably not that helpful for OP any longer, but it might be for others in the same boat.
Evaluating Various Implicit Factors in E-commerce
Modelling User Preferences from Implicit Preference Indicators via Compensational Aggregations
If anyone knows more papers/methods, please share as I'm currently looking for state of the art approaches to this problem. Thanks in advance.
You typically use a sum of clicks, or some weighted sum of events, as a "score" for each user-item pair in implicit feedback systems. It's not a rating, and that's more than a semantic distinction. You won't get good results if you feed these values into a process that's expecting rating-like and trying to minimize a squared-error loss.
You treat 3 clicks as adding 3 times the value of 1 click to the user-item interaction strength. Other events, like a purchase, might be weighted much more highly than a click. But in the end it also adds to a sum.

Parse.com: get rank within leaderboard

Actually my question is the same as stated here https://parse.com/questions/issue-with-using-countobjects-on-class-with-large-number-of-objects-millions which is a 9 months old thread that unfortunately never got a real answer.
I have a parse backend for a mobile game, and I want to query the global rank of the user based on his score.
How can I query this, assuming I have more than 1000 users, and a lot more score entries?
Also, I want to be able to query the all-time rank as well as the last-24-hours rank.
Is this even possible with parse?
If you need to do large queries like that, you should redesign your solution. Keep a leaderboard class that gets updated with cloud code; either in an afterSave hook, or with a scheduled job that runs regularly. Your primary focus must be on lightning fast lookups. As soon as your lookups start getting large or calculation heavy, you should redesign your backend for scalability.
Solution for all time and time based leader boards is
You should keep records all users score in separate entity suppose LeaderBoard and maintain top users of all time leader board or time base leader board users list in separate entity suppose TopScoreUsers. In this way you do not need to run query every time on actual leader board entity(i.e LeaderBoard) to get your top users.
Suppose your app/game has all time leader board that should show 100 top users only. When any user submit score then you check first, is this user fall in top 100 users list (from TopScoreUser), if yes then add it in separate entity named TopScoreUser (if this list already has 100 users then find place of current user in this list and remove last record) and then update user's own score in LeaderBoard entity otherwise only update user score in LeaderBoard entity.
Same case is for Time based leader board.
For user rank, recommended way is to use some calculation or formula to determine approximately rank otherwise it is tough to find/manage user actual rank if users are in millions.
Hopefully this will help you.

User activity algorithm

I am not sure if this is the right place to ask this question. Here it goes, I am looking to calculate a user activity score for a game.
Consider this:
User A wants to attack a User B...Z
Users B....Z can attack any of Users A...Z
I need a way to sort Users B...Z to User A, so they can choose who to attack. We want to make sure that everyone gets to play this game. Which means each user gets advertised as a potential opponent to user A, based on a score determined by:
The user B...Z has already been attacked (this user's score should drop, because we want to display someone who hasn't been attacked)
User B...Z responds to the attack (this user's score should increase, since they actually are playing the game)
What is the simplest algorithm to calculate such a score?
The simplest algorithm is a simple formula:
Rank = weight1 * TimesResponded - weight2 * TimesAttacked
In which both weights are positive constants, which you can adapt to change the gameplay a little bit.

Algorithm for Rating Objects Based on Amount of Votes and 5 Star Rating

I'm creating a site whereby people can rate an object of their choice by allotting a star rating (say 5 star rating). Objects are arranged in a series of tags and categories eg. electronics>graphics cards>pci express>... or maintenance>contractor>plumber.
If another user searches for a specific category or tag, the hits must return the highest "rated" object in that category. However, the system would be flawed if 1 person only votes 5 stars for an object whilst 1000 users vote an average of 4.5 stars for another object. Obviously, logic dictates that credibility would be given to the 1000 user rated object as opposed to the object that is evaluated by 1 user even though it has a "lower" score.
Conversely, it's reliable to trust an object with 500 user rating with score of 4.8 than it is to trust an object with 1000 user ratings of 4.5 for example.
What algorithm can achieve this weighting?
A great answer to this question is here:
http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
You can use the Bayesian average when sorting by recommendation.
I'd be tempted to have a cutoff (say, fifty votes though this is obviously traffic dependent) before which you consider the item as unranked. That would significantly reduce the motivation for spam/idiot rankings (especially if each vote is tied to a user account), and also gets you a simple, quick to implement, and reasonably reliable system.
simboid_function(value) = 1/(1+e^(-value));
rating = simboid_function(number_of_voters) + simboid_function(average_rating);

Resources