Designing Leaderboard Scores in DDD - microservices

There are two entities around this problem, mainly
Leaderboard - Which holds the info about type(lowest first/highest first), description, name etc.
Score - The score value submitted by the player which holds player details along with score value
Usecases:
We need to fetch scored who are top 10
For a monthly leaderboard, we need to find top 3
Domain Rules:
A player can submit any number of scores
The leaderboard ranking needs to be based on the type defined in leaderboard (lowest/highest)
For such a system where
Leaderboard and scores has 1 to many relationship
Score needs to have info about the player information(which is a separate aggregate root and in different Bounding Context)
How to design it in DDD?
Scenario 1:
Does Leaderboard will be aggregate root and Scores will be added through Leaderboard aggregate root (for every score)?
Queries:
Here, scores doesn't have a meaning without Leaderboard, and also no domain rules insist to add a score via Leaderboard aggregate root. This is in-fact a dilemma and how to handle this?
How to get the Player details to feed in score? Do I need to fetch the player details in a domain service and feed the Leaderboard Aggregate root while adding the score?
Scenario 2:
Leaderboard and LeaderboardScore are two different Aggregate roots.
Queries:
While calculating ranks, we need to fetch scores from score aggregate root and type info from leaderboard and fulfil the use-case?
Here most of the use-case serving code needs to be in Domain Service or Application Service?

I would approach it with score and leaderboard being their own aggregates. Score changes publish domain events which get fed (asynchronously, since eventual consistency is probably OK) to update the leaderboard.

Related

Displaying the most relevant advertisements to web users based on their visited pages

I am asked to develop a pop-up ad display system for website. What it does is that the website will record the urls that the users visit and display the most relevant pop-up ad for them.
The website administrator first needs to define some groups (e.g. "Golfers", "Video game players") and then define some rules such as:
If the user visits the url pattern http://www.domain.com/golf-clubs/* and stay on that page for more than 10 seconds, he will be assigned to the Golfers group.
Also, the website administrator can create ads and assign them to different groups. For example, he can create a golf club promotion ad for users with the Golfers group. When the user visits the website again, the system will check if he belongs to any group(s) and display the most relevant ad for him.
For the user identification part, I am going to simply use cookies, which is to assign an unique cookie to every new website visitor.
The difficult part for me is to design the logic about which pop-up ad to display when the user belongs to multiple groups. For example, if he belongs to both Golfers and Video game players groups. Rather than randomly choosing one to display, is there a better way to handle such situation?
I have come up with a solution that I don't know if it's good or not. That is when a user is assigned a group it also comes with a score for that group. For example, if a user belongs to both Golfers and Video game players groups but he has a higher score for the Golfers group, the system will display a Golfers group ad for him as the first priority.
But this creates another difficult problem, how should the group scores calculated for each user? I also need to account for that recent page visits are of more importance, for example, maybe the user was a golfer and belongs to the Golfers group with a very high score but he recently visits a lot of video game web pages and gets assigned the Video game players group, how many scores should he get in this group?
Any thoughts would be appreciated.
Your problem is really close to some Operating System problems. For example when it decides about what to keep in cache and what to delete. Both "number" and "time" of visits influence the decision, and of course there are plenty of policies to select.
Here I try to make one in order to show how they work. I want to make it simple and manageable, so I use weights for time wand number of visits v. For each category, keep number of visits n and their relative times t. Then calculate sum of weight of time divided by relative times (except the expired ones) plus times of visits multiplied by corresponding weight: w/t+n*v.
Larger t leads to smaller score, while larger n (number of visits) improves the score.

Scoring categories from web logs

I am building a scorer for individual scoring for categories on a website.
Input : userid, category
Output : user id, score_cat_1, score_cat_2 etc...
The score are given on 10.
My plan is to first count for each user how many clicks for each categories, then to divide the results in quantile (maybe a thousand), to finally use a cluster algorithm for each categories quantiles to clsuter them in 10 clusters, who will be ordered, and give the rate.
The idea is to group the quantiles who are close together in a same cluster and get a more interesting score than only saying "the 10% best clickers get a 10, the next 10% get a 9 etc...
My problems are following:
1- do you think it is a good idea? Is there more natural and accurate way to do it?
2- the cluster may be too small, and I can't guarantee the cardinal on each cluster.

How to manage multiple positive implicit feedbacks?

When there are no ratings, a common scenario is to use implicit feedback (items bought, pageviews, clicks, ...) to suggests recommendations. I'm using a model-based approach and I wondering how to deal with multiple identical feedback.
As an example, let's imagine that consummers buy items more than once. Should I have to consider the number of feedback (pageviews, items bought, ...) as a rating or compute a custom value ?
To model implicit feedback, we usually have a mapping procedure to map implicit user feedback into the explicit ratings. I guess in most domains, repeated user action against the same item indicates that the user's preference over the item is increasing.
This is certainly true if the domain is music or video recommendation. In a shopping site, such a behavior might indicate the item is consumed periodically, e.g., diapers or printer ink.
One way I am aware of to model this multiple implicit feedback is to create a numeric rating mapping function. When the number of times (k) of implicit feedback increases, the mapped value of rating should increase. At k = 1, you have a minimal rating of positive feedback, for example 0.6; when k increases, it approaches 1. For sure, you don't need to map to [0,1]; you can have integer ratings, 0,1,2,3,4,5.
To give you a concrete example of the mapping, here is what they did in a music recommendation domain. For short, they used the statistic info of the items per user to define the mapping function.
We assume that the more
times the user has listened to an artist the more the user
likes that particular artist. Note that user’s listening habits
usually present a power law distribution, meaning that a few
artists have lots of plays in the users profile, while the rest
of the artists have significantly less play counts. Therefore,
we compute the complementary cumulative distribution of
artist plays in the users’ profile. Artists located in the top
80-100% of the distribution are assigned a score of 5, while
artists in the 60-80% range assign a score of 4.
Another way I have seen in the literature is to create another variable besides a binary rating variable. They call it confidence levels. See here for details.
Probably not that helpful for OP any longer, but it might be for others in the same boat.
Evaluating Various Implicit Factors in E-commerce
Modelling User Preferences from Implicit Preference Indicators via Compensational Aggregations
If anyone knows more papers/methods, please share as I'm currently looking for state of the art approaches to this problem. Thanks in advance.
You typically use a sum of clicks, or some weighted sum of events, as a "score" for each user-item pair in implicit feedback systems. It's not a rating, and that's more than a semantic distinction. You won't get good results if you feed these values into a process that's expecting rating-like and trying to minimize a squared-error loss.
You treat 3 clicks as adding 3 times the value of 1 click to the user-item interaction strength. Other events, like a purchase, might be weighted much more highly than a click. But in the end it also adds to a sum.

Parse.com: get rank within leaderboard

Actually my question is the same as stated here https://parse.com/questions/issue-with-using-countobjects-on-class-with-large-number-of-objects-millions which is a 9 months old thread that unfortunately never got a real answer.
I have a parse backend for a mobile game, and I want to query the global rank of the user based on his score.
How can I query this, assuming I have more than 1000 users, and a lot more score entries?
Also, I want to be able to query the all-time rank as well as the last-24-hours rank.
Is this even possible with parse?
If you need to do large queries like that, you should redesign your solution. Keep a leaderboard class that gets updated with cloud code; either in an afterSave hook, or with a scheduled job that runs regularly. Your primary focus must be on lightning fast lookups. As soon as your lookups start getting large or calculation heavy, you should redesign your backend for scalability.
Solution for all time and time based leader boards is
You should keep records all users score in separate entity suppose LeaderBoard and maintain top users of all time leader board or time base leader board users list in separate entity suppose TopScoreUsers. In this way you do not need to run query every time on actual leader board entity(i.e LeaderBoard) to get your top users.
Suppose your app/game has all time leader board that should show 100 top users only. When any user submit score then you check first, is this user fall in top 100 users list (from TopScoreUser), if yes then add it in separate entity named TopScoreUser (if this list already has 100 users then find place of current user in this list and remove last record) and then update user's own score in LeaderBoard entity otherwise only update user score in LeaderBoard entity.
Same case is for Time based leader board.
For user rank, recommended way is to use some calculation or formula to determine approximately rank otherwise it is tough to find/manage user actual rank if users are in millions.
Hopefully this will help you.

Ranking/ weighing search result

I am trying to build an application that has a smart adaptive search engine (lets say for cars). If I search for for 4x4 then the DB will return all the 4x4 cars I have (100 cars) - but as time goes by and I start checking out cars, liking them, commenting on them, etc the order of the search result should be the different. That means 1 month later when searching for 4x4, I should get the same result set ordered differently as per my previous interaction with the site. If I was mainly liking and commenting on German cars, BMW should be on the top and Land cruiser should be further down.
This ranking should be based on attributes that I captureduring user interaction (eg: car origin, user age, user location, car type[4x4, coupe, hatchback], price range). So for each car result I get, I will be weighing it based on how well it is performing on the 5 attributes above.
I intend to use the DB just as a repository and do the ranking and the thinking on the server. My question is, what kind of algorithm should I be using to weigh/rank my search result?
Thanks.
You're basically saying that you already have several ordering schemes:
Keyword search result
amount of likes for car's category
likely others, such as popularity, some form of date, etc.
What you do then is make up a new scheme, call it relevance:
relevance = W1 * keyword_score + W2*likes_score + ...
and sort by relevance. Experiment with the weights W1, W2, ..., until you get something you find useful.
From my understanding search engines work on this principle. It's been long thrown around that Google has on the order of 200 different inputs into the relevance score, PageRank being just one. The beauty of this approach is that it lets you fine tune the importance of everything (even individually for every query), and it lets you add additional inputs without screwing everything up.

Resources