I am working on a site where there are nodes that users can vote on (upvotes/downvotes). Each node has a list of subnodes. Users can vote on each of these subnodes as well. Think of the relationship between posts and comments on reddit.
What ranking algorithms are there that will help me sort nodes based on their own score as well as the score of their subnodes? I've looked at the reddit ranking algorithm for "hot" but unfortunately I don't see how I would factor in subnode ranking.
It depends on what sorting strategy you want.
Using stable sorting algorithm you could do as follows:
- sort list by subnodes (say, by best ranked subnode)
- sort list by nodes
Chosing sortings algorithm itself is quite a different task. You could look at:
Wikipedia for descriptions of all kinds of algoritms
Any site with alogrithms implementations, say algo.pw
Related
I was talking with my dad, he asked me:
"What algorithm you can use, so your online store can choose the best warehouse to send a product from?"
My reply was:
"Oh that is easy, just use Dijkstra, and use as weights for the nodes the monetary cost of the transportation, maybe with time factored in, the starting node being the client house"
Then he asked:
"Alright, but his list has a lot of different items, and many warehouses don't have all of them, you have to choose multiple warehouses to send the items from."
And then my brain froze.
So what algorithm I would use for that? I know I can brute-force Dijkstra, running many possibilities and then computing a result from the results of previous Dijkstra runs but is there a algorithm that can calculate multiple paths that result in the best overall result possible?
The question asked by interview Panel..Web application user can select the favorite sports, one user has a number of Favourite sports.e.g User John has Favourites sport as Football, Soccer, Tennis. User Alen has favorites sport as BaseBall,BasketBall.
Consider Million of users, Which algorithm in Data structure uses to search users associated with Football or scoccer.
First I gave an answer as HashMap but interview panel told me it cause Memory issue, another way I can use Binary Search tree, but he is not satisfied with the answer.
Can anyone please explain to me what is a good way to get all users with favorite sport using DS algorithm.
The easiest solution is to use HashMap by mapping user as key and list of sports as value as you mentioned. And its very common to come with this solution first at interview to see if it satisfies the interviewers.
A better solution can be building a graph of users and sports where there will be uni-directional edge from sport item node i.e. football to user nodes i.e. foo, bar. For any query for sport item i.e. football, we can traverse the graph considering football as source node. All the nodes which will be traversed are the list of users whose one of the favorite sports is football. This way, it will space efficient.
Considering the time complexity of traversing the graph for each query, where time complexity of graph traversal is O(E) where E = all users in worst case. So, we can cache some frequent query result using HashMap. Again, to cope with space, we can simulate LRU cache with HashMap.
Hope it helps!
I am working on a problem of Clustering of Results of Keyword Search on Graph. The results are in the form of Tree and I need to cluster those threes in group based on their similarities. Every node of the tree has two keys, one is the table name in the SQL database(semantic form) and second is the actual values of a record of that table(label).
I have used Zhang and Shasha, Klein, Demaine and RTED algorithms to find the Tree Edit Distance between the trees based on these two keys. All algorithms use no of deletion/insertion/relabel operation need to modify the trees to make them look same.
**I want some more matrices of to check the similarities between two trees e.g. Number of Nodes, average fan outs and more so that I can take a weighted average of these matrices to reach on a very good similarity matrix which takes into account both the semantic form of the tree (structure) and information contained in the tree(Labels at the node).
Can you please suggest me some way out or some literature which can be of some help?**
Can anyone suggest me some good paper
Even if you had the (pseudo-)distances between each pair of possible trees, this is actually not what you're after. You actually want to do unsupervised learning (clustering) in which you combine structure learning with parameter learning. The types of data structures you want to perform inference on are trees. To postulate "some metric space" for your clustering method, you introduce something that is not really necessary. To find the proper distance measure is a very difficult problem. I'll point in different directions in the following paragraphs and hope they can help you on your way.
The following is not the only way to represent this problem... You can see your problem as Bayesian inference over all possible trees with all possible values at the tree nodes. You probably would have some prior knowledge on what kind of trees are more likely than others and/or what kind of values are more likely than others. The Bayesian approach would allow you to define priors for both.
One article you might like to read is "Learning with Mixtures of Trees" by Meila and Jordan, 2000 (pdf). It explains that it is possible to use a decomposable prior: the tree structure has a different prior from the values/parameters (this of course means that there is some assumption of independence at play here).
I know you were hinting at heuristics such as the average fan-out etc., but you might find it worthwhile to check out these new applications of Bayesian inference. Note, for example that within nonparametric Bayesian method it is also feasible to reason about infinite trees, as done e.g. by Hutter, 2004 (pdf)!
I was wondering how an algorithm might work when you input the search for an item in Amazon. I'm guessing that the more popular seller might have a higher rank, and maybe the more the number of products sold, the higher the rank. But is that it? Surely when a user from a different country types the same query, different results will be observed right? What are the factors that influence such a ranking? What technique or algorithm is used in the implementation of ranking?
Thanks!
I'd like to set up a system that crowd sources the best 10 items from a set that can vary from 20-2000 items (the ranking amoung the top ten is not important). There is an excellent stackoverflow post on algorithms for doing the actual sort at
How to rank a million images with a crowdsourced sort. I am leaning toward asking users which they like best between two items and then using the TrueSkill algorithm.
My question is given I am using something like TrueSkill, what is the best algorithm for deciding which pairs of items to show a user to rate? I will have a limited number of opportunities to ask people which items they like best so it is important that the pairs presented will give the system the most valuable information in identifying the top 10. Again, I am mostly interested in finding the top ten, less so how the rest of the items rank amongst themselves or even how the top ten rank amongst themselves.
This problem is very similar to organizing a knock-out tournament where skills of the players are not well known and number of players is very high (think school level tennis tournaments). Since round robin ( O(n^2) matches) is very expensive, but a simple knock-out tournament is too simplistic, the usual option is to go with k-elimination structure. Essentially, every player (in your context a item) is knocked out of contention after losing k games. Take a look at the double elimination structure: http://en.wikipedia.org/wiki/Double-elimination_tournament .
Perhaps you can modify it sufficiently to meet your needs.
Another well known algorithm for this was produced to calculate rankings in Go or Chess tournaments. You can have a look at the MacMahon Algorithms which calculate such pairings and the ranks at the same time. It should be possible to truncate this algorithm, so that it will only produce a set of 10 best Items.
You can find more details in Christian Gerlach's thesis, where he describes the actual optimization algorithm (unfortunately the thesis is in German).