Data Structure search record - data-structures

The question asked by interview Panel..Web application user can select the favorite sports, one user has a number of Favourite sports.e.g User John has Favourites sport as Football, Soccer, Tennis. User Alen has favorites sport as BaseBall,BasketBall.
Consider Million of users, Which algorithm in Data structure uses to search users associated with Football or scoccer.
First I gave an answer as HashMap but interview panel told me it cause Memory issue, another way I can use Binary Search tree, but he is not satisfied with the answer.
Can anyone please explain to me what is a good way to get all users with favorite sport using DS algorithm.

The easiest solution is to use HashMap by mapping user as key and list of sports as value as you mentioned. And its very common to come with this solution first at interview to see if it satisfies the interviewers.
A better solution can be building a graph of users and sports where there will be uni-directional edge from sport item node i.e. football to user nodes i.e. foo, bar. For any query for sport item i.e. football, we can traverse the graph considering football as source node. All the nodes which will be traversed are the list of users whose one of the favorite sports is football. This way, it will space efficient.
Considering the time complexity of traversing the graph for each query, where time complexity of graph traversal is O(E) where E = all users in worst case. So, we can cache some frequent query result using HashMap. Again, to cope with space, we can simulate LRU cache with HashMap.
Hope it helps!

Related

Algorithm for matching people together based on likes and dislikes

I have a group of about 75 people. Each user has liked or disliked the other 74 users. These people need to be divided in about 15 groups of various sizes (4 tot 8 people). They need to be grouped together so that the groups consist only of people who all liked eachother, or at least as much as possible.
I'm not sure what the best algorithm is to tackle this problem. Any pointers or pseudo code much appreciated!
This isn't formed quite well enough to suggest a particular algorithm. I suggest clustering and "clique" algorithms, but you'll still need to define your "best grouping" metric. "as much as possible", in the face of trade-offs and undefined desires, is meaningless. Your clustering algorithm will need this metric to form your groups.
Data representation is simple: you need a directed graph. An edge from A to B means that A likes B; lack of an edge means A doesn't like B. That will encode the "likes" information in a form tractable to your algorithm. You have 75 nodes and one edge for every "like".
Start by researching clique algorithms; a "clique" is a set in which every member likes every other member. These will likely form the basis of your clustering.
Note, however, that you have to define your trade-offs. For instance, consider the case of 13 nodes consisting of two distinct cliques of 4 and 8 people, plus one person who likes one member of the 8-clique. There are no other "likes" in the graph.
How do you place that 13th person? Do you split the 8-clique and add them to the group with the person they like? If so, you do split off 3 or 4 people form the 8? Is it fair to break 15 or 16 "likes" to put that person with the one person they like -- who doesn't like them? Is it better to add the 13th person to the mutually antagonistic clique of 4?
Your eval function must return a well-ordered metric for all of these situations. It will need to support adding to a group, splitting a large group, etc.
It sounds like a clustering problem.
Each user is a node. If two users liked each other, there is a path between the nodes.
If users disliked each other, or one like another but not the other way around, then there is no path between those nodes.
Once you process the like information into a graph, you will get a connected graph (maybe some nodes will be isolated if no one likes that user). Now the question becomes how to cut that graph into clusters of 4-8 connected nodes, which is a well studied problem with a lot of possible algorithms:
https://www.google.com/search?q=divide+connected+graph+into+clusters
If you want to differentiate between the cases when two people dislike each other vs one person likes another and that person dislikes the first one, than you can also introduce weight on the path - each like is +1, and dislike is -1. Then the question becomes that of partitioning a weighted graph.

Binary Search Tree order of insert when we have an estimate ahead of time

I am currently reading "Algorithms Fourth Edition" by Robert Sedgewick and Kevin Wayne. I have question about:
3.2.5 Suppose that we have an estimate ahead of time of how often search keys are to be accessed in a BST, and the freedom to insert items in any order that we desire. Should the keys be inserted into the tree in increasing order, decreasing order of likely frequency of access, or some other order? Explain your answer.
Obviously, the perfect situation is to have at root the item that is most accessed, then at its direct ancestors, the next items in terms of frequency of access and so on and so on.
However, this is BST and as such we have to insert them according to its specifics.
Shall I consider in this task all the combinations?
For example, if we have items 1 (accessed 1000 times), 2 (999 times) and 3 (999). The tree with root 2 and ancestors 1 and 3 is the best solution I think.
Its sounds logical to me, to strive to have balanced tree. However this again depends on the input. If we have smallest item accessed 10 000 times and next 10 items accessed only once, then the smallest shall be root and the tree will not
be perfectly balanced.
I will also appreciate guidance, not direct answers.
Neither inserting the keys in increasing or decreasing order of likely frequency of access will be the optimal approach.
This problem is known as creating an optimal-binary search tree (Optimal BST).
For more details, refer to https://github.com/reneargento/algorithms-sedgewick-wayne/blob/master/src/chapter3/section2/Exercise5.txt

Algorithm: find friends from a list of users

Scenario: in my app, users could follow a post. They will get notified whenever their friends liked the post. The problem gets nontrivial when there are thousands of users following and like a post.
My current approach is simply, when a new user likes a post, iterate through all the users who have followed the post, and check whether the new user exists in their friend list (let's say the average size is N). I indexed the friend list, so the look up is O(logN), which means for each new like, the computation is O(klogN) if there are k people following the post and directly since there are k users who like it, then the overall complexity becomes O(k^2logN). Can I do better than this?
Note:
Notification does not have to be instant, nor does it have to happen 100% of the times
Posts are created by users
I am using Firestore, a NoSQL database, if that matters
What you need to use is a hybrid approach. Take advantage of the fact that the users friend list might be shorter than the number of followers, or vice versa. There are two options:
Do what you do now, and check every follower against the new user's friends list. The time complexity reflects the number of followers.
Do the reverse, and check every friend of the user against the followers list for the post. The time complexity reflects the number of friends of the user.
Armed with these tactics, now we design an algorithm to check which of the two will give the better performance.
Keep an active count of the number of friends of each user, and of the followers of a post. When someone likes a post, if they have fewer friends than there are people that liked the post, its faster to check if each friend is in the followers list (use a self-balancing BST or hash table in the implementation). If there are fewer followers than the user has friends, the reverse would be faster.
If there are N followers, K users liking the post, and F friends, the checking friend->follower would give a run time of O(N*F*log(K)), and follower->friend would be O(N*K*log(F)). The worst case still remains the same, however if you were only concerned about theoretical time bounds then you could substitute your index table with a hash table anyways which is O(1) instead of O(log(n)) anyways.
I think it can be improved to N^2 + k logN^2 by using more memory space. This problem is fundamentally that of finding intersection of two sets (the set of friends of the new liked user and the set of followers, OR the set of friends of the followers and the the set of liked users). Since look up is cheap, we want to make the set to be looked up be as big as possible. So if we put all the friends of all the followers into one big set (more specifically a map) of size N^2, it becomes k logN^2 if there are k liked users, plus initial iteration of N^2
Additional benefit of aggregating friends together is that many users have mutual friends, so actual size may be smaller than N^2

Ranking algorithms for nodes with subnodes?

I am working on a site where there are nodes that users can vote on (upvotes/downvotes). Each node has a list of subnodes. Users can vote on each of these subnodes as well. Think of the relationship between posts and comments on reddit.
What ranking algorithms are there that will help me sort nodes based on their own score as well as the score of their subnodes? I've looked at the reddit ranking algorithm for "hot" but unfortunately I don't see how I would factor in subnode ranking.
It depends on what sorting strategy you want.
Using stable sorting algorithm you could do as follows:
- sort list by subnodes (say, by best ranked subnode)
- sort list by nodes
Chosing sortings algorithm itself is quite a different task. You could look at:
Wikipedia for descriptions of all kinds of algoritms
Any site with alogrithms implementations, say algo.pw

algorithm for solving resource allocation problems

Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.

Resources