Finding n-th combination with highest sum - algorithm

I have a list of players and each player has a salary and a rating (both integer values).
I have to find n-th largest combination of 6 players (largest in terms of sum of their ratings) with a constraint that the sum of their salaries must be less or equal than 50000.
For example, if I have a list of players 1,2,...,m, what I'm currently doing is:
Generate all possible 6 player combinations (m choose 6).
Filter out combinations for which sum of salaries is > 50000
Sort remaining combinations in descending order, ordered by sum of ratings
Pick the n-th from the sorted list.
This is obviously a brute force approach which works fine for smaller number of players. But currently I have 140 players which yield over 9 billion combinations and it takes too much to finish.
Any suggestion on how to do this faster?

Here is how you can avoid getting all the combinations.
prepare a descending sorted map with rank as key and salary as value
This will get your ranks sorted on descending order and the first key in the map will be the highest rank. If you have multiple records having the same rank, consider putting them as a list against the same rank.
pick the first 6 top ranks and check if their total salary <= 50000, you get your result, else move to the next 6 combination.
Here if you have more than one record against a rank, try adding their salaries as well.
This will take some patience and some good testing to translate to a program, but will certainly be an optimal solution.

Related

Sorting/Ranking Tied Scores based on Points Difference

I wanted to use Google Sheets to do a competition ranking which can help me to rank or sort the ranking automatically when I key in the Points.
However, there is a condition where there will be a tied happens. If a tie happens, I will take the Score Differences (SD) into consideration. If the Score Differences is low, then it will be rank higher in the tie condition.
See below table for illustration:
For example: Currently Team A and Team D having the highest PTS, so both of them are currently Rank 1. However, Team D is having a lower SD compare to Team A. So I wanted to have it automatically rank Team D as Rank 1 and Team A as Rank 2.
Is this possible?
One solution might be to create a hidden column with a formula like:
=PTS * 10000 - SD
(Replacing PTS and SD with the actual cell references)
Multiplying PTS by 10000 ensures it has a higher priority than SD.
We want to reward low SDs, so we subtract instead of add.
Finally, in the rank column, we can use a formula like:
=RANK(HiddenScoreCell, HiddenScoreColumnRange, 0)
So, for example, if the HiddenScore column is column K, the actual formula for row 2 might look like
=RANK(K2, K:K, 0)
The third parameter is 0 as we want higher scores to have a lower rank.
To sort, you can just apply a sort on the Rank column.
With sort() you can define multiple sorting criteria (see [documentation][1], e.g.
=sort(A2:I5,8,false,7,false)
So you're going to sort your table (in A2:I5, change accordingly) based first on PTS, descending, then on SD, descending? You can add more criteria with more pairs of parameters (column index, then descending or ascending as a boolean).
Then you need to compare your team name with with the sorted table and find its rank in the sorted list:
=ArrayFormula(match(A2:I5,sort(A2:I5,8,false,7,false),0))
Paste that formula in I2 (assuming your table starts in A1 with its headers, otherwise adjust accordingly).
=ARRAYFORMULA(IF(LEN(A2:A), RANK(H2:H*9^9-G2:G, H2:H*9^9-G2:G), ))

Partitioning N arrays into K groups with constraints

I have been stuck in this problem and can't find the efficient solution for this problem .
I have N (Upto 10 Million ) arrays of say maximum 100 elements. These arrays contain numbers from 1-10000 .
Now my problem is to partition these arrays into K groups such that i minimize the duplicates across all the arrays i.e for an array containing 1, 4, 10 ,100 and another containing 1, 100. I would like them to go into same group because that minimizes duplicity. Two constraints my problem has are as follows -
i don't want to increase size of unique elements more than 110 for a group of arrays. So i have an array of size 100 and there is another array of size 100 which is a 60% match i would rather create new group because this increases no. of unique elements to 140 and this will go on increasing.
The number of vectors in the groups should be uniformly distributed.
Grouping these arrays based on size in decreasing order. Then finding unique vectors unique hashing and applying a greedy algo of maximum match with the constraints but the greedy doesn't seem to be working well because that will entirely depend on the partitions i picked first. I couldn't figure out how DP can be applied because number of combinations given total number of vectors is just huge. I am not sure what methodology should i take.
some of the fail cases of my algo are , say there are two vectors which are mutually exclusive of each other but if i form a group with them i could match 100% with a third vector which otherwise matched just 30% in a group and made that group full following the addition to that group this will increase my duplicity because the third vector should have formed a group with first two vectors.
Simple yet intensive on computing and memory is iterate 10 million times for each array to match maximum numbers match. Now store match numbers in an array and find match of such arrays similarly by iterating with criteria that match should be at least 60%

Assign seats in an Auditorium

I encountered an Interview Question:
There is an event in the auditorium and Given capacity of the auditorium is NxM.
Every group of person booked ticket and all the tickets are booked, Now you have to assign seat number to all of them such that minimum number of group split.
So basically a 2-D array is given and we have some groups of certain size(different groups may be of different size).Array needs to be completely filled with minimum number of groups split.
One Brute force Recursive approach I found is :Place first group ,then second group and so on.Permute this arrangement to find the arrangement with minimum split.
One efficient solution I found was using subset sum problem.
https://en.wikipedia.org/wiki/Subset_sum_problem
I could not understand how subset sum problem can be used to solve this problem.
Please suggest how can I approach this problem.I am not looking for code,just psuedo-code or algorithm will suffice.
Firstly, I'm assuming that "group-split" means that some part of the group is in one row and remaining is in another. If the number of seats in a row are N, and given a set which contains the size of the different groups, you need to find a subset that will sum to N. If such a subset is found, that row will be filled without breaking any groups. If there is no such subset found, then you will need to break at least 1 group. Then there can be multiple strategies here.
1) You can pick a group that will be split across 2 rows. This group can be the largest of the remaining, or the smallest or can be picked at random. Once this group is decided, you have 2 rows with less than N empty seats that need to be filled recursively.
2) The strategy can be to find a subset that sums to 2*N - if found, 1 group will be split. If not found, then find a subset that sums to 3*N with 2 group-splits and so on. The maximum number of group-splits will be M-1 for M rows.
Continue 1) or 2) to fill M number of rows in the theatre.

Algorithms for dividing an array into n parts

In a recent campus Facebook interview i have asked to divide an array into 3 equal parts such that the sum in each array is roughly equal to sum/3.My Approach1. Sort The Array2. Fill the array[k] (k=0) uptil (array[k]<=sum/3)3. After that increment k and repeat the above step for array[k]Is there any better algorithm for this or it is NP Hard Problem
This is a variant of the partition problem (see http://en.wikipedia.org/wiki/Partition_problem for details). In fact a solution to this can solve that one (take an array, pad with 0s, and then solve this problem) so this problem is NP hard.
There is a dynamic programming approach that is pseudo-polynomial. For each i from 0 to the size of the array, you keep track of all possible combinations of current sizes for the sub arrays, and their current sums. As long as there are a limited number possible sums of subsets of the array, this runs acceptably fast.
The solution that I would have suggested is to just go for "good enough" closeness. First let's consider the simpler problem with all values positive. Then sort by value descending. Take that array in threes. Build up the three subsets by always adding the largest of the triple to the one with the smallest sum, the smallest to the one with the largest, and the middle to the middle. You will end up dividing the array evenly, and the difference will be no more than the value of the third smallest element.
For the general case you can divide into positive and negative, use the above approach on each, and then brute force all combinations of a group of positives, a group of negatives, and the few leftover values in the middle that did not divide evenly.
Here are details on a dynamic programming solution if you are interested. The running time and memory usage is O(n*(sum)^2) where n is the size of your array and sum is the sum of absolute values of your array values. For each array index j from 1 to n, store all the possible values you can get for your 3 subset sums when you split the array from index 1 to j into 3 subsets. Also for each possibility, store one possible way to split the array to get the 3 sums. Then to extend this information for 1 to (j+1) given the information from 1 to j, simply take each possible combination of 3 sums for splitting 1 to j and form the 3 combinations of 3 sums you get when you choose to add the (j+1)th array element to any one of the 3 subsets. Finally, when you are done and reach j = n, go through the set of all combinations of 3 subset sums you can get when you split array positions 1 to n into 3 sets, and choose the one whose maximum deviation from sum/3 is minimized. At first this may seem like O(n*(sum)^3) complexity, but for each j and each combination of the first 2 subset sums, the 3rd subset sum is uniquely determined. (because you are not allowed to omit any elements of the array). Thus the complexity really is O(n*(sum)^2).

Big Shot IT company interview puzzle

This past week I attended a couple of interviews at a few big IT companies. one question that left me bit puzzled. below is an exact description of the problem.(from one of the interview questions website)
Given the data set,
A,B,A,C,A,B,A,D,A,B,A,C,A,B,A,E,A,B,A,C,A,B,A,D,A,B,A,C,A,B,A,F
which can be reduced to
(A; 16); (B; 8); (C; 4); (D; 2); (E; 1); (F; 1):
using the (value, frequency) format.
for a total of m of these tuples, stored in no specific order. Devise an O(m) algorithm that returns the kth order statistic of the data set. m is the number of tuples as opposed to n which is total number of elements in the data set.
You can use Quick-Select to solve this problem.
Naively:
Pick an element (called the pivot) from the array
Put things less than or equal to the pivot on the left of the array, those greater on the right.
If the pivot is in position k, then you're done. If it's greater than k, then repeat the algorithm on the left side of the array. If it's less than k, then repeat the algorithm on the right side of the array.
There's a couple of details:
You need to either pick the pivot randomly (if you're happy with expected O(m) as the cost), or use a deterministic median algorithm.
You need to be careful to not take O(m^2) time if there's lots of values equal to the pivot. One simple way to do this is to do a second pass to split the array into 3 parts rather than 2: those less than the pivot, those equal to the pivot, and those greater than the pivot.

Resources