Finding subsets being used at most k times - algorithm

Every now and then I read all those conspiracy theories about Lotto-based games being controlled and a computer browsing through the combinations chosen by the players and determining the non-used subset. It got me thinking - how would such algorithm have to work in order to determine such subsets really efficiently? Finding non-used numbers is definitely crossed out as is finding the least used because it's not necesserily providing us with a solution. Also, going deeper, how could an algorithm efficiently choose such a subset that it was used some k times by the players? Saying more formally:
We are given a set of 50 numbers 1 to 50. In the draw 6 numbers are picked.
INPUT: m subsets each consisting of 6 distinct numbers 1 to 50 each,
integer k (0<=k) being the maximum players having all of their 6
numbers correct.
OUTPUT: Subsets which make not more than k players win the jackpot ('winning' means all the numbers they chose were picked in the draw).
Is there any efficient algorithm which could calculate this without using a terrabyte HDD to store all the encounters of every possible 50!/(44!*6!) in the pessimistic case? Honestly, I can't think of any.

If I wanted to run such a conspirancy I would first of all acquire the list of submissions by players. Then I would generate random lottery selections and see how many winners would be produced by each such selection. Then just choose the random lottery selection most attractive to me. There is little point doing anything more sophisticated, because that is probably already powerful enough to be noticed by staticians.
If you want to corrupt the lottery it would probably be easier and safer to select a few competitors you favour and have them win the lottery. In (the book) "1984" I think the state simply announced imaginary lottery winners, with the announcement in each area announcing somebody outside the area. One of the ideas in "The Beckoning Lady" by Margery Allingham is of a gang who attempt to set up a racecourse so they can rig races to allow them to disguise bribes as winnings.

First of all, the total number of combinations (choosing 6 from 50) is not very large. It is about 16 million which can be easily handled.
For each combination keep a count of number of people who played it. While declaring a winner choose the combination that has less than k plays.

If the number within each subset are sorted, then you can treat your subsets as strings - sort them in lexicographical order, then it is easy to count how many players selected each subset (and which subsets were not selected at all). So the time is proportional to the number of players and not the number of numbers in the lottery.

Related

How to find the winner(s) of Brazil's Mega Sena lottery most efficiently?

Mega Sena is Brazil's most famous lottery. The number set ranges from 1 to 60 and a single bet can contain each from 6 to 15 numbers selected (the more numbers selected the more expensive that bet is). Prizes are given, proportionally, for those who can correctly guess 4, 5 or 6 numbers, the latter being the jackpot.
Sometimes the total number of bets reach the hundred of millions. Drawings are televisioned live so, as a player, you can know almost instantaneously if you won or not. However, the lottery takes several hours to announce the number of winners and which cities they are from.
Given the rules above, how would you implement the computational logic that finds the winners? What data structures would you choose to store the data? Which sorting algorithms, if any, would you select? What's the Big O notation of the best solution you can think of?
I would allocate a 64-bit integer to represent which numbers have been chosen.
The first step would be expand all the 7-15 number tickets to tickets having only 6 bits, which could be done offline before the drawing. I don't know if the 15-number ticket contains all the (15 choose 6) = 5005 6-element tickets, or do they use another system. But since it's done offline, the complexity is delegated elsewhere.
There's even an algorithm (or bithack) called lexicographically next permutation, which is able to generate all those n choose k bit patterns efficiently, if it needs to be done in real time.
Mask all those tickets with the bit pattern of the winning row, and compute the number of bits left. This should be extremely efficient taking an order of one second for a billion tickets in a modern computer that has the popcount instruction.
The other issue is the validation, integrity and confidentiality of the data, when associated to the ticket holders. I would guess that this is the real issue and is probably addressed by implementing the whole thing by a database query.
You can hold the selection of each person within a single 64-bit word with each bit representing a selected number. The entire dataset could fit in memory in one long integer array.
If the array was sorted in the same order as your database, e.g., by ticket ID, then you could retrieve an associated record simply from knowing that the position in the array would be the same as the rownum of your query.
If you performed a bitwise AND operation of each value with the 64-bit representation of the winning numbers, you could count the bits and store the offsets of any matching 4, 5, or 6 bits into their own respective lists.
The whole operation would be pretty obviously linear O(n).

probability of winning a special sing-elimination tournament using dynamic programming and bitmask

Let's say we have N teams in a tournament and based on historical data we know what is the probability of each team winning any other team .Lets put all the probabilities in a matrix called P . P[a][b] is the probability team a winning team b. It is obvious that P[a][a] = 0 and P[a][b] = 1-P[b][a].
In this tournament at every round, two of teams compete against each other and the loser is eliminated. This two team are chosen randomly (with equal possibility of each team being picked). So at the first round we have n teams, next n-1 teams and so on until only one team remains and becomes the champion. What is the probability of each team becoming the champion? ( 1 <= N <= 18).
At first when I didn't know how to approach the problem but after some reading and search and keeping in mind that max n is 18 I figured at that using Dynamic programming and Bitmask is the way to go. How ever I couldn't figure at a solution. Here are my problems:
I have really hard time to figure at what are the sub problems and what sub problems should not be recomputed, basically I can't find a well defined recursive ( or not recursive) equation for the problem
In bitmask+dp problems we usually define something like dp[mask][n] or dp[n][mask]. I tried different approaches to define the mask but since the general solution is not clear to me there was no success
Some guidance on this two problems would be very helpful.
This is not really a dynamic programming problem.
If you have a vector V that gives the probability of each player being in the game after n rounds, then you can calculate the player probabilities for n+1 rounds by:
V'i = 2/((18-n)(17-n)) * sum over all j!=i of [ViVjPi,j]
That first factor is the probability that any given available match will be chosen, which depends on the number of previous rounds, because each successive round has fewer players to match up.
The second part is the probability of the players being available for each match, times the probability that the current player will win.
Just do this calculation 17 times to get the player probabilities after 17 rounds, which is the answer you're looking for. You can even drop that first factor, and fix it at the end by normalizing the vector so that the probabilities sum to 1.

Partitioning an ordered list of weights into N sub-lists of approximately equal weight

Suppose I have an ordered list of weights, having length M. I want to divide this list into N ordered non-empty sublists, where the sum of the weights in each sublist are as close to each other as possible. Finally, the length of the list will always be greater than or equal to the number of partitions.
For example:
A reader of epoch fantasy wants to read the entire Wheel of Time series in N = 90 days. She wants to read approximately the same amount of words each day, but she doesn't want to break a single chapter across two days. Obviously, she also doesn't want to read it out of order either. The series has a total of M chapters, and she has a list of the word counts in each.
What algorithm could she use to calculate the optimum reading schedule?
In this example, the weights probably won't vary much, but the algorithm I'm seeking should be general enough to handle weights that vary widely.
As for what I consider optimum, I would say that given the choice between having two or three partitions vary in weight a small amount from the average would be better than having one partition vary a lot. Or in other words, She would rather have several days where she reads a few hundred more or fewer words than the average, if it means she can avoid having to read a thousand words more or fewer than the average, even once. My thinking is to use something like this to compute the score of any given solution:
let W_1, W_2, W_3 ... w_N be the weights of each partition (calculated by simply summing the weights of its elements).
let x be the total weight of the list, divided by its length M.
Then the score would be the sum, where I goes from 1 to N of (X - w_i)^2
So, I think I know a way to score each solution. The question is, what's the best way to minimize the score, other than brute force?
Any help or pointers in the right direction would be much appreciated!
As hinted by the first entry under "Related" on the right column of this page, you are probably looking for a "minimum raggedness word wrap" algorithm.

Algorithm to select random pairs, schedule matchups

I'm working in Ruby, but I think this question is best asked agnostic of language. It may be assumed that we have access to basic list/array functions, as well as a "random" number generator. Here's what I'd like to be able to do:
Given a collection of n teams, with n even,
Randomly pair each team with an opponent, such that every team is part of exactly one pair. Call this ROUND 1.
Randomly generate n-2 subsequent rounds (ROUND 2 through ROUND n-1) such that:
Each round has the same property as the first (every team is a
member of one pair), and
After all the rounds, every team has faced every other team exactly once.
I imagine that algorithms for doing exactly this must be well known, but as a self-taught coder I'm having trouble figuring out how to find them.
I belive You are describing a round robin tournament. The wikipedia page gives an algorithm.
If You need a way to randomize the schedule, randomize team order, round order, etc.
Well not sure if this is the most efficient algorithm but:
Randomly assign N teams into two lists of same length n/2 (List1, List2)
Starting with i = 0:
Create pairs: List1[i],List2[i] = a team pair
Repeat for i = 1-> (n/2-1)
For rounds 2-> n/2-1:
Rotate List2, so that the first team in List2 is now at the end.
Repeat steps 2 through 5, until List2 has been cycled once.
This link was very helpful to me the last time I wrote a round robin scheduling algorithm. It includes a C implementation of a first fit algorithm for round robin pairings.
http://www.devenezia.com/downloads/round-robin/
In addition to the algorithm, he has some helpful links to other aspects of tournament scheduling (balancing home and away games, as well as rotating teams across fields/courts).
Note that you don't necessarily want a "random" order to the pairings in all cases. If, for example, you were scheduling a round robin soccer league for 8 games that only had 6 teams, then each team is going to have to play two other teams twice. If you want to make a more enjoyable season for everyone, you have to start worrying about seeding so that you don't have your top 2 teams clobbering the two weakest teams in their last two games. You'd be better off arranging for the extra games to be paired against teams of similar strength/seeding.
Based on info I found through Maniek's link, I went with the following:
A simple round robin algorithm that
a. Starts with pairings achieved by zipping [0,...,(n-1)/2] and [(n-1)/2 + 1,..., n-1]. (So, if n==10, we have 0 paired with 5, 1 with 6, etc.)
b. Rotates all but one team n-2 times clockwise until all teams have played each other. (So in round 2 we pair 1 with 6, 5 with 7, etc.)
Randomly assigns one of [0,..., n-1] to each of the teams.

select equal sized number groups from a sequence of random numbers

say i have a list of 100 numbers, i want to split them into 5 groups which have the sum within each group closest to the mean of the numbers.
the simplest solution is to sort the hundred numbers and take the max number and keep adding the smallest numbers until the sum goes beyond the avg.
obviously that is not going to bring the best results. i guess we could use BFS or DFS or some other search algo. like A* to get the best result.
does anyone have a simple solution out there? pseudo code is good enough. thanks!
This sounds like a variation on the knapsack problem and if I'm interpreting you correctly, it may be the multiple knapsack problem. Can't you come up with an easy problem? :)
An efficient algorithm (solution) that can be used is a variation of the Best Fit of bin packing algorithm. However, we would have to be applying a variation in which we have specifically 5 different groups of numbers that we want rather than looking for using the smallest number of groups.
The algorithm begins by finding the mean of the list of all 100 numbers. This mean will be used as the max capacity for all 5 of the groups (bins) we are trying to fit numbers into. We then find the largest number in our list of 100 numbers that doesn’t exceed the max capacity of our group and assign it to our first group. (We can find this in log(n) time because we can use a self-balancing binary search tree). We keep track of the how filled our current group is. We then find the next largest number that fits into our current group until we have either reached max capacity or there are no other numbers that will allow this group to reach max capacity. In either of these cases we move on to the next group and repeat our algorithm with the numbers left in our list of numbers. Once we move on from a group we must also keep track of the current sum of that group. We continue until we have reached the highest capacity for all 5 groups. If there are any numbers left we place them in the groups that have the lowest total sum (because we kept track of these sums as we went).
This is in fact a greedy algorithm with a Θ(nlogn) running time due to the nature of bin packing.

Resources