Can someone help me solve this?
We recently held an election for 3 positions.
There were 6 candidates.
Each member could cast 3 votes but could not vote for the same person more than once.
134 ballots were cast. (402 total votes)
The final tally was
result = {a:91, b:66, c:63, d:63, e:60, f:59}
I can easily determine the 20 possible unique ballots cast
result.keys.combination(3).to_a
But obviously given the number of possible combinations brute force would be impossibly time consuming so I am hoping someone can provide an algorithm to solve this practically.
I am trying to find a reasonably efficient way to determine a single possible tally of ballots but if you can provide multiple or all possible tallies that would be amazing.
Lets think about simpler thing, for example canapé. You have N small slices of bread and M ingredients (Mi pieces each).
You have to create N canapes with unique ingredients. Obviously, this is impossible if there is Mi > N.
reasonably efficient way to determine a single possible tally of ballots
Arrange the bread slices in a line. Take the first ingredient and spread it all out, starting with the first slice of bread. Take the second ingredient and so on until you reach the last piece of bread. Return to the first slice of bread and continue to lay on top. Continue until you run out of ingredients.
require 'set'
result = {a: 91, b: 66, c: 63, d: 63, e: 60, f: 59}
BALLOTS = 134
if (result.values.any? { |v| v > BALLOTS }) || (result.values.sum % BALLOTS != 0)
raise 'Unfair election!'
end
ballots = BALLOTS.times.map { Set.new }
i = 0
result.each do |candidate, votes|
votes.times do
ballots[i % BALLOTS] << candidate
i += 1
end
end
puts "Do all of them have 3 unique votes? #{ballots.all? { |b| b.size == 3 }}"
Obviously, it's O(∑Mi) where ∑Mi is 402 (your "total votes"). I don't think there is a faster way.
but if you can provide multiple or all possible tallies
You can change the ingredients order. In your case there are 6! = 720 ways to fill the ballots. And I found that 60 of them are unique.
There is different amount of unique tallies for the different results. There are only 10 unique ways for result = {a: 67, b: 67, c: 67, d: 67, e: 67, f: 67} for example.
Changing start position (i = start # instead of 0) does not provide new unique ways.
If anybody has more than one third of the possible votes, or if the number of votes is not a multiple of three, there is no possible answer.
If there are at least three votes left, and nobody has more than one third of the possible votes, decide on a ballot paper that gives the top three candidates one vote each and reduce their totals by one.
This process stops either with all votes accounted for or with somebody having more than one third of the vote. I think the worst case for this is votes (N+1, N, N, N) where you go to (N, N-1, N-1, N) where the count not decremented gains a little but does not reach one third, so I think you can continue this process to account for all votes cast.
There are obvious many different equivalent counts. Any pair of actions that do not overlap in two candidates has at least one possible alternative interpretation. One way to generate multiple solutions would be to chose possible ballots at random subject only to the constraint that no sum ever gets to more than one third of the possible votes left. Another would be to rearrange answers at random and follow https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm (although I have not proved that any particular set of small rearrangements makes the set of possible solutions connected)
Although one can may be able to find a feasible solution for a given result using a heuristic algorithm, there is no assurance of that as it an NP-hard problem. It could be formulated as an integer linear program (ILP). In principle, ILP software will identify a feasible solution or report that no feasible solution exists. I say "in principle" because the problem may well be unsolvable in the time available.
This ILP problem has 20x6 = 120 non-negative, integer-valued variables and 6 constraints.
The variables are:
ni : the number of times ballot i is cast, i = 1,2,...,20
#<Set: {1, 3, 5}> is an example of one of the 20 ballots.
There are two types of known constants:
aij : equal to 1 if ballot i contains a vote for candidate j,
i = 1,2,...,20, j = 1,...,6, else zero
vj : the number of votes candidate j is to receive, j = 1,...,6
The constraints are as follows.
∑iniaij = vj, j = 1,...,6
ni >= 0 and integer-valued, i = 1,2,...,20
The first set of contraints ensures that each candidate receives the specified number of votes. The second set of contraints is implicit in ILP's.
ILP's also have objective functions to be maximized or minimimized, subject to the constraints. Here only a feasible solution is desired so the objective function might be expressed as
max 0x1
or something equivalent.
I can't believe I was so stuck on this but thank you #mcdowella for shaking me free of it.
We can just keep randomly shuffling and shifting the lists until we get a list where each group contains exactly 3 elements. It is definitely brute force but for a single result it is reasonably efficient (albeit unpredictable)
result = {a: 91, b: 66, c: 63, d: 63, e: 60, f: 59}
votes = result.map { |candidate, count| [candidate] * count }
def build_election(arr)
134.times.map do |x|
arr.shuffle.select {|a| !a.empty? }.first(3).map do |s|
s.pop
end
end
end
#a = build_election(votes.map(&:dup)) until #a&.all? {|s| s.size == 3}
Related
Considering an infinite and perfectly random deck of cards (or just repositioning the cards and shuffling a normal deck after every drawn card), how do I calculate the probability that exactly x suits will be pulled out with y tries? For example if we draw five cards from the deck, what's the probability of all four suits being present?
I bruteforced this scenario for 5 cards by generating every possible combination and considering it equally likely (since I'm repositioning the cards after every draw), and found the following values: 4/1024 for exactly one suit, 180/1024 for two, 600/1024 for three and 240/1024 for four suits.
The way I bruteforced it was: I generated all possible combinations with five loops iterating from 0 to 4, appending every iteration to a list, then running the following function to check if a combination has exactly n different suits for every element in the list:
def valid(l, n):
arr = [l[0]]
for i in range(0, len(l)):
aux = 0
for j in range(0, len(arr)):
if arr[j] == l[i]:
aux = 1
if aux == 0:
arr.append(l[i])
return len(arr) == n
It's a trivial matter for one suit, however when considering four of them I could not reach the expected value: 1/4^4 for a specific combination (considering the fifth card might be anything), times the number of ways to arrange those 4 cards in 5 slots (120) = 120/256, roughly two times my bruteforced result.
What am I missing here?
I have a set of points (x,y).
i need to return two points with minimal distance.
I use this:
http://www.cs.ucsb.edu/~suri/cs235/ClosestPair.pdf
but , i dont really understand how the algo is working.
Can explain in more simple how the algo working?
or suggest another idea?
Thank!
If the number of points is small, you can use the brute force approach i.e:
for each point find the closest point among other points and save the minimum distance with the current two indices till now.
If the number of points is large, I think you may find the answer in this thread:
Shortest distance between points algorithm
Solution for Closest Pair Problem with minimum time complexity O(nlogn) is divide-and-conquer methodology as it mentioned in the document that you have read.
Divide-and-conquer Approach for Closest-Pair Problem
Easiest way to understand this algorithm is reading an implementation of it in a high-level language (because sometimes understanding the algorithms or pseudo-codes can be harder than understanding the real codes) like Python:
# closest pairs by divide and conquer
# David Eppstein, UC Irvine, 7 Mar 2002
from __future__ import generators
def closestpair(L):
def square(x): return x*x
def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1])
# Work around ridiculous Python inability to change variables in outer scopes
# by storing a list "best", where best[0] = smallest sqdist found so far and
# best[1] = pair of points giving that value of sqdist. Then best itself is never
# changed, but its elements best[0] and best[1] can be.
#
# We use the pair L[0],L[1] as our initial guess at a small distance.
best = [sqdist(L[0],L[1]), (L[0],L[1])]
# check whether pair (p,q) forms a closer pair than one seen already
def testpair(p,q):
d = sqdist(p,q)
if d < best[0]:
best[0] = d
best[1] = p,q
# merge two sorted lists by y-coordinate
def merge(A,B):
i = 0
j = 0
while i < len(A) or j < len(B):
if j >= len(B) or (i < len(A) and A[i][1] <= B[j][1]):
yield A[i]
i += 1
else:
yield B[j]
j += 1
# Find closest pair recursively; returns all points sorted by y coordinate
def recur(L):
if len(L) < 2:
return L
split = len(L)/2
L = list(merge(recur(L[:split]), recur(L[split:])))
# Find possible closest pair across split line
# Note: this is not quite the same as the algorithm described in class, because
# we use the global minimum distance found so far (best[0]), instead of
# the best distance found within the recursive calls made by this call to recur().
for i in range(len(E)):
for j in range(1,8):
if i+j < len(E):
testpair(E[i],E[i+j])
return L
L.sort()
recur(L)
return best[1]
closestpair([(0,0),(7,6),(2,20),(12,5),(16,16),(5,8),\
(19,7),(14,22),(8,19),(7,29),(10,11),(1,13)])
# returns: (7,6),(5,8)
Taken from: https://www.ics.uci.edu/~eppstein/161/python/closestpair.py
Detailed explanation:
First we define an Euclidean distance aka Square distance function to prevent code repetition.
def square(x): return x*x # Define square function
def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1]) # Define Euclidean distance function
Then we are taking the first two points as our initial best guess:
best = [sqdist(L[0],L[1]), (L[0],L[1])]
This is a function definition for comparing Euclidean distances of next pair with our current best pair:
def testpair(p,q):
d = sqdist(p,q)
if d < best[0]:
best[0] = d
best[1] = p,q
def merge(A,B): is just a rewind function for the algorithm to merge two sorted lists that previously divided to half.
def recur(L): function definition is the actual body of the algorithm. So I will explain this function definition in more detail:
if len(L) < 2:
return L
with this part, algorithm terminates the recursion if there is only one element/point left in the list of points.
Split the list to half: split = len(L)/2
Create a recursion (by calling function's itself) for each half: L = list(merge(recur(L[:split]), recur(L[split:])))
Then lastly this nested loops will test whole pairs in the current half-list with each other:
for i in range(len(E)):
for j in range(1,8):
if i+j < len(E):
testpair(E[i],E[i+j])
As the result of this, if a better pair is found best pair will be updated.
So they solve for the problem in Many dimensions using a divide-and-conquer approach. Binary search or divide-and-conquer is mega fast. Basically, if you can split a dataset into two halves, and keep doing that until you find some info you want, you are doing it as fast as humanly and computerly possible most of the time.
For this question, it means that we divide the data set of points into two sets, S1 and S2.
All the points are numerical, right? So we have to pick some number where to divide the dataset.
So we pick some number m and say it is the median.
So let's take a look at an example:
(14, 2)
(11, 2)
(5, 2)
(15, 2)
(0, 2)
What's the closest pair?
Well, they all have the same Y coordinate, so we can look at Xs only... X shortest distance is 14 to 15, a distance of 1.
How can we figure that out using divide-and-conquer?
We look at the greatest value of X and the smallest value of X and we choose the median as a dividing line to make our two sets.
Our median is 7.5 in this example.
We then make 2 sets
S1: (0, 2) and (5, 2)
S2: (11, 2) and (14, 2) and (15, 2)
Median: 7.5
We must keep track of the median for every split, because that is actually a vital piece of knowledge in this algorithm. They don't show it very clearly on the slides, but knowing the median value (where you split a set to make two sets) is essential to solving this question quickly.
We keep track of a value they call delta in the algorithm. Ugh I don't know why most computer scientists absolutely suck at naming variables, you need to have descriptive names when you code so you don't forget what the f000 you coded 10 years ago, so instead of delta let's call this value our-shortest-twig-from-the-median-so-far
Since we have the median value of 7.5 let's go and see what our-shortest-twig-from-the-median-so-far is for Set1 and Set2, respectively:
Set1 : shortest-twig-from-the-median-so-far 2.5 (5 to m where m is 7.5)
Set 2: shortest-twig-from-the-median-so-far 3.5 (looking at 11 to m)
So I think the key take-away from the algorithm is that this shortest-twig-from-the-median-so-far is something that you're trying to improve upon every time you divide a set.
Since S1 in our case has 2 elements only, we are done with the left set, and we have 3 in the right set, so we continue dividing:
S2 = { (11,2) (14,2) (15,2) }
What do you do? You make a new median, call it S2-median
S2-median is halfway between 15 and 11... or 13, right? My math may be fuzzy, but I think that's right so far.
So let's look at the shortest-twig-so-far-for-our-right-side-with-median-thirteen ...
15 to 13 is... 2
11 to 13 is .... 2
14 to 13 is ... 1 (!!!)
So our m value or shortest-twig-from-the-median-so-far is improved (where we updated our median from before because we're in a new chunk or Set...)
Now that we've found it we know that (14, 2) is one of the points that satisfies the shortest pair equation. You can then check exhaustively against the points in this subset (15, 11, 14) to see which one is the closer one.
Clearly, (15,2) and (14,2) are the winning pair in this case.
Does that make sense? You must keep track of the median when you cut the set, and keep a new median for everytime you cut the set until you have only 2 elements remaining on each side (or in our case 3)
The magic is in the median or shortest-twig-from-the-median-so-far
Thanks for asking this question, I went in not knowing how this algorithm worked but found the right highlighted bullet point on the slide and rolled with it. Do you get it now? I don't know how to explain the median magic other than binary search is f000ing awesome.
Consider a set of 13 Danish, 11 Japanese and 8 Polish people. It is well known that the number of different ways of dividing this set of people to groups is the 13+11+8=32:th Bell number (the number of set partitions). However we are asked to find the number of possible set partitions under a given constraint. The question is as follows:
A set partition is said to be good if it has no group consisting of at least two people that only includes a single nationality. How many good partitions there are for this set? (A group may include only one person.)
The brute force approach requires going though about 10^26 partitions and checking which ones are good. This seems pretty unfeasible, especially if the groups are larger or one introduces other nationalities. Is there a smart way instead?
EDIT: As a side note. There probably is no hope for a really nice solution. A highly esteemed expert in combinatorics answered a related question, which, I think, basically says that the related problem, and thus this problem also, is very difficult to solve exactly.
Here's a solution using dynamic programming.
It starts from an empty set, then adds one element at a time and calculates all the valid partitions.
The state space is huge, but notice that to be able to calculate the next step we only need to know about a partition the following things:
For each nationality, how many sets it contains that consists of only a single member of that nationality. (e.g.: {a})
How many sets it contains with mixed elements. (e.g.: {a, b, c})
For each of these configurations I only store the total count. Example:
[0, 1, 2, 2] -> 3
{a}{b}{c}{mixed}
e.g.: 3 partitions that look like: {b}, {c}, {c}, {a,c}, {b,c}
Here's the code in python:
import collections
from operator import mul
from fractions import Fraction
def nCk(n,k):
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
def good_partitions(l):
n = len(l)
i = 0
prev = collections.defaultdict(int)
while l:
#any more from this kind?
if l[0] == 0:
l.pop(0)
i += 1
continue
l[0] -= 1
curr = collections.defaultdict(int)
for solution,total in prev.iteritems():
for idx,item in enumerate(solution):
my_solution = list(solution)
if idx == i:
# add element as a new set
my_solution[i] += 1
curr[tuple(my_solution)] += total
elif my_solution[idx]:
if idx != n:
# add to a set consisting of one element
# or merge into multiple sets that consist of one element
cnt = my_solution[idx]
c = cnt
while c > 0:
my_solution = list(solution)
my_solution[n] += 1
my_solution[idx] -= c
curr[tuple(my_solution)] += total * nCk(cnt, c)
c -= 1
else:
# add to a mixed set
cnt = my_solution[idx]
curr[tuple(my_solution)] += total * cnt
if not prev:
# one set with one element
lone = [0] * (n+1)
lone[i] = 1
curr[tuple(lone)] = 1
prev = curr
return sum(prev.values())
print good_partitions([1, 1, 1, 1]) # 15
print good_partitions([1, 1, 1, 1, 1]) # 52
print good_partitions([2, 1]) # 4
print good_partitions([13, 11, 8]) # 29811734589499214658370837
It produces correct values for the test cases. I also tested it against a brute-force solution (for small values), and it produces the same results.
An exact analytic solution is hard, but a polynomial time+space dynamic programming solution is straightforward.
First of all, we need an absolute order on the size of groups. We do that by comparing how many Danes, Japanese, and Poles we have.
Next, the function to write is this one.
m is the maximum group size we can emit
p is the number of people of each nationality that we have left to split
max_good_partitions_of_maximum_size(m, p) is the number of "good partitions"
we can form from p people, with no group being larger than m
Clearly you can write this as a somewhat complicated recursive function that always select the next partition to use, then call itself with that as the new maximum size, and subtract the partition from p. If you had this function, then your answer is simply max_good_partitions_of_maximum_size(p, p) with p = [13, 11, 8]. But that is going to be a brute force search that won't run in reasonable time.
Finally apply https://en.wikipedia.org/wiki/Memoization by caching every call to this function, and it will run in polynomial time. However you will also have to cache a polynomial number of calls to it.
I need to generate a list of numbers (about 120.) The numbers range from 1 to X (max 10), both included. The algorithm should use every number an equal amount of times, or at least try, if some numbers are used once less, that's OK.
This is the first time I have to make this kind of algorithm, I've created very simple once, but I'm stumped on how to do this. I tried googling first, though don't really know what to call this kind of algorithms, so I couldn't find anything.
Thanks a lot!
It sounds like what you want to do is first fill a list with the numbers you want and then shuffle that list. One way to do this would be to add each of your numbers to the list and then repeat that process until the list has as many items as you want. After that, randomly shuffle the list.
In pseudo-code, generating the initial list might look something like this:
list = []
while length(list) < N
for i in 1, 2, ..., X
if length(list) >= N
break
end if
list.append(i)
end for
end while
I leave the shuffling part as an exercise to the reader.
EDIT:
As pointed out in the comments the above will always put more smaller numbers than larger numbers. If this isn't what's desired, you could iterate over the possible numbers in a random order. For example:
list = []
numbers = shuffle( [1, 2, ..., X] )
while length(list) < N
for i in 1, 2, ..., X
if length(list) >= N
break
end if
list.append( numbers[i] )
end for
end while
I think this should remove that bias.
What you want is a uniformly distributed random number (wiki). It means that if you generate 10 numbers between 1 to 10 then there is a high probability that all the numbers 1 upto 10 are present in the list.
The Random() class in java gives a fairly uniform distribution. So just go for it. To test, just check this:
Random rand = new Random();
for(int i=0;i<10;i++)
int rNum = rand.nextInt(10);
And see in the result whether you get all the numbers between 1 to 10.
One more similar discussion that might help: Uniform distribution with Random class
Suppose that you are given three "options", A, B and C.
Your algorithm must pick and return a random one. For this, it is pretty simple to just put them in an array {A,B,C} and generate a random number (0, 1 or 2) which will be the index of the element in the array to be returned.
Now, there is a variation to this algorithm: Suppose that A has a 40% chance of being picked, B a 20%, and C a 40%. If that was the case, you could have a similar approach: generate an array {A,A,B,C,C} and have a random number (0, 1, 2, 3, 4) to pick the element to be returned.
That works. However, I feel that it is very inefficient. Imagine using this algorithm for a large amount of options. You would be creating a somewhat big array, maybe with 100 elements representing a 1% each. Now, that's still not quite big, but supposing that your algorithm is used many times per second, this could be troublesome.
I've considered making a class called Slot, which has two properties: .value and .size. One slot is created for each option, where the .value property is the value of the option, and the .size one is the equivalent to the amount of occurrences of such option in the array. Then generate a random number from 0 to the total amount of occurrences and check on what slot did the number fall on.
I'm more concerned about the algorithm, but here is my Ruby attempt on this:
class Slot
attr_accessor :value
attr_accessor :size
def initialize(value,size)
#value = value
#size = size
end
end
def picker(options)
slots = []
totalSize = 0
options.each do |value,size|
slots << Slot.new(value,size)
totalSize += size
end
pick = rand(totalSize)
currentStack = 0
slots.each do |slot|
if (pick <= currentStack + slot.size)
return slot.value
else
currentStack += slot.size
end
end
return nil
end
50.times do
print picker({"A" => 40, "B" => 20, "C" => 40})
end
Which outputs:
CCCCACCCCAAACABAAACACACCCAABACABABACBAAACACCBACAAB
Is there a more efficient way to implement an algorithm that picks a random option, where each option has a different probability of being picked?
The simplest way is probably to write a case statement:
def get_random()
case rand(100) + 1
when 1..50 then 'A'
when 50..75 then 'B'
when 75..100 then 'C'
end
end
The problem with that is that you cannot pass any options, so you can write a function like this if you want it to be able to take options. The one below is very much like the one you wrote, but a bit shorter:
def picker(options)
current, max = 0, options.values.inject(:+)
random_value = rand(max) + 1
options.each do |key,val|
current += val
return key if random_value <= current
end
end
# A with 25% prob, B with 75%.
50.times do
print picker({"A" => 1, "B" => 3})
end
# => BBBBBBBBBABBABABBBBBBBBABBBBABBBBBABBBBBBABBBBBBBA
# If you add upp to 100, the number represent percentage.
50.times do
print picker({"A" => 40, "T" => 30, "C" => 20, "G" => 10})
end
# => GAAAATATTGTACCTCAATCCAGATACCTTAAGACCATTAAATCTTTACT
As a first approximation to a more efficient algorithm, if you compute the cumulative distribution function (which is just one pass over the distribution function, computing a running sum), then you can find the position of the randomly chosen integer using a binary search instead of a linear search. This will help if you have a lot of options, since it reduces the search time from O(#options) to O(log #options).
There is an O(1) solution, though. Here's the basic outline.
Let's say we have N options, 1...N, with weights ω1...ωN, where all of the ω values are at least 0. For simplicity, we scale the weights so their mean is 1, or in other words, their sum is N. (We just multiply them by N/Σω. We don't actually have to do this, but it makes the next couple of paragraphs easier to type without MathJax.)
Now, create a vector of N elements, where each element has a two option identifiers (lo and hi) and a cutoff p. The option identifiers are just integers 1...N, and p will be computed as a real number in the range (0, 1.0) inclusive.
We proceed to fill in the vector as follows. For each element i in turn:
If some ωj is exactly 1.0, then we set:
loi = j
hii = j
pi = 1.0
And we remove ωj from the list of weights.
Otherwise, there must be some ωj < 1.0 and some ωk > 1.0. (That's because the average weight is 1.0, and none of them have the average value. Some some of them must have less and some of them more, because it is impossible for all elements to be greater than the average or all elements to be less than the average.) Now, we set:
loi = j
hii = k
pi = ωj
ωk = ωk - (1 - ωj)
And once again, we remove ωj from the weights.
Note that in both cases, we have removed one weight, and we have reduced the sum of the weights by 1.0. So the average weight is still 1.0.
We continue in this fashion until the entire vector is filled. (The last element will have p = 1.0).
Given this vector, we can select a weighted random option as follows:
Generate a random integer i in the range 1...N and a random floating point value r in the range (0, 1.0]. If r < pi then we select option loi; otherwise, we select option hii.
It should be clear why this works from the construction of the vector. The weights of each above-average-weight option are distributed amongst the various vector elements, while each below-average-weight option is assigned to one part of some vector element with a corresponding probability of selection.
In a real implementation, we would map the range of weights onto integer values, and make the total weights close to the maximum integer (it has to be a multiple of N, so there will be some slosh.) We can then select a slot and select the weight inside the slot from a single random integer. In fact, we can modify the algorithm to avoid the division by forcing the number of slots to be a power of 2 by adding some 0-weighted options. Because the integer arithmetic will not work out perfectly, a bit of fiddling around will be necessary, but the end result can be made to be statistically correct, modulo the characteristics of the PRNG being used, and it will execute almost as fast as a simple unweighted selection of N options (one shift and a couple of comparisons extra), at the cost of a vector occupying less than 6N storage elements (counting the possibility of having to almost double the number of slots).
While this is not a direct answer I will show you a source for help you outline this problem: http://www.av8n.com/physics/arbitrary-probability.htm.
Edit:
Just found a nice source in ruby for that, pickup gem.
require 'pickup'
headings = {
A: 40,
B: 20,
C: 40,
}
pickup = Pickup.new(headings)
pickup.pick
#=> A
pickup.pick
#=> B
pickup.pick
#=> A
pickup.pick
#=> C
pickup.pick
#=> C