Dividing an array into equally weighted subarrays - ruby

Algorithm question here.
I have an unordered array containing product weights, e.g. [3, 2, 5, 5, 8] which need to be divided up into smaller arrays.
Rules:
REQUIRED: Should return 1 or more arrays.
REQUIRED: No array should sum to more than 12.
REQUIRED: Return minimum possible number of arrays, ex. total weight of example above is 23, which can fit into two arrays.
IDEALLY: Arrays should be weighted as evenly as possible.
In the example above, the ideal return would be [ [3, 8], [2, 5, 5] ]
My current thoughts:
Number of arrays to return will be (sum(input_array) / 12).ceil
A greedy algorithm could work well enough?

This is a combination of the bin packing problem and multiprocessor scheduling problem. Both are NP-hard.
Your three requirements constitute the bin packing problem: find the minimal number of bins of a fixed size (12) that fit all the numbers.
Once you solve that, you have the multiprocessor scheduling problem: given a fixed number of bins, what is the most even way to distribute the numbers among them.
There are number of well-known approximate algorithms for both problems.

How about something with an entirely different take on it. Something really simple. Like this, which is based on common horse sense:
module Splitter
def self.split(values, max_length = 12)
# if the sum of all values is lower than the max_length there
# is no point in continuing
return values unless self.sum(values) > max_length
optimized = []
current = []
# start off by ordering the values. perhaps it's a good idea
# to start off with the smallest values first; this will result
# in gathering as much values as possible in the first array. This
# in order to conform to the rule "Should return minimum possible
# number of arrays"
ordered_values = values.sort
ordered_values.each do |v|
if self.sum(current) + v > max_length
# finish up the current iteration if we've got an optimized pair
optimized.push(current)
# reset for the next iteration
current = []
end
current.push(v)
end
# push the last iteration
optimized.push(current)
return optimized
end
# calculates the sum of a collection of numbers
def self.sum(numbers)
if numbers.empty?
return 0
else
return numbers.inject{|sum,x| sum + x }
end
end
end
Which can be used like so:
product_weights = [3, 2, 5, 5, 8]
p Splitter.split(product_weights)
The output will be:
[[2, 3, 5], [5], [8]]
Now, as said before, this is a really simple sample. And I've excluded the validations for empty or non-numeric values in the array for brevity. But it does seem to conform to your primary requirements:
Splitting the (expected: all numeric) values into arrays
With a ceiling per array, defaulting in the sample to 12
Return minimum amount of arrays by collection the smallest numbers first EDIT: after the edit from the comments, this indeed doesn't work
I do have some doubts regarding the comment on "returning minimum possible number of arrays, and balance the weights throughout those as evenly as possible". I'm sure someone else will come up with an implementation of a better and math-proven algorithm that conforms to that requirement, but perhaps this is at least a suitable example for the discussion?

Related

Interview Question: Remove repeating numbers at the end of an array

I got a surprising interview question today at a big Bay Area tech company that I was absolutely stumped by despite seeming so easy. Was wondering if anyone has seen it or can offer a simpler solution as the interviewer didn't want to show me the answer. The solution can be written in any language or pseudocode.
Question:
Given a list of numbers, remove any extraneous repeating suffix sequences of numbers that appear at the end of the list until it has no repeating suffix sequences. The repeating sequence can be cut-off.
For example:
[1,2,3,4,5,6,7,5,6,7,5,6] -> [1,2,3,4,5,6,7]
explanation: [5, 6, 7] were repeating
Also consider the situation
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5,] -> [1,2,3,4,5,4,5,1] # not [1,2,3,4,5,4,5,1,4,5,4,5,1]
explanation: [4,5,4,5,1] is a repeating sequence
There are always two ways to approach this topic. Finding any solution and finding an efficient one. It is usually better to start with any and then think on how to optimize it.
Now as we can see in the second example, the problem is complicated by the fact that the repeating pattern is not known. So we could just do it for all the possible patterns at the end. Then we would need to check two things
is it actually repeating
how long is the result
Then we could just take the shortest result. Here is the Python code:
def remove_repeating_tail(a: list) -> list:
results = []
for i in range(len(a)):
tail = a[i:]
results.append(remove_repeats(a, tail))
if len(results) == 0:
return a
return sorted(results, key=len)[0]
Also we made sure we cover all the cases. Empty list, no repeating pattern. Next we need to write remove_repeats. Also we check the empty repeating pattern, so we need to be aware of that.
def remove_repeats(a: list, tail: list) -> list:
assert len(tail) <= len(a)
if len(tail) == 0:
return a
remainder = a
count = 0
while remainder[-len(tail):] == tail:
remainder = remainder[:-len(tail)]
count += 1
if count <= 1:
return a
return remainder
We remove the repeating pattern and then add it back at the end. Now it's time to test the code if it actually works, if that is possible in the interview.
remove_repeating_tail([1,2,3,4,5,6,7,5,6,7,5,6])
-> [1, 2, 3, 4, 5, 6]
remove_repeating_tail([1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5])
-> [1, 2, 3, 4, 5, 4, 5]
Also good to check some other cases:
remove_repeating_tail([1,2,3,4])
-> [1, 2, 3, 4]
remove_repeating_tail([])
-> []
After quite a bit of fixing we got the above, which I think is correct. In particular I missed:
first I had an infinite loop in remove_repeats for an empty tail
remove_repeats removed always the tail and sometimes everything, as I wasn't checking that there is at least one repeat. I then added the counting.
I made simple mistakes like writing results = res instead of results.append(res) leading to some Exceptions.
Then a lot of simplification. First I used some sentinel None to communicate back that it is not repeating, but we could just return the whole list. Then I checked the repeating with some if before the while loop, but realized its basically doing the same as the first iteration, so I used counting.
Similarly I don't like the if len(results) == 0: check. I would probably add a to the result in the beginning and remove the check, as now there is always a result. Then we could start the counting from 1 instead of 0. Still I kept it in.
If we want something fast, we first need to analyze the complexity.
So remove repeating tails for a list of size n and tail size k is: O(n / k). Then we call this function n times. And then we sort it. Wait why do we sort it, we could just take the minimum return min(results, key=len). That's better.
In each loop we call remove_repeats starting with k = 1 to n. So we have:
sum(k = 1 .. n) O(n / k). This is n / 1 + n / 2 + n / 3 + .. n / n. I had to look this up on Wikipedia, but these are called harmonic numbers. We can also just make our live easy and say its less than O(n^2) for now. Otherwise I found an approximation of H_n = n ln(n) + 0.5 n here. So the complexity overall is O(n log n). Not to bad I would say. Is it the optimal? Maybe. Here I would compare it to some other similar algorithms (like substring search, etc).
Before going there, at this point, I would check with the interviewer, where he would like to go next. As there are many directions.
This seems a tricky question and there may not be a simple solution. Best solution I can think of would be O(n) time and O(n) and that is if I am not missing any edge case.
Let's take as example
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5] -> [1,2,3,4,5,4,5,1]
Steps would be as follows:
Iterate over the input array from last index to first and build a dictionary (hashtable) with every number in the array being a key and value: a list of positions where the specific number is found in the array.
Occurrences dictionary will become:
{
5: [14, 11, 9, 6, 4],
4: [13, 10, 8, 5, 3],
1: [12, 7, 0],
3: [2]
2: [1]
}
Find the possible suffix lengths by calculating deltas between every position and first position for every number. This way we take into consideration the case in which a specific number repeats in the suffix or in the prefix.
We then add each distinct possible suffix length to a set.
We sort the possible suffix lengths in descending order.
We get following suffix lengths:
[12, 10, 7, 5, 2]
For every possible length l, we test if arr[n-1] == arr[n-1-l]. If l is our suffix's length, it means that the number at last position is repeated at exactly l positions before. We then check the last l elements to respect the same condition. If they do, we found the maximum suffix length. If not, the max suffix length is even smaller, so we check the next possible length.
After finding the correct suffix length, we delete the remaining numbers that repeat at positions pos-l. We then return the slice of array with suffix removed.
def removeRepeatingSuffixes(arr):
if not arr:
return []
n = len(arr)
occurrences = {}
for i in range(n - 1, -1, -1):
c = arr[i]
if c not in occurrences:
occurrences[c] = []
occurrences[c].append(i)
# treat edge case: no repeating suffix
if len(occurrences[arr[n-1]]) == 1:
return arr
# create a set of possible suffix lengths,
# based on the differences between the positions of each number.
possible_suffixes_lengths_set = set()
for c, olist in occurrences.items():
if len(olist) >= 2:
for i in range(len(olist)-1):
delta = olist[i] - olist[len(olist)-1]
possible_suffixes_lengths_set.add(delta)
suff_lengths = sorted(possible_suffixes_lengths_set, reverse=True)
for l in suff_lengths:
if arr[n - 1] == arr[n - 1 - l]:
# possible suffix length, check if last l characters repeat
ok_length = True
for j in range(n-2, n-1-l, -1):
if arr[j] != arr[j-l]:
ok_length = False
break
if ok_length:
last_i = n-1-l
while last_i > 0 and arr[last_i] == arr[last_i - l]:
last_i -= 1
# return non-repeating slice, from 0 to last_i
return arr[0:last_i + 1]
quick way to remove repeating or dedupe is change to a type set() instead of a list

How do I solve this question about Pigeonhole Principle (Discrete Mathematics)?

I am not understanding the following question. I mean I want to know the sample input output for this problem question: "The pigeonhole principle states that if a function f has n distinct inputs but less than n distinct outputs,then there exist two inputs a and b such that a!=b and f(a)=f(b). Present an algorithm to find a and b such that f(a)=f(b). Assume that the function inputs are 1,2,......,and n.?"
I am unable to solve this problem as I am not understanding the question clearly. looking for your help.
The pigeonhole principle says that if you have more items than boxes, at least one of the boxes must have multiple items in it.
If you want to find which items a != b have the property f(a) == f(b), a straightforward approach is to use a hashmap data structure. Use the function value f(x) as key to store the item value x. Iterate through the items, x=1,...,n. If there is no entry at f(x), store x. If there is, the current value of x and the value stored at f(x) are a pair of the type you're seeking.
In pseudocode:
h = {} # initialize an empty hashmap
for x in 1,...,n
if h[f(x)] is empty
h[f(x)] <- x # store x in the hashmap indexed by f(x)
else
(x, h[f(x)]) qualify as a match # do what you want with them
If you want to identify all pigeons who have roommates, initialize the hashmap with empty sets. Then iterate through the values and append the current value x to the set indexed by f(x). Finally, iterate through the hashmap and pick out all sets with more than one element.
Since you didn't specify a language, for the fun of it I decided to implement the latter algorithm in Ruby:
N = 10 # number of pigeons
# Create an array of value/function pairs.
# Using N-1 for range of rand guarantees at least one duplicate random
# number, and with the nature of randomness, quite likely more.
value_and_f = Array.new(N) { |index| [index, rand(N-1)]}
h = {} # new hash
puts "Value/function pairs..."
p value_and_f # print the value/function pairs
value_and_f.each do |x, key|
h[key] = [] unless h[key] # create an array if none exists for this key
h[key] << x # append the x to the array associated with this key
end
puts "\nConfirm which values share function mapping"
h.keys.each { |key| p h[key] if h[key].length > 1 }
Which produces the following output, for example:
Value/function pairs...
[[0, 0], [1, 3], [2, 1], [3, 6], [4, 7], [5, 4], [6, 0], [7, 1], [8, 0], [9, 3]]
Confirm which values share function mapping
[0, 6, 8]
[1, 9]
[2, 7]
Since this implementation uses randomness, it will produce different results each time you run it.
Well let's go step by step.
I have 2 boxes. My father gave me 3 chocolates....
And I want to put those chocolates in 2 boxes. For our benefit let's name the chocolate a,b,c.
So how many ways we can put them?
[ab][c]
[abc][]
[a][bc]
And you see something strange? There is atleast one box with more than 1 chocolate.
So what do you think?
You can try this with any number of boxes and chocolates ( more than number of boxes) and try this. You will see that it's right.
Well let's make it more easy:
I have 5 friends 3 rooms. We are having a party. And now let's see what happens. (All my friends will sit in any of the room)
I am claiming that there will be atleast one room where there will be more than 1 friend.
My friends are quite mischievious and knowing this they tried to prove me wrong.
Friend-1 selects room-1.
Friend-2 thinks why room-1? Then I will be correct so he selects room-2
Friend-3 also thinks same...he avoids 1 and 2 room and get into room-3
Friend-4 now comes and he understands that there is no other empty room and so he has to enter some room. And thus I become correct.
So you understand the situation?
There n friends (funtions) but unfortunately or (fortunately) their rooms (output values) are less than n. So ofcourse one of the there exists 2 friend of mine a and b who shares the same room.( same value f(a)=f(b))
Continuing what https://stackoverflow.com/a/42254627/7256243 said.
Lets say that you map an array A of length N to an array B with length N-1.
Than the result could be an array B; were for 1 index you would have 2 elements.
A = {1,2,3,4,5,6}
map A -> B
Were a possible solution could be.
B= {1,2,{3,4},5,6}
The mapping of A -> could be done in any number of ways.
Here in this example both input index of 3 and 4 in Array A have the same index in array B.
I hope this usefull.

Maximum sum of n intervals in a sequence

I'm doing some programming "kata" which are skill building exercises for programming (and martial arts). I want to learn how to solve for algorithms like these in shorter amounts of time, so I need to develop my knowledge of the patterns. Eventually I want to solve in increasingly efficient time complexities (O(n), O(n^2), etc), but for now I'm fine with figuring out the solution with any efficiency to start.
The problem:
Given arr[10] = [4, 5, 0, 2, 5, 6, 4, 0, 3, 5]
Given various segment lengths, for example one 3-length segment, and two 2-length segments, find the optimal position of (or maximum sum contained by) the segments without overlapping the segments.
For example, solution to this array and these segments is 2, because:
{4 5} 0 2 {5 6 4} 0 {3 5}
What I have tried before posting on stackoverflow.com:
I've read through:
Algorithm to find maximum coverage of non-overlapping sequences. (I.e., the Weighted Interval Scheduling Prob.)
algorithm to find longest non-overlapping sequences
and I've watched MIT opencourseware and read about general steps for solving complex problems with dynamic programming, and completed a dynamic programming tutorial for finding Fibonacci numbers with memoization. I thought I could apply memoization to this problem, but I haven't found a way yet.
The theme of dynamic programming is to break the problem down into sub-problems which can be iterated to find the optimal solution.
What I have come up with (in an OO way) is
foreach (segment) {
- find the greatest sum interval with length of this segment
This produces incorrect results, because not always will the segments fit with this approach. For example:
Given arr[7] = [0, 3, 5, 5, 5, 1, 0] and two 3-length segments,
The first segment will take 5, 5, 5, leaving no room for the second segment. Ideally I should memoize this scenario and try the algorithm again, this time avoiding 5, 5, 5, as a first pick. Is this the right path?
How can I approach this in a "dynamic programming" way?
If you place the first segment, you get two smaller sub-arrays: placing one or both of the two remaining segments into one of these sub-arrays is a sub-problem of just the same form as the original one.
So this suggests a recursion: you place the first segment, then try out the various combinations of assigning remaining segments to sub-arrays, and maximize over those combinations. Then you memoize: the sub-problems all take an array and a list of segment sizes, just like the original problem.
I'm not sure this is the best algorithm but it is the one suggested by a "direct" dynamic programming approach.
EDIT: In more detail:
The arguments to the valuation function should have two parts: one is a pair of numbers which represent the sub-array being analysed (initially [0,6] in this example) and the second is a multi-set of numbers representing the lengths of the segments to be allocated ({3,3} in this example). Then in pseudo-code you do something like this:
valuation( array_ends, the_segments):
if sum of the_segments > array_ends[1] - array_ends[0]:
return -infinity
segment_length = length of chosen segment from the_segments
remaining_segments = the_segments with chosen segment removed
best_option = 0
for segment_placement = array_ends[0] to array_ends[1] - segment_length:
value1 = value of placing the chosen segment at segment_placement
new_array1 = [array_ends[0],segment_placement]
new_array2 = [segment_placement + segment_length,array_ends[1]]
for each partition of remaining segments into seg1 and seg2:
sub_value1 = valuation( new_array1, seg1)
sub_value2 = valuation( new_array2, seg2)
if value1 + sub_value1 + sub_value2 > best_option:
best_option = value1 + sub_value1 + sub_value2
return best_option
This code (modulo off by one errors and typos) calculates the valuation but it calls the valuation function more than once with the same arguments. So the idea of the memoization is to cache those results and avoid re-traversing equivalent parts of the tree. So we can do this just by wrapping the valuation function:
memoized_valuation(args):
if args in memo_dictionary:
return memo_dictionary[args]
else:
result = valuation(args)
memo_dictionary[args] = result
return result
Of course, you need to change the recursive call now to call memoized_valuation.

Picking a random option, where each option has a different probability of being picked

Suppose that you are given three "options", A, B and C.
Your algorithm must pick and return a random one. For this, it is pretty simple to just put them in an array {A,B,C} and generate a random number (0, 1 or 2) which will be the index of the element in the array to be returned.
Now, there is a variation to this algorithm: Suppose that A has a 40% chance of being picked, B a 20%, and C a 40%. If that was the case, you could have a similar approach: generate an array {A,A,B,C,C} and have a random number (0, 1, 2, 3, 4) to pick the element to be returned.
That works. However, I feel that it is very inefficient. Imagine using this algorithm for a large amount of options. You would be creating a somewhat big array, maybe with 100 elements representing a 1% each. Now, that's still not quite big, but supposing that your algorithm is used many times per second, this could be troublesome.
I've considered making a class called Slot, which has two properties: .value and .size. One slot is created for each option, where the .value property is the value of the option, and the .size one is the equivalent to the amount of occurrences of such option in the array. Then generate a random number from 0 to the total amount of occurrences and check on what slot did the number fall on.
I'm more concerned about the algorithm, but here is my Ruby attempt on this:
class Slot
attr_accessor :value
attr_accessor :size
def initialize(value,size)
#value = value
#size = size
end
end
def picker(options)
slots = []
totalSize = 0
options.each do |value,size|
slots << Slot.new(value,size)
totalSize += size
end
pick = rand(totalSize)
currentStack = 0
slots.each do |slot|
if (pick <= currentStack + slot.size)
return slot.value
else
currentStack += slot.size
end
end
return nil
end
50.times do
print picker({"A" => 40, "B" => 20, "C" => 40})
end
Which outputs:
CCCCACCCCAAACABAAACACACCCAABACABABACBAAACACCBACAAB
Is there a more efficient way to implement an algorithm that picks a random option, where each option has a different probability of being picked?
The simplest way is probably to write a case statement:
def get_random()
case rand(100) + 1
when 1..50 then 'A'
when 50..75 then 'B'
when 75..100 then 'C'
end
end
The problem with that is that you cannot pass any options, so you can write a function like this if you want it to be able to take options. The one below is very much like the one you wrote, but a bit shorter:
def picker(options)
current, max = 0, options.values.inject(:+)
random_value = rand(max) + 1
options.each do |key,val|
current += val
return key if random_value <= current
end
end
# A with 25% prob, B with 75%.
50.times do
print picker({"A" => 1, "B" => 3})
end
# => BBBBBBBBBABBABABBBBBBBBABBBBABBBBBABBBBBBABBBBBBBA
# If you add upp to 100, the number represent percentage.
50.times do
print picker({"A" => 40, "T" => 30, "C" => 20, "G" => 10})
end
# => GAAAATATTGTACCTCAATCCAGATACCTTAAGACCATTAAATCTTTACT
As a first approximation to a more efficient algorithm, if you compute the cumulative distribution function (which is just one pass over the distribution function, computing a running sum), then you can find the position of the randomly chosen integer using a binary search instead of a linear search. This will help if you have a lot of options, since it reduces the search time from O(#options) to O(log #options).
There is an O(1) solution, though. Here's the basic outline.
Let's say we have N options, 1...N, with weights ω1...ωN, where all of the ω values are at least 0. For simplicity, we scale the weights so their mean is 1, or in other words, their sum is N. (We just multiply them by N/Σω. We don't actually have to do this, but it makes the next couple of paragraphs easier to type without MathJax.)
Now, create a vector of N elements, where each element has a two option identifiers (lo and hi) and a cutoff p. The option identifiers are just integers 1...N, and p will be computed as a real number in the range (0, 1.0) inclusive.
We proceed to fill in the vector as follows. For each element i in turn:
If some ωj is exactly 1.0, then we set:
loi = j
hii = j
pi = 1.0
And we remove ωj from the list of weights.
Otherwise, there must be some ωj < 1.0 and some ωk > 1.0. (That's because the average weight is 1.0, and none of them have the average value. Some some of them must have less and some of them more, because it is impossible for all elements to be greater than the average or all elements to be less than the average.) Now, we set:
loi = j
hii = k
pi = ωj
ωk = ωk - (1 - ωj)
And once again, we remove ωj from the weights.
Note that in both cases, we have removed one weight, and we have reduced the sum of the weights by 1.0. So the average weight is still 1.0.
We continue in this fashion until the entire vector is filled. (The last element will have p = 1.0).
Given this vector, we can select a weighted random option as follows:
Generate a random integer i in the range 1...N and a random floating point value r in the range (0, 1.0]. If r < pi then we select option loi; otherwise, we select option hii.
It should be clear why this works from the construction of the vector. The weights of each above-average-weight option are distributed amongst the various vector elements, while each below-average-weight option is assigned to one part of some vector element with a corresponding probability of selection.
In a real implementation, we would map the range of weights onto integer values, and make the total weights close to the maximum integer (it has to be a multiple of N, so there will be some slosh.) We can then select a slot and select the weight inside the slot from a single random integer. In fact, we can modify the algorithm to avoid the division by forcing the number of slots to be a power of 2 by adding some 0-weighted options. Because the integer arithmetic will not work out perfectly, a bit of fiddling around will be necessary, but the end result can be made to be statistically correct, modulo the characteristics of the PRNG being used, and it will execute almost as fast as a simple unweighted selection of N options (one shift and a couple of comparisons extra), at the cost of a vector occupying less than 6N storage elements (counting the possibility of having to almost double the number of slots).
While this is not a direct answer I will show you a source for help you outline this problem: http://www.av8n.com/physics/arbitrary-probability.htm.
Edit:
Just found a nice source in ruby for that, pickup gem.
require 'pickup'
headings = {
A: 40,
B: 20,
C: 40,
}
pickup = Pickup.new(headings)
pickup.pick
#=> A
pickup.pick
#=> B
pickup.pick
#=> A
pickup.pick
#=> C
pickup.pick
#=> C

Randomly sampling unique subsets of an array

If I have an array:
a = [1,2,3]
How do I randomly select subsets of the array, such that the elements of each subset are unique? That is, for a the possible subsets would be:
[]
[1]
[2]
[3]
[1,2]
[2,3]
[1,2,3]
I can't generate all of the possible subsets as the real size of a is very big so there are many, many subsets. At the moment, I am using a 'random walk' idea - for each element of a, I 'flip a coin' and include it if the coin comes up heads - but I am not sure if this actually uniformly samples the space. It feels like it biases towards the middle, but this might just be my mind doing pattern-matching, as there will be more middle sized possiblities.
Am I using the right approach, or how should I be randomly sampling?
(I am aware that this is more of a language agnostic and 'mathsy' question, but I felt it wasn't really Mathoverflow material - I just need a practical answer.)
Just go ahead with your original "coin flipping" idea. It uniformly samples the space of possibilities.
It feels to you like it's biased towards the "middle", but that's because the number of possibilities is largest in the "middle". Think about it: there is only 1 possibility with no elements, and only 1 with all elements. There are N possibilities with 1 element, and N possibilities with (N-1) elements. As the number of elements chosen gets closer to (N/2), the number of possibilities grows very quickly.
You could generate random numbers, convert them to binary and choose the elements from your original array where the bits were 1. Here is an implementation of this as a monkey-patch for the Array class:
class Array
def random_subset(n=1)
raise ArgumentError, "negative argument" if n < 0
(1..n).map do
r = rand(2**self.size)
self.select.with_index { |el, i| r[i] == 1 }
end
end
end
Usage:
a.random_subset(3)
#=> [[3, 6, 9], [4, 5, 7, 8, 10], [1, 2, 3, 4, 6, 9]]
Generally this doesn't perform so bad, it's O(n*m) where n is the number of subsets you want and m is the length of the array.
I think the coin flipping is fine.
ar = ('a'..'j').to_a
p ar.select{ rand(2) == 0 }
An array with 10 elements has 2**10 possible combinations (including [ ] and all 10 elements) which is nothing more then 10 times (1 or 0). It does output more arrays of four, five and six elements, because there are a lot more of those in the powerset.
A way to select a random element from the power set is the following:
my_array = ('a'..'z').to_a
power_set_size = 2 ** my_array.length
random_subset = rand(power_set_size)
subset = []
random_subset.to_i(2).chars.each_with_index do |bit, corresponding_element|
subset << my_array[corresponding_element] if bit == "1"
end
This makes use of strings functions instead than working with real "bits" and bitwise operations just for my convenience. You can turn it into a faster (I guess) algorithm by using real bits.
What it does, is to encode the powerset of array as an integer between 0 and 2 ** array.length and then picks one of those integers at random (uniformly random, indeed). Then it decodes back the integer into a particular subset of array using a bitmask (1 = the element is in the subset, 0 = it is not).
In this way you have an uniform distribution over the power set of your array.
a.select {|element| rand(2) == 0 }
For each element, a coin is flipped. If heads ( == 0), then it is selected.

Resources