How do I solve this question about Pigeonhole Principle (Discrete Mathematics)? - algorithm

I am not understanding the following question. I mean I want to know the sample input output for this problem question: "The pigeonhole principle states that if a function f has n distinct inputs but less than n distinct outputs,then there exist two inputs a and b such that a!=b and f(a)=f(b). Present an algorithm to find a and b such that f(a)=f(b). Assume that the function inputs are 1,2,......,and n.?"
I am unable to solve this problem as I am not understanding the question clearly. looking for your help.

The pigeonhole principle says that if you have more items than boxes, at least one of the boxes must have multiple items in it.
If you want to find which items a != b have the property f(a) == f(b), a straightforward approach is to use a hashmap data structure. Use the function value f(x) as key to store the item value x. Iterate through the items, x=1,...,n. If there is no entry at f(x), store x. If there is, the current value of x and the value stored at f(x) are a pair of the type you're seeking.
In pseudocode:
h = {} # initialize an empty hashmap
for x in 1,...,n
if h[f(x)] is empty
h[f(x)] <- x # store x in the hashmap indexed by f(x)
else
(x, h[f(x)]) qualify as a match # do what you want with them
If you want to identify all pigeons who have roommates, initialize the hashmap with empty sets. Then iterate through the values and append the current value x to the set indexed by f(x). Finally, iterate through the hashmap and pick out all sets with more than one element.
Since you didn't specify a language, for the fun of it I decided to implement the latter algorithm in Ruby:
N = 10 # number of pigeons
# Create an array of value/function pairs.
# Using N-1 for range of rand guarantees at least one duplicate random
# number, and with the nature of randomness, quite likely more.
value_and_f = Array.new(N) { |index| [index, rand(N-1)]}
h = {} # new hash
puts "Value/function pairs..."
p value_and_f # print the value/function pairs
value_and_f.each do |x, key|
h[key] = [] unless h[key] # create an array if none exists for this key
h[key] << x # append the x to the array associated with this key
end
puts "\nConfirm which values share function mapping"
h.keys.each { |key| p h[key] if h[key].length > 1 }
Which produces the following output, for example:
Value/function pairs...
[[0, 0], [1, 3], [2, 1], [3, 6], [4, 7], [5, 4], [6, 0], [7, 1], [8, 0], [9, 3]]
Confirm which values share function mapping
[0, 6, 8]
[1, 9]
[2, 7]
Since this implementation uses randomness, it will produce different results each time you run it.

Well let's go step by step.
I have 2 boxes. My father gave me 3 chocolates....
And I want to put those chocolates in 2 boxes. For our benefit let's name the chocolate a,b,c.
So how many ways we can put them?
[ab][c]
[abc][]
[a][bc]
And you see something strange? There is atleast one box with more than 1 chocolate.
So what do you think?
You can try this with any number of boxes and chocolates ( more than number of boxes) and try this. You will see that it's right.
Well let's make it more easy:
I have 5 friends 3 rooms. We are having a party. And now let's see what happens. (All my friends will sit in any of the room)
I am claiming that there will be atleast one room where there will be more than 1 friend.
My friends are quite mischievious and knowing this they tried to prove me wrong.
Friend-1 selects room-1.
Friend-2 thinks why room-1? Then I will be correct so he selects room-2
Friend-3 also thinks same...he avoids 1 and 2 room and get into room-3
Friend-4 now comes and he understands that there is no other empty room and so he has to enter some room. And thus I become correct.
So you understand the situation?
There n friends (funtions) but unfortunately or (fortunately) their rooms (output values) are less than n. So ofcourse one of the there exists 2 friend of mine a and b who shares the same room.( same value f(a)=f(b))

Continuing what https://stackoverflow.com/a/42254627/7256243 said.
Lets say that you map an array A of length N to an array B with length N-1.
Than the result could be an array B; were for 1 index you would have 2 elements.
A = {1,2,3,4,5,6}
map A -> B
Were a possible solution could be.
B= {1,2,{3,4},5,6}
The mapping of A -> could be done in any number of ways.
Here in this example both input index of 3 and 4 in Array A have the same index in array B.
I hope this usefull.

Related

Interview Question: Remove repeating numbers at the end of an array

I got a surprising interview question today at a big Bay Area tech company that I was absolutely stumped by despite seeming so easy. Was wondering if anyone has seen it or can offer a simpler solution as the interviewer didn't want to show me the answer. The solution can be written in any language or pseudocode.
Question:
Given a list of numbers, remove any extraneous repeating suffix sequences of numbers that appear at the end of the list until it has no repeating suffix sequences. The repeating sequence can be cut-off.
For example:
[1,2,3,4,5,6,7,5,6,7,5,6] -> [1,2,3,4,5,6,7]
explanation: [5, 6, 7] were repeating
Also consider the situation
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5,] -> [1,2,3,4,5,4,5,1] # not [1,2,3,4,5,4,5,1,4,5,4,5,1]
explanation: [4,5,4,5,1] is a repeating sequence
There are always two ways to approach this topic. Finding any solution and finding an efficient one. It is usually better to start with any and then think on how to optimize it.
Now as we can see in the second example, the problem is complicated by the fact that the repeating pattern is not known. So we could just do it for all the possible patterns at the end. Then we would need to check two things
is it actually repeating
how long is the result
Then we could just take the shortest result. Here is the Python code:
def remove_repeating_tail(a: list) -> list:
results = []
for i in range(len(a)):
tail = a[i:]
results.append(remove_repeats(a, tail))
if len(results) == 0:
return a
return sorted(results, key=len)[0]
Also we made sure we cover all the cases. Empty list, no repeating pattern. Next we need to write remove_repeats. Also we check the empty repeating pattern, so we need to be aware of that.
def remove_repeats(a: list, tail: list) -> list:
assert len(tail) <= len(a)
if len(tail) == 0:
return a
remainder = a
count = 0
while remainder[-len(tail):] == tail:
remainder = remainder[:-len(tail)]
count += 1
if count <= 1:
return a
return remainder
We remove the repeating pattern and then add it back at the end. Now it's time to test the code if it actually works, if that is possible in the interview.
remove_repeating_tail([1,2,3,4,5,6,7,5,6,7,5,6])
-> [1, 2, 3, 4, 5, 6]
remove_repeating_tail([1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5])
-> [1, 2, 3, 4, 5, 4, 5]
Also good to check some other cases:
remove_repeating_tail([1,2,3,4])
-> [1, 2, 3, 4]
remove_repeating_tail([])
-> []
After quite a bit of fixing we got the above, which I think is correct. In particular I missed:
first I had an infinite loop in remove_repeats for an empty tail
remove_repeats removed always the tail and sometimes everything, as I wasn't checking that there is at least one repeat. I then added the counting.
I made simple mistakes like writing results = res instead of results.append(res) leading to some Exceptions.
Then a lot of simplification. First I used some sentinel None to communicate back that it is not repeating, but we could just return the whole list. Then I checked the repeating with some if before the while loop, but realized its basically doing the same as the first iteration, so I used counting.
Similarly I don't like the if len(results) == 0: check. I would probably add a to the result in the beginning and remove the check, as now there is always a result. Then we could start the counting from 1 instead of 0. Still I kept it in.
If we want something fast, we first need to analyze the complexity.
So remove repeating tails for a list of size n and tail size k is: O(n / k). Then we call this function n times. And then we sort it. Wait why do we sort it, we could just take the minimum return min(results, key=len). That's better.
In each loop we call remove_repeats starting with k = 1 to n. So we have:
sum(k = 1 .. n) O(n / k). This is n / 1 + n / 2 + n / 3 + .. n / n. I had to look this up on Wikipedia, but these are called harmonic numbers. We can also just make our live easy and say its less than O(n^2) for now. Otherwise I found an approximation of H_n = n ln(n) + 0.5 n here. So the complexity overall is O(n log n). Not to bad I would say. Is it the optimal? Maybe. Here I would compare it to some other similar algorithms (like substring search, etc).
Before going there, at this point, I would check with the interviewer, where he would like to go next. As there are many directions.
This seems a tricky question and there may not be a simple solution. Best solution I can think of would be O(n) time and O(n) and that is if I am not missing any edge case.
Let's take as example
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5] -> [1,2,3,4,5,4,5,1]
Steps would be as follows:
Iterate over the input array from last index to first and build a dictionary (hashtable) with every number in the array being a key and value: a list of positions where the specific number is found in the array.
Occurrences dictionary will become:
{
5: [14, 11, 9, 6, 4],
4: [13, 10, 8, 5, 3],
1: [12, 7, 0],
3: [2]
2: [1]
}
Find the possible suffix lengths by calculating deltas between every position and first position for every number. This way we take into consideration the case in which a specific number repeats in the suffix or in the prefix.
We then add each distinct possible suffix length to a set.
We sort the possible suffix lengths in descending order.
We get following suffix lengths:
[12, 10, 7, 5, 2]
For every possible length l, we test if arr[n-1] == arr[n-1-l]. If l is our suffix's length, it means that the number at last position is repeated at exactly l positions before. We then check the last l elements to respect the same condition. If they do, we found the maximum suffix length. If not, the max suffix length is even smaller, so we check the next possible length.
After finding the correct suffix length, we delete the remaining numbers that repeat at positions pos-l. We then return the slice of array with suffix removed.
def removeRepeatingSuffixes(arr):
if not arr:
return []
n = len(arr)
occurrences = {}
for i in range(n - 1, -1, -1):
c = arr[i]
if c not in occurrences:
occurrences[c] = []
occurrences[c].append(i)
# treat edge case: no repeating suffix
if len(occurrences[arr[n-1]]) == 1:
return arr
# create a set of possible suffix lengths,
# based on the differences between the positions of each number.
possible_suffixes_lengths_set = set()
for c, olist in occurrences.items():
if len(olist) >= 2:
for i in range(len(olist)-1):
delta = olist[i] - olist[len(olist)-1]
possible_suffixes_lengths_set.add(delta)
suff_lengths = sorted(possible_suffixes_lengths_set, reverse=True)
for l in suff_lengths:
if arr[n - 1] == arr[n - 1 - l]:
# possible suffix length, check if last l characters repeat
ok_length = True
for j in range(n-2, n-1-l, -1):
if arr[j] != arr[j-l]:
ok_length = False
break
if ok_length:
last_i = n-1-l
while last_i > 0 and arr[last_i] == arr[last_i - l]:
last_i -= 1
# return non-repeating slice, from 0 to last_i
return arr[0:last_i + 1]
quick way to remove repeating or dedupe is change to a type set() instead of a list

Preallocate or change size of vector

I have a situation where I have a process which needs to "burn-in". This means that I
Start with p values, p relatively small
For n>p, generate nth value using most recently generated p values (e.g. p+1 generated from values 1 to p, p+2 generated from values 2, p+1, etc.)
Repeat until n=N, where N large
Now, only the most recently generated p values will be useful to me, so there are two ways for me to implement this. I can either
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value or,
Preallocate a large array of length N, where first p elements are initial values. At iteration n, mutate nth value with most recently generated value
There are pros and cons to both approaches.
Pros of the first, are that we only store most relevant values. Cons of the first are that we are changing the length of the vector at each iteration.
Pros of the second are that we preallocate all the memory we need. Cons of the second is that we store much more than we need.
What is the best way to proceed? Does it depend on what aspect of performance I most need to care about? What will be the quickest?
Cheers in advance.
edit: approximately, p is usually in the order of low tens, N can be several thousand
The first solution has another huge cons: removing the first item of an array takes O(n) time since elements should be moved in memory. This certainly cause the algorithm to runs in quadratic time which is not reasonable. Shifting the items as proposed by #ForceBru should also cause this quadratic run time (since many items are moved just to add one value every time).
The second solution should be pretty fast compared to the first but, indeed, it can use a lot of memory so it should be sub-optimal (it takes time to write values in the RAM).
A faster solution is to use a data structure called a deque. Such data structure enable you to remove the first item in constant time and append a new value at the end also in constant time. That being said, it also introduces some overhead to be able to do that. Julia provide such data structure (more especially queues).
Since the number of in-flight items appears to be bounded in your algorithm, you can implement a rolling buffer. Fortunately, Julia also implement this: see CircularBuffer. This solution should be quite simple and fast (since the operations you want to do are done in O(1) time on it).
It is probably simplest to use CircularArrays.jl for your use case:
julia> using CircularArrays
julia> c = CircularArray([1,2,3,4])
4-element CircularVector(::Vector{Int64}):
1
2
3
4
julia> for i in 5:10
c[i] = i
#show c
end
c = [5, 2, 3, 4]
c = [5, 6, 3, 4]
c = [5, 6, 7, 4]
c = [5, 6, 7, 8]
c = [9, 6, 7, 8]
c = [9, 10, 7, 8]
In this way - as you can see - you can can continue using an increasing index and array will wrap around internally as needed (discarding old values that are not needed any more).
In this way you always store last p values in the array without having to copy anything or re-allocate memory in each step.
...only the most recently generated p values will be useful to me...
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value.
Cons of the first are that we are changing the length of the vector at each iteration.
There's no need to change the length of the vector. Simply shift its elements to the left (overwriting the first element) and write the new data to the_vector[end]:
the_vector = [1,2,3,4,5,6]
function shift_and_add!(vec::AbstractVector, value)
vec[1:end-1] .= #view vec[2:end] # shift
vec[end] = value # replace the last value
vec
end
#assert shift_and_add!(the_vector, 80) == [2,3,4,5,6,80]
# `the_vector` will be mutated
#assert the_vector == [2,3,4,5,6,80]

Dividing an array into equally weighted subarrays

Algorithm question here.
I have an unordered array containing product weights, e.g. [3, 2, 5, 5, 8] which need to be divided up into smaller arrays.
Rules:
REQUIRED: Should return 1 or more arrays.
REQUIRED: No array should sum to more than 12.
REQUIRED: Return minimum possible number of arrays, ex. total weight of example above is 23, which can fit into two arrays.
IDEALLY: Arrays should be weighted as evenly as possible.
In the example above, the ideal return would be [ [3, 8], [2, 5, 5] ]
My current thoughts:
Number of arrays to return will be (sum(input_array) / 12).ceil
A greedy algorithm could work well enough?
This is a combination of the bin packing problem and multiprocessor scheduling problem. Both are NP-hard.
Your three requirements constitute the bin packing problem: find the minimal number of bins of a fixed size (12) that fit all the numbers.
Once you solve that, you have the multiprocessor scheduling problem: given a fixed number of bins, what is the most even way to distribute the numbers among them.
There are number of well-known approximate algorithms for both problems.
How about something with an entirely different take on it. Something really simple. Like this, which is based on common horse sense:
module Splitter
def self.split(values, max_length = 12)
# if the sum of all values is lower than the max_length there
# is no point in continuing
return values unless self.sum(values) > max_length
optimized = []
current = []
# start off by ordering the values. perhaps it's a good idea
# to start off with the smallest values first; this will result
# in gathering as much values as possible in the first array. This
# in order to conform to the rule "Should return minimum possible
# number of arrays"
ordered_values = values.sort
ordered_values.each do |v|
if self.sum(current) + v > max_length
# finish up the current iteration if we've got an optimized pair
optimized.push(current)
# reset for the next iteration
current = []
end
current.push(v)
end
# push the last iteration
optimized.push(current)
return optimized
end
# calculates the sum of a collection of numbers
def self.sum(numbers)
if numbers.empty?
return 0
else
return numbers.inject{|sum,x| sum + x }
end
end
end
Which can be used like so:
product_weights = [3, 2, 5, 5, 8]
p Splitter.split(product_weights)
The output will be:
[[2, 3, 5], [5], [8]]
Now, as said before, this is a really simple sample. And I've excluded the validations for empty or non-numeric values in the array for brevity. But it does seem to conform to your primary requirements:
Splitting the (expected: all numeric) values into arrays
With a ceiling per array, defaulting in the sample to 12
Return minimum amount of arrays by collection the smallest numbers first EDIT: after the edit from the comments, this indeed doesn't work
I do have some doubts regarding the comment on "returning minimum possible number of arrays, and balance the weights throughout those as evenly as possible". I'm sure someone else will come up with an implementation of a better and math-proven algorithm that conforms to that requirement, but perhaps this is at least a suitable example for the discussion?

Randomly sampling unique subsets of an array

If I have an array:
a = [1,2,3]
How do I randomly select subsets of the array, such that the elements of each subset are unique? That is, for a the possible subsets would be:
[]
[1]
[2]
[3]
[1,2]
[2,3]
[1,2,3]
I can't generate all of the possible subsets as the real size of a is very big so there are many, many subsets. At the moment, I am using a 'random walk' idea - for each element of a, I 'flip a coin' and include it if the coin comes up heads - but I am not sure if this actually uniformly samples the space. It feels like it biases towards the middle, but this might just be my mind doing pattern-matching, as there will be more middle sized possiblities.
Am I using the right approach, or how should I be randomly sampling?
(I am aware that this is more of a language agnostic and 'mathsy' question, but I felt it wasn't really Mathoverflow material - I just need a practical answer.)
Just go ahead with your original "coin flipping" idea. It uniformly samples the space of possibilities.
It feels to you like it's biased towards the "middle", but that's because the number of possibilities is largest in the "middle". Think about it: there is only 1 possibility with no elements, and only 1 with all elements. There are N possibilities with 1 element, and N possibilities with (N-1) elements. As the number of elements chosen gets closer to (N/2), the number of possibilities grows very quickly.
You could generate random numbers, convert them to binary and choose the elements from your original array where the bits were 1. Here is an implementation of this as a monkey-patch for the Array class:
class Array
def random_subset(n=1)
raise ArgumentError, "negative argument" if n < 0
(1..n).map do
r = rand(2**self.size)
self.select.with_index { |el, i| r[i] == 1 }
end
end
end
Usage:
a.random_subset(3)
#=> [[3, 6, 9], [4, 5, 7, 8, 10], [1, 2, 3, 4, 6, 9]]
Generally this doesn't perform so bad, it's O(n*m) where n is the number of subsets you want and m is the length of the array.
I think the coin flipping is fine.
ar = ('a'..'j').to_a
p ar.select{ rand(2) == 0 }
An array with 10 elements has 2**10 possible combinations (including [ ] and all 10 elements) which is nothing more then 10 times (1 or 0). It does output more arrays of four, five and six elements, because there are a lot more of those in the powerset.
A way to select a random element from the power set is the following:
my_array = ('a'..'z').to_a
power_set_size = 2 ** my_array.length
random_subset = rand(power_set_size)
subset = []
random_subset.to_i(2).chars.each_with_index do |bit, corresponding_element|
subset << my_array[corresponding_element] if bit == "1"
end
This makes use of strings functions instead than working with real "bits" and bitwise operations just for my convenience. You can turn it into a faster (I guess) algorithm by using real bits.
What it does, is to encode the powerset of array as an integer between 0 and 2 ** array.length and then picks one of those integers at random (uniformly random, indeed). Then it decodes back the integer into a particular subset of array using a bitmask (1 = the element is in the subset, 0 = it is not).
In this way you have an uniform distribution over the power set of your array.
a.select {|element| rand(2) == 0 }
For each element, a coin is flipped. If heads ( == 0), then it is selected.

Searching through arrays in Ruby using a range

i've seen a lot of other questions touch on the subject but nothing as on topic as to provide an answer for my particular problem. Is there a way to search an array and return values within a given range...
for clarity I have one array = [0,5,12]
I would like to compare array to another array (array2) using a range of numbers.
Using array[0] as a starting point how would I return all values from array2 +/- 4 of array[0].
In this particular case the returned numbers from array2 will be within the range of -4 and 4.
Thanks for the help ninjas.
Build a Range that is your target ±4 and then use Enumerable#select (remember that Array includes Enumerable) and Range#include?.
For example, let us look for 11±4 in an array that contains the integers between 1 and 100 (inclusive):
a = (1..100).to_a
r = 11-4 .. 11+4
a.select { |i| r.include?(i) }
# [7, 8, 9, 10, 11, 12, 13, 14, 15]
If you don't care about preserving order in your output and you don't have any duplicates in your array you could do it this way:
a & (c-w .. c+w).to_a
Where c is the center of your interval and w is the interval's width. Using Array#& treats the arrays as sets so it will remove duplicates and is not guaranteed to preserver order.

Resources