Finding continuous number sequence - algorithm

How to find the longest continuous number sequence in array of number arrays? Each array of numbers represent one or zero numbers in resulting sequence.
Example ([] - represents array (like in javascript)):
[
[1, 5, 6],
[7],
[22, 34],
[500, 550],
[60, 1],
[90, 100],
[243],
[250, 110],
[150],
[155],
[160]
]
Correct output would be: [1, 7, 22, 60, 90, 110, 150, 155, 160]
Detailed output:
1, -- index 1 all 1, 5 and 6 would match here, pick the smallest
7, -- index 2
22, -- index 3
-- index 4 skipped, the sequence would end here or wouldn't be the longest possible
60, -- index 5 picked 60, because 1 wouldn't continue in the sequence
90, -- index 6
-- index 7 skipped, the sequence would end here or wouldn't be the longest possible
110, -- index 8
150, -- index 9
155, -- index 10
160 -- index 11

A possible approach is to use dynamic programming using as parameters the last value and the index of first sub-array to consider.
This is a solution in Python based on recursion with memoization
data = [[1, 5, 6],
[7],
[22, 34],
[500, 550],
[60, 1],
[90, 100],
[243],
[250, 110],
[150],
[155],
[160]]
def longest(x0, i, _cache=dict()):
if i == len(data):
return []
try:
return _cache[x0, i]
except KeyError:
best = longest(x0, i+1)
for x in data[i]:
if x >= x0:
L = [x] + longest(x, i+1)
if len(L) > len(best):
best = L
_cache[x0, i] = best
return best
print longest(0, 0)

Related

how do i reverse individual (and specific) columns in a 2d array (RUBY)

RUBY the goal is to get the max value from each of the four zones and get their sum.
UPDATE I came up with a solution, I'm sorry about the mixup. It turned out that the matrix is a 2n x 2n matrix so it could have been greater or smaller than 4x4 in fact it. The solution i wrote below worked on all of the test cases. Here is a link to the problem
I tried doing matrix.transpose then I tried reversing the specific array, that didn't work for all edge cases.
Here is the code for that
def flippingMatrix(matrix)
2.times do
matrix = matrix.transpose
matrix = matrix.map do |array|
if (array[-1] == array.max) || (array[-2] == array.max)
array.reverse
else
array
end
end
end
return matrix[0][0] + matrix[0][1] + matrix[1][0] + matrix[1][1]
end
I gave up and tried the below, which in my mind works, it also works for most edge cases but not all.
But i'm getting an error (undefined method `[]' for nil:NilClass (NoMethodError))
keep in mind when I print the results or spot_1, spot_2, spot_3, or spot_4 I get the correct answer. does anyone have an idea why this is happening?
Here is a matrix that FAILED
[
[517, 37, 380, 3727],
[3049, 1181, 2690, 1587],
[3227, 3500, 2665, 383],
[4041, 2013, 384, 965]
]
**expected output: 12,881 (this fails)**
**because 4041 + 2013 + 3227 + 3500 = 12,881**
Here is a matrix that PASSED
[
[112, 42, 83, 119],
[56, 125, 56, 49],
[15, 78, 101, 43],
[62, 98, 114, 108],
]
**expected output: 414 (this passes)**
Here is the code
def flippingMatrix(matrix)
# Write your code here
spot_1 = [matrix[0][0] , matrix[0][3] , matrix[3][0] , matrix[3][3]].max
spot_2 = [matrix[0][1] , matrix[0][2] , matrix[3][1] , matrix[3][2]].max
spot_3 = [matrix[1][0] , matrix[1][3] , matrix[2][0] , matrix[2][3]].max
spot_4 = [matrix[1][1] , matrix[1][2] , matrix[2][1] , matrix[2][2]].max
return (spot_1 + spot_2 + spot_3 + spot_4)
end
I will answer your question and at the same time suggest two other ways to obtain the desired sum.
Suppose
arr = [
[ 1, 30, 40, 2],
[300, 4000, 1000, 200],
[400, 3000, 2000, 100],
[ 4, 10, 20, 3]
]
First solution
We see that the desired return value is 4444. This corresponds to
A B B A
C D D C
C D D C
A B B A
First create three helper methods.
Compute the largest value among the four inner elements
def mx(arr)
[arr[1][1], arr[1][2], arr[2][1], arr[2][2]].max
end
mx(arr)
#=> 4000
This is the largest of the "D" values.
Reverse the first two and last two rows
def row_flip(arr)
[arr[1], arr[0], arr[3], arr[2]]
end
row_flip(arr)
#=> [[300, 4000, 1000, 200],
# [ 1, 30, 40, 2],
# [ 4, 10, 20, 3],
# [400, 3000, 2000, 100]]
This allows us to use the method mx to obtain the largest of the "B" values.
Reverse the first two and last two columns
def column_flip(arr)
row_flip(arr.transpose).transpose
end
column_flip(arr)
#= [[ 30, 1, 2, 40],
# [4000, 300, 200, 1000],
# [3000, 400, 100, 2000],
# [ 10, 4, 3, 20]]
This allows us to use the method mx to obtain the largest of the "C" values.
Lastly, the maximum of the "A" values can be computed as follows.
t = row_flip(column_flip(arr))
#=> [[4000, 300, 200, 1000],
# [ 30, 1, 2, 40],
# [ 10, 4, 3, 20],
# [3000, 400, 100, 2000]]
mx(column_flip(t))
#=> 4
The sum of the maximum values may therefore be computed as follows.
def sum_max(arr)
t = column_flip(arr)
mx(arr) + mx(row_flip(arr)) + mx(t) + mx(row_flip(t))
end
sum_max(arr)
#=> 4444
Second solution
Another way is the following:
[0, 1].product([0, 1]).sum do |i, j|
[arr[i][j], arr[i][-j-1], arr[-i-1][j], arr[-i-1][-j-1]].max
end
#=> 4444
To see how this works let me break this into two statements add a puts statement. Note that, for each of the groups A, B, C and D, the block variables i and j are the row and column indices of the top-left element of the group.
top_left_indices = [0, 1].product([0, 1])
#=> [[0, 0], [0, 1], [1, 0], [1, 1]]
top_left_indices.sum do |i, j|
a = [arr[i][j], arr[i][-j-1], arr[-i-1][j], arr[-i-1][-j-1]]
puts a
a.max
end
#=> 4444
The prints the following.
[1, 2, 4, 3]
[30, 40, 10, 20]
[300, 200, 400, 100]
[4000, 1000, 3000, 2000]
ahhh I came up with an answer that answers all edge cases. I originally saw something like this in Javascript and kind of turned it into Ruby. Apparently some of the hidden edge cases (that were hidden) weren't all 4 by 4 some were smaller and some larger, that was the cause of the nil error.
Here is the solution:
def flippingMatrix(matrix)
total = []
(0...(matrix.length/2)).each do |idx1|
(0...(matrix.length/2)).each do |idx2|
total << [matrix[idx1][idx2],
matrix[(matrix.length - 1)-idx1][idx2],
matrix[idx1][(matrix.length - 1)-idx2],
matrix[(matrix.length - 1)-idx1][(matrix.length - 1)-idx2]].max
end
end
total.sum
end
Thank you all for your support! I hope this helps someone in the near future.

Is there a way to specify a multi-step in Ruby?

I'm working with some lazy iteration, and would like to be able to specify a multiple step for this iteration. This means that I want the step to alternate between a and b. So, if I had this as a range (not lazy just for simplification)
(1..20).step(2, 4)
I would want my resulting range to be
1 # + 2 =
3 # + 4 =
7 # + 2 =
9 # + 4 =
13 # + 2 =
15 # + 4 =
19 # + 2 = 21 (out of range, STOP ITERATION)
However, I cannot figure out a way to do this. Is this at all possible in Ruby?
You could use a combination of cycle and Enumerator :
class Range
def multi_step(*steps)
a = min
Enumerator.new do |yielder|
steps.cycle do |step|
yielder << a
a += step
break if a > max
end
end
end
end
p (1..20).multi_step(2, 4).to_a
#=> [1, 3, 7, 9, 13, 15, 19]
Note that the first element is 1, because the first element of (1..20).step(2) is also 1.
It takes exclude_end? into account :
p (1...19).multi_step(2, 4).to_a
#=> [1, 3, 7, 9, 13, 15]
And can be lazy :
p (0..2).multi_step(1,-1).first(20)
#=> [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
p (0..Float::INFINITY).multi_step(*(1..100).to_a).lazy.map{|x| x*2}.first(20)
#=> [0, 2, 6, 12, 20, 30, 42, 56, 72, 90, 110, 132, 156, 182, 210, 240, 272, 306, 342, 380]
Here's a variant of FizzBuzz, which generates all the multiples of 3 or 5 but not 15 :
p (3..50).multi_step(2,1,3,1,2,6).to_a
#=> [3, 5, 6, 9, 10, 12, 18, 20, 21, 24, 25, 27, 33, 35, 36, 39, 40, 42, 48, 50]
Ruby doesn't have a built-in method for stepping with multiple values. However, if you don't actually need a lazy method, you can use Enumerable#cycle with an accumulator. For example:
range = 1..20
accum = range.min
[2, 4].cycle(range.max) { |step| accum += step; puts accum }
Alternatively, you could construct your own lazy enumerator with Enumerator::Lazy. That seems like overkill for the given example, but may be useful if you have an extremely large Range object.

Recursive merge sort in Ruby

I am trying to write a ruby method which performs a merge sort recursively. I have the method working, but It's one of those times where I accidentally got it working so I have no idea WHY it works, and would love to understand how the code I have written works. In psuedocode, the steps I followed look like this.
Split the original array of length n until I have n arrays of length 1
Merge and sort 2 arrays of length m at time to return an array of length m*2
Repeat the step above until I have a single now sorted array of length n
Basically what this looks like to me is a large tree branching out into n branches, with each branch containing an array of length 1. Then I need to take these n branches and somehow merge them back into a single branch within the method.
def merge_sort(arr)
return arr if arr.length == 1
merge(merge_sort(arr.slice(0, arr.length/2)),
merge_sort(arr.slice(arr.length/2, arr[-1])))
end
def merge(arr1, arr2)
sorted = []
begin
less_than = arr1[0] <=> arr2[0]
less_than = (arr1[0] == nil ? 1 : -1) if less_than == nil
case less_than
when -1
sorted << arr1[0]
arr1 = arr1.drop(1)
when 0
sorted << arr1[0]
sorted << arr2[0]
arr1 = arr1.drop(1)
arr2 = arr2.drop(1)
when 1
sorted << arr2[0]
arr2 = arr2.drop(1)
end
end until (arr1.length == 0 && arr2.length == 0)
sorted
end
merge_sort([1,6,3,8,22,3,11,24,54,68,79,80,98,65,46,76,53])
#Returns => [1, 3, 3, 6, 8, 11, 22, 24, 46, 53, 54, 65, 68, 76, 79, 80, 98]
The method I have actually correctly sorts the list, but I am not totally sure how the method combines each branch and then returns the sorted merged list, rather than just the first two length one arrays it combines.
Also, If anyone has ideas for how I can make the merge method prettier to look more like the ruby code I have grown to love please let me know.
Here is my implementation of mergesort in Ruby
def mergesort(array)
return array if array.length == 1
middle = array.length / 2
merge mergesort(array[0...middle]), mergesort(array[middle..-1])
end
def merge(left, right)
result = []
until left.length == 0 || right.length == 0 do
result << (left.first <= right.first ? left.shift : right.shift)
end
result + left + right
end
As you can see, the mergesort method is basically the same as yours, and this is where the recursion occurs so that is what I will focus on.
First, you have your base case: return array if array.length == 1 This is what allows the recursion to work and not go on indefinitely.
Next, in my implementation I have defined a variable middle to represent the middle of the array: middle = array.length / 2
Finally, the third line is where all the work occurs: merge mergesort(array[0...middle]), mergesort(array[middle..-1])
What you are doing here is telling the merge method to merge the mergesorted left half with the mergesorted right half.
If you assume your input array is [9, 1, 5, 4] what you are saying is merge mergesort([9, 1]), mergesort([5, 4]).
In order to perform the merge, you first have to mergesort [9, 1] and mergesort [5, 4]. The recursion then becomes
merge((merge mergesort([9]), mergesort([1])), (merge mergesort([5]), mergesort([4])))
When we recurse again, the mergesort([9]) has reached the base case and returns [9]. Similarly, mergesort([1]) has also reached the base case and returns [1]. Now you can merge [9] and [1]. The result of the merge is [1, 9].
Now for the other side of the merge. We have to figure out the result of merge mergesort([5]), mergesort([4]) before we can merge it with [1, 9]. Following the same procedure as the left side, we get to the base case of [5] and [4] and merge those to get [4, 5].
Now we need to merge [1, 9] with [4, 5].
On the first pass, result receives 1 because 1 <= 4.
On the next pass, we are working with result = [1], left = [9], and right = [4, 5]. When we see if left.first <= right.first we see that it is false, so we return right.shift, or 4. Now result = [1, 4].
On the third pass, we are working with result = [1, 4], left = [9], and right = [5]. When we see if left.first <= right.first we see that it is false, so we return right.shift, or 5. Now result = [1, 4, 5].
Here the loop ends because right.length == 0.
We simply concatenate result + left + right or [1, 4, 5] + [9] + [], which results in a sorted array.
Here is my version of a recursive merge_sort method for Ruby. Which does the exact same as above, but slightly different.
def merge_sort(array)
array.length <= 1 ? array : merge_helper(merge_sort(array[0...array.length / 2]), merge_sort(array[array.length / 2..-1]))
end
def merge_helper(left, right, merged = [])
left.first <= right.first ? merged << left.shift : merged << right.shift until left.length < 1 || right.length < 1
merged + left + right
end
p merge_sort([]) # => []
p merge_sort([20, 8]) # => [8, 20]
p merge_sort([16, 14, 11]) # => [11, 14, 16]
p merge_sort([18, 4, 7, 19, 17]) # => [4, 7, 17, 18, 19]
p merge_sort([10, 12, 15, 13, 16, 7, 19, 2]) # => [2, 7, 10, 12, 13, 15, 16, 19]
p merge_sort([3, 14, 10, 8, 11, 7, 18, 17, 2, 5, 9, 20, 19]) # => [2, 3, 5, 7, 8, 9, 10, 11, 14, 17, 18, 19, 20]

Spread objects evenly over multiple collections

The scenario is that there are n objects, of different sizes, unevenly spread over m buckets. The size of a bucket is the sum of all of the object sizes that it contains. It now happens that the sizes of the buckets are varying wildly.
What would be a good algorithm if I want to spread those objects evenly over those buckets so that the total size of each bucket would be about the same? It would be nice if the algorithm leaned towards less move size over a perfectly even spread.
I have this naïve, ineffective, and buggy solution in Ruby.
buckets = [ [10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2] ]
avg_size = buckets.flatten.reduce(:+) / buckets.count + 1
large_buckets = buckets.take_while {|arr| arr.reduce(:+) >= avg_size}.to_a
large_buckets.each do |large|
smallest = buckets.last
until ((small_sum = smallest.reduce(:+)) >= avg_size)
break if small_sum + large.last >= avg_size
smallest << large.pop
end
buckets.insert(0, buckets.pop)
end
=> [[3, 1, 1, 1, 2, 3], [2, 1, 2, 3, 3], [10, 4], [5, 5]]
I believe this is a variant of the bin packing problem, and as such it is NP-hard. Your answer is essentially a variant of the first fit decreasing heuristic, which is a pretty good heuristic. That said, I believe that the following will give better results.
Sort each individual bucket in descending size order, using a balanced binary tree.
Calculate average size.
Sort the buckets with size less than average (the "too-small buckets") in descending size order, using a balanced binary tree.
Sort the buckets with size greater than average (the "too-large buckets") in order of the size of their greatest elements, using a balanced binary tree (so the bucket with {9, 1} would come first and the bucket with {8, 5} would come second).
Pass1: Remove the largest element from the bucket with the largest element; if this reduces its size below the average, then replace the removed element and remove the bucket from the balanced binary tree of "too-large buckets"; else place the element in the smallest bucket, and re-index the two modified buckets to reflect the new smallest bucket and the new "too-large bucket" with the largest element. Continue iterating until you've removed all of the "too-large buckets."
Pass2: Iterate through the "too-small buckets" from smallest to largest, and select the best-fitting elements from the largest "too-large bucket" without causing it to become a "too-small bucket;" iterate through the remaining "too-large buckets" from largest to smallest, removing the best-fitting elements from them without causing them to become "too-small buckets." Do the same for the remaining "too-small buckets." The results of this variant won't be as good as they are for the more complex variant because it won't shift buckets from the "too-large" to the "too-small" category or vice versa (hence the search space will be smaller), but this also means that it has much simpler halting conditions (simply iterate through all of the "too-small" buckets and then halt), whereas the complex variant might cause an infinite loop if you're not careful.
The idea is that by moving the largest elements in Pass1 you make it easier to more precisely match up the buckets' sizes in Pass2. You use balanced binary trees so that you can quickly re-index the buckets or the trees of buckets after removing or adding an element, but you could use linked lists instead (the balanced binary trees would have better worst-case performance but the linked lists might have better average-case performance). By performing a best-fit instead of a first-fit in Pass2 you're less likely to perform useless moves (e.g. moving a size-10 object from a bucket that's 5 greater than average into a bucket that's 5 less than average - first fit would blindly perform the movie, best-fit would either query the next "too-large bucket" for a better-sized object or else would remove the "too-small bucket" from the bucket tree).
I ended up with something like this.
Sort the buckets in descending size order.
Sort each individual bucket in descending size order.
Calculate average size.
Iterate over each bucket with a size larger than average size.
Move objects in size order from those buckets to the smallest bucket until either the large bucket is smaller than average size or the target bucket reaches average size.
Ruby code example
require 'pp'
def average_size(buckets)
(buckets.flatten.reduce(:+).to_f / buckets.count + 0.5).to_i
end
def spread_evenly(buckets)
average = average_size(buckets)
large_buckets = buckets.take_while {|arr| arr.reduce(:+) >= average}.to_a
large_buckets.each do |large_bucket|
smallest_bucket = buckets.last
smallest_size = smallest_bucket.reduce(:+)
large_size = large_bucket.reduce(:+)
until (smallest_size >= average)
break if large_size <= average
if smallest_size + large_bucket.last > average and large_size > average
buckets.unshift buckets.pop
smallest_bucket = buckets.last
smallest_size = smallest_bucket.reduce(:+)
end
smallest_size += smallest_object = large_bucket.pop
large_size -= smallest_object
smallest_bucket << smallest_object
end
buckets.unshift buckets.pop if smallest_size >= average
end
buckets
end
test_buckets = [
[ [10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2] ],
[ [4, 3, 3, 2, 2, 2, 2, 1, 1], [10, 5, 3, 2, 1], [3, 3, 3], [6] ],
[ [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1], [1, 1] ],
[ [10, 9, 8, 7], [6, 5, 4], [3, 2], [1] ],
]
test_buckets.each do |buckets|
puts "Before spread with average of #{average_size(buckets)}:"
pp buckets
result = spread_evenly(buckets)
puts "Result and sum of each bucket:"
pp result
sizes = result.map {|bucket| bucket.reduce :+}
pp sizes
puts
end
Output:
Before spread with average of 12:
[[10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2]]
Result and sum of each bucket:
[[3, 1, 1, 4, 1, 2], [2, 1, 2, 3, 3], [10], [5, 5, 3]]
[12, 11, 10, 13]
Before spread with average of 14:
[[4, 3, 3, 2, 2, 2, 2, 1, 1], [10, 5, 3, 2, 1], [3, 3, 3], [6]]
Result and sum of each bucket:
[[3, 3, 3, 2, 3], [6, 1, 1, 2, 2, 1], [4, 3, 3, 2, 2], [10, 5]]
[14, 13, 14, 15]
Before spread with average of 4:
[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1], [1, 1]]
Result and sum of each bucket:
[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
[4, 4, 4, 4, 4]
Before spread with average of 14:
[[10, 9, 8, 7], [6, 5, 4], [3, 2], [1]]
Result and sum of each bucket:
[[1, 7, 9], [10], [6, 5, 4], [3, 2, 8]]
[17, 10, 15, 13]
This isn't bin packing as others have suggested. There the size of bins is fixed and you are trying to minimize the number. Here you are trying to minimize the variance among a fixed number of bins.
It turns out this is equivalent to Multiprocessor Scheduling, and - according to the reference - the algorithm below (known as "Longest Job First" or "Longest Processing Time First") is certain to produce a largest sum no more than 4/3 - 1/(3m) times optimal, where m is the number of buckets. In the test cases shonw, we'd have 4/3-1/12 = 5/4 or no more than 25% above optimal.
We just start with all bins empty, and put each item in decreasing order of size into the currently least full bin. We can track the least full bin efficiently with a min heap. With a heap having O(log n) insert and deletemin, the algorithm has O(n log m) time (n and m defined as #Jonas Elfström says). Ruby is very expressive here: only 9 sloc for the algorithm itself.
Here is code. I am not a Ruby expert, so please feel free to suggest better ways. I am using #Jonas Elfström's test cases.
require 'algorithms'
require 'pp'
test_buckets = [
[ [10, 4, 3, 3, 2, 1], [5, 5, 3, 2, 1], [3, 1, 1], [2] ],
[ [4, 3, 3, 2, 2, 2, 2, 1, 1], [10, 5, 3, 2, 1], [3, 3, 3], [6] ],
[ [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1], [1, 1] ],
[ [10, 9, 8, 7], [6, 5, 4], [3, 2], [1] ],
]
def relevel(buckets)
q = Containers::PriorityQueue.new { |x, y| x < y }
# Initially all buckets to be returned are empty and so have zero sums.
rtn = Array.new(buckets.length) { [] }
buckets.each_index {|i| q.push(i, 0) }
sums = Array.new(buckets.length, 0)
# Add to emptiest bucket in descending order.
# Bang! ops would generate less garbage.
buckets.flatten.sort.reverse.each do |val|
i = q.pop # Get index of emptiest bucket
rtn[i] << val # Append current value to it
q.push(i, sums[i] += val) # Update sums and min heap
end
rtn
end
test_buckets.each {|b| pp relevel(b).map {|a| a.inject(:+) }}
Results:
[12, 11, 11, 12]
[14, 14, 14, 14]
[4, 4, 4, 4, 4]
[13, 13, 15, 14]
You could use my answer to fitting n variable height images into 3 (similar length) column layout.
Mentally map:
Object size to picture height, and
bucket count to bincount
Then the rest of that solution should apply...
The following uses the first_fit algorithm mentioned by Robin Green earlier but then improves on this by greedy swapping.
The swapping routine finds the column that is furthest away from the average column height then systematically looks for a swap between one of its pictures and the first picture in another column that minimizes the maximum deviation from the average.
I used a random sample of 30 pictures with heights in the range five to 50 'units'. The convergenge was swift in my case and improved significantly on the first_fit algorithm.
The code (Python 3.2:
def first_fit(items, bincount=3):
items = sorted(items, reverse=1) # New - improves first fit.
bins = [[] for c in range(bincount)]
binsizes = [0] * bincount
for item in items:
minbinindex = binsizes.index(min(binsizes))
bins[minbinindex].append(item)
binsizes[minbinindex] += item
average = sum(binsizes) / float(bincount)
maxdeviation = max(abs(average - bs) for bs in binsizes)
return bins, binsizes, average, maxdeviation
def swap1(columns, colsize, average, margin=0):
'See if you can do a swap to smooth the heights'
colcount = len(columns)
maxdeviation, i_a = max((abs(average - cs), i)
for i,cs in enumerate(colsize))
col_a = columns[i_a]
for pic_a in set(col_a): # use set as if same height then only do once
for i_b, col_b in enumerate(columns):
if i_a != i_b: # Not same column
for pic_b in set(col_b):
if (abs(pic_a - pic_b) > margin): # Not same heights
# new heights if swapped
new_a = colsize[i_a] - pic_a + pic_b
new_b = colsize[i_b] - pic_b + pic_a
if all(abs(average - new) < maxdeviation
for new in (new_a, new_b)):
# Better to swap (in-place)
colsize[i_a] = new_a
colsize[i_b] = new_b
columns[i_a].remove(pic_a)
columns[i_a].append(pic_b)
columns[i_b].remove(pic_b)
columns[i_b].append(pic_a)
maxdeviation = max(abs(average - cs)
for cs in colsize)
return True, maxdeviation
return False, maxdeviation
def printit(columns, colsize, average, maxdeviation):
print('columns')
pp(columns)
print('colsize:', colsize)
print('average, maxdeviation:', average, maxdeviation)
print('deviations:', [abs(average - cs) for cs in colsize])
print()
if __name__ == '__main__':
## Some data
#import random
#heights = [random.randint(5, 50) for i in range(30)]
## Here's some from the above, but 'fixed'.
from pprint import pprint as pp
heights = [45, 7, 46, 34, 12, 12, 34, 19, 17, 41,
28, 9, 37, 32, 30, 44, 17, 16, 44, 7,
23, 30, 36, 5, 40, 20, 28, 42, 8, 38]
columns, colsize, average, maxdeviation = first_fit(heights)
printit(columns, colsize, average, maxdeviation)
while 1:
swapped, maxdeviation = swap1(columns, colsize, average, maxdeviation)
printit(columns, colsize, average, maxdeviation)
if not swapped:
break
#input('Paused: ')
The output:
columns
[[45, 12, 17, 28, 32, 17, 44, 5, 40, 8, 38],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 34, 9, 37, 44, 30, 20, 28]]
colsize: [286, 267, 248]
average, maxdeviation: 267.0 19.0
deviations: [19.0, 0.0, 19.0]
columns
[[45, 12, 17, 28, 17, 44, 5, 40, 8, 38, 9],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 34, 37, 44, 30, 20, 28, 32]]
colsize: [263, 267, 271]
average, maxdeviation: 267.0 4.0
deviations: [4.0, 0.0, 4.0]
columns
[[45, 12, 17, 17, 44, 5, 40, 8, 38, 9, 34],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 37, 44, 30, 20, 28, 32, 28]]
colsize: [269, 267, 265]
average, maxdeviation: 267.0 2.0
deviations: [2.0, 0.0, 2.0]
columns
[[45, 12, 17, 17, 44, 5, 8, 38, 9, 34, 37],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 44, 30, 20, 28, 32, 28, 40]]
colsize: [266, 267, 268]
average, maxdeviation: 267.0 1.0
deviations: [1.0, 0.0, 1.0]
columns
[[45, 12, 17, 17, 44, 5, 8, 38, 9, 34, 37],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 44, 30, 20, 28, 32, 28, 40]]
colsize: [266, 267, 268]
average, maxdeviation: 267.0 1.0
deviations: [1.0, 0.0, 1.0]
Nice problem.
Heres the info on reverse-sorting mentioned in my separate comment below.
>>> h = sorted(heights, reverse=1)
>>> h
[46, 45, 44, 44, 42, 41, 40, 38, 37, 36, 34, 34, 32, 30, 30, 28, 28, 23, 20, 19, 17, 17, 16, 12, 12, 9, 8, 7, 7, 5]
>>> columns, colsize, average, maxdeviation = first_fit(h)
>>> printit(columns, colsize, average, maxdeviation)
columns
[[46, 41, 40, 34, 30, 28, 19, 12, 12, 5],
[45, 42, 38, 36, 30, 28, 17, 16, 8, 7],
[44, 44, 37, 34, 32, 23, 20, 17, 9, 7]]
colsize: [267, 267, 267]
average, maxdeviation: 267.0 0.0
deviations: [0.0, 0.0, 0.0]
If you have the reverse-sorting, this extra code appended to the bottom of the above code (in the 'if name == ...), will do extra trials on random data:
for trial in range(2,11):
print('\n## Trial %i' % trial)
heights = [random.randint(5, 50) for i in range(random.randint(5, 50))]
print('Pictures:',len(heights))
columns, colsize, average, maxdeviation = first_fit(heights)
print('average %7.3f' % average, '\nmaxdeviation:')
print('%5.2f%% = %6.3f' % ((maxdeviation * 100. / average), maxdeviation))
swapcount = 0
while maxdeviation:
swapped, maxdeviation = swap1(columns, colsize, average, maxdeviation)
if not swapped:
break
print('%5.2f%% = %6.3f' % ((maxdeviation * 100. / average), maxdeviation))
swapcount += 1
print('swaps:', swapcount)
The extra output shows the effect of the swaps:
## Trial 2
Pictures: 11
average 72.000
maxdeviation:
9.72% = 7.000
swaps: 0
## Trial 3
Pictures: 14
average 118.667
maxdeviation:
6.46% = 7.667
4.78% = 5.667
3.09% = 3.667
0.56% = 0.667
swaps: 3
## Trial 4
Pictures: 46
average 470.333
maxdeviation:
0.57% = 2.667
0.35% = 1.667
0.14% = 0.667
swaps: 2
## Trial 5
Pictures: 40
average 388.667
maxdeviation:
0.43% = 1.667
0.17% = 0.667
swaps: 1
## Trial 6
Pictures: 5
average 44.000
maxdeviation:
4.55% = 2.000
swaps: 0
## Trial 7
Pictures: 30
average 295.000
maxdeviation:
0.34% = 1.000
swaps: 0
## Trial 8
Pictures: 43
average 413.000
maxdeviation:
0.97% = 4.000
0.73% = 3.000
0.48% = 2.000
swaps: 2
## Trial 9
Pictures: 33
average 342.000
maxdeviation:
0.29% = 1.000
swaps: 0
## Trial 10
Pictures: 26
average 233.333
maxdeviation:
2.29% = 5.333
1.86% = 4.333
1.43% = 3.333
1.00% = 2.333
0.57% = 1.333
swaps: 4
Adapt the Knapsack Problem solving algorithms' by, for example, specify the "weight" of every buckets to be roughly equals to the mean of the n objects' sizes (try a gaussian distri around the mean value).
http://en.wikipedia.org/wiki/Knapsack_problem#Solving
Sort buckets in size order.
Move an object from the largest bucket into the smallest bucket, re-sorting the array (which is almost-sorted, so we can use "limited insertion sort" in both directions; you can also speed things up by noting where you placed the last two buckets to be sorted. If you have 6-6-6-6-6-6-5... and get one object from the first bucket, you will move it to the sixth position. Then on the next iteration you can start comparing from the fifth. The same goes, right-to-left, for the smallest buckets).
When the difference of the two buckets is one, you can stop.
This moves the minimum number of buckets, but is of order n^2 log n for comparisons (the simplest version is n^3 log n). If object moving is expensive while bucket size checking is not, for reasonable n it might still do:
12 7 5 2
11 7 5 3
10 7 5 4
9 7 5 5
8 7 6 5
7 7 6 6
12 7 3 1
11 7 3 2
10 7 3 3
9 7 4 3
8 7 4 4
7 7 5 4
7 6 5 5
6 6 6 5
Another possibility would be to calculate the expected average size for every bucket, and "move along" a bag (or a further bucket) with the excess from the larger buckets to the smaller ones.
Otherwise, strange things may happen:
12 7 3 1, the average is a bit less than 6, so we take 5 as the average.
5 7 3 1 bag = 7 from 1st bucket
5 5 3 1 bag = 9
5 5 5 1 bag = 7
5 5 5 8 which is a bit unbalanced.
By taking 6 (i.e. rounding) it goes better, but again sometimes it won't work:
12 5 3 1
6 5 3 1 bag = 6 from 1st bucket
6 6 3 1 bag = 5
6 6 6 1 bag = 2
6 6 6 3 which again is unbalanced.
You can run two passes, the first with the rounded mean left-to-right, the other with the truncated mean right-to-left:
12 5 3 1 we want to get no more than 6 in each bucket
6 11 3 1
6 6 8 1
6 6 6 3
6 6 6 3 and now we want to get at least 5 in each bucket
6 6 4 5 (we have taken 2 from bucket #3 into bucket #5)
6 5 5 5 (when the difference is 1 we stop).
This will require "n log n" size checks, and no more than 2n object moves.
Another possibility which is interesting is to reason thus: you have m objects into n buckets. So you need to do an integer mapping of m onto n, and this is Bresenham's linearization algorithm. Run a (n,m) Bresenham on the sorted array, and at step i (i.e. against bucket i-th) the algorithm will tell you whether to use round(m/n) or floor(m/n) size. Then move objects from or to the "moving bag" according to bucket i-th size.
This requires n log n comparisons.
You can further reduce the number of object moves by initially removing all buckets that are either round(m/n) or floor(m/n) in size to two pools of buckets sized R or F. When, running the algorithm, you need the i-th bucket to hold R objects, if the pool of R objects is not empty, swap the i-th bucket with one of the R-sized ones. This way, only buckets that are hopelessly under- or over-sized get balanced; (most of) the others are simply ignored, except for their references being shuffled.
If object access time is huge in proportion to computation time (e.g. some kind of automatic loader magazine), this will yield a magazine that is as balanced as possible, with the absolute minimum of overall object moves.
You could use an Integer Programming Package if it's fast enough.
It may be tricky getting your constraints right. Something like the following may do the trick:
let variable Oij denote Object i being in Bucket j. Let Wi represent the weight or size of Oi
Constraints:
sum(Oij for all j) == 1 #each object is in only one bucket
Oij = 1 or 0. #object is either in bucket j or not in bucket j
sum(Oij * Wi for all i) <= X + R #restrict weight on buckets.
Objective:
minimize X
Note R is the relaxation constant that you can play with depending on how much movement is required and how much performance is needed.
Now the maximum bucket size is X + R
The next step is to figure out the minimum amount movement possible whilst keeping the bucket size less than X + R
Define a Stay variable Si that controls if Oi stays in bucket j
If Si is 0 it indicates that Oi stays where it was.
Constraints:
Si = 1 or 0.
Oij = 1 or 0.
Oij <= Si where j != original bucket of Object i
Oij != Si where j == original bucket of Object i
Sum(Oij for all j) == 1
Sum(Oij for all i) <= X + R
Objective:
minimize Sum(Si for all i)
Here Sum(Si for all i) represents the number of objects that have moved.

What is the best way to find the period of a (repeating) list in Mathematica?

What is the best way to find the period in a repeating list?
For example:
a = {4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2}
has repeat {4, 5, 1, 2, 3} with the remainder {4, 5, 1, 2} matching, but being incomplete.
The algorithm should be fast enough to handle longer cases, like so:
b = RandomInteger[10000, {100}];
a = Join[b, b, b, b, Take[b, 27]]
The algorithm should return $Failed if there is no repeating pattern like above.
Please see the comments interspersed with the code on how it works.
(* True if a has period p *)
testPeriod[p_, a_] := Drop[a, p] === Drop[a, -p]
(* are all the list elements the same? *)
homogeneousQ[list_List] := Length#Tally[list] === 1
homogeneousQ[{}] := Throw[$Failed] (* yes, it's ugly to put this here ... *)
(* auxiliary for findPeriodOfFirstElement[] *)
reduce[a_] := Differences#Flatten#Position[a, First[a], {1}]
(* the first element occurs every ?th position ? *)
findPeriodOfFirstElement[a_] := Module[{nl},
nl = NestWhileList[reduce, reduce[a], ! homogeneousQ[#] &];
Fold[Total#Take[#2, #1] &, 1, Reverse[nl]]
]
(* the period must be a multiple of the period of the first element *)
period[a_] := Catch#With[{fp = findPeriodOfFirstElement[a]},
Do[
If[testPeriod[p, a], Return[p]],
{p, fp, Quotient[Length[a], 2], fp}
]
]
Please ask if findPeriodOfFirstElement[] is not clear. I did this independently (for fun!), but now I see that the principle is the same as in Verbeia's solution, except the problem pointed out by Brett is fixed.
I was testing with
b = RandomInteger[100, {1000}];
a = Flatten[{ConstantArray[b, 1000], Take[b, 27]}];
(Note the low integer values: there will be lots of repeating elements within the same period *)
EDIT: According to Leonid's comment below, another 2-3x speedup (~2.4x on my machine) is possible by using a custom position function, compiled specifically for lists of integers:
(* Leonid's reduce[] *)
myPosition = Compile[
{{lst, _Integer, 1}, {val, _Integer}},
Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0},
For[i = 1, i <= Length[lst], i++,
If[lst[[i]] == val, pos[[++ctr]] = i]
];
Take[pos, ctr]
],
CompilationTarget -> "C", RuntimeOptions -> "Speed"
]
reduce[a_] := Differences#myPosition[a, First[a]]
Compiling testPeriod gives a further ~20% speedup in a quick test, but I believe this will depend on the input data:
Clear[testPeriod]
testPeriod =
Compile[{{p, _Integer}, {a, _Integer, 1}},
Drop[a, p] === Drop[a, -p]]
Above methods are better if you have no noise. If your signal is only approximate then Fourier transform methods might be useful. I'll illustrate with a "parametrized" setup wherein the length and number of repetitions of the base signal, the length of the trailing part, and a bound on the noise perturbation are all variables one can play with.
noise = 20;
extra = 40;
baselen = 103;
base = RandomInteger[10000, {baselen}];
repeat = 5;
signal = Flatten[Join[ConstantArray[base, repeat], Take[base, extra]]];
noisysignal = signal + RandomInteger[{-noise, noise}, Length[signal]];
We compute the absolute value of the FFT. We adjoin zeros to both ends. The object will be to threshold by comparing to neighbors.
sigfft = Join[{0.}, Abs[Fourier[noisysignal]], {0}];
Now we create two 0-1 vectors. In one we threshold by making a 1 for each element in the fft that is greater than twice the geometric mean of its two neighbors. In the other we use the average (arithmetic mean) but we lower the size bound to 3/4. This was based on some experimentation. We count the number of 1s in each case. Ideally we'd get 100 for each, as that would be the number of nonzeros in a "perfect" case of no noise and no tail part.
In[419]:=
thresh1 =
Table[If[sigfft[[j]]^2 > 2*sigfft[[j - 1]]*sigfft[[j + 1]], 1,
0], {j, 2, Length[sigfft] - 1}];
count1 = Count[thresh1, 1]
thresh2 =
Table[If[sigfft[[j]] > 3/4*(sigfft[[j - 1]] + sigfft[[j + 1]]), 1,
0], {j, 2, Length[sigfft] - 1}];
count2 = Count[thresh2, 1]
Out[420]= 114
Out[422]= 100
Now we get our best guess as to the value of "repeats", by taking the floor of the total length over the average of our counts.
approxrepeats = Floor[2*Length[signal]/(count1 + count2)]
Out[423]= 5
So we have found that the basic signal is repeated 5 times. That can give a start toward refining to estimate the correct length (baselen, above). To that end we might try removing elements at the end and seeing when we get ffts closer to actually having runs of four 0s between nonzero values.
Something else that might work for estimating number of repeats is finding the modal number of zeros in run length encoding of the thresholded ffts. While I have not actually tried that, it looks like it might be robust to bad choices in the details of how one does the thresholding (mine were just experiments that seem to work).
Daniel Lichtblau
The following assumes that the cycle starts on the first element and gives the period length and the cycle.
findCyclingList[a_?VectorQ] :=
Module[{repeats1, repeats2, cl, cLs, vec},
repeats1 = Flatten#Differences[Position[a, First[a]]];
repeats2 = Flatten[Position[repeats1, First[repeats1]]];
If[Equal ## Differences[repeats2] && Length[repeats2] > 2(*
is potentially cyclic - first element appears cyclically *),
cl = Plus ### Partition[repeats1, First[Differences[repeats2]]];
cLs = Partition[a, First[cl]];
If[SameQ ## cLs (* candidate cycles all actually the same *),
vec = First[cLs];
{Length[vec], vec}, $Failed], $Failed] ]
Testing
b = RandomInteger[50, {100}];
a = Join[b, b, b, b, Take[b, 27]];
findCyclingList[a]
{100, {47, 15, 42, 10, 14, 29, 12, 29, 11, 37, 6, 19, 14, 50, 4, 38,
23, 3, 41, 39, 41, 17, 32, 8, 18, 37, 5, 45, 38, 8, 39, 9, 26, 33,
40, 50, 0, 45, 1, 48, 32, 37, 15, 37, 49, 16, 27, 36, 11, 16, 4, 28,
31, 46, 30, 24, 30, 3, 32, 31, 31, 0, 32, 35, 47, 44, 7, 21, 1, 22,
43, 13, 44, 35, 29, 38, 31, 31, 17, 37, 49, 22, 15, 28, 21, 8, 31,
42, 26, 33, 1, 47, 26, 1, 37, 22, 40, 27, 27, 16}}
b1 = RandomInteger[10000, {100}];
a1 = Join[b1, b1, b1, b1, Take[b1, 23]];
findCyclingList[a1]
{100, {1281, 5325, 8435, 7505, 1355, 857, 2597, 8807, 1095, 4203,
3718, 3501, 7054, 4620, 6359, 1624, 6115, 8567, 4030, 5029, 6515,
5921, 4875, 2677, 6776, 2468, 7983, 4750, 7609, 9471, 1328, 7830,
2241, 4859, 9289, 6294, 7259, 4693, 7188, 2038, 3994, 1907, 2389,
6622, 4758, 3171, 1746, 2254, 556, 3010, 1814, 4782, 3849, 6695,
4316, 1548, 3824, 5094, 8161, 8423, 8765, 1134, 7442, 8218, 5429,
7255, 4131, 9474, 6016, 2438, 403, 6783, 4217, 7452, 2418, 9744,
6405, 8757, 9666, 4035, 7833, 2657, 7432, 3066, 9081, 9523, 3284,
3661, 1947, 3619, 2550, 4950, 1537, 2772, 5432, 6517, 6142, 9774,
1289, 6352}}
This case should fail because it isn't cyclical.
findCyclingList[Join[b, Take[b, 11], b]]
$Failed
I tried to something with Repeated, e.g. a /. Repeated[t__, {2, 100}] -> {t} but it just doesn't work for me.
Does this work for you?
period[a_] :=
Quiet[Check[
First[Cases[
Table[
{k, Equal ## Partition[a, k]},
{k, Floor[Length[a]/2]}],
{k_, True} :> k
]],
$Failed]]
Strictly speaking, this will fail for things like
a = {1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5}
although this can be fixed by using something like:
(Equal ## Partition[a, k]) && (Equal ## Partition[Reverse[a], k])
(probably computing Reverse[a] just once ahead of time.)
I propose this. It borrows from both Verbeia and Brett's answers.
Do[
If[MatchQ ## Equal ## Partition[#, i, i, 1, _], Return ## i],
{i, #[[ 2 ;; Floor[Length##/2] ]] ~Position~ First##}
] /. Null -> $Failed &
It is not quite as efficient as Vebeia's function on long periods, but it is faster on short ones, and it is simpler as well.
I don't know how to solve it in mathematica, but the following algorithm (written in python) should work. It's O(n) so speed should be no concern.
def period(array):
if len(array) == 0:
return False
else:
s = array[0]
match = False
end = 0
i = 0
for k in range(1,len(array)):
c = array[k]
if not match:
if c == s:
i = 1
match = True
end = k
else:
if not c == array[i]:
match = False
i += 1
if match:
return array[:end]
else:
return False
# False
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2,1]))
# [4, 5, 1, 2, 3]
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2]))
# False
print(period([4]))
# [4, 2]
print(period([4,2,4]))
# False
print(period([4,2,1]))
# False
print(period([]))
Ok, just to show my own work here:
ModifiedTortoiseHare[a_List] := Module[{counter, tortoise, hare},
Quiet[
Check[
counter = 1;
tortoise = a[[counter]];
hare = a[[2 counter]];
While[(tortoise != hare) || (a[[counter ;; 2 counter - 1]] != a[[2 counter ;; 3 counter - 1]]),
counter++;
tortoise = a[[counter]];
hare = a[[2 counter]];
];
counter,
$Failed]]]
I'm not sure this is a 100% correct, especially with cases like {pattern,pattern,different,pattern, pattern} and it gets slower and slower when there are a lot of repeating elements, like so:
{ 1,2,1,1, 1,2,1,1, 1,2,1,1, ...}
because it is making too many expensive comparisons.
#include <iostream>
#include <vector>
using namespace std;
int period(vector<int> v)
{
int p=0; // period 0
for(int i=p+1; i<v.size(); i++)
{
if(v[i] == v[0])
{
p=i; // new potential period
bool periodical=true;
for(int i=0; i<v.size()-p; i++)
{
if(v[i]!=v[i+p])
{
periodical=false;
break;
}
}
if(periodical) return p;
i=p; // try to find new period
}
}
return 0; // no period
}
int main()
{
vector<int> v3{1,2,3,1,2,3,1,2,3};
cout<<"Period is :\t"<<period(v3)<<endl;
vector<int> v0{1,2,3,1,2,3,1,9,6};
cout<<"Period is :\t"<<period(v0)<<endl;
vector<int> v1{1,2,1,1,7,1,2,1,1,7,1,2,1,1};
cout<<"Period is :\t"<<period(v1)<<endl;
return 0;
}
This sounds like it might relate to sequence alignment. These algorithms are well studied, and might already be implemented in mathematica.

Resources