Compare all elements inside a 2D array with each other - algorithm

I have a perfectly square 64x64 2D array of integers that will never have a value greater than 64. I was wondering if there is a really fast way to compare all of the elements with each other and display the ones that are the same, in a unique way.
At the current moment I have this
2D int array named array
loop from i = 0 to 64
loop from j = 0 to 64
loop from k = (j+1) to 64
loop from z = 0 to 64
if(array[i][j] == array[k][z])
print "element [i][j] is same as [k][z]
As you see having 4 nested loops is quite a stupid thing that I would like not to use. Language does not matter at all whatsoever, I am just simply curious to see what kind of cool solutions it is possible to use. Since value inside any integer will not be greater than 64, I guess you can only use 6 bits and transform array into something fancier. And that therefore would require less memory and would allow for some really fancy bitwise operations. Alas I am not quite knowledgeable enough to think in that format, and therefore would like to see what you guys can come up with.
Thanks to anyone in advance for a really unique solution.

There's no need to sort the array via an O(m log m) algorithm; you can use an O(m) bucket sort. (Letting m = n*n = 64*64).
An easy O(m) method using lists is to set up an array H of n+1 integers, initialized to -1; also allocate an array L of m integers each, to use as list elements. For the i'th array element, with value A[i], set k=A[i] and L[i]=H[k] and H[k]=i. When that's done, each H[k] is the head of a list of entries with equal values in them. For 2D arrays, treat array element A[i,j] as A[i+n*(j-1)].
Here's a python example using python lists, with n=7 for ease of viewing results:
import random
n = 7
m = n*n
a=[random.randint(1,n) for i in range(m)]
h=[[] for i in range(n+1)]
for i in range(m):
k = a[i]
h[k].append(i)
for i in range(1,n+1):
print 'With value %2d: %s' %(i, h[i])
Its output looks like:
With value 1: [1, 19, 24, 28, 44, 45]
With value 2: [3, 6, 8, 16, 27, 29, 30, 34, 42]
With value 3: [12, 17, 21, 23, 32, 41, 47]
With value 4: [9, 15, 36]
With value 5: [0, 4, 7, 10, 14, 18, 26, 33, 38]
With value 6: [5, 11, 20, 22, 35, 37, 39, 43, 46, 48]
With value 7: [2, 13, 25, 31, 40]

class temp {
int i, j;
int value;
}
then fill your array in class temp array[64][64], then sort it by value (you can do this in Java by implementing a comparable interface). Then the equal element should be after each other and you can extract i,j for each other.
This solution would be optimal, categorizing as a quadratic approach for big-O notation.

Use quicksort on the array, then iterate through the array, storing a temporary value of the "cursor" (current value you're looking at), and determine if the temporary value is the same as the next cursor.
array[64][64];
quicksort(array);
temp = array[0][0];
for x in array[] {
for y in array[][] {
if(temp == array[x][y]) {
print "duplicate found at x,y";
}
temp = array[x][y];
}
}

Related

Divide and conquer algorithm to find one element that isn't repeated

If I have a sorted list of integers where every element except one is repeated, how would I find the singleton element in less than O(n) time?
For example: (−2, −2, 5, 5, 5, 67, 67, 72, 80, 80, 80, 80) would return 72.
I'm fairly certain binary search is involved in this, but not sure how to implement it. I'm just looking for the pseudocode here.
I'm thinking to iterate through the list, and binary search for the last occurrence of the current element. If the index of that is the same as the one we're currently on, that's the singleton element. If not, keep going. That would be O(nlogn) I think.
If each integer can be repeated an arbitrary number of times, the best algorithm would be O(n) as there is no way to avoid iterating through every integer. Simply iterate through the list and keep a counter of how many of the same integer has been found in a row. If the counter is only one and a new integer is discovered, then terminate, as we have found the non-repeating integer.
If we know that all numbers are repeated the same number of times (except for the one which is not repeated), we can use binary search to achieve even better time complexity. However, based on your example problem, it looks like this is not the case.
An O(n) Python implementation:
def find_singletons(items):
singletons = []
a, b = items[:2]
for item in items[2:]:
if a != b and b != item:
singletons.append(b)
a, b = b, item
return singletons
items = [-2, -2, 5, 5, 5, 67, 67, 72, 80, 80, 80, 80]
print(find_singletons(items))
# [72]
Another O(n) Python implementation (using a counter):
def find_singletons2(items):
singletons = []
count = 1
last_item = items[0]
for item in items[1:]:
if last_item != item:
if count == 1:
singletons.append(last_item)
count = 1
else:
count += 1
last_item = item
return singletons
items = [-2, -2, 5, 5, 5, 67, 67, 72, 80, 80, 80, 80]
print(find_singletons(items))
# [72]

Algorithm to efficiently select rows from a matrix such that column totals are equal

The practical application of this problem is group assignment in a psychology study, but the theoretical formulation is this:
I have a matrix (the actual matrix is 27x72, but I'll pick a 4x8 as an example):
1 0 1 0
0 1 0 1
1 1 0 0
0 1 1 0
0 0 1 1
1 0 1 0
1 1 0 0
0 1 0 1
I want to pick half of the rows out of this matrix such that the column totals are equal (thus effectively creating two matrices with equivalent column totals). I cannot rearrange values within the rows.
I have tried some brute force solutions, but my matrix is too large for that to be effective, even having chosen some random restrictions first. It seems to me that the search space could be constrained with a better algorithm, but I haven't been able to think of one thus far. Any ideas? It is also possible that there is no solution, so an algorithm would have to be able to deal with that. I have been working in R, but I could switch to python easily.
Update
Found a solution thanks to ljeabmreosn. Karmarkar-Karp worked great for an algorithm, and converting the rows to base 73 was inspired. I had a surprising hard time finding code that would actually give me the sub-sequences rather than just the final difference (maybe most people are only interested in this problem in the abstract?). Anyway this was the code:
First I converted my rows in to base 73 as the poster suggested. To do this I used the basein package in python, defining an alphabet with 73 characters and then using the basein.decode function to convert to decimel.
For the algorithm, I just added code to print the sub-sequence indices from this mailing list message from Tim Peters: https://mail.python.org/pipermail/tutor/2001-August/008098.html
from __future__ import nested_scopes
import sys
import bisect
class _Num:
def __init__(self, value, index):
self.value = value
self.i = index
def __lt__(self, other):
return self.value < other.value
# This implements the Karmarkar-Karp heuristic for partitioning a set
# in two, i.e. into two disjoint subsets s.t. their sums are
# approximately equal. It produces only one result, in O(N*log N)
# time. A remarkable property is that it loves large sets: in
# general, the more numbers you feed it, the better it does.
class Partition:
def __init__(self, nums):
self.nums = nums
sorted = [_Num(nums[i], i) for i in range(len(nums))]
sorted.sort()
self.sorted = sorted
def run(self):
sorted = self.sorted[:]
N = len(sorted)
connections = [[] for i in range(N)]
while len(sorted) > 1:
bigger = sorted.pop()
smaller = sorted.pop()
# Force these into different sets, by "drawing a
# line" connecting them.
i, j = bigger.i, smaller.i
connections[i].append(j)
connections[j].append(i)
diff = bigger.value - smaller.value
assert diff >= 0
bisect.insort(sorted, _Num(diff, i))
# Now sorted contains only 1 element x, and x.value is
# the difference between the subsets' sums.
# Theorem: The connections matrix represents a spanning tree
# on the set of index nodes, and any tree can be 2-colored.
# 2-color this one (with "colors" 0 and 1).
index2color = [None] * N
def color(i, c):
if index2color[i] is not None:
assert index2color[i] == c
return
index2color[i] = c
for j in connections[i]:
color(j, 1-c)
color(0, 0)
# Partition the indices by their colors.
subsets = [[], []]
for i in range(N):
subsets[index2color[i]].append(i)
return subsets
if not sys.argv:
print "error no arguments provided"
elif sys.argv[1]:
f = open(sys.argv[1], "r")
x = [int(line.strip()) for line in f]
N = 50
import math
p = Partition(x)
s, t = p.run()
sum1 = 0L
sum2 = 0L
for i in s:
sum1 += x[i]
for i in t:
sum2 += x[i]
print "Set 1:"
print s
print "Set 2:"
print t
print "Set 1 sum", repr(sum1)
print "Set 2 sum", repr(sum2)
print "difference", repr(abs(sum1 - sum2))
This gives the following output:
Set 1:
[0, 3, 5, 6, 9, 10, 12, 15, 17, 19, 21, 22, 24, 26, 28, 31, 32, 34, 36, 38, 41, 43, 45, 47, 48, 51, 53, 54, 56, 59, 61, 62, 65, 66, 68, 71]
Set 2:
[1, 2, 4, 7, 8, 11, 13, 14, 16, 18, 20, 23, 25, 27, 29, 30, 33, 35, 37, 39, 40, 42, 44, 46, 49, 50, 52, 55, 57, 58, 60, 63, 64, 67, 69, 70]
Set 1 sum 30309344369339288555041174435706422018348623853211009172L
Set 2 sum 30309344369339288555041174435706422018348623853211009172L
difference 0L
Which provides the indices of the proper subsets in a few seconds. Thanks everybody!
Assuming each entry in the matrix can either be 0 or 1, this problem seems to be in the same family as the Partition Problem which only has a pseudo-polynomial time algorithm. Let r be the number of rows in the matrix and c be the number of columns in the matrix. Then, encode each row to a c-digit number of base r+1. This is to ensure when adding each encoding, there is no need to carry, thus equivalent numbers in this base will equate to two sets of rows whose column sums are equivalent. So in your example, you would convert each row into a 4-digit number of base 9. This would yield the numbers (converted into base 10):
10109 => 73810
01019 => 8210
11009 => 81010
01109 => 9010
00119 => 1010
10109 => 73810
11009 => 81010
01019 => 8210
Although you probably couldn't use the pseudo-polynomial time algorithm with this method, you could use a simple heuristic with some decision trees to try to speed up the bruteforce. Using the numbers above, you could try to use the Karmarkar-Karp heuristic. Implemented below is the first step of algorithm in Python 3:
# Sorted (descending) => 810, 810, 738, 738, 90, 82, 82, 10
from queue import PriorityQueue
def karmarkar_karp_partition(arr):
pqueue = PriorityQueue()
for e in arr:
pqueue.put_nowait((-e, e))
for _ in range(len(arr)-1):
_, first = pqueue.get_nowait()
_, second = pqueue.get_nowait()
diff = first - second
pqueue.put_nowait((-diff, diff))
return pqueue.get_nowait()[1]
Here is the algorithm fully implemented. Note that this method is simply a heuristic and may fail to find the best partition.

Insert a number into an ordered array

I have an array of numbers sorted either in ascending or descending order, and I want to find the index at which to insert a number while preserving the order of the array. If the array is [1, 5, 7, 11, 51] and the number to insert is 9, I would be expecting 3 so I could do [1, 5, 7, 11, 51].insert(3, 9). If the array is [49, 32, 22, 11, 10, 8, 3, 2] and the number to be inserted is 9, I would be expecting 5 so I could do [49, 32, 22, 11, 10, 8, 3, 2].insert(5, 9)
What would be the best/cleanest way to find the index at which to insert 9 in either of these two arrays while preserving the sorting of the array?
I wrote this code that works, but it's not very pretty:
array = [55, 33, 10, 7, 1]
num_to_insert = 9
index_to_insert = array[0..-2].each_with_index.map do |n, index|
range = [n, array[index.next]].sort
index.next if num_to_insert.between?(range[0], range[1])
end.compact.first
index_to_insert # => 3
Wand Maker's answer isn't bad, but it has two problems:
It sorts the entire array to determine whether it's ascending or descending. That's silly when all you have to do is find one element that's not equal to the one before it compare the first and last elements to determine this. That's O(n) O(1) in the worst case instead of O(n log n).
It uses Array#index when it should use bsearch. We can do a binary search instead of iterating over the whole array because it's sorted. That's O(log n) in the worst case instead of O(n).
I found it was clearer to split it into two methods, but you could of course turn it into one:
def search_proc(ary, n)
case ary.first <=> ary.last
when 1 then ->(idx) { n > ary[idx] }
when -1 then ->(idx) { n < ary[idx] }
else raise "Array neither ascending nor descending"
end
end
def find_insert_idx(ary, n)
(0...ary.size).bsearch(&search_proc(ary, n))
end
p find_insert_idx([1, 5, 7, 11, 51], 9)
#=> 3
p find_insert_idx([49, 32, 22, 11, 10, 8, 3, 2], 9)
#=> 5
(I use Range#bsearch here. Array#bsearch works the same, but it was more convenient to use a range to return an index, and more efficient since otherwise we'd have to do each_with_index.to_a or something.)
This is not a good way, but perhaps cleaner since you can use the method insert_sorted(number) on either an ascending or descending array without bothering about the index it will be placed on:
module SortedInsert
def insert_index(number)
self.each_with_index do |element, index|
if element > number && ascending?
return index
end
if element < number && descending?
return index
end
end
length
end
def insert_sorted(number)
insert(insert_index(number), number)
end
def ascending?
first <= last
end
def descending?
!ascending?
end
end
Use it on a array as follows:
array = [2, 61, 12, 7, 98, 64]
ascending = array.sort
descending = array.sort.reverse
ascending.extend SortedInsert
descending.extend SortedInsert
number_to_insert = 3
puts "Descending: "
p number_to_insert
p descending
p descending.insert_sorted(number_to_insert)
puts "Ascending: "
p number_to_insert
p ascending
p ascending.insert_sorted(number_to_insert)
This will give:
Descending:
3
[98, 64, 61, 12, 7, 2]
[98, 64, 61, 12, 7, 3, 2]
Ascending:
3
[2, 7, 12, 61, 64, 98]
[2, 3, 7, 12, 61, 64, 98]
Notes:
The module defines a few methods that will be added to the specific Array object alone.
The new methods provides a sorted array (either ascending/descending) a method insert_sorted(number) which enables to insert the number at sorted position.
In case the position of insertion is required, there is a method for that too: insert_index(number), which will provide the index to which the number needs to be inserted so that the resultant array remains sorted.
Caveat: The module assumes the array being extended is sorted either as ascending or descending.
Here is the simplest way I can think of doing.
def find_insert_idx(ary, n)
is_asc = (ary.sort == ary)
if (is_asc)
return ary.index { |i| i > n }
else
return ary.index { |i| i < n }
end
end
p find_insert_idx([1,5,7,11,51], 9)
#=> 3
p find_insert_idx([49,32,22,11,10,8,3,2], 9)
#=> 5

Splitting a string of numbers with different digit sizes in Ruby

I'm trying to figure out if there's a way to split a string that contains numbers with different digit sizes without having to use if/else statements. Is there an outright method for doing so. Here is an example string:
"123456789101112131415161718192021222324252627282930"
So that it would be split into an array containing 1-9 and 10-30 without having to first split the array into single digits, separate it, find the 9, and iterate through combining every 2 elements after the 9.
Here is the current way I would go about doing this to clarify:
single_digits, double_digits = [], []
string = "123456789101112131415161718192021222324252627282930".split('')
single_digits << string.slice!(0,9)
single_digits.map! {|e| e.to_i}
string.each_slice(2) {|num| double_digits << num.join.to_i}
This would give me:
single_digits = [1,2,3,4,5,6,7,8,9]
double_digits = [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
As long as you can be sure that every number is greater than its predecessor and greater than zero, and every length of number from a single digit to the maximum is represented at least once, you could write this
def split_numbers(str)
numbers = []
current = 0
str.each_char do |ch|
current = current * 10 + ch.to_i
if numbers.empty? or current > numbers.last
numbers << current
current = 0
end
end
numbers << current if current > 0
numbers
end
p split_numbers('123456789101112131415161718192021222324252627282930')
output
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
For Anon's example of 192837453572 we get
[1, 9, 28, 37, 45, 357, 2]
Go through each character of the string, collecting single 'digits', until you find a 9 (set a controlling value and increment it by 1), then continue on collecting two digits, until you find 2 consecutive 9's, and continue on.
This can then be written to handle any sequence of numbers such as your example string.
You could do this:
str = "123456789101112131415161718192021222324252627282930"
result = str[0..8].split('').map {|e| e.to_i }
result += str[9..-1].scan(/../).map {|e| e.to_i }
It's essentially the same solution as yours, but slightly cleaner (no need to combine the pairs of digits). But yeah, if you want a generalizable solution to an arbitrary length string (including more than just 2 digits), that's a different question than what you seem to be asking.
UPDATE:
Well, I haven't been able to get this question out of my mind, because it seems like there could be a simple, generalizable solution. So here's my attempt. The basic idea is to keep a counter so that you know how many digits the number you want to slice out of the string is.
str = "123456789101112131415161718192021222324252627282930"
result = []
i = 1
done = str.length < 1
str_copy = str
while !done do
result << str_copy.slice!(0..i.to_s.size-1).to_i
done = true if str_copy.size == 0
i += 1
end
puts result
This generates the desired output, and is generalizable to a string of consecutive positive integers starting with 1. I'd be very interested to see other people's improvements to this -- it's not super succinct

Efficient data structure for a list of index sets

I am trying to explain by example:
Imagine a list of numbered elements E = [elem0, elem1, elem2, ...].
One index set could now be {42, 66, 128} refering to elements in E. The ordering in this set is not important, so {42, 66, 128} == {66, 128, 42}, but each element is at most once in any given index set (so it is an actual set).
What I want now is a space efficient data structure that gives me another ordered list M that contains index sets that refer to elements in E. Each index set in M will only occur once (so M is a set in this regard) but M must be indexable itself (so M is a List in this sense, whereby the precise index is not important). If necessary, index sets can be forced to all contain the same number of elements.
For example, M could look like:
0: {42, 66, 128}
1: {42, 66, 9999}
2: {1, 66, 9999}
I could now do the following:
for(i in M[2]) { element = E[i]; /* do something with E[1],E[66],and E[9999] */ }
You probably see where this is going: You may now have another map M2 that is an ordered list of sets pointing into M which ultimately point to elements in E.
As you can see in this example, index sets can be relatively similar (M[0] and M[1] share the first two entries, M[1] and M[2] share the last two) which makes me think that there must be something more efficient than the naive way of using an array-of-sets. However, I may not be able to come up with a good global ordering of index entries that guarantee good "sharing".
I could think of anything ranging from representing M as a tree (where M's index comes from the depth-first search ordering or something) to hash maps of union-find structures (no idea how that would work though:)
Pointers to any textbook datastructure for something like this are highly welcome (is there anything in the world of databases?) but I also appreciate if you propose a "self-made" solution or only random ideas.
Space efficiency is important for me because E may contain thousands or even few million elements, (some) index sets are potentially large, similarities between at least some index sets should be substantial, and there may be multiple layers of mappings.
Thanks a ton!
You may combine all numbers from M and remove duplicates and name it as UniqueM.
All M[X] collections convert to bit masks. For example int value may store 32 numbers (To support of unlimited count you should store array of ints, if array size is 10 totally we can store 320 different elements). long type may store 64 bits.
E: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
M[0]: {6, 8, 1}
M[1]: {2, 8, 1}
M[2]: {6, 8, 5}
Will be converted to:
UniqueM: {6, 8, 1, 2, 5}
M[0]: 11100 {this is 7}
M[1]: 01110 {this is 14}
M[2]: 11001 {this is 19}
Note:
Also you may combine my and ring0 approaches, instead of rearrange E make new UniqueM and use intervals inside it.
It will be pretty hard to beat an index. You could save some space by using the right data type (eg in gnu C, short if less than 64k elements in E, int if < 4G...).
Besides,
Since you say the order in E is not important, you could sort E a way it maximizes the consecutive elements to match as much as possible the Ms.
For instance,
E: { 1,2,3,4,5,6,7,8 }
0: {1,3,5,7}
1: {1,3,5,8}
2: {3,5,7,8}
By re-arranging E
E: { 1,3,5,7,8,2,4,6 }
and using E indexes, not values, you could define the Ms based on subsets of E, giving indexes
0: {0-3} // E[0]: 1, E[1]: 3, E[2]: 5, E[3]: 7 etc...
1: {0-2,4}
2: {1-3,4}
this way
you use indexes instead of the raw numbers (indexes are usually smaller, no negative..)
the Ms are made of sub-sets, 0-3 meaning 0,1,2,3,
The difficult part is to make the algorithm to re-arrange E so that you maximize the subsets sizes - minimize the Ms sizes.
E rearrangement algo suggestion
sort all Ms
process all Ms:
algo to build a map, which gives for an element 'x' its list of neighbors 'y', along with points, number of times 'y' is just after 'x'
Map map (x,y) -> z
for m in Ms
for e,f in m // e and f are consecutive elements
if ( ! map(e,f)) map(e,f) = 1
else map(e,f)++
rof
rof
Get E rearranged
ER = {} // E rearranged
Map mas = sort_map(map) // mas(x) -> list(y) where 'y' are sorted desc based on 'z'
e = get_min_elem(mas) // init with lowest element (regardless its 'z' scores)
while (mas has elements)
ER += e // add element e to ER
f = mas(e)[0] // get most likely neighbor of e (in f), ie first in the list
if (empty(mas(e))
e = get_min_elem(mas) // Get next lowest remaining value
else
delete mas(e)[0] // set next e neighbour in line
e = f
fi
elihw
The algo (map) should be O(n*m) space, with n elements in E, m elements in all Ms.
Bit arrays may be used. They're arrays of elements a[i] which are 1 if i is in set and 0 if i is not in set. So every set would occupy exactly size(E) bits even if it contain a few or no members. Not so space efficient, but if you compress this array with some compression algorithm it will be much less in size (possibly reaching ultimate entropy limit). So you can try dynamic Markov coder or RLE or group Huffman and choose one most efficient for you. Then, iteration process could include on-the-fly decompression followed by linear scanning for 1 bits. For looong 0 runs you could modify decompression algorithm to detect such cases (RLE is simplest case for it).
If you found sets having small defference, you may store sets A and A xor B anstead of A and B saving space for common parts. In this case to iterate over B you'll have to unpack both A and A xor B then xor them.
Another useful solution:
E: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
M[0]: {1, 2, 3, 4, 5, 10, 14, 15}
M[1]: {1, 2, 3, 4, 5, 11, 14, 15}
M[2]: {1, 2, 3, 4, 5, 12, 13}
Cache frequently used items:
Cache[1] = {1, 2, 3, 4, 5}
Cache[2] = {14, 15}
Cache[3] = {-2, 7, 8, 9} //Not used just example.
M[0]: {-1, 10, -2}
M[1]: {-1, 11, -2}
M[2]: {-1, 12, 13}
Mark links to cached list as negative numbers.

Resources