Find max value 2d array N*N with fewer comparisons - algorithm

I want to find the maximum value in two dimensional array N*N in C with fewer comparisons. I can do it simply with an O(N^2) algorithm, but I think it is too slow.
So, I thought about another way. I simply loop once and search by row and column at the same time, and try to reduce the complexity. (I guess O(2(n-1))) You can see in this picture what I'm trying to do.
I use the same loop to check the content of the columns and the rows.
What I want to know is there anything faster? Like Sort the 2D array with O(N log N) complexity? Assume the values are unsorted.

If the 2d array of M x M elements is not sorted in any way, then you're not going to do better than O(M^2).
Keep in mind that the matrix has M^2 elements, so sorting them will have complexity of O(M^2 log M^2), since most decent sorts are O(N log N) and here N = M^2.

Divide it up into [no, of cores] chunks. Get max. of each chunk in parallel. Pick the bones out of the results.

You could probably just cast the array to a 1D array and iterate over the flattened pointer...
I'll explain:
As you probably know, a 2D Array in the memory is stored in a flat state. The Array char c[4][2] looks like this:
| c[0][0] | | c[0][1] | | c[1][0] | | c[1][1] | | c[2][0] | ...
| Byte 1 | | Byte 2 | | Byte 3 | | Byte 4 | | Byte 5 | ...
In this example, c[1][1] == ((char*)c)[3].
For this reason, when all members are of the same type, it's possible to safely cast a 2D array to a 1D array, i.e.
int my_array[20][20];
for (int i = 0; i < 400 ; i++) {
((int *)(my_array))[i] = i;
}
// my_array[19][0] == 180;
As dbush points out (up vote his answer), If your matrix is M x M elements, then M^2 is the best you're going to get and flattening the array this way simply saves you from copying the memory over before any operations.
EDIT
Someone asked why casting the array to a 1D array might be better.
The idea is to avoid a nested inner loop, making the optimizer's work easier. It is more likely that the compiler will unroll the loop if it's only a single dimension loop and the array's size is fixed.

dbush certainly has the right answer in terms of complexity.
It should also be noted that if you want "faster" in terms of actual run time (not just complexity), you need to consider caching. Going down the rows and columns in parallel is very bad for data locality, and you will incur a cache miss when you iterate down a column if your data has relatively large rows. You have to touch every element at least once in order to find the max, and it would be fastest to touch them in a "row major" ordering.

Related

How to "sort" elements of 2 possible values in place in linear time? [duplicate]

This question already has answers here:
Stable separation for two classes of elements in an array
(3 answers)
Closed 9 years ago.
Suppose I have a function f and array of elements.
The function returns A or B for any element; you could visualize the elements this way ABBAABABAA.
I need to sort the elements according to the function, so the result is: AAAAAABBBB
The number of A values doesn't have to equal the number of B values. The total number of elements can be arbitrary (not fixed). Note that you don't sort chars, you sort objects that have a single char representation.
Few more things:
the sort should take linear time - O(n),
it should be performed in place,
it should be a stable sort.
Any ideas?
Note: if the above is not possible, do you have ideas for algorithms sacrificing one of the above requirements?
If it has to be linear and in-place, you could do a semi-stable version. By semi-stable I mean that A or B could be stable, but not both. Similar to Dukeling's answer, but you move both iterators from the same side:
a = first A
b = first B
loop while next A exists
if b < a
swap a,b elements
b = next B
a = next A
else
a = next A
With the sample string ABBAABABAA, you get:
ABBAABABAA
AABBABABAA
AAABBBABAA
AAAABBBBAA
AAAAABBBBA
AAAAAABBBB
on each turn, if you make a swap you move both, if not you just move a. This will keep A stable, but B will lose its ordering. To keep B stable instead, start from the end and work your way left.
It may be possible to do it with full stability, but I don't see how.
A stable sort might not be possible with the other given constraints, so here's an unstable sort that's similar to the partition step of quick-sort.
Have 2 iterators, one starting on the left, one starting on the right.
While there's a B at the right iterator, decrement the iterator.
While there's an A at the left iterator, increment the iterator.
If the iterators haven't crossed each other, swap their elements and repeat from 2.
Lets say,
Object_Array[1...N]
Type_A objs are A1,A2,...Ai
Type_B objs are B1,B2,...Bj
i+j = N
FOR i=1 :N
if Object_Array[i] is of Type_A
obj_A_count=obj_A_count+1
else
obj_B_count=obj_B_count+1
LOOP
Fill the resultant array with obj_A and obj_B with their respective counts depending on obj_A > obj_B
The following should work in linear time for a doubly-linked list. Because up to N insertion/deletions are involved that may cause quadratic time for arrays though.
Find the location where the first B should be after "sorting". This can be done in linear time by counting As.
Start with 3 iterators: iterA starts from the beginning of the container, and iterB starts from the above location where As and Bs should meet, and iterMiddle starts one element prior to iterB.
With iterA skip over As, find the 1st B, and move the object from iterA to iterB->previous position. Now iterA points to the next element after where the moved element used to be, and the moved element is now just before iterB.
Continue with step 3 until you reach iterMiddle. After that all elements between first() and iterB-1 are As.
Now set iterA to iterB-1.
Skip over Bs with iterB. When A is found move it to just after iterA and increment iterA.
Continue step 6 until iterB reaches end().
This would work as a stable sort for any container. The algorithm includes O(N) insertion/deletion, which is linear time for containers with O(1) insertions/deletions, but, alas, O(N^2) for arrays. Applicability in you case depends on whether the container is an array rather than a list.
If your data structure is a linked list instead of an array, you should be able to meet all three of your constraints. You just skim through the list and accumulating and moving the "B"s will be trivial pointer changes. Pseudo code below:
sort(list) {
node = list.head, blast = null, bhead = null
while(node != null) {
nextnode = node.next
if(node.val == "a") {
if(blast != null){
//move the 'a' to the front of the 'B' list
bhead.prev.next = node, node.prev = bhead.prev
blast.next = node.next, node.next.prev = blast
node.next = bhead, bhead.prev = node
}
}
else if(node.val == "b") {
if(blast == null)
bhead = blast = node
else //accumulate the "b"s..
blast = node
}
3
node = nextnode
}
}
So, you can do this in an array, but the memcopies, that emulate the list swap, will make it quiet slow for large arrays.
Firstly, assuming the array of A's and B's is either generated or read-in, I wonder why not avoid this question entirely by simply applying f as the list is being accumulated into memory into two lists that would subsequently be merged.
Otherwise, we can posit an alternative solution in O(n) time and O(1) space that may be sufficient depending on Sir Bohumil's ultimate needs:
Traverse the list and sort each segment of 1,000,000 elements in-place using the permutation cycles of the segment (once this step is done, the list could technically be sorted in-place by recursively swapping the inner-blocks, e.g., ABB AAB -> AAABBB, but that may be too time-consuming without extra space). Traverse the list again and use the same constant space to store, in two interval trees, the pointers to each block of A's and B's. For example, segments of 4,
ABBAABABAA => AABB AABB AA + pointers to blocks of A's and B's
Sequential access to A's or B's would be immediately available, and random access would come from using the interval tree to locate a specific A or B. One option could be to have the intervals number the A's and B's; e.g., to find the 4th A, look for the interval containing 4.
For sorting, an array of 1,000,000 four-byte elements (3.8MB) would suffice to store the indexes, using one bit in each element for recording visited indexes during the swaps; and two temporary variables the size of the largest A or B. For a list of one billion elements, the maximum combined interval trees would number 4000 intervals. Using 128 bits per interval, we can easily store numbered intervals for the A's and B's, and we can use the unused bits as pointers to the block index (10 bits) and offset in the case of B (20 bits). 4000*16 bytes = 62.5KB. We can store an additional array with only the B blocks' offsets in 4KB. Total space under 5MB for a list of one billion elements. (Space is in fact dependent on n but because it is extremely small in relation to n, for all practical purposes, we may consider it O(1).)
Time for sorting the million-element segments would be - one pass to count and index (here we can also accumulate the intervals and B offsets) and one pass to sort. Constructing the interval tree is O(nlogn) but n here is only 4000 (0.00005 of the one-billion list count). Total time O(2n) = O(n)
This should be possible with a bit of dynamic programming.
It works a bit like counting sort, but with a key difference. Make arrays of size n for both a and b count_a[n] and count_b[n]. Fill these arrays with how many As or Bs there has been before index i.
After just one loop, we can use these arrays to look up the correct index for any element in O(1). Like this:
int final_index(char id, int pos){
if(id == 'A')
return count_a[pos];
else
return count_a[n-1] + count_b[pos];
}
Finally, to meet the total O(n) requirement, the swapping needs to be done in a smart order. One simple option is to have recursive swapping procedure that doesn't actually perform any swapping until both elements would be placed in correct final positions. EDIT: This is actually not true. Even naive swapping will have O(n) swaps. But doing this recursive strategy will give you absolute minimum required swaps.
Note that in general case this would be very bad sorting algorithm since it has memory requirement of O(n * element value range).

Data structure to find integers within a query range efficiently

There is an arbitrary amount of distinct unsigned integer values within a known range.
The number of integer values is << the number of integers within the range.
I want to build a data structure which allows the following runtime complexities:
Insertion in O(1)
After insertion is done:
Deletion in O(1)
Get all values within a query range in O(k) with k being the number of result values (returned values do not have to be sorted)
Memory complexity is not restricted. However, an astronomically large amount of memory is not available ;-)
Here is an example:
range = [0, 1023]
insert 42
insert 350
insert 729
insert 64
insert 1
insert 680
insert 258
find values in [300;800] ; returns {350, 729, 680}
delete 350
delete 680
find values in [35;1000] ; returns {42, 258, 64, 729, 258}
delete 42
delete 258
find values in [0; 5] ; returns {1}
delete 1
Is such a data structure even possible? (with the aid of look-up tables etc)?
An approximation I thought about would be:
Bin the inserted values into buckets. 0..31 => bucket 0, 32..63 => bucket 1, 64..95 => bucket 2, 96..127 => bucket 3, ...
Insertion: find bucket id using simple shifting arithmetic, then insert it into an array per bucket
Find: find bucket id of start and endpoint using shifting arithmetic. Look through all values in the first and last bucket and check if they are within the range or outside the range. Add all values in all intermediate buckets to the search result
Delete: find bucket id using shifting. Swap value to delete with last value in bucket, then decrement count for this bucket.
Downside: if there are many queries which query a range which has a span of less than 32 values, the whole bucket will be searched every time.
Downside 2: if there are empty buckets within the range, they will also be visited during the search phase.
Theoretically speaking, a van Emde Boas tree is your best bet, with O(log log M)-time operations where M is the size of the range. The space usage is quite large, though there are more efficient variants.
Actually the theoretical state of the art is described in the paper On Range Reporting in One Dimension, by Mortensen, Pagh, and Patrascu.
I'm not sure if the existing lower bounds rule out O(1), but M won't be large enough to make the distinction matter. Instead of the vEB structure, I would just use a k-ary trie with k a power of two like 32 or 64.
EDIT: here's one way to do range search with a trie.
Let's assume each datum is a bit pattern (easy enough; that's how the CPU think of it). Each subtree consists of all of the nodes with a certain prefix. For example, {0000, 0011, 0101, 1001} is represented by the following 4-ary trie, where X denotes a null pointer.
+---+---+---+---+
|00\|01\|10\|11X|
+--|+--|+--|+---+
| | |
| | +----------------------------+
+--+ | |
| +------------+ |
| | |
v v v
+---+---+---+---+ +---+---+---+---+ +---+---+---+---+
|00\|01X|10X|11\| |00X|01\|10X|11X| |00X|01\|10X|11X|
+--|+---+---+--|+ +---+--|+---+---+ +---+--|+---+---+
| | | |
v v v v
0000 0011 0101 1001
A couple optimizations quickly become apparent. First, if all of the bit patterns are the same length, then we don't need to store them at the leaves—they can be reconstructed from the descent path. All we need is the bitmap, which if k is the number of bits in a machine word, fits nicely where the pointer from the previous level used to be.
+--------+--------+--------+--------+
|00(1001)|01(0100)|10(0100)|11(0000)|
+--------+--------+--------+--------+
In order to search the trie for a range like [0001, 1000], we start at the root, determine which subtrees might intersect the range and recurse on them. In this example, the relevant children of the root are 00, 01, and 10. The relevant children of 00 are the subtrees representing the prefixes 0001, 0010, and 0011.
For k fixed, reporting from a k-ary trie is O(log M + s), where M is the number of bit patterns and s is the number of hits. Don't be fooled though—when k is medium, each node occupies a couple cache lines but the trie isn't very high, so the number of cache misses is pretty small.
You could achieve your target (O(1),O(1) and O(k)) if the query operation required that it be told the value of at least one existing member that is already in the relevant range (the lower bound perhaps). Can you provide a guarantee that you will already know at least one member of the range? I guess not. I will expand if you can.
I'll now focus on the problem as specified. Each number in the data structure should form part of a linked list, such that each number knows the next highest number that is in the data structure. In C++
struct Number {
struct Number *next_highest;
int value;
};
Obviously, the largest value in the set will have next_highest==NULL, but otherwise this->value < this->next_highest->value
To add or remove or query, we need to be able to find the existing Numbers which are close to a particular lookup value.
set<Number *, specialized_comparator_to_compare_on_value_t >
Insertion and deletion would be O(log(N)), and query would be O(log(N)+k). N is the number of values currently in the set, which as you say will be much less than M (the number of possible values of the relevant datatype). Therefore log(N) < log(M). But in practice, other methods should also be considered, such as tries and such datastructures.

Given a file, find the ten most frequently occurring words as efficiently as possible

This is apparently an interview question (found it in a collection of interview questions), but even if it's not it's pretty cool.
We are told to do this efficiently on all complexity measures. I thought of creating a HashMap that maps the words to their frequency. That would be O(n) in time and space complexity, but since there may be lots of words we cannot assume that we can store everything in memory.
I must add that nothing in the question says that the words cannot be stored in memory, but what if that were the case? If that's not the case, then the question does not seem as challenging.
Optimizing for my own time:
sort file | uniq -c | sort -nr | head -10
Possibly followed by awk '{print $2}' to eliminate the counts.
I think the trie data structure is a choice.
In the trie, you can record word count in each node representing frequency of word consisting of characters on the path from root to current node.
The time complexity to setup the trie is O(Ln) ~ O(n) (where L is number of characters in the longest word, which we can treat as a constant). To find the top 10 words, we can traversal the trie, which also costs O(n). So it takes O(n) to solve this problem.
An complete solution would be something like this:
Do an external sort O(N log N)
Count the word freq in the file O(N)
(An alternate would be the use of a Trie as #Summer_More_More_Tea to count the frequencies, if you can afford that amount of memory) O(k*N) //for the two first steps
Use a min-heap:
Put the first n elements on the heap
For every word left add it to the heap and delete the new min in heap
In the end the heap Will contain the n-th most common words O(|words|*log(n))
With the Trie the cost would be O(k*N), because the number of total words generally is bigger than the size of the vocabulary. Finally, since k is smaller for most of the western languages you could assume a linear complexity.
I have done in C# like this(a sample)
int wordFrequency = 10;
string words = "hello how r u u u u u u u u u u u u u u u u u u ? hello there u u u u ! great to c u there. hello .hello hello hello hello hello .hello hello hello hello hello hello ";
var result = (from word in words.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)
group word by word into g
select new { Word = g.Key, Occurance = g.Count() }).ToList().FindAll(i => i.Occurance >= wordFrequency);
Let's say we assign a random prime number to each of the 26 alphabets. Then we scan the file. Whenever we find a word, we calculate its hash value(formula based on the positon & the value of the alphabets making the word). If we find this value in the hash table, then we know for sure that we are not encountering it for the first time and we increment its key value. And maintain a array of maximum 10. But If we encounter a new hash , then we store the file pointer for that hash value, and initialize the key to 0.
I think this is a typical application of counting sort since the sum of occurrences of each word is equal to the total number of words. A hash table with a counting sort should do the job in a time proportional to the number of words.
You could make a time/space tradeoff and go O(n^2) for time and O(1) for (memory) space by counting how many times a word occurs each time you encounter it in a linear pass of the data. If the count is above the top 10 found so far, then keep the word and the count, otherwise ignore it.
Says building a Hash and sorting the values is best. I'm inclined to agree.
http://www.allinterview.com/showanswers/56657.html
Here is a Bash implementation that does something similar...I think
http://www.commandlinefu.com/commands/view/5994/computes-the-most-frequent-used-words-of-a-text-file
Depending on the size of the input data, it may or may not be a good idea to keep a HashMap. Say for instance, our hash-map is too big to fit into main memory. This can cause a very high number of memory transfers as most hash-map implementations need random access and would not be very good on the cache.
In such cases sorting the input data would be a better solution.
Cycle through the string of words and store each in a dictionary(using python) and number of times they occur as the value.
If the word list will not fit in memory, you can split the file until it will. Generate a histogram of each part (either sequentially or in parallel), and merge the results (the details of which may be a bit fiddly if you want guaranteed correctness for all inputs, but should not compromise the O(n) effort, or the O(n/k) time for k tasks).
A Radix tree or one of it's variations will generally allow you to save storage space by collapsing common sequences.
Building it will take O(nk) - where k is "the maximum length of all strings in the set".
step 1 : If the file is very large and can't be sorted in memory you can split it into chunks that can be sorted in memory.
step 2 : For each sorted chunk compute sorted pairs of (words, nr_occurrence), at his point you can renounce to the chunks because you need only the sorted pairs.
step 3 : Iterate over the chunks and sort the chunks and always keep the top ten appearances.
Example:
Step 1:
a b a ab abb a a b b c c ab ab
split into :
chunk 1: a b a ab
chunk 2: abb a a b b
chunk 3: c c ab ab
Step 2:
chunk 1: a2, b1, ab1
chunk 2: a2, b2, abb1
chunk 3: c2, ab2
Step 3(merge the chunks and keep the top ten appearances):
a4 b3 ab3 c2 abb1
int k = 0;
int n = i;
int j;
string[] stringList = h.Split(" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries);
int m = stringList.Count();
for (j = 0; j < m; j++)
{
int c = 0;
for (k = 0; k < m; k++)
{
if (string.Compare(stringList[j], stringList[k]) == 0)
{
c = c + 1;
}
}
}
Not the most efficient CPU-wise, and UGLY, but it took only 2 minutes to bang out:
perl -lane '$h{$_}++ for #F; END{for $w (sort {$h{$b}<=>$h{$a}} keys %h) {print "$h{$w}\t$w"}}' file | head
Loop over each line with -n
Split each line into #F words with -a
Each $_ word increments hash %h
Once the END of file has been reached,
sort the hash by the frequency
Print the frequency $h{$w} and the word $w
Use bash head to stop at 10 lines
Using the text of this web page as input:
121 the
77 a
48 in
46 to
44 of
39 at
33 is
30 vote
29 and
25 you
I benchmarked this solution vs the top-rated shell solution (ben jackson) on a 3.3GB text file with 580,000,000 words.
Perl 5.22 completed in 171 seconds, while the shell solution completed in 474 seconds.

Fastest sorting algorithm for a specific situation

What is the fastest sorting algorithm for a large number (tens of thousands) of groups of 9 positive double precision values, where each group must be sorted individually? So it's got to sort fast a small number of possibly repeated double precision values, many times in a row.
The values are in the [0..1] interval. I don't care about space complexity or stability, just about speed.
Sorting each group individually, merge sort would probably be easiest to implement with good results.
A sorting network would probably be the fastest solution:
http://en.wikipedia.org/wiki/Sorting_network
Good question because this comes down to "Fastest way to sort an array of 9 elements", and most comparisons between and analysis of sorting methods are about large N. I assume the 'groups' are clearly defined and don't play a real role here.
You will probably have to benchmark a few candidates because a lot of factors (locality) come into play here.
In any case, making it parallel sounds like a good idea. Use Parallel.For() if you can use ,NET4.
I think you will need to try out a few examples to see what works best, as you have an unusual set of conditions. My guess that the best will be one of
sorting network
insertion sort
quick sort (one level -- insertion sort below)
merge sort
Given that double precision number are relatively long I suspect you will not do better with a radix sort, but feel free to add it in.
For what its worth, Java uses quick sort on doubles until the number of items to be sorted drops below 7, at which is uses insertion sort. The third option mimics that solution.
Also your overall problem is embarrassingly parallel so you want to make use of parallelism when possible. The problem looks too small for a distributed solution (more time would be lost in networking than saved), but if set up right, your problem can make use of multiple cores very effectively.
It looks like you want the most cycle-stingy way to sort 9 values. Since the number of values is limited, I would (as Kathy suggested) first do an unrolled insertion sort on the first 4 elements and the second 5 elements. Then I would merge those two groups.
Here's an unrolled insertion sort of 4 elements:
if (u[1] < u[0]) swap(u[0], u[1]);
if (u[2] < u[0]) swap(u[0], u[2]);
if (u[3] < u[0]) swap(u[0], u[3]);
if (u[2] < u[1]) swap(u[1], u[2]);
if (u[3] < u[1]) swap(u[1], u[3]);
if (u[3] < u[2]) swap(u[2], u[3]);
Here's a merge loop. The first set of 4 elements is in u, and the second set of 5 elements in in v. The result is in r.
i = j = k = 0;
while(i < 4 && j < 5){
if (u[i] < v[j]) r[k++] = u[i++];
else if (v[j] < u[i]) r[k++] = v[j++];
else {
r[k++] = u[i++];
r[k++] = v[j++];
}
}
while (i < 4) r[k++] = u[i++];
while (j < 5) r[k++] = v[j++];

Create a sorted array out of 2 arrays

There are 2 arrays given for ex.
A = [20,4,21,6,3]
B = [748,32,48,92,23......]
assuming B is very large and can hold all the elements of array A.
Find the way in which array B is in (containing all the elements of array A as well) sorted order.
Design an algorithm in the most efficient way.
This sounds like merge sort algorithm. You will find tons of examples here. You can then modify to suit.
Given that your array is integer array, you can use Radix sort algorithm to sort B in linear time, O(n). Wikipedia has a nice write-up and sample python code.
Radix sort is linear with respect to the number of elements. While it also has a dependence on the size of the array, you take it as a constant; just like you take the comparison operator to be constant as well. When sorting bignum for instance, the comparison operator would also depend on the integer size!
Smells like homework. Basically, write into array B starting from the end, keeping track of the place you are reading from in both A and B.
Just try it out :
Merge the array A into B .
Use quick sort algorithm.
Before adding elements from A to B check if by doing that if you exceed the size of the array, if not Merge A into B
Then do a quick sort,
But if you just want to merge both arrays into a new arrays where new array has the combine length of both. Here is a jump start for you, try if you can go forward from here...
public double[] combineArrays(double[] first, double[] second) {
int totalLenth = first.length + second.length;
double[] newDoubles = new double[totalLenth];
for (int i = 0; i < first.length; i++) {
newDoubles[i] = first[i];
}
for (int j = first.length; j < newDoubles.length; j++) {
newDoubles[j] = second[j - first.length];
}
return newDoubles;
}
Hope this helps, Good Luck.
You can also modify an insertion sort idea:
0) do all necessary tests: if arrays are null, if bigger array has enough space
1) add small array at the end of the big array
2) do normal insertion sort, but start it at the beginning of the small array
here if you do quick_sort or some other "quickiest" O(n*log_n) sort, the problem is that you are not using the fact, that both array are sorted. With the insertion sort you are using the fact, that array B is sorted (but not the fact that A is sorted, so maybe we should develop the idea and modify insertion sort to use that fact as well).

Resources