From the Wikipedia page for block sort I figured out that block sort works by dividing the initial array into small subarrays of length 16 for example, sorting all those subarrays in O(n) time, then merging all these blocks in a way I can't understand.
For example, considering an array of length 16, dividing it in 4 block, each of length 4, and sorting those blocks, we get:
10 1 8 3 4 19 20 13 14 17 8 9 12 18 7 20
10 1 8 3 ----- 4 19 20 13 ----- 14 17 8 9 ----- 12 18 7 20
1 3 8 10 ----- 4 13 19 20 ----- 8 9 14 17 ----- 7 12 18 20
Can anyone please explain me how does merge step works?
Usually merge sort goes even further and splits the array in blocks of 2. To merge, it creates a pointer to the begging of both blocks and compares their values. It picks the smaller and increments the corresponding pointer.
1 4 5 ...
^
2 3 4 ...
^
Pick 1, because its smaller, and update pointer
1 4 5 ...
^
2 3 4 ...
^
Pick 2
1 4 5 ...
^
2 3 4 ...
^
Pick 3 and so on....
These values are put on an array which is gonna be compared with another array created with the same technique. And it goes on and on merging until all the members are sorted. I'm not considering the whole lot of optimizations that you could do in a real merge algorithm.
The first thing of block sort merging is to extract buffers. That is the only thing I know a lot about, and it starts like this. Find the square root of the array's length, and find that many unique values in the beginning and end. Using either rotations or reversals, you can put them all in the beginning and end. Then, I don't know how to merge the other stuff.
Related
Let's say we have an array of size N with values from 1 to N inside it. We want to check if this array has any duplicates. My friend suggested two ways that I showed him were wrong:
Take the sum of the array and check it against the sum 1+2+3+...+N. I gave the example 1,1,4,4 which proves that this way is wrong since 1+1+4+4 = 1+2+3+4 despite there being duplicates in the array.
Next he suggested the same thing but with multiplication. i.e. check if the product of the elements in the array is equal to N!, but again this fails with an array like 2,2,3,2, where 2x2x3x2 = 1x2x3x4.
Finally, he suggested doing both checks, and if one of them fails, then there is a duplicate in the array. I can't help but feel that this is still incorrect, but I can't prove it to him by giving him an example of an array with duplicates that passes both checks. I understand that the burden of proof lies with him, not me, but I can't help but want to find an example where this doesn't work.
P.S. I understand there are many more efficient ways to solve such a problem, but we are trying to discuss this particular approach.
Is there a way to prove that doing both checks doesn't necessarily mean there are no duplicates?
Here's a counterexample: 1,3,3,3,4,6,7,8,10,10
Found by looking for a pair of composite numbers with factorizations that change the sum & count by the same amount.
I.e., 9 -> 3, 3 reduces the sum by 3 and increases the count by 1, and 10 -> 2, 5 does the same. So by converting 2,5 to 10 and 9 to 3,3, I leave both the sum and count unchanged. Also of course the product, since I'm replacing numbers with their factors & vice versa.
Here's a much longer one.
24 -> 2*3*4 increases the count by 2 and decreases the sum by 15
2*11 -> 22 decreases the count by 1 and increases the sum by 9
2*8 -> 16 decreases the count by 1 and increases the sum by 6.
We have a second 2 available because of the factorization of 24.
This gives us:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24
Has the same sum, product, and count of elements as
1,3,3,4,4,5,6,7,9,10,12,13,14,15,16,16,17,18,19,20,21,22,22,23
In general you can find these by finding all factorizations of composite numbers, seeing how they change the sum & count (as above), and choosing changes in both directions (composite <-> factors) that cancel out.
I've just wrote a simple not very effective brute-force function. And it shows that there is for example
1 2 4 4 4 5 7 9 9
sequence that has the same sum and product as
1 2 3 4 5 6 7 8 9
For n = 10 there are more such sequences:
1 2 3 4 6 6 6 7 10 10
1 2 4 4 4 5 7 9 9 10
1 3 3 3 4 6 7 8 10 10
1 3 3 4 4 4 7 9 10 10
2 2 2 3 4 6 7 9 10 10
My write-only c++ code is here: https://ideone.com/2oRCbh
What is an efficient algorithm to replace the values in an image while
minimizing the largest value and maintaining order?
Background
I have a 8.5Gb image which is represented as a rows and columns.
Suppose we have a smaller version (there are no duplicates in input):
4, 5, 9,
2, 3, 7,
8, 6, 1
I need to replace the entries at each pixel to the smallest positive value possible (greater than zero) in the entire matrix
while preserving the row-wise and column-wise ordering.
One possible output (duplicates allowed here) is the following and the maximum value is 5 ( I do not believe we can reduce it to 4):
2, 3, 4,
1, 2, 3,
5, 4, 1
The reason it works:
Input: First Row: 4 < 5 < 9 and first Column: 4 > 2 < 8
Output: First Row: 2 < 3 < 4 and First Column 2 > 1 < 5 (column)
The orderings are being maintained. The same for the other rows and columns:
5 > 3 < 6 <=> 3 > 2 < 4
...
...
----------------------------------------- Attempt: My wrong algorithm -----------------------------------------
1. Each row and column will contain unique elements. So start with the first row and assign integers from the range {1, total the number of rows}:
1 2 3
x x x
x x x
The maximum in that row is currently at 3.
2. Go to the next row which is 2,3,7 and again assign numbers in the range {1, total number of rows}. When we assign 1 we look at all the previous rows if there are conflicts. In this case 1 is already present in the previous row. And we need a number which is smaller than 1. So place a zero there (I will offset every entries by on later).
1 2 3
0 1 2
* * *
The maximum in that row is currently 2.
3. Go to the next row and again fill as above. But 1 already occurred before and we need a number larger than the first and second rows:
So, try 2. The next number needs to be larger than 2 and 1 (column) and smaller than 2 (row). That is a huge problem. I need to change too many cells each time.
For severe clarity, I'll add 10 to each of your values.
Input Ordering
14 15 19 - - -
12 13 17 - - -
18 16 11 - - -
Consider each of the values in order, smallest to largest. Each element receives an ordering value that is the smallest integer available at that location. "Available" means that the assigned number is larger than any in the same row or column.
11 and 12 aren't in the same row or column, so we can assign both of those immediately.
Input Ordering
14 15 19 - - -
12 13 17 1 - -
18 16 11 - - 1
When we consider 13, we see that it is in the same row with a 1, so it must have the next larger value:
Input Ordering
14 15 19 - - -
12 13 17 1 2 -
18 16 11 - - 1
14 has the same problem, being above a 1:
Input Ordering
14 15 19 2 - -
12 13 17 1 2 -
18 16 11 - - 1
Continue this process for each number. Take the maximum of the orderings in that number's row and column. Add 1 and assign that ordering.
Input Ordering
14 15 19 2 3 -
12 13 17 1 2 -
18 16 11 - 4 1
Input Ordering
14 15 19 2 3 4
12 13 17 1 2 3
18 16 11 5 4 1
There's a solution. The "dominance" path 18 > 16 > 15 > [14 or 13] > 12 demonstrates that 5 is the lowest max value.
You can also solve this by converting the locations to a directed graph. Nodes in the same row or column have an edge connecting them; the edge is directed from the smaller to the larger. It will be sufficient to order the values and merely connect the adjacent values: given 14->15 and 15->19, we don't need 14->19 as well.
Add a node 0 with label 0 and an edge to each node that has no other input edges.
Now follow a typical labeling iteration: any node with all its inputs labeled receives a label that is one more than the largest of its inputs.
This is the same algorithm as the above, but the correctness and minimalism are much easier to see.
14 -> 15 -> 19
12 -> 13 -> 17
11 -> 16 -> 18
12 -> 14 -> 18
13 -> 15 -> 16
11 -> 17 -> 19
0 -> 11
0 -> 12
Now, if we shake out the topology of this, starting on the left, we get:
0 11 13 17
12 14 15 16 18
19
This makes the numbering obvious: each node is labeled with the length of its longest path from the start node.
Your memory problem should be edited into your question proposal, or given as a new question. You have non-trivial dependencies along rows and columns. If your data do not fit into memory, then you may want to make a disk-hosted data base to store your pre-processed data. For instance, you could store the graph as a list of edges keyed by dependencies:
11 none
12 none
13 12
14 12
15 13, 14
16 11, 15
17 11, 13
18 14, 16
19 15, 17
You haven't described the shape of your data. At the very worst, you should be able to build this graph data base with one pass to do the rows, and then one pass per column -- or multiple columns in each pass, depending on how many you can fit into memory at once.
Then you can apply the algorithm to the items int he data base. You can speed it up if you keep in memory, not only all nodes with no dependencies, but another list with few dependencies -- "few" being dependent on your memory availability.
For instance, make one pass over the data base to grab every cell with 0 or 1 dependencies. Put the independent nodes in your "active" list; as you process those, add nodes only from the "1-dependency" list as they're freed up. Once you've exhausted those sub-graphs, then make a large pass to (1) update the data base; (2) extract the next sets of nodes with 0 or 1 dependency.
Let's look at this with the example you gave. First, we make a couple of lists from the original graph:
0-dep 11, 12
1-dep 13 (on 12), 14 (on 12)
This pass is trivial: we assign 1 to cells 11 and 12; 2 to cells 13 and 14. Now update the graph:
node dep done (assigned values)
15 none 2, 2
16 15 1
17 none 1, 2
18 16 2
19 15, 17
Refresh the in-memory lists:
0-dep 15, 17
1-dep 16 (on 15), 18 (on 16)
On this pass, both 15 and 17 depend on a node with value 2, so they are both assigned 3. Resolving 15 frees node 16, which gets value 4. This, in turn, frees up node 18, which gets the value 5.
In one final pass, we now have node 19 with no outstanding dependencies. it's maximum upstream value is 3, so it gets the value 4.
In the worst case -- you can't even hold all independent nodes in memory at once -- you can still grab as many as you can fit, assign their values in an in-memory pass, and return to the disk for more to process.
Can you handle the data manipulations from here?
I am trying to do http://www.spoj.com/problems/FIBTWIST/ problem by linear recursion. However, since the constraints are large I have to use matrix exponentiation.
I have read http://zobayer.blogspot.in/2010/11/matrix-exponentiation.html
so according to it equations formed are
ft(n)=ft(n-1)+ft(n-2)+g(n) ft(0)=0, ft(0)=1
g(n) =g(n-1)+1 g(1)=0
But now I am confused how to form matrices A and B of the form A*M=B. It is given as Type 7 in mentioned blogspot link but I am having difficulty in understanding it.
Define a third sequence, fut, Fibonacci-untwist, as
fut(n)=ft(n)+(n+2).
Then
fut(n)=ft(n)+n+1=ft(n-1)+ft(n-2)+(n-1)+(n+2)=fut(n-2)+fut(n-1)
So fut is just another solution of the Fibonacci recursion, and thus
fut(n)=f(n-1)*fut(0)+f(n)*fut(1)=2*f(n-1)+4*f(n)=2*f(n)+2*f(n+1)=2*f(n+2)
and finally
ft(n)=2*f(n+2)-(n+2)
Test:
f(n): 0 1 1 2 3 5 8 13 21 34
2*f(n+2): 2 4 6 10 16 26 42 68
n+2: 2 3 4 5 6 7 8 9
ft(n): 0 1 2 5 10 19 34 59
and really, the last row is the difference of the second and third row.
Quick question about Polyphase sorting. Do you write the files sequentially? Like so:
F1: 10 13 7 8 9 4 3 17 18 2
F2: -
F3: -
F4: -
This:
F1: -
F2: 10 13 7 8
F3: 9 4 3 17
F4: 18 2
or do you alternate? vs. this?
F1: -
F2: 10 8 3 2
F3: 13 9 17
F4: 7 4 18
or does it not matter?
For any particular set of inputs, your choice of how you split up the initial input will definitely influence the number of passes you need to make in some cases, but in the general sense, it should not matter - some will be slightly better with one method over the other, some will be slightly worse. In the end the average case over all possible inputs will be the same.
However, what the second method has going for it is that you do not have to pre-scan the input to count elements in order to figure out how many elements to put in each bucket.
The Build-Heap algorithm given in CLRS
BUILD-MAX-HEAP(A)
1 heap-size[A] ← length[A]
2 for i ← ⌊length[A]/2⌋ downto 1
3 do MAX-HEAPIFY(A, i)
It produces only One of several possible cases.Are there other algorithms which would yield a different case than that of the above algorithm.
For input array
A={4,1,3,2,16,9,10,14,8,7}
Build-Heap produces A={16,14,10,8,7,9,3,2,4,1} which satisfies heap property.
May be this is the most efficient algorithm to build a heap out of an array but there are several other permutations of the array which also have the heap property.
When i generated all permutations of the array and performed a test for heap property.I got 3360 permutations of the array which had the heap property.
Count1 16 9 14 4 8 10 3 2 1 7
Count2 16 9 14 4 8 10 3 1 2 7
Count3 16 9 14 4 8 10 2 1 3 7
Count4 16 9 14 4 8 10 2 3 1 7
Count5 16 9 14 4 8 10 7 2 1 3
Count6 16 9 14 4 8 10 7 2 3 1
Count7 16 9 14 4 8 10 7 1 3 2
Count8 16 9 14 4 8 10 7 1 2 3
Count9 16 9 14 4 8 10 7 3 1 2
Count10 16 9 14 4 8 10 7 3 2 1
...........................................................
Count3358 16 8 14 7 4 9 10 2 1 3
Count3359 16 8 14 7 4 9 10 3 2 1
Count3360 16 8 14 7 4 9 10 3 1 2
So is there a different build-heap algorithm which would give an output which differs from that of the above algorithm or which gives some of the 3360 possible outcomes?
Once we have used the build-heap to get an array which satisfies the heap property.How can we generate maximum number of other cases using this array.We can swap the leaf nodes of the heap to generate some of the cases.Is there any other way to get more possible cases without checking all permutations for heap property test?
Given the range of values in the array and all values being distinct.Can we say anything about the total number of possible cases that will satisfy the heap property?
Any heap building algorithm will be sensitive to the order in which items are inserted. Even the Build-Heap algorithm will generate a different heap if you give it the same elements, but in a different order.
Remember that when you're building a heap, the partially-built part must maintain the heap property after each insertion. So that's going to limit the different permutations that can be generated by any particular algorithm.
Given a heap, it's fairly easy to generate at least some of the permitted permutations.
A node doesn't care about the relative size of its two child nodes. Therefore, you can swap the children of any node, then do a sift-up on the smaller of the two to ensure that the heap property is maintained for that subtree (i.e., if it's smaller than one of its sub-nodes, swap it with that sub-node, and continue doing the same down that path until it gets to a spot where it's larger than either sub-node, or it's moved close enough to the end of the array that it's a leaf node.