How to replace entries with smaller values while keeping order? - algorithm

What is an efficient algorithm to replace the values in an image while
minimizing the largest value and maintaining order?
Background
I have a 8.5Gb image which is represented as a rows and columns.
Suppose we have a smaller version (there are no duplicates in input):
4, 5, 9,
2, 3, 7,
8, 6, 1
I need to replace the entries at each pixel to the smallest positive value possible (greater than zero) in the entire matrix
while preserving the row-wise and column-wise ordering.
One possible output (duplicates allowed here) is the following and the maximum value is 5 ( I do not believe we can reduce it to 4):
2, 3, 4,
1, 2, 3,
5, 4, 1
The reason it works:
Input: First Row: 4 < 5 < 9 and first Column: 4 > 2 < 8
Output: First Row: 2 < 3 < 4 and First Column 2 > 1 < 5 (column)
The orderings are being maintained. The same for the other rows and columns:
5 > 3 < 6 <=> 3 > 2 < 4
...
...
----------------------------------------- Attempt: My wrong algorithm -----------------------------------------
1. Each row and column will contain unique elements. So start with the first row and assign integers from the range {1, total the number of rows}:
1 2 3
x x x
x x x
The maximum in that row is currently at 3.
2. Go to the next row which is 2,3,7 and again assign numbers in the range {1, total number of rows}. When we assign 1 we look at all the previous rows if there are conflicts. In this case 1 is already present in the previous row. And we need a number which is smaller than 1. So place a zero there (I will offset every entries by on later).
1 2 3
0 1 2
* * *
The maximum in that row is currently 2.
3. Go to the next row and again fill as above. But 1 already occurred before and we need a number larger than the first and second rows:
So, try 2. The next number needs to be larger than 2 and 1 (column) and smaller than 2 (row). That is a huge problem. I need to change too many cells each time.

For severe clarity, I'll add 10 to each of your values.
Input Ordering
14 15 19 - - -
12 13 17 - - -
18 16 11 - - -
Consider each of the values in order, smallest to largest. Each element receives an ordering value that is the smallest integer available at that location. "Available" means that the assigned number is larger than any in the same row or column.
11 and 12 aren't in the same row or column, so we can assign both of those immediately.
Input Ordering
14 15 19 - - -
12 13 17 1 - -
18 16 11 - - 1
When we consider 13, we see that it is in the same row with a 1, so it must have the next larger value:
Input Ordering
14 15 19 - - -
12 13 17 1 2 -
18 16 11 - - 1
14 has the same problem, being above a 1:
Input Ordering
14 15 19 2 - -
12 13 17 1 2 -
18 16 11 - - 1
Continue this process for each number. Take the maximum of the orderings in that number's row and column. Add 1 and assign that ordering.
Input Ordering
14 15 19 2 3 -
12 13 17 1 2 -
18 16 11 - 4 1
Input Ordering
14 15 19 2 3 4
12 13 17 1 2 3
18 16 11 5 4 1
There's a solution. The "dominance" path 18 > 16 > 15 > [14 or 13] > 12 demonstrates that 5 is the lowest max value.
You can also solve this by converting the locations to a directed graph. Nodes in the same row or column have an edge connecting them; the edge is directed from the smaller to the larger. It will be sufficient to order the values and merely connect the adjacent values: given 14->15 and 15->19, we don't need 14->19 as well.
Add a node 0 with label 0 and an edge to each node that has no other input edges.
Now follow a typical labeling iteration: any node with all its inputs labeled receives a label that is one more than the largest of its inputs.
This is the same algorithm as the above, but the correctness and minimalism are much easier to see.
14 -> 15 -> 19
12 -> 13 -> 17
11 -> 16 -> 18
12 -> 14 -> 18
13 -> 15 -> 16
11 -> 17 -> 19
0 -> 11
0 -> 12
Now, if we shake out the topology of this, starting on the left, we get:
0 11 13 17
12 14 15 16 18
19
This makes the numbering obvious: each node is labeled with the length of its longest path from the start node.
Your memory problem should be edited into your question proposal, or given as a new question. You have non-trivial dependencies along rows and columns. If your data do not fit into memory, then you may want to make a disk-hosted data base to store your pre-processed data. For instance, you could store the graph as a list of edges keyed by dependencies:
11 none
12 none
13 12
14 12
15 13, 14
16 11, 15
17 11, 13
18 14, 16
19 15, 17
You haven't described the shape of your data. At the very worst, you should be able to build this graph data base with one pass to do the rows, and then one pass per column -- or multiple columns in each pass, depending on how many you can fit into memory at once.
Then you can apply the algorithm to the items int he data base. You can speed it up if you keep in memory, not only all nodes with no dependencies, but another list with few dependencies -- "few" being dependent on your memory availability.
For instance, make one pass over the data base to grab every cell with 0 or 1 dependencies. Put the independent nodes in your "active" list; as you process those, add nodes only from the "1-dependency" list as they're freed up. Once you've exhausted those sub-graphs, then make a large pass to (1) update the data base; (2) extract the next sets of nodes with 0 or 1 dependency.
Let's look at this with the example you gave. First, we make a couple of lists from the original graph:
0-dep 11, 12
1-dep 13 (on 12), 14 (on 12)
This pass is trivial: we assign 1 to cells 11 and 12; 2 to cells 13 and 14. Now update the graph:
node dep done (assigned values)
15 none 2, 2
16 15 1
17 none 1, 2
18 16 2
19 15, 17
Refresh the in-memory lists:
0-dep 15, 17
1-dep 16 (on 15), 18 (on 16)
On this pass, both 15 and 17 depend on a node with value 2, so they are both assigned 3. Resolving 15 frees node 16, which gets value 4. This, in turn, frees up node 18, which gets the value 5.
In one final pass, we now have node 19 with no outstanding dependencies. it's maximum upstream value is 3, so it gets the value 4.
In the worst case -- you can't even hold all independent nodes in memory at once -- you can still grab as many as you can fit, assign their values in an in-memory pass, and return to the disk for more to process.
Can you handle the data manipulations from here?

Related

Find elements of given matrix

You are given an infinite matrix whose upper-left square starts with 1. Here are the first five rows of the infinite matrix :
1 2 9 10 25
4 3 8 11 24
5 6 7 12 23
16 15 14 13 22
17 18 19 20 21
Your task is to find out the number in presents at row x and column y after observing a certain kind of patter present in the matrix
Input Format
The first input line contains an integer t: the number of test cases
After this, there are t lines, each containing integer x and y
For each test, print the number present at xth row and yth column.
sample input
3
2 3
1 1
4 2
sample output
8
1
15
Hint: the numbers at the right and bottom border of a left upper square are consecutive (going either down and left, or right and up). First determine in which border your position is, then find out which direction applies, and finally find the correct number at the position (which easy formula gives you the first number in the border?).

Block Sort Algorithm

From the Wikipedia page for block sort I figured out that block sort works by dividing the initial array into small subarrays of length 16 for example, sorting all those subarrays in O(n) time, then merging all these blocks in a way I can't understand.
For example, considering an array of length 16, dividing it in 4 block, each of length 4, and sorting those blocks, we get:
10 1 8 3 4 19 20 13 14 17 8 9 12 18 7 20
10 1 8 3 ----- 4 19 20 13 ----- 14 17 8 9 ----- 12 18 7 20
1 3 8 10 ----- 4 13 19 20 ----- 8 9 14 17 ----- 7 12 18 20
Can anyone please explain me how does merge step works?
Usually merge sort goes even further and splits the array in blocks of 2. To merge, it creates a pointer to the begging of both blocks and compares their values. It picks the smaller and increments the corresponding pointer.
1 4 5 ...
^
2 3 4 ...
^
Pick 1, because its smaller, and update pointer
1 4 5 ...
^
2 3 4 ...
^
Pick 2
1 4 5 ...
^
2 3 4 ...
^
Pick 3 and so on....
These values are put on an array which is gonna be compared with another array created with the same technique. And it goes on and on merging until all the members are sorted. I'm not considering the whole lot of optimizations that you could do in a real merge algorithm.
The first thing of block sort merging is to extract buffers. That is the only thing I know a lot about, and it starts like this. Find the square root of the array's length, and find that many unique values in the beginning and end. Using either rotations or reversals, you can put them all in the beginning and end. Then, I don't know how to merge the other stuff.

X-Y heuristic function for solving N-puzzle

Can somebody please explain this heuristic function, for example for the following arrangement of 4x4 puzzle, whats the X-Y heuristic cost?
1 2 3 4
5 6 7 8
9 10 11 12
0 13 14 15
(0 indicates blank space)
As from here and here the X-Y heuristic is computed by the sum of the minimum number of column-adjacent blank swaps to get all tiles in their destination column and the minimum number of row adjacent blank swaps to get all tiles in their destination row.
So in this situation:
1 2 3 4
5 6 7 8
9 10 11 12
0 13 14 15
the only misplaced tiles are 13 , 14 and 15, assuming the goal state is
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 0
So in this case the we have to compute at first the number of column swaps the blank has to do to get all the tiles in the correct position. This is equivalent to 3, since the blank has to move three times to the the right column to be in the right position (and to have all the tiles in the right position)
Then we have to compute the number of row swaps the blank has to do. This is 0 thanks to the fact that all the tiles are already on the correct row.
Finally h(n) = 3 + 0 = 3 .

Insertion into a binary heap

If I have an array representing a minimum binary heap that contains the values {2, 8, 3, 10, 16, 7, 18, 13, 15}, what would the array look like after inserting the value of 4? Also, how would I demonstrate this to be correct?
I deduced it would be 2,4,3,10,8,7,18,13,15,16. Is that correct?
To demonstrate that your min heap is correct, you need to prove recursively that your child nodes are larger than your root node
If your root node is n, your child nodes are 2n+1 and 2n+2, so iterate through your tree and check if child nodes are greater than parent. If this logic is not satisfied anywhere then your heap is bad.
2
8 3
10 16 7 18
13 15
push at end
2
8 3
10 **16** 7 18
13 15 4
compare and replace with parent
2
**8** 3
10 4 7 18
13 15 16
compare and replace with parent-no replacement
**2**
4 3
10 8 7 18
13 15 16

Summation of difference between matrix elements

I am in the process of building a function in MATLAB. As a part of it I have to calculate differences between elements in two matrices and sum them up.
Let me explain considering two matrices,
1 2 3 4 5 6
13 14 15 16 17 18
and
7 8 9 10 11 12
19 20 21 22 23 24
The calculations in the first row - only four elements in both matrices are considered at once (zero indicates padding):
(1-8)+(2-9)+(3-10)+(4-11): This replaces 1 in initial matrix.
(2-9)+(3-10)+(4-11)+(5-12): This replaces 2 in initial matrix.
(3-10)+(4-11)+(5-12)+(6-0): This replaces 3 in initial matrix.
(4-11)+(5-12)+(6-0)+(0-0): This replaces 4 in initial matrix. And so on
I am unable to decide how to code this in MATLAB. How do I do it?
I use the following equation.
Here i ranges from 1 to n(h), n(h), the number of distant pairs. It depends on the lag distance chosen. So if I choose a lag distance of 1, n(h) will be the number of elements - 1.
When I use a 7 X 7 window, considering the central value, n(h) = 4 - 1 = 3 which is the case here.
You may want to look at the circshfit() function:
a = [1 2 3 4; 9 10 11 12];
b = [5 6 7 8; 12 14 15 16];
for k = 1:3
b = circshift(b, [0 -1]);
b(:, end) = 0;
diff = sum(a - b, 2)
end

Resources