Finding neighbour blocks on a grid - algorithm

I have a grid like this:
1234567
1 ACBDBAB
2 ABDBABC
3 ABADBAB
4 BABDAAB
5 BABCDBA
6 BDBABCB
7 ABBCBAB
Given a certain coordinate, for example (3:4), I'd like to find all the other
blocks with the same letter that have at least one common side with the original
block and one of those block (recursively). On my example, I'd like the following blocks:
1234567
1 .......
2 .......
3 .......
4 ..B....
5 ..B....
6 ..B....
7 .BB....
My current idea is to check the original column up and down, by incrementing and
decrementing the row number until the letter is different, in my example this
would give me row numbers (4, 5, 6, 7). Then, I increment the
column number and check my previous row numbers, in my example, none of them are
of the original letter, so I start decrementing, I check 4, 5, 6 and 7 at column
2, and I see only 7 matches, so I continue to column 1 and I check row 7 and so
on.

I believe you are looking for the flood fill algorithm.
Edit: I gave some thought to your proposed algorithm and realized why it wouldn't work. The problem is that it only detects convex areas. Say you have a grid like this:
BAB
BBB
BAB
And you would like to replace all the B's with C's. If you started your algorithm from the B in the center of the grid, you'd get this:
BAB
CCC
BAB

Related

How to perform range updates in sqrt{n} time?

I have an array and I have to perform query and updates on it.
For queries, I have to find frequency of a particular number in a range from l to r and for update, I have to add x from some range l to r.
How to perform this?
I thought of sqrt{n} optimization but I don't know how to perform range updates with this time complexity.
Edit - Since some people are asking for an example, here is one
Suppose the array is of size n = 8
and it is
1 3 3 4 5 1 2 3
And there are 3 queries to help everybody explain about what I am trying to say
Here they are
q 1 5 3 - This means that you have to find the frequency of 3 in range 1 to 5 which is 2 as 3 appears on 2nd and 3rd position.
second is update query and it goes like this - u 2 4 6 -> This means that you have to add 6 in the array from range 2 to 4. So the new array will become
1 9 9 10 5 1 2 3
And the last query is again the same as first one which will now return 0 as there is no 3 in the array from position 1 to 5 now.
I believe things must be more clear now. :)
I developed this algorithm long time (20+ years) ago for Arithmetic coder.
Both Update and Retrieve are performed in O(log(N)).
I named this algorithm "Method of Intervals". Let I show you the example.
Imagine, we have 8 intervals, with numbers 0-7:
+--0--+--1--+--2-+--3--+--4--+--5--+--6--+--7--+
Lets we create additional set of intervals, each spawns pair of original ones:
+----01-----+----23----+----45-----+----67-----+
Thereafter, we'll create the extra one layer of intervals, spawn pairs of 2nd:
+---------0123---------+---------4567----------+
And at last, we create single interval, covers all 8:
+------------------01234567--------------------+
As you see, in this structure, to retrieve right border of the interval [5], you needed just add together length of intervals [0123] + [45]. to retrieve left border of the interval [5], you needed sum of length the intervals [0123] + [4] (left border for 5 is right border for 4).
Of course, left border of the interval [0] is always = 0.
When you'll watch this proposed structure carefully, you will see, the odd elements in the each layers aren't needed. I say, you do not needed elements 1, 3, 5, 7, 23, 67, 4567, since these elements aren't used, during Retrieval or Update.
Lets we remove the odd elements and make following remuneration:
+--1--+--x--+--3-+--x--+--5--+--x--+--7--+--x--+
+-----2-----+-----x----+-----6-----+-----x-----+
+-----------4----------+-----------x-----------+
+----------------------8-----------------------+
As you see, with this remuneration, used the numbers [1-8]. Lets they will be array indexes. So, you see, there is used memory O(N).
To retrieve right border of the interval [7], you needed add length of the values with indexes 4,6,7. To update length of the interval [7], you needed add difference to all 3 of these values. As result, both Retrieval and Update are performed for Log(N) time.
Now is needed algorithm, how by the original interval number compute set of indexes in this data structure. For instance - how to convert:
1 -> 1
2 -> 2
3 -> 3,2
...
7 -> 7,6,4
This is easy, if we will see binary representation for these numbers:
1 -> 1
10 -> 10
11 -> 11,10
111 -> 111,110,100
As you see, in the each chain - next value is previous value, where rightmost "1" changed to "0". Using simple bit operation "x & (x - 1)", we can wtite a simple loop to iterate array indexes, related to the interval number:
int interval = 7;
do {
int index = interval;
do_something(index);
} while(interval &= interval - 1);

Neighbours of a cell in the borders of a matrix

I'm facing some issues to find the neighbours in a matrix. I'm trying not to put a lot of if statements in the code because I'm pretty sure there's a better way to do it but I don't know exactly how.
To simplify, let's say we have the following matrix:
1 2 3 4 5
6 7 8 9 6
1 2 3 4 5
2 3 4 6 7
Considering the cell [2,2] = 3, the neighbours would be (i,j-1),(i-1,j),(i+1,j),(i,j+1),(i+1,j+1),(i-1,j-1). I created a "mask" for it using a for-loop like this, where inicio[0] is the i-coordinate of my current element (2 in the example) and inicio[1] is the j-coordinate (also 2 for element 3). Also, I'm considering the element must be in the center of the mask.
for(k=inicio[0]-1;k<inicio[0]+1;k++){
for(z=inicio[1]-1;z<inicio[1]+1;z++)
if(k!=0 || z!=0) //jump the current cell
However, I don't know how to treat the elements in the borders. If I want to find the neighbours of element [0,0] = 1 for example, considering the element must be in the middle of the mask like this:
x x x
x 1 2
x 6 7
How can I treat those X elements? I thought of initializing the borders on zero but I'm thinking this is not the proper way to do it. So if anyone can explain a better way to do it or an algorithm, I will be glad.

Ascending Cardinal Numbers in APL

In the FinnAPL Idiom Library, the 19th item is described as “Ascending cardinal numbers (ranking, all different) ,” and the code is as follows:
⍋⍋X
I also found a book review of the same library by R. Peschi, in which he said, “'Ascending cardinal numbers (ranking, all different)' How many of us understand why grading the result of Grade Up has that effect?” That's my question too. I searched extensively on the internet and came up with zilch.
Ascending Cardinal Numbers
For the sake of shorthand, I'll call that little code snippet “rank.” It becomes evident what is happening with rank when you start applying it to binary numbers. For example:
X←0 0 1 0 1
⍋⍋X ⍝ output is 1 2 4 3 5
The output indicates the position of the values after sorting. You can see from the output that the two 1s will end up in the last two slots, 4 and 5, and the 0s will end up at positions 1, 2 and 3. Thus, it is assigning rank to each value of the vector. Compare that to grade up:
X←7 8 9 6
⍋X ⍝ output is 4 1 2 3
⍋⍋X ⍝ output is 2 3 4 1
You can think of grade up as this position gets that number and, you can think of rank as this number gets that position:
7 8 9 6 ⍝ values of X
4 1 2 3 ⍝ position 1 gets the number at 4 (6)
⍝ position 2 gets the number at 1 (7) etc.
2 3 4 1 ⍝ 1st number (7) gets the position 2
⍝ 2nd number (8) gets the position 3 etc.
It's interesting to note that grade up and rank are like two sides of the same coin in that you can alternate between the two. In other words, we have the following identities:
⍋X = ⍋⍋⍋X = ⍋⍋⍋⍋⍋X = ...
⍋⍋X = ⍋⍋⍋⍋X = ⍋⍋⍋⍋⍋⍋X = ...
Why?
So far that doesn't really answer Mr Peschi's question as to why it has this effect. If you think in terms of key-value pairs, the answer lies in the fact that the original keys are a set of ascending cardinal numbers: 1 2 3 4. After applying grade up, a new vector is created, whose values are the original keys rearranged as they would be after a sort: 4 1 2 3. Applying grade up a second time is about restoring the original keys to a sequence of ascending cardinal numbers again. However, the values of this third vector aren't the ascending cardinal numbers themselves. Rather they correspond to the keys of the second vector.
It's kind of hard to understand since it's a reference to a reference, but the values of the third vector are referencing the orginal set of numbers as they occurred in their original positions:
7 8 9 6
2 3 4 1
In the example, 2 is referencing 7 from 7's original position. Since the value 2 also corresponds to the key of the second vector, which in turn is the second position, the final message is that after the sort, 7 will be in position 2. 8 will be in position 3, 9 in 4 and 6 in the 1st position.
Ranking and Shareable
In the FinnAPL Idiom Library, the 2nd item is described as “Ascending cardinal numbers (ranking, shareable) ,” and the code is as follows:
⌊.5×(⍋⍋X)+⌽⍋⍋⌽X
The output of this code is the same as its brother, ascending cardinal numbers (ranking, all different) as long as all the values of the input vector are different. However, the shareable version doesn't assign new values for those that are equal:
X←0 0 1 0 1
⌊.5×(⍋⍋X)+⌽⍋⍋⌽X ⍝ output is 2 2 4 2 4
The values of the output should generally be interpreted as relative, i.e. The 2s have a relatively lower rank than the 4s, so they will appear first in the array.

Minimize maximum absolute difference in pairs of numbers

The problem statement:
Give n variables and k pairs. The variables can be distinct by assigning a value from 1 to n to each variable. Each pair p contain 2 variables and let the absolute difference between 2 variables in p is abs(p). Define the upper bound of difference is U=max(Abs(p)|every p).
Find an assignment that minimize U.
Limit:
n<=100
k<=1000
Each variable appear at least 2 times in list of pairs.
A problem instance:
Input
n=9, k=12
1 2 (meaning pair x1 x2)
1 3
1 4
1 5
2 3
2 6
3 5
3 7
3 8
3 9
6 9
8 9
Output:
1 2 5 4 3 6 7 8 9
(meaning x1=1,x2=2,x3=5,...)
Explaination: An assignment of x1=1,x2=2,x3=3,... will result in U=6 (3 9 has greastest abs value). The output assignment will get U=4, the minimum value (changed pair: 3 7 => 5 7, 3 8 => 5 8, etc. and 3 5 isn't changed. In this case, abs(p)<=4 for every pair).
There is an important point: To achieve the best assignments, the variables in the pairs that have greatest abs must be change.
Base on this, I have thought of a greedy algorithm:
1)Assign every x to default assignment (x(i)=i)
2)Locate pairs that have largest abs and x(i)'s contained in them.
3)For every i,j: Calculate U. Swap value of x(i),x(j). Calculate U'. If U'<U, stop and repeat step 3. If U'>=U for every i,j, end and output the assignment.
However, this method has a major pitfall, if we need an assignment like this:
x(a)<<x(b), x(b)<<x(c), x(c)<<x(a)
, we have to swap in 2 steps, like: x(a)<=>x(b), then x(b)<=>x(c), then there is a possibility that x(b)<<x(a) in first step has its abs become larger than U and the swap failed.
Is there any efficient algorithm to solve this problem?
This looks like http://en.wikipedia.org/wiki/Graph_bandwidth (NP complete, even for special cases). It looks like people run http://en.wikipedia.org/wiki/Cuthill-McKee_algorithm when they need to do this to try and turn a sparse matrix into a banded diagonal matrix.

Array problem using if and do loop

This is my code:
data INDAT8; set INDAT6;
Array myarray{24,27};
goodgroups=0;
do i=2 to 24 by 2;
do j=2 to 27;
if myarray[i,j] gt 1 then myarray[i+1,j] = 'bad';
else if myarray[i,j] eq 1 and myarray[i+1,j] = 1 then myarray[i+1,j]= 'good';
end;
end;
run;
proc print data=INDAT8;
run;
Problem:
I have the data in this format- it is just an example: n=2
X Y info
2 1 good
2 4 bad
3 2 good
4 1 bad
4 4 good
6 2 good
6 3 good
Now, the above data is in sorted manner (total 7 rows). I need to make a group of 2 , 3 or 4 rows separately and generate a graph. In the above data, I made a group of 2 rows. The third row is left alone as there is no other column in 3rd row to form a group. A group can be formed only within the same row. NOT with other rows.
Now, I will check if both the rows have “good” in the info column or not. If both rows have “good” – the group formed is also good , otherwise bad. In the above example, 3rd /last group is “good” group. Rest are all bad group. Once I’m done with all the rows, I will calculate the total no. of Good groups formed/Total no. of groups.
In the above example, the output will be: Total no. of good groups/Total no. of groups => 1/3.
This is the case of n=2(size of group)
Now, for n=3, we make group of 3 rows and for n=4, we make a group of 4 rows and find the good /bad groups in a similar way. If all the rows in a group has “good” block—the result is good block, otherwise bad.
Example: n= 3
2 1 good
2 4 bad
2 6 good
3 2 good
4 1 good
4 4 good
4 6 good
6 2 good
6 3 good
In the above case, I left the 4th row and last 2 rows as I can’t make group of 3 rows with them. The first group result is “bad” and last group result is “good”.
Output: 1/ 2
For n= 4:
2 1 good
2 4 good
2 6 good
2 7 good
3 2 good
4 1 good
4 4 good
4 6 good
6 2 good
6 3 good
6 4 good
6 5 good
In this case, I make a group of 4 and finds the result. The 5th,6th,7th,8th row are left behind or ignored. I made 2 groups of 4 rows and both are “good” blocks.
Output: 2/2
So, After getting 3 output values from n=2 , n-3, and n=4 I will plot a graph of these values.
If you can help in any any language using array, if and do loop. it would be great.
I can change my code accordingly.
Update:
The answer for this doesn't have to be in sas. Since it is more algorithm-related than anything, I will accept suggestions in any language as long as they show how to accomplish this using arrays and do.
I am having trouble understanding your problem statement, but from what I can gather here is what I can suggest:
Place data into bins and the process the summary data.
Implementation 1
Assumption: You don't know what the range of the first column will be or distriution will be sparse
Create a hash table. The Key will be the item you are doing your grouping on. The value will be the count seen so far.
Proces each record. If the key already exists, increment the count (value for that key in the hash). Otherwise add the key and set the value to 1.
Continue until you have processed all records
Count the number of keys in the hash table and the number of values that are greater than your threshold.
Implementation 2
Assumption: You know the range of the first column and the distriution is reasonably dense
Create an array of integers with enough elements so the index can match the column value. Initialize all elements to zero. This array will hold your count for each item you are grouping on
Process each record. Examine value of first column. Increment corresponding index in array. (So if you have "2 1 good", do groupCount[2]++)
Continue until you have processed all records
Walk each element in the array. Count how many items are non zero (meaning they appeared at least once) and how many items meet your threshold.
You can use the same approach for gathering the good and bad counts.

Resources