count the number of split points - algorithm

I just got a question about counting the split points in a integer array, to ensure there is at least one duplicated integer on the two sides.
ex:
1 1 4 2 4 2 4 1
we can either split it into:
1 1 4 2 | 4 2 4 1
or
1 1 4 2 4 | 2 4 1
so that there is at least one '1', '2' ,and '4' are in both sides.
The integer can range from 1 to 100,000
The complexity requires O(n). How to solve this question?

Make one pass over the array and build count[i] = how many times the value i appears in the array. The problem is only solvable if count[i] >= 2 for all non-zero values. You can use this array to tell how many distinct values you have in your array.
Next, make another pass and using another array count2[i] (or you can reuse the first one), keep track of when you have visited each value at least once. Then use that position as your split point.
Example:
1 1 4 2 4 2 4 1
count = [3, 2, 0, 4] => 3 distinct values
1 1 4 2 4 2 4 1
^ => 1 distinct value so far
^ => 1 distinct value so far
^ => 2 distinct values so far
^ => 3 distinct values so far => this is your split point
There might be cases for which there is no solution, for example if the last 1 was at the beginning as well. To detect this, you can just make another pass over the rest of the array after you have decided on the split point and see if you still have all the values on that side.
You can avoid this last pass by using the count and count2 arrays to detect when you can no longer have a split point. This is left as an exercise.

Related

How to extract vectors from a given condition matrix in Octave

I'm trying to extract a matrix with two columns. The first column is the data that I want to group into a vector, while the second column is information about the group.
A =
1 1
2 1
7 2
9 2
7 3
10 3
13 3
1 4
5 4
17 4
1 5
6 5
the result that i seek are
A1 =
1
2
A2 =
7
9
A3 =
7
10
13
A4=
1
5
17
A5 =
1
6
as an illustration, I used the eval function but it didn't give the results I wanted
Assuming that you don't actually need individually named separated variables, the following will put the values into separate cells of a cell array, each of which can be an arbitrary size and which can be then retrieved using cell index syntax. It makes used of logical indexing so that each iteration of the for loop assigns to that cell in B just the values from the first column of A that have the correct number in the second column of A.
num_cells = max (A(:,2));
B = cell (num_cells,1);
for idx = 1:max(A(:,2))
B(idx) = A((A(:,2)==idx),1);
end
B =
{
[1,1] =
1
2
[2,1] =
7
9
[3,1] =
7
10
13
[4,1] =
1
5
17
[5,1] =
1
6
}
Cell arrays are accessed a bit differently than normal numeric arrays. Array indexing (with ()) will return another cell, e.g.:
>> B(1)
ans =
{
[1,1] =
1
2
}
To get the contents of the cell so that you can work with them like any other variable, index them using {}.
>> B{1}
ans =
1
2
How it works:
Use max(A(:,2)) to find out how many array elements are going to be needed. A(:,2) uses subscript notation to indicate every value of A in column 2.
Create an empty cell array B with the right number of cells to contain the separated parts of A. This isn't strictly necessary, but with large amounts of data, things can slow down a lot if you keep adding on to the end of an array. Pre-allocating is usually better.
For each iteration of the for loop, it determines which elements in the 2nd column of A have the value matching the value of idx. This returns a logical array. For example, for the third time through the for loop, idx = 3, and:
>> A_index3 = A(:,2)==3
A_index3 =
0
0
0
0
1
1
1
0
0
0
0
0
That is a logical array of trues/falses indicating which elements equal 3. You are allowed to mix both logical and subscripts when indexing. So using this we can retrieve just those values from the first column:
A(A_index3, 1)
ans =
7
10
13
we get the same result if we do it in a single line without the A_index3 intermediate placeholder:
>> A(A(:,2)==3, 1)
ans =
7
10
13
Putting it in a for loop where 3 is replaced by the loop variable idx, and we assign the answer to the idx location in B, we get all of the values separated into different cells.

Bipartie Matching to form an Array

I am given a number from 1 to N , and there are M relationship given in the form a and b where we can connect number a and b.
We have to form the valid array , A array is said to be valid if for any two consecutive indexes A[i] and A[i+1] is one of the M relationship
We have to construct a valid Array of Size N, it's always possible to construct that.
Solution: Make A Bipartite Graph of the following , but there is a loophole on this,
let N=6
M=6
1 2
2 3
1 3
4 5
5 6
3 4
So Bipartite Matching gives this:
Match[1]=2
Match[2]=3
Match[3]=1 // Here it form a Loop
Match[4]=5
Match[5]=6
So how to i print a valid Array of size N , since N can be very large so many loops can be formed ? Is there any other solution ?
Another Example:
let N=6
M=6
1 3
3 5
2 5
5 1
4 2
6 4
It's will form a loop 1->3->5->1
1 3 5 2 4 6

Ascending Cardinal Numbers in APL

In the FinnAPL Idiom Library, the 19th item is described as “Ascending cardinal numbers (ranking, all different) ,” and the code is as follows:
⍋⍋X
I also found a book review of the same library by R. Peschi, in which he said, “'Ascending cardinal numbers (ranking, all different)' How many of us understand why grading the result of Grade Up has that effect?” That's my question too. I searched extensively on the internet and came up with zilch.
Ascending Cardinal Numbers
For the sake of shorthand, I'll call that little code snippet “rank.” It becomes evident what is happening with rank when you start applying it to binary numbers. For example:
X←0 0 1 0 1
⍋⍋X ⍝ output is 1 2 4 3 5
The output indicates the position of the values after sorting. You can see from the output that the two 1s will end up in the last two slots, 4 and 5, and the 0s will end up at positions 1, 2 and 3. Thus, it is assigning rank to each value of the vector. Compare that to grade up:
X←7 8 9 6
⍋X ⍝ output is 4 1 2 3
⍋⍋X ⍝ output is 2 3 4 1
You can think of grade up as this position gets that number and, you can think of rank as this number gets that position:
7 8 9 6 ⍝ values of X
4 1 2 3 ⍝ position 1 gets the number at 4 (6)
⍝ position 2 gets the number at 1 (7) etc.
2 3 4 1 ⍝ 1st number (7) gets the position 2
⍝ 2nd number (8) gets the position 3 etc.
It's interesting to note that grade up and rank are like two sides of the same coin in that you can alternate between the two. In other words, we have the following identities:
⍋X = ⍋⍋⍋X = ⍋⍋⍋⍋⍋X = ...
⍋⍋X = ⍋⍋⍋⍋X = ⍋⍋⍋⍋⍋⍋X = ...
Why?
So far that doesn't really answer Mr Peschi's question as to why it has this effect. If you think in terms of key-value pairs, the answer lies in the fact that the original keys are a set of ascending cardinal numbers: 1 2 3 4. After applying grade up, a new vector is created, whose values are the original keys rearranged as they would be after a sort: 4 1 2 3. Applying grade up a second time is about restoring the original keys to a sequence of ascending cardinal numbers again. However, the values of this third vector aren't the ascending cardinal numbers themselves. Rather they correspond to the keys of the second vector.
It's kind of hard to understand since it's a reference to a reference, but the values of the third vector are referencing the orginal set of numbers as they occurred in their original positions:
7 8 9 6
2 3 4 1
In the example, 2 is referencing 7 from 7's original position. Since the value 2 also corresponds to the key of the second vector, which in turn is the second position, the final message is that after the sort, 7 will be in position 2. 8 will be in position 3, 9 in 4 and 6 in the 1st position.
Ranking and Shareable
In the FinnAPL Idiom Library, the 2nd item is described as “Ascending cardinal numbers (ranking, shareable) ,” and the code is as follows:
⌊.5×(⍋⍋X)+⌽⍋⍋⌽X
The output of this code is the same as its brother, ascending cardinal numbers (ranking, all different) as long as all the values of the input vector are different. However, the shareable version doesn't assign new values for those that are equal:
X←0 0 1 0 1
⌊.5×(⍋⍋X)+⌽⍋⍋⌽X ⍝ output is 2 2 4 2 4
The values of the output should generally be interpreted as relative, i.e. The 2s have a relatively lower rank than the 4s, so they will appear first in the array.

Minimize maximum absolute difference in pairs of numbers

The problem statement:
Give n variables and k pairs. The variables can be distinct by assigning a value from 1 to n to each variable. Each pair p contain 2 variables and let the absolute difference between 2 variables in p is abs(p). Define the upper bound of difference is U=max(Abs(p)|every p).
Find an assignment that minimize U.
Limit:
n<=100
k<=1000
Each variable appear at least 2 times in list of pairs.
A problem instance:
Input
n=9, k=12
1 2 (meaning pair x1 x2)
1 3
1 4
1 5
2 3
2 6
3 5
3 7
3 8
3 9
6 9
8 9
Output:
1 2 5 4 3 6 7 8 9
(meaning x1=1,x2=2,x3=5,...)
Explaination: An assignment of x1=1,x2=2,x3=3,... will result in U=6 (3 9 has greastest abs value). The output assignment will get U=4, the minimum value (changed pair: 3 7 => 5 7, 3 8 => 5 8, etc. and 3 5 isn't changed. In this case, abs(p)<=4 for every pair).
There is an important point: To achieve the best assignments, the variables in the pairs that have greatest abs must be change.
Base on this, I have thought of a greedy algorithm:
1)Assign every x to default assignment (x(i)=i)
2)Locate pairs that have largest abs and x(i)'s contained in them.
3)For every i,j: Calculate U. Swap value of x(i),x(j). Calculate U'. If U'<U, stop and repeat step 3. If U'>=U for every i,j, end and output the assignment.
However, this method has a major pitfall, if we need an assignment like this:
x(a)<<x(b), x(b)<<x(c), x(c)<<x(a)
, we have to swap in 2 steps, like: x(a)<=>x(b), then x(b)<=>x(c), then there is a possibility that x(b)<<x(a) in first step has its abs become larger than U and the swap failed.
Is there any efficient algorithm to solve this problem?
This looks like http://en.wikipedia.org/wiki/Graph_bandwidth (NP complete, even for special cases). It looks like people run http://en.wikipedia.org/wiki/Cuthill-McKee_algorithm when they need to do this to try and turn a sparse matrix into a banded diagonal matrix.

Matrix, algorithm interview question

This was one of my interview questions.
We have a matrix containing integers (no range provided). The matrix is randomly populated with integers. We need to devise an algorithm which finds those rows which match exactly with a column(s). We need to return the row number and the column number for the match. The order of of the matching elements is the same. For example, If, i'th row matches with j'th column, and i'th row contains the elements - [1,4,5,6,3]. Then jth column would also contain the elements - [1,4,5,6,3]. Size is n x n.
My solution:
RCEQUAL(A,i1..12,j1..j2)// A is n*n matrix
if(i2-i1==2 && j2-j1==2 && b[n*i1+1..n*i2] has [j1..j2])
use brute force to check if the rows and columns are same.
if (any rows and columns are same)
store the row and column numbers in b[1..n^2].//b[1],b[n+2],b[2n+3].. store row no,
// b[2..n+1] stores columns that
//match with row 1, b[n+3..2n+2]
//those that match with row 2,etc..
else
RCEQUAL(A,1..n/2,1..n/2);
RCEQUAL(A,n/2..n,1..n/2);
RCEQUAL(A,1..n/2,n/2..n);
RCEQUAL(A,n/2..n,n/2..n);
Takes O(n^2). Is this correct? If correct, is there a faster algorithm?
you could build a trie from the data in the rows. then you can compare the columns with the trie.
this would allow to exit as soon as the beginning of a column do not match any row. also this would let you check a column against all rows in one pass.
of course the trie is most interesting when n is big (setting up a trie for a small n is not worth it) and when there are many rows and columns which are quite the same. but even in the worst case where all integers in the matrix are different, the structure allows for a clear algorithm...
You could speed up the average case by calculating the sum of each row/column and narrowing your brute-force comparison (which you have to do eventually) only on rows that match the sums of columns.
This doesn't increase the worst case (all having the same sum) but if your input is truly random that "won't happen" :-)
This might only work on non-singular matrices (not sure), but...
Let A be a square (and possibly non-singular) NxN matrix. Let A' be the transpose of A. If we create matrix B such that it is a horizontal concatenation of A and A' (in other words [A A']) and put it into RREF form, we will get a diagonal on all ones in the left half and some square matrix in the right half.
Example:
A = 1 2
3 4
A'= 1 3
2 4
B = 1 2 1 3
3 4 2 4
rref(B) = 1 0 0 -2
0 1 0.5 2.5
On the other hand, if a column of A were equal to a row of A then column of A would be equal to a column of A'. Then we would get another single 1 in of of the columns of the right half of rref(B).
Example
A=
1 2 3 4 5
2 6 -3 4 6
3 8 -7 6 9
4 1 7 -5 3
5 2 4 -1 -1
A'=
1 2 3 4 5
2 6 8 1 2
3 -3 -7 7 4
4 4 6 -5 -1
5 6 9 3 -1
B =
1 2 3 4 5 1 2 3 4 5
2 6 -3 4 6 2 6 8 1 2
3 8 -7 6 9 3 -3 -7 7 4
4 1 7 -5 3 4 4 6 -5 -1
5 2 4 -1 -1 5 6 9 3 -1
rref(B)=
1 0 0 0 0 1.000 -3.689 -5.921 3.080 0.495
0 1 0 0 0 0 6.054 9.394 -3.097 -1.024
0 0 1 0 0 0 2.378 3.842 -0.961 0.009
0 0 0 1 0 0 -0.565 -0.842 1.823 0.802
0 0 0 0 1 0 -2.258 -3.605 0.540 0.662
1.000 in the top row of the right half means that the first column of A matches on of its rows. The fact that the 1.000 is in the left-most column of the right half means that it is the first row.
Without looking at your algorithm or any of the approaches in the previous answers, but since the matrix has n^2 elements to begin with, I do not think there is a method which does better than that :)
IFF the matrix is truely random...
You could create a list of pointers to the columns sorted by the first element. Then create a similar list of the rows sorted by their first element. This takes O(n*logn).
Next create an index into each sorted list initialized to 0. If the first elements match, you must compare the whole row. If they do not match, increment the index of the one with the lowest starting element (either move to the next row or to the next column). Since each index cycles from 0 to n-1 only once, you have at most 2*n comparisons unless all the rows and columns start with the same number, but we said a matrix of random numbers.
The time for a row/column comparison is n in the worst case, but is expected to be O(1) on average with random data.
So 2 sorts of O(nlogn), and a scan of 2*n*1 gives you an expected run time of O(nlogn). This is of course assuming random data. Worst case is still going to be n**3 for a large matrix with most elements the same value.

Resources