Error correcting codes and minimum distances - algorithm

I was looking at a challenge online (at King's website) and although I understand the general idea behind it I'm slightly lost - maybe the wording is a little off? Here is the problem and I'll state what I don't understand below:
Error correcting codes are used in a wide variety of applications
ranging from satellite communication to music CDs. The idea is to
encode a binary string of length k as a binary string of length n>k,
called a codeword such that even if some bit(s) of the encoding are
corrupted (if you scratch on your CD for instance), the original k-bit
string can still be recovered. There are three important parameters
associated with an error correcting code: the length of codewords (n),
the dimension (k) which is the length of the unencoded strings, and
finally the minimum distance (d) of the code. Distance between two
codewords is measured as hamming distance, i.e., the number of
positions in which the codewords differ: 0010 and 0100 are at distance
2. The minimum distance of the code is the distance between the two different codewords that are closest to each other. Linear codes are a
simple type of error correcting codes with several nice properties.
One of them being that the minmum distance is the smallest distance
any non-zero codeword has to the zero codeword (the codeword
consisting of n zeros always belongs to a linear code of length n).
Another nice property of linear codes of length n and dimension k is
that they can be described by an n×k generator matrix of zeros and
ones. Encoding a k-bit string is done by viewing it as a column vector
and multiplying it by the generator matrix. The example below shows a
generator matrix and how the string 1001 is encoded. graph.png Matrix
multiplication is done as usual except that additon is done modulo 2
(i.e., 0+1=1+0=1 and 0+0=1+1=0). The set of codewords of this code is
then simply all vectors that can be obtained by encoding all k-bit
strings in this way. Write a program to calculate the minimum distance
for several linear error correcting codes of length at most 30 and
dimension at most 15. Each code will be given as a generator matrix.
Input You will be given several generator matrices as input. The first
line contains an integer T indicating the number of test cases. The
first line of each test case gives the parameters n and k where
1≤n≤30, 1≤k≤15 and n > k, as two integers separated by a single space.
The following n lines describe a generator matrix. Each line is a row
of the matrix and has k space separated entries that are 0 or 1.
Output For each generator matrix output a single line with the minimum
distance of the corresponding linear code.
Sample Input 1
2
7 4
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 1 1 1
1 0 1 1
1 1 0 1
3 2
1 1
0 0
1 1
Sample Output 1
3
0
Now my assumption is that the question is asking "Write a program that can take in the linear code in matrix form and say what the minimum distance is from an all zero codeword" I just don't understand why there is a 3 output for the first input and a 0 for the second input?
Very confused.
Any ideas?

For first example:
Input binary string: 1000
Resulting code: 1100001
Hamming distance to zero codeword 0000000: 3
For second example:
Input binary string: 11
Resulting code: 000
Hamming distance to zero codeword 000: 0
Your goal is to find valid non-zero codeword (which can be produced from some non-zero k-bit input string) with minimal Hamming distance to zero codeword (in different words - with minimal amount of ones in binary representation) and return that distance.
Hope that helps, the problem description is indeed a little bit hard to understand.
EDIT. I've made typo in first example. Actual input should be 1000 not 0001. Also it's may be not clear what exactly is input string and how the codeword is calculated. Let's look at first sample.
Input binary string: 1000
This binary string in general is not part of generator matrix. It is just one of all possible non-zero 4-bit strings. Let's multiply it by generator matrix:
(1 0 0 0) * (1 0 0 0) = 1
(0 1 0 0) * (1 0 0 0) = 0
(0 0 1 0) * (1 0 0 0) = 0
(0 0 0 1) * (1 0 0 0) = 0
(0 1 1 1) * (1 0 0 0) = 0
(1 0 1 1) * (1 0 0 0) = 1
(1 1 0 1) * (1 0 0 0) = 1
One way to find input that produces "minimal" codeword is to iterate all 2^k-1 non-zero k-bit strings and calculate codeword for each of them. This is feasible solution for k <= 15.
Another example for first test case 0011 (it's possible to have multiple inputs that produce "minimal" output):
(1 0 0 0) * (0 0 1 1) = 0
(0 1 0 0) * (0 0 1 1) = 0
(0 0 1 0) * (0 0 1 1) = 1
(0 0 0 1) * (0 0 1 1) = 1
(0 1 1 1) * (0 0 1 1) = 2 = 0 (mod 2)
(1 0 1 1) * (0 0 1 1) = 2 = 0 (mod 2)
(1 1 0 1) * (0 0 1 1) = 1
Resulting code 0011001 also has Hamming distance 3 to the zero codeword. There is no 4-bit string with code that has less that 3 ones in binary representation. That's why the answer for first test case is 3.

Related

Check if a number is divisible by 3 in logic design

i seen a post on the site about it and i didn't understand the answer, can i get explanation please:
question:
Write code to determine if a number is divisible by 3. The input to the function is a single bit, 0 or 1, and the output should be 1 if the number received so far is the binary representation of a number divisible by 3, otherwise zero.
Examples:
input "0": (0) output 1
inputs "1,0,0": (4) output 0
inputs "1,1,0,0": (6) output 1
This is based on an interview question. I ask for a drawing of logic gates but since this is stackoverflow I'll accept any coding language. Bonus points for a hardware implementation (verilog etc).
Part a (easy): First input is the MSB.
Part b (a little harder): First input is the LSB.
Part c (difficult): Which one is faster and smaller, (a) or (b)? (Not theoretically in the Big-O sense, but practically faster/smaller.) Now take the slower/bigger one and make it as fast/small as the faster/smaller one.
answer:
State table for LSB:
S I S' O
0 0 0 1
0 1 1 0
1 0 2 0
1 1 0 1
2 0 1 0
2 1 2 0
Explanation: 0 is divisible by three. 0 << 1 + 0 = 0. Repeat using S = (S << 1 + I) % 3 and O = 1 if S == 0.
State table for MSB:
S I S' O
0 0 0 1
0 1 2 0
1 0 1 0
1 1 0 1
2 0 2 0
2 1 1 0
Explanation: 0 is divisible by three. 0 >> 1 + 0 = 0. Repeat using S = (S >> 1 + I) % 3 and O = 1 if S == 0.
S' is different from above, but O works the same, since S' is 0 for the same cases (00 and 11). Since O is the same in both cases, O_LSB = O_MSB, so to make MSB as short as LSB, or vice-versa, just use the shortest of both.
thanks for the answers in advanced.
Well, I suppose the question isn't entirely off-topic, since you asked about logic design, but you'll have to do the coding yourself.
You have 3 states in the S column. These track the value of the current full input mod 3. So, S0 means the current input mod 3 is 0, and so is divisible by 0 (remember also that 0 is divisible by 3). S1 means the remainder is 1, S2 means that the remainder is 2.
The I column gives the current input (0 or 1), and S' gives the next state (in other words, the new number mod 3).
For 'LSB', the new number is the old number << 1, plus either 0 or 1. Write out the table. For starters, if the old modulo was 0, then the new modulo will be 0 if the input bit was 0, and will be 1 if the new input was 1. This gives you the first 2 rows in the first table. Filling in the rest is left as an exercise for you.
Note that the O column is just 1 if the next state is 0, as expected.

Distinct values of bitwise and of subarrays

How to find number of distinct values of bitwise and of all subarrays of an array.(Array size <=1e5 and array elements<=1e6).
for eg.
A[]={1,2,3}
distinct values are 4(1,2,3,0).
Let's fix the right boundary r of the subarray. Let's image the left boundary l moves to the left starting from r. How many times can the value of the and change? At most O(log(MAX_VALUE)). Why? When we add one more element to the left, we've got two options:
The and value of the subarray doesn't change.
It changes. In that case, the number of bits in it gets strictly less (as it's a submask of the previous and value).
Thus, we can consider only those values of l where something changes. Now we just need to find them quickly.
Let's iterate over the array from left to right and store the position of the last element that doesn't have the i-th bit for all valid i (we can update it by iterating over all bits of the current element). This way, we'll be able to find the next position where the value changes quickly (namely, it's the largest value in this array over all bits that are set). If we sort the positions, we can find the next largest one in O(1).
The total time complexity of this solution is O(N * log(MAX_VALUE) * log(log(MAX_VALUE))) (we iterate over all bits of each element of the array, we sort the array of positions for each them and iterate over it). The space complexity is O(N + MAX_VALUE). It should be good enough for the given contraints.
Imagine the numbers as columns representing their bits. We will have sequences of 1's extending horizontally. For example:
Array index: 0 1 2 3 4 5 6 7
Bit columns: 0 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1
0 0 1 1 1 1 1 0
1 0 0 0 1 1 0 1
0 1 1 1 1 1 1 0
Looking to the left, the bit-row for any subarray anded after a zero will continue being zero, which means no change after that in that row.
Let's take index 5 for example. Now sorting the horizontal sequences of 1's from index 5 to the left will provide us a simple way to detect a change in the bit configuration (the sorting would have to be done on each iteration):
Index 5 ->
Sorted bit rows: 1 0 0 0 1 1
0 0 0 1 1 1
0 0 1 1 1 1
0 1 1 1 1 1
0 1 1 1 1 1
Index 5 to 4, no change
Index 4 to 3, change
Index 2 to 1, change
Index 1 to 0, change
To easily examine these changes, kraskevich proposes recording only the last unset bit for each row as we go along, which would indicate the length of the horizontal sequence of 1's, and a boolean array (of 1e6 numbers max) to store the unique bit configurations encountered.
Numbers: 1, 2, 3
Bits: 1 0 1
0 1 1
As we move from left to right, keep a record of the index of the last unset bit in each row, and also keep a record of any new bit configuration (at most 1e6 of them):
Indexes of last unset bit for each row on each iteration
Numbers: 1, 2, 3
A[0]: -1 arrayHash = [false,true,false,false], count = 1
0
A[1]: -1 1 Now sort the column descending, representing (current - index)
0 0 the lengths of sequences of 1's extending to the left.
As we move from top to bottom on this column, each value change represents a bit
configuration and a possibly distinct count:
Record present bit configuration b10
=> arrayHash = [false,true,true,false]
1 => 1 - 1 => sequence length 0, ignore sequence length 0
0 => 1 - 0 => sequence length 1,
unset second bit: b10 => b00
=> new bit configuration b00
=> arrayHash = [true,true,true,false]
Third iteration:
Numbers: 1, 2, 3
A[2]: -1 1 1
0 0 0
Record present bit configuration b11
=> arrayHash = [true,true,true,true]
(We continue since we don't necessarily know the arrayHash has filled.)
1 => 2 - 1 => sequence length 1
unset first bit: b11 => b10
=> seen bit configuration b10
0 => 2 - 0 => sequence length 2,
unset second bit: b10 => b00
=> seen bit configuration b00

Altering CT Volume voxel values in Matlab

I am trying to alter some voxel values in Matlab.
I am using the following code:
for p=1:100
Vol(Vol(:,:,p) > 0) = 65535; %altering voxel values in the volume to 65535 if value > 0.
end
Unfortunately, I find all the values being altered, as if the condition is not working, although if i write Vol(Vol(:,:,1)>0)= 65535 immediately in the command line it works perfectly.
Any clue where the error is?
The reason why is because you are not indexing each slice properly in your volume. When you are doing this for loop, what will happen is that the Boolean condition that is provided in Vol is modifying only the first channel.
Consider this small example. Let's create a 3 x 3 x 3 matrix of all 1s.
A = ones(3,3,3)
A(:,:,1) =
1 1 1
1 1 1
1 1 1
A(:,:,2) =
1 1 1
1 1 1
1 1 1
A(:,:,3) =
1 1 1
1 1 1
1 1 1
Let's set the first slice all to 65535 according to your condition:
A(A(:,:,1) > 0) = 65535
A(:,:,1) =
65535 65535 65535
65535 65535 65535
65535 65535 65535
A(:,:,2) =
1 1 1
1 1 1
1 1 1
A(:,:,3) =
1 1 1
1 1 1
1 1 1
This certainly works as we expect. Now let's try going to the second channel:
A(A(:,:,2) > 0) = 65535
A(:,:,1) =
65535 65535 65535
65535 65535 65535
65535 65535 65535
A(:,:,2) =
1 1 1
1 1 1
1 1 1
A(:,:,3) =
1 1 1
1 1 1
1 1 1
Oh no! It didn't work! It only worked for the first channel.... why? The reason why is because A(:,:,1) or any other channel provides a 2D matrix. If you provide a single 2D matrix, it only modifies the first slice of the volume. As such, as your loop keeps progressing, only the first channel gets modified (if at all). If you wanted to modify the second channel, you would have to create a 3D matrix, where the first slice would have all logical false, while the second slice contains the Boolean mask from Vol(:,:,2) > 0.
The 3D slicing stuff is probably complicated, especially for someone new to MATLAB. As such, I would recommend you do this to make things simpler. If you want to modify each slice, consider placing each binary mask as a temporary variable, modifying that temporary variable, then manually assigning this back to each slice. In other words:
for p=1:100
temp = Vol(:,:,p); %//Extract p'th channel
temp(temp > 0) = 65535; %// Find non-zero pixels and set to 65535
Vol(:,:,p) = temp; %// Set back to p'th channel.
end
Another recommended suggestion
Instead of using for loops, I would like to recommend this simple one-liner:
Vol(Vol > 0) = 65535;
This will automatically create a 3D Boolean matrix that will index Vol, and it will find those locations that are greater than 0, and set all of those locations to 65535. This avoids the need of any unnecessary for loops. This one-line essentially performs what the above for loop is doing, but is much more quicker... and I daresay much easier to read.
For your problem, I would just do :
Vol(Vol(:,:,1:100) > 0) = 65535;
No need for loop.

Sorting a binary 2D matrix?

I'm looking for some pointers here as I don't quite know where to start researching this one.
I have a 2D matrix with 0 or 1 in each cell, such as:
1 2 3 4
A 0 1 1 0
B 1 1 1 0
C 0 1 0 0
D 1 1 0 0
And I'd like to sort it so it is as "upper triangular" as possible, like so:
4 3 1 2
B 0 1 1 1
A 0 1 0 1
D 0 0 1 1
C 0 0 0 1
The rows and columns must remain intact, i.e. elements can't be moved individually and can only be swapped "whole".
I understand that there'll probably be pathological cases where a matrix has multiple possible sorted results (i.e. same shape, but differ in the identity of the "original" rows/columns.)
So, can anyone suggest where I might find some starting points for this? An existing library/algorithm would be great, but I'll settle for knowing the name of the problem I'm trying to solve!
I doubt it's a linear algebra problem as such, and maybe there's some kind of image processing technique that's applicable.
Any other ideas aside, my initial guess is just to write a simple insertion sort on the rows, then the columns and iterate that until it stabilises (and hope that detecting the pathological cases isn't too hard.)
More details: Some more information on what I'm trying to do may help clarify. Each row represents a competitor, each column represents a challenge. Each 1 or 0 represents "success" for the competitor on a particular challenge.
By sorting the matrix so all 1s are in the top-right, I hope to then provide a ranking of the intrinsic difficulty of each challenge and a ranking of the competitors (which will take into account the difficulty of the challenges they succeeded at, not just the number of successes.)
Note on accepted answer: I've accepted Simulated Annealing as "the answer" with the caveat that this question doesn't have a right answer. It seems like a good approach, though I haven't actually managed to come up with a scoring function that works for my problem.
An Algorithm based upon simulated annealing can handle this sort of thing without too much trouble. Not great if you have small matrices which most likely hae a fixed solution, but great if your matrices get to be larger and the problem becomes more difficult.
(However, it also fails your desire that insertions can be done incrementally.)
Preliminaries
Devise a performance function that "scores" a matrix - matrices that are closer to your triangleness should get a better score than those that are less triangle-y.
Devise a set of operations that are allowed on the matrix. Your description was a little ambiguous, but if you can swap rows then one op would be SwapRows(a, b). Another could be SwapCols(a, b).
The Annealing loop
I won't give a full exposition here, but the idea is simple. You perform random transformations on the matrix using your operations. You measure how much "better" the matrix is after the operation (using the performance function before and after the operation). Then you decide whether to commit that transformation. You repeat this process a lot.
Deciding whether to commit the transform is the fun part: you need to decide whether to perform that operation or not. Toward the end of the annealing process, you only accept transformations that improved the score of the matrix. But earlier on, in a more chaotic time, you allow transformations that don't improve the score. In the beginning, the algorithm is "hot" and anything goes. Eventually, the algorithm cools and only good transforms are allowed. If you linearly cool the algorithm, then the choice of whether to accept a transformation is:
public bool ShouldAccept(double cost, double temperature, Random random) {
return Math.Exp(-cost / temperature) > random.NextDouble();
}
You should read the excellent information contained in Numerical Recipes for more information on this algorithm.
Long story short, you should learn some of these general purpose algorithms. Doing so will allow you to solve large classes of problems that are hard to solve analytically.
Scoring algorithm
This is probably the trickiest part. You will want to devise a scorer that guides the annealing process toward your goal. The scorer should be a continuous function that results in larger numbers as the matrix approaches the ideal solution.
How do you measure the "ideal solution" - triangleness? Here is a naive and easy scorer: For every point, you know whether it should be 1 or 0. Add +1 to the score if the matrix is right, -1 if it's wrong. Here's some code so I can be explicit (not tested! please review!)
int Score(Matrix m) {
var score = 0;
for (var r = 0; r < m.NumRows; r++) {
for (var c = 0; c < m.NumCols; c++) {
var val = m.At(r, c);
var shouldBe = (c >= r) ? 1 : 0;
if (val == shouldBe) {
score++;
}
else {
score--;
}
}
}
return score;
}
With this scoring algorithm, a random field of 1s and 0s will give a score of 0. An "opposite" triangle will give the most negative score, and the correct solution will give the most positive score. Diffing two scores will give you the cost.
If this scorer doesn't work for you, then you will need to "tune" it until it produces the matrices you want.
This algorithm is based on the premise that tuning this scorer is much simpler than devising the optimal algorithm for sorting the matrix.
I came up with the below algorithm, and it seems to work correctly.
Phase 1: move rows with most 1s up and columns with most 1s right.
First the rows. Sort the rows by counting their 1s. We don't care
if 2 rows have the same number of 1s.
Now the columns. Sort the cols by
counting their 1s. We don't care
if 2 cols have the same number of
1s.
Phase 2: repeat phase 1 but with extra criterions, so that we satisfy the triangular matrix morph.
Criterion for rows: if 2 rows have the same number of 1s, we move up the row that begin with fewer 0s.
Criterion for cols: if 2 cols have the same number of 1s, we move right the col that has fewer 0s at the bottom.
Example:
Phase 1
1 2 3 4 1 2 3 4 4 1 3 2
A 0 1 1 0 B 1 1 1 0 B 0 1 1 1
B 1 1 1 0 - sort rows-> A 0 1 1 0 - sort cols-> A 0 0 1 1
C 0 1 0 0 D 1 1 0 0 D 0 1 0 1
D 1 1 0 0 C 0 1 0 0 C 0 0 0 1
Phase 2
4 1 3 2 4 1 3 2
B 0 1 1 1 B 0 1 1 1
A 0 0 1 1 - sort rows-> D 0 1 0 1 - sort cols-> "completed"
D 0 1 0 1 A 0 0 1 1
C 0 0 0 1 C 0 0 0 1
Edit: it turns out that my algorithm doesn't give proper triangular matrices always.
For example:
Phase 1
1 2 3 4 1 2 3 4
A 1 0 0 0 B 0 1 1 1
B 0 1 1 1 - sort rows-> C 0 0 1 1 - sort cols-> "completed"
C 0 0 1 1 A 1 0 0 0
D 0 0 0 1 D 0 0 0 1
Phase 2
1 2 3 4 1 2 3 4 2 1 3 4
B 0 1 1 1 B 0 1 1 1 B 1 0 1 1
C 0 0 1 1 - sort rows-> C 0 0 1 1 - sort cols-> C 0 0 1 1
A 1 0 0 0 A 1 0 0 0 A 0 1 0 0
D 0 0 0 1 D 0 0 0 1 D 0 0 0 1
(no change)
(*) Perhaps a phase 3 will increase the good results. In that phase we place the rows that start with fewer 0s in the top.
Look for a 1987 paper by Anna Lubiw on "Doubly Lexical Orderings of Matrices".
There is a citation below. The ordering is not identical to what you are looking for, but is pretty close. If nothing else, you should be able to get a pretty good idea from there.
http://dl.acm.org/citation.cfm?id=33385
Here's a starting point:
Convert each row from binary bits into a number
Sort the numbers in descending order.
Then convert each row back to binary.
Basic algorithm:
Determine the row sums and store
values. Determine the column sums
and store values.
Sort the row sums in ascending order. Sort the column
sums in ascending order.
Hopefully, you should have a matrix with as close to an upper-right triangular region as possible.
Treat rows as binary numbers, with the leftmost column as the most significant bit, and sort them in descending order, top to bottom
Treat the columns as binary numbers with the bottommost row as the most significant bit and sort them in ascending order, left to right.
Repeat until you reach a fixed point. Proof that the algorithm terminates left as an excercise for the reader.

Monotonically increasing 2-d array

Give an algorithm to find a given element x (give the co-ordinates), in an n by n matrix where the rows and columns are monotonically increasing.
My thoughts:
Reduce problem set size.
In the 1st column, find the largest element <= x. We know x must be in this row or after (lower). In the last column of the matrix, find the smallest element >= x. We know x must be in this row or before. Do the same thing with the first and last rows of the matrix. We have now defined a sub-matrix such that if x is in the matrix at all, it is in this sub-matrix. Now repeat the algo on this sub-matrix... Something along these lines.
[YAAQ: Yet another arrays question.]
I think you cannot hope for more than O(N), which is attainable. (N is the width of the matrix).
Why you cannot hope for more
Imagine a matrix like this:
0 0 0 0 0 0 ... 0 0 x
0 0 0 0 0 0 ... 0 x 2
0 0 0 0 0 0 ... x 2 2
.....................
0 0 0 0 0 x ... 2 2 2
0 0 0 0 x 2 ... 2 2 2
0 0 0 x 2 2 ... 2 2 2
0 0 x 2 2 2 ... 2 2 2
0 x 2 2 2 2 ... 2 2 2
x 2 2 2 2 2 ... 2 2 2
where x is an unknown number (not the same number, ie. it might be a different one in every column). To satisfy the monotonicity of the matrix, you can place any of 0, 1, or 2 in all of the x places. So, to find if there is 1 in the matrix, you have to check all the x places, and there are N of them.
How to make it O(n)
Imagine you have to find first column indicies with number > q (a given number) for all rows. You start in the upper right corner of the matrix; if the number you see is greater, you go left; else go down. End when you are in the last row. The points where you went down are the places you search for. If any of them have the number you search for, you've found it.
This algorithm is O(n), because in each step, you either go left or down. Totally, it cannot go more than N times left and N times down. Therefore it's O(n).
Pick a corner element, one that is greatest in its row and smallest in its column (or the other way). Compare with x. Depending on the result of the comparison, you can exclude the row or the column from further search.
The new matrix has sum of dimensions decreased by 1, compared to the original one. Apply the above iteratively. After 2*n steps you end up with a 1x1 matrix.
If "the rows and columns are monotonically increasing" means that the values in each (row,col) increase such that for any row, (rowM,col1) < (rowM,col2) < ... < (rowM,colN) < (rowM+1,col1) ...
Then you can just treat it as a 1 dimensional array that is sorted from smallest to largest, and do a standard binary search, by sampling the item that is 1/2(rows * cols) fron the start, then sampling the element that is 1/4(rows * cols) behind (if the first element sampled is > x) or ahead (if the first element sampled is < x), and so forth.

Resources