swap dimensions of 4-D image array - image

I have a 4-D array of images of shape [32 32 3 1000]. In Matlab, how can I change this so that the image number (1000) is the first index instead of the last, i.e. shape [1000 32 32 3]?
I tried A = permute(A, [1000 32 32 3]);, but it says:
Error using permute ORDER contains an invalid permutation index.

Related

Debayering bayer encoded Raw Images

I have an image which I need to write a debayer for, but I can't figure out how the data is packed.
The information I have about the image:
original bpp: 64;
PNG bpp: 8;
columns: 242;
rows: 3944;
data size: 7635584 bytes.
PNG https://drive.google.com/file/d/1fr8Tg3OvhavsgYTwjJnUG3vz-kZcRpi9/view?usp=sharing
SRC data: https://drive.google.com/file/d/1O_3tfeln76faqgewAknYKJKCbDq8UjEz/view?usp=sharing
I was told that it should be BGGR, but it doesn't look like any ordinary Bayer BGGR image to me. Also I got the image with a txt file which contains this text:
Camera resolution: 1280x944
Camera type: LVDS
Could the image be compressed somehow?
I'm completely lost here, I would appreciate any help.
Bayer pattern of the image in 8bpp representation
Looks like there are 4 images, and the pixels are stored in some kind of "packed 12" format.
Please note that "reverse engineering" the format is challenging, and the solution probably has few mistakes.
The 4 images are stored in steps of 4 rows:
aaaaaaaaaaaaa
bbbbbbbbbbbbb
ccccccccccccc
ddddddddddddd
aaaaaaaaaaaaa
bbbbbbbbbbbbb
ccccccccccccc
ddddddddddddd
...
aaa... marks the first image.
bbb... marks the second image.
ccc... marks the third image.
ddd... marks the fourth image.
There are about 168 rows at the top that we have to ignore.
Getting 1280 pixels out of 1936 bytes in each row:
Each row has 16 bytes we have to ignore.
Out of 1936 bytes, only 1920 bytes are relevant (assume we have to remove 8 bytes from each side).
The 1920 bytes represents 1280 pixels.
Every 2 pixels are stored in 3 bytes (every pixel is 12 bits).
The two 12 bits elements in 3 bytes are packed as follows:
8 MSB bits 8 MSB bits 4 LSB and 4 LSB bits
######## ######## #### ####
It's hard to tell how the LSB bits are divided between the two pixels (the LSB it mainly "noise").
After unpacking the pixels, and extracting one image out of the 4, the format looks like GRBG Bayer pattern (by changing the size of the margins we may get BGGR).
MATLAB code sample for extracting one image:
f = fopen('test.img', 'r'); % Open file (as binary file) for reading
T = fread(f, [1936, 168], 'uint8')'; % Read the first 168
I = fread(f, [1936, 944*4], 'uint8')'; % Read 944*4 rows
fclose(f);
% Convert from packed 12 to uint16 (also skip rows in steps of 4, and ignore 8 bytes from each side):
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
A = uint16(I(1:4:end, 8+1:3:end-8)); % MSB of even pixels (convert to uint16)
B = uint16(I(1:4:end, 8+2:3:end-8)); % MSB of odd pixels (convert to uint16)
C = uint16(I(1:4:end, 8+3:3:end-8)); % 4 bits are LSB of even pixels and 4 bits are LSB of odd pixels
I1 = A*16 + bitshift(C, -4); % Add the 4 LSB bits to the even pixels (may be a wrong)
I2 = B*16 + bitand(C, 15); % Add the other 4 LSB bits to the even pixels (may be a wrong)
I = zeros(size(I1, 1), size(I1, 2)*2, 'uint16'); % Allocate 1280x944 uint16 elements.
I(:, 1:2:end) = I1; % Copy even pixels
I(:, 2:2:end) = I2; % Copy odd pixels
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
J = demosaic(I*16, 'grbg'); % Apply demosaic (multiply by 16, because MATLAB assume 12 bits are in the upper bits).
figure;imshow(lin2rgb(J));impixelinfo % Show the output image (lin2rgb applies gamma correction).
Result (converted to 8 bit):

Get X and Y positions of a Pixel given the HEIGHT , WIDTH and index of a pixel in FLATTENED array representing the image

Imagine you have this image
[[1 2 3] [4 5 6] [7 8 90]]
You flatten it into this format -
[1 2 3 4 5 6 7 8 90]
Now you are given the index of Pixel 90 to be 8.
How can you find that pixel 90 is in Row 3 and column 3?
OpenCL, similarly to other programming languages like C, C++, Java and so on, uses zero based indexing. So in this terms you are looking for Row 2 and Column 2.
Now to calculate which row that is we need to divide index position 8 by number of columns:
8 / 3 = 2
So in zero based indexing that is a second row.
Now to calculate which column that is we use modulo operator:
8 % 3 = 2
In the 2D case, a point (x,y) in a rectangle with the dimensions (sx,sy) can be represented in 1D space by a linear index n as follows:
n = x+y*sx
Converting the 1D index n back to (x,y) works as follows:
x = n%sx
y = n/sx
For the 3D case, a point (x,y,z) in the a box with dimensions (sx,sy,sz) can be represented in 1D as
n = x+(y+z*sy)*sx
and converted back to (x,y,z) like this:
z = n/(sx*sy);
temp = n%(sx*sy);
y = temp/sx;
x = temp%sx;
Note that "/" here means integer division (always rounds down the result) and "%" is the modulo operator.

what does write_back_intra_pred_mode() function from libavcodec do?

Bellow is a function from ffmpeg defined in libavcodec/h264.h:
static av_always_inline void write_back_intra_pred_mode(const H264Context *h,
H264SliceContext *sl)
{
int8_t *i4x4 = sl->intra4x4_pred_mode + h->mb2br_xy[sl->mb_xy];
int8_t *i4x4_cache = sl->intra4x4_pred_mode_cache;
AV_COPY32(i4x4, i4x4_cache + 4 + 8 * 4);
i4x4[4] = i4x4_cache[7 + 8 * 3];
i4x4[5] = i4x4_cache[7 + 8 * 2];
i4x4[6] = i4x4_cache[7 + 8 * 1];
}
What does this function do?
Can you explain the function body too?
The function updates a frame-wide cache of intra prediction modes (at 4x4 block resolution), located in the variable sl->intra4x4_pred_mode per slice or h->intra4x4_pred_mode for the whole frame. This cache is later used in h264_mvpred.h, specifically the function fill_decode_caches() around line 510-528, to set the contextual (left/above neighbour) block info for decoding of subsequent intra4x4 blocks located below or to the right of the current set of 4x4 blocks.
[edit]
OK, some more on the design of variables here. sl->mb_xy is sl->mb_x + sl->mb_y * mb_stride. Think of mb_stride as a padded version of the width (in mbs) of the image. So mb_xy is the raster-ordered index of the current macroblock. Some variables are indexed in block (4x4) instead of macroblock (16x16) resolution, so to convert between units, you use mb2br_xy. That should explain the layout of the frame-wide cache (intra4x4_pred_mode/i4x4).
Now, the local per-macroblock cache, it contains 4x4 entries for the current macroblock, plus the left/above edge entries, so 5x5. However, multiplying something by 5 takes 2 registers in a lea instruction, whereas 8 only takes one, so we prefer 8 (more generally, we prefer powers of 2). So the resolution becomes 8(width)x5(height) for a total of 40 entries, of which the left 3 in each row are unused, the fourth is the left edge, and the right 4 are the actual entries of the current macroblock. The top row is above, and the 4 rows below it are the actual entries of the current macroblock.
Because of that, the backcopy from cache to frame-wide cache uses 8 as stride, 4/3/2/1 as indices for y=3/2/1/0 and 4-7 as indices for x=0-3. In the backcopy, you'll notice we don't actually copy the whole 4x4 block, but just the last line (AVCOPY32 copies 4 entries, offset=4[y=3]+8[stride]*4[x=0]) and the right-most entry for each of the other lines (7[x=3]+8[stride]*1-3[y=0-2]). That's because only the right/bottom edges are interesting as top/left context for future macroblock decoding, so the rest is unnecessary.
So as illustration, the layout of i4x4_pred_mode_cache is:
x x x TL T0 T1 T2 T3
x x x L0 00 01 02 03
x x x L1 10 11 12 13
x x x L2 20 21 22 23
x x x L3 30 31 32 33
x means unused, TL is topleft, Ln is left[n], Tn is top[n] and the numbered entries ab are y=a,x=b for 4x4 blocks in a 16x16 macroblock.
You may be wondering why TL is placed in [3] instead of [0], i.e. why isn't it TL T0-3 x x x (and so on for the remaining lines); the reason for that is that in the frame-wide and block-local cache, T0-3 (and 00-03, 10-13, 20-23, 30-33) are 4-byte aligned sets of 4 modes, which means that copying 4 entries in a single instruction (COPY32) is significantly faster on most machines. If we did an unaligned copy, this would add additional overhead and slow down decoding (slightly).

Minimum bits required on a chess board

This is an interview question:
What is the minimum representation in bits of two positions on an 8x8 chessboard?
I found the answer http://www.careercup.com/question?id=4981467352399872
But I am unable to understand what the author is trying to convey when she says:
You can represent 2^n values with n bits. However, you can represent
2^n + 2^(n-1) + 2^(n-2) + ... 1 = 2^(n+1) - 1 values with atmost n
bits. So you can represent 2^11 - 1 = 2047 different values using just
10 bits.
I am not seeking an explanation of what the author is suggesting in his answer, but I am more interested in solving the problem itself. As far as I can think, since there are 64C2 = 2016 ways to represent two pieces on an 8x8 board, the minimum number of bits required should be 11. But someone suggested that one can use just 10 bits to represent the board. How?
The author is saying that you can represent the positions using 5, 6, 7, 8, 9 and 10 bit values.
In binary 2016 is 11111100000 (1024 + 512+ 256 + 128 + 64 + 32)
5 bits (00000 - 11111) represent 32 positions
6 bits (000000 - 111111) represent 64 positions
7 bits (0000000 - 1111111) represent 128 positions
8 bits (00000000 - 11111111) represent 256 positions
9 bits (000000000 - 111111111) represent 512 positions
10 bits (0000000000 - 1111111111) represent 1024 positions
A total of 2016 positions.
This could be implemented in languages with bit collections, e.g. C++ bitset, which has a size function to get the length.
Here's an example for a 2x2 board which will hopefully explain this better.
For a 2x2 board, there are 4C2 (6) positions
.x x. .. xx .x x.
.x x. xx .. x. .x
so you could use 3 bits 000, 001, 010, 011, 100, 101 and 110
But 6 is binary 110 (4+2) so you could use 1 bit (0-1) for 2 of the positions and 2 bits (00, 01, 10, 11) for the remaining 4. So the positions are:
0, 1, 00, 01, 10, 11.
To answer the question and receive an integer solution you must evaluation the following equation:
bits = ceil(log2(combination(64,2)));
bits = ceil(log2(64!/(62!*2!)));
bits = ceil(log2(64*63/2));
bits = ceil(log2(32*63));
bits = ceil(log2(32)+log2(63));
bits = ceil(5+log2(63));
bits = ceil(5+5.97728);
bits = 11;
Deriving the equation requires a working knowledge of combinatorics.
combination(64,2) represents the number of ways to choose 2 of 64 possible unique spaces.

Confusion regarding genetic algorithms

My books(Artificial Intelligence A modern approach) says that Genetic algorithms begin with a set of k randomly generated states, called population. Each state is represented as a string over a finite alphabet- most commonly, a string of 0s and 1s. For eg, an 8-queens state must specify the positions of 8 queens, each in a column of 8 squares, and so requires 8 * log(2)8 = 24 bits. Alternatively the state could be represented as 8 digits, each in range from 1 to 8.
[ http://en.wikipedia.org/wiki/Eight_queens_puzzle ]
I don't understand the expression 8 * log(2)8 = 24 bits , why log2 ^ 8? And what are these 24 bits supposed to be for?
If we take first example on the wikipedia page, the solution can be encoded as [2,4,6,8,3,1,7,5] : the first digit gives the row number for the queen in column A, the second for the queen in column B and so on. Now instead of starting the row numbering at 1, we will start at 0. The solution is then encoded with [1,3,5,7,0,6,4]. Any position can be encoded such way.
We have only digits between 0 and 7, if we write them in binary 3 bit (=log2(8)) are enough :
000 -> 0
001 -> 1
...
110 -> 6
111 -> 7
A position can be encoded using 8 times 3 digits, e.g. from [1,3,5,7,2,0,6,4] we get [001,011,101,111,010,000,110,100] or more briefly 001011101111010000110100 : 24 bits.
In the other way, the bitstring 000010001011100101111110 decodes as 000.010.001.011.100.101.111.110 then [0,2,1,3,4,5,7,6] and gives [1,3,2,4,5,8,7] : queen in column A is on row 1, queen in column B is on row 3, etc.
The number of bits needed to store the possible squares (8 possibilities 0-7) is log(2)8. Note that 111 in binary is 7 in decimal. You have to specify the square for 8 columns, so you need 3 bits 8 times

Resources