MPEG1 motion estimation / compensation - mpeg

I seen the following explanantion for motion estimation / compensation for MPEG 1 and was just wondering is it correct:
Why dont we just code the raw difference between the current block and the reference block?
Because the numbers for the residual are usually going to be a lot smaller. For example, say an object accelerates across the image. The x position in 11 frames was the following numbers.
12 16 20 25 31 38 48 59 72 84 96
The raw differences would be
x 4 4 5 6 7 10 11 13 12 12
So the predicted values would be
x x 20 24 30 37 45 58 70 85 96
So the residuals are
x x 0 1 1 1 3 1 2 -1 0
Is the prediction for frame[i+1] = (frame[i] - frame[i-1]) + frame[i] i.e add the motion vector of previous two reference frames to the most recent reference frame? Then we encode the prediction residual, which is actual captured shot of frame[i+1] - prediction frame[i+1] and send this to the decoder?

MPEG1 decoding (motion compensation) works like this:
The predictions and motion vectors turn a reference frame into the next (current) frame. Here's how you would calculate each pixel of the new frame:
For each macroblock, you have a set of predicted values (differences from reference frame). The motion vector is a value relative to the reference frame.
// Each luma and chroma block are 8x8 pixels
for(y=0; y<8; y++)
{
for (x=0; x<8; x++)
{
NewPixel(x,y) = Prediction(x,y) + RefPixel(x+motion_vector_x, y+motion_vector_y)
}
}
With MPEG1 you have I, P and B frames. I frames are completely intra coded (e.g. similar to JPEG), with no references to other frames. P frames are coded with predictions from the previous frame (either I or P). B frames are coded with predictions from both directions (previous and next frame). The B frame processing makes the video player a little more complicated because it may reference the next frame, therefore each frame has a sequence number and B frames will cause the sequence to be non-linear. In other words, your video decoder needs to hold on to potentially 3 frames while decoding a stream (previous, current and next).

Related

Issue with Lua Random Number Generation in Loops

I have a script for a rock-paper-scissors (RPS) game I am making, and I am trying to generate a random number to determine a series of RPS moves. The logic is as follows:
moves = {}
table.insert(moves, 'rock')
table.insert(moves, 'paper')
table.insert(moves, 'scissors')
currentMoves = {}
math.randomseed(playdate.getSecondsSinceEpoch()) -- game SDK library function that returns seconds since midnight January 1 2000 UTC to initialize new random sequence
math.random(); math.random(); math.random();
-- generates a list of rps moves to display on the screen
function generateMoves(maxMovesLength) -- i set maxMovesLength to 3
currentMoves = {}
for i = 1, maxMovesLength, 1 do
randomNumber = math.random(1, 3)
otherRandomNumber = math.random(1,99) -- even with this, based on the presumption 1~33 is rock, 34~66 is paper, 67~99 is scissors, I get a suspicious number of 3 of the same move)
print(otherRandomNumber)
table.insert(currentMoves, moves[randomNumber])
end
return currentMoves
end
However, I noticed that using the Lua math.random() function, I seem to be getting a statistically unlikely number of series of 3 of the same RPS move. The likelihood of getting 3 of the same move (rock rock rock, paper paper paper, or scissors scissors scissors) should be about 11%, but I am getting sets of 3 much more often.
For example, here is what I got when I set maxMovesLength to 15:
36 -paper
41 -paper
60 -paper
22 -rock
1 -rock
2 -rock
91 -scissors
36 -paper
69 -scissors
76 -scissors
35 -paper
18 -rock
22 -rock
22 -rock
92 -scissors
From this sample, it seems that sets of 3 of a kind are happening much more often than they should be. There are 13 series of 3 moves in this list of 15 moves, and among those 3/13 are three of a kind which would be a probability of about 23%, higher than the expected statistical probability of 11%.
Is this just a flaw in the Lua math library?
It seems that when setting maxMovesLength to a very high number this issue doesn't exist, so I will just call math.random() a bunch of times before I actually use it in my game (more than the 3 times I currently do under randomseed().

what does write_back_intra_pred_mode() function from libavcodec do?

Bellow is a function from ffmpeg defined in libavcodec/h264.h:
static av_always_inline void write_back_intra_pred_mode(const H264Context *h,
H264SliceContext *sl)
{
int8_t *i4x4 = sl->intra4x4_pred_mode + h->mb2br_xy[sl->mb_xy];
int8_t *i4x4_cache = sl->intra4x4_pred_mode_cache;
AV_COPY32(i4x4, i4x4_cache + 4 + 8 * 4);
i4x4[4] = i4x4_cache[7 + 8 * 3];
i4x4[5] = i4x4_cache[7 + 8 * 2];
i4x4[6] = i4x4_cache[7 + 8 * 1];
}
What does this function do?
Can you explain the function body too?
The function updates a frame-wide cache of intra prediction modes (at 4x4 block resolution), located in the variable sl->intra4x4_pred_mode per slice or h->intra4x4_pred_mode for the whole frame. This cache is later used in h264_mvpred.h, specifically the function fill_decode_caches() around line 510-528, to set the contextual (left/above neighbour) block info for decoding of subsequent intra4x4 blocks located below or to the right of the current set of 4x4 blocks.
[edit]
OK, some more on the design of variables here. sl->mb_xy is sl->mb_x + sl->mb_y * mb_stride. Think of mb_stride as a padded version of the width (in mbs) of the image. So mb_xy is the raster-ordered index of the current macroblock. Some variables are indexed in block (4x4) instead of macroblock (16x16) resolution, so to convert between units, you use mb2br_xy. That should explain the layout of the frame-wide cache (intra4x4_pred_mode/i4x4).
Now, the local per-macroblock cache, it contains 4x4 entries for the current macroblock, plus the left/above edge entries, so 5x5. However, multiplying something by 5 takes 2 registers in a lea instruction, whereas 8 only takes one, so we prefer 8 (more generally, we prefer powers of 2). So the resolution becomes 8(width)x5(height) for a total of 40 entries, of which the left 3 in each row are unused, the fourth is the left edge, and the right 4 are the actual entries of the current macroblock. The top row is above, and the 4 rows below it are the actual entries of the current macroblock.
Because of that, the backcopy from cache to frame-wide cache uses 8 as stride, 4/3/2/1 as indices for y=3/2/1/0 and 4-7 as indices for x=0-3. In the backcopy, you'll notice we don't actually copy the whole 4x4 block, but just the last line (AVCOPY32 copies 4 entries, offset=4[y=3]+8[stride]*4[x=0]) and the right-most entry for each of the other lines (7[x=3]+8[stride]*1-3[y=0-2]). That's because only the right/bottom edges are interesting as top/left context for future macroblock decoding, so the rest is unnecessary.
So as illustration, the layout of i4x4_pred_mode_cache is:
x x x TL T0 T1 T2 T3
x x x L0 00 01 02 03
x x x L1 10 11 12 13
x x x L2 20 21 22 23
x x x L3 30 31 32 33
x means unused, TL is topleft, Ln is left[n], Tn is top[n] and the numbered entries ab are y=a,x=b for 4x4 blocks in a 16x16 macroblock.
You may be wondering why TL is placed in [3] instead of [0], i.e. why isn't it TL T0-3 x x x (and so on for the remaining lines); the reason for that is that in the frame-wide and block-local cache, T0-3 (and 00-03, 10-13, 20-23, 30-33) are 4-byte aligned sets of 4 modes, which means that copying 4 entries in a single instruction (COPY32) is significantly faster on most machines. If we did an unaligned copy, this would add additional overhead and slow down decoding (slightly).

Slideshow Algorithm

I need to design an algorithm for a photo slideshow that is constantly receiving new images, so that the oldest pictures appear less in the presentation, until a balance between the old photos and those that have appeared.
I have thought that every image could have a counter of the number of times they have been shown and prioritize those pictures with the lowest value in that variable.
Any other ideas or solutions would be well received.
You can achieve an overall near-uniform distribution (each image appears about the same number of times for the long run), but I wouldn't recommend doing it. Images that were available early would appear very very rarely later on. A better user experience would be to simply choose a random image from all the available images at each step.
If you still want near-uniform distribution for the long run, you should set the probability for any image based on the number of times it appeared so far. For example:
p(i) = 1 - count(i) / (max_count() + epsilon)
Here is a simple R code that simulates such process. 37 random images are selected before a new image becomes available. This process is repeated 3000 times:
h <- 3000 # total images
eps <- 0.001
t <- integer(length=h) # t[i]: no. of instances of value i in r
r <- c() # proceded vector of indexes of images
m <- 0 # highest number of appearances for an image
for (i in 1:h)
for (j in 1:37) # select 37 random images in range 1..i
{
v <- sample(1:i, 1, prob=1-t[1:i]/(m+eps)) # select image i with weight 1-t[i]/(m+eps)
r <- c(r, v) # add to output vector
t[v] <- t[v]+1 # update appearances count
m <- max(m, t[v]) # update highest number of appearances
}
plot(table(r))
The output plot shows the number of times each image appeared:
epsilon = 0.001:
epsilon = 0.0001:
If we look, for example at the indexes in the output vector in which, say, image #3 was selected:
> which(r==3)
[1] 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
[21] 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 1189 34767 39377
[41] 70259
Note that if epsilon is very small, the sequence will seem less random (newer images are much preferred). For the long run however, any epsilon will do.
Instead of a view counter, you could also try basing your algorithm on the timestamp that images were uploaded.

MATLAB Greyscale 12 bit to 8 bit

I'm trying to create an algorithm to convert a greyscale from 12 bit to 8 bit.
I got a greyscale like this one:
The scale is represented in a Matrix. The problem is, that the simple multiplication with 1/16 destroys the first grey-columns.
Here the Codeexample:
in =[
1 1 1 3 3 3 15 15 15 63 63 63;
1 1 1 3 3 3 15 15 15 63 63 63;
1 1 1 3 3 3 15 15 15 63 63 63;
1 1 1 3 3 3 15 15 15 63 63 63
];
[zeilen spalten] = size(in);
eight = round(in/16);
imshow(uint8(eight));
Destroy mean, that the New Columns are Black now
Simply rescale the image so that you divide every single element by the maximum possible intensity that corresponds to a 12-bit (or 2^12 - 1 = 4095) unsigned integer and then multiply by the maximum possible intensity that corresponds to an 8-bit unsigned integer (or 2^8 - 1 = 255).
Therefore:
out = uint8((255.0/4095.0)*(double(in)));
You need to cast to double to ensure that you maintain floating point precision when performing this scaling, and then cast to uint8 so that the image type is ensured to be 8-bit. You have cleverly deduced that this scaling factor is roughly (1/16) (since 255.0/4095.0 ~ 1/16). However, the output of your test image will have its first 6 columns to surely be zero because intensities of 1 and 3 for a 12-bit image are just too small to be represented in its equivalent 8-bit form, which is why it gets rounded down to 0. If you think about it, for every 16 intensity increase that you have for your 12-bit image, this registers as an equivalent single intensity increase for an 8-bit image, or:
12-bit --> 8-bit
0 --> 0
15 --> 1
31 --> 2
47 --> 3
63 --> 4
... --> ...
4095 --> 255
Because your values of 1 and 3 are not high enough to get to the next level, these get rounded down to 0. However, your values of 15 get mapped to 1, and the values of 63 get mapped to 4, which is what we expect when you run the above code on your test input.

How to optimize the layout of rectangles

I have a dynamic number of equally proportioned and sized rectangular objects that I want to optimally display on the screen. I can resize the objects but need to maintain proportion.
I know what the screen dimensions are.
How can I calculate the optimal number of rows and columns that I will need to divide the screen in to and what size I will need to scale the objects to?
Thanks,
Jamie.
Assuming that all rectangles have the same dimensions and orientation and that such should not be changed.
Let's play!
// Proportion of the screen
// w,h width and height of your rectangles
// W,H width and height of the screen
// N number of your rectangles that you would like to fit in
// ratio
r = (w*H) / (h*W)
// This ratio is important since we can define the following relationship
// nbRows and nbColumns are what you are looking for
// nbColumns = nbRows * r (there will be problems of integers)
// we are looking for the minimum values of nbRows and nbColumns such that
// N <= nbRows * nbColumns = (nbRows ^ 2) * r
nbRows = ceil ( sqrt ( N / r ) ) // r is positive...
nbColumns = ceil ( N / nbRows )
I hope I got my maths right, but that cannot be far from what you are looking for ;)
EDIT:
there is not much difference between having a ratio and the width and height...
// If ratio = w/h
r = ratio * (H/W)
// If ratio = h/w
r = H / (W * ratio)
And then you're back using 'r' to find out how much rows and columns use.
Jamie, I interpreted "optimal number of rows and columns" to mean "how many rows and columns will provide the largest rectangles, consistent with the required proportions and screen size". Here's a simple approach for that interpretation.
Each possible choice (number of rows and columns of rectangles) results in a maximum possible size of rectangle for the specified proportions. Looping over the possible choices and computing the resulting size implements a simple linear search over the space of possible solutions. Here's a bit of code that does that, using an example screen of 480 x 640 and rectangles in a 3 x 5 proportion.
def min (a, b)
a < b ? a : b
end
screenh, screenw = 480, 640
recth, rectw = 3.0, 5.0
ratio = recth / rectw
puts ratio
nrect = 14
(1..nrect).each do |nhigh|
nwide = ((nrect + nhigh - 1) / nhigh).truncate
maxh, maxw = (screenh / nhigh).truncate, (screenw / nwide).truncate
relh, relw = (maxw * ratio).truncate, (maxh / ratio).truncate
acth, actw = min(maxh, relh), min(maxw, relw)
area = acth * actw
puts ([nhigh, nwide, maxh, maxw, relh, relw, acth, actw, area].join("\t"))
end
Running that code provides the following trace:
1 14 480 45 27 800 27 45 1215
2 7 240 91 54 400 54 91 4914
3 5 160 128 76 266 76 128 9728
4 4 120 160 96 200 96 160 15360
5 3 96 213 127 160 96 160 15360
6 3 80 213 127 133 80 133 10640
7 2 68 320 192 113 68 113 7684
8 2 60 320 192 100 60 100 6000
9 2 53 320 192 88 53 88 4664
10 2 48 320 192 80 48 80 3840
11 2 43 320 192 71 43 71 3053
12 2 40 320 192 66 40 66 2640
13 2 36 320 192 60 36 60 2160
14 1 34 640 384 56 34 56 1904
From this, it's clear that either a 4x4 or 5x3 layout will produce the largest rectangles. It's also clear that the rectangle size (as a function of row count) is worst (smallest) at the extremes and best (largest) at an intermediate point. Assuming that the number of rectangles is modest, you could simply code the calculation above in your language of choice, but bail out as soon as the resulting area starts to decrease after rising to a maximum.
That's a quick and dirty (but, I hope, fairly obvious) solution. If the number of rectangles became large enough to bother, you could tweak for performance in a variety of ways:
use a more sophisticated search algorithm (partition the space and recursively search the best segment),
if the number of rectangles is growing during the program, keep the previous result and only search nearby solutions,
apply a bit of calculus to get a faster, precise, but less obvious formula.
This is almost exactly like kenneth's question here on SO. He also wrote it up on his blog.
If you scale the proportions in one dimension so that you are packing squares, it becomes the same problem.
One way I like to do that is to use the square root of the area:
Let
r = number of rectangles
w = width of display
h = height of display
Then,
A = (w * h) / r is the area per rectangle
and
L = sqrt(A) is the base length of each rectangle.
If they are not square, then just multiply accordingly to keep the same ratio.
Another way to do a similar thing is to just take the square root of the number of rectangles. That'll give you one dimension of your grid (i.e. the number of columns):
C = sqrt(n) is the number of columns in your grid
and
R = n / C is the number of rows.
Note that one of these will have to ceiling and the other floor otherwise you will truncate numbers and might miss a row.
Your mention of rows and columns suggests that you envisaged arranging the rectangles in a grid, possibly with a few spaces (e.g. some of the bottom row) unfilled. Assuming this is the case:
Suppose you scale the objects such that (an as-yet unknown number) n of them fit across the screen. Then
objectScale=screenWidth/(n*objectWidth)
Now suppose there are N objects, so there will be
nRows = ceil(N/n)
rows of objects (where ceil is the Ceiling function), which will take up
nRows*objectScale*objectHeight
of vertical height. We need to find n, and want to choose the smallest n such that this distance is smaller than screenHeight.
A simple mathematical expression for n is made trickier by the presence of the ceiling function. If the number of columns is going to be fairly small, probably the easiest way to find n is just to loop through increasing n until the inequality is satisfied.
Edit: We can start the loop with the upper bound of
floor(sqrt(N*objectHeight*screenWidth/(screenHeight*objectWidth)))
for n, and work down: the solution is then found in O(sqrt(N)). An O(1) solution is to assume that
nRows = N/n + 1
or to take
n=ceil(sqrt(N*objectHeight*screenWidth/(screenHeight*objectWidth)))
(the solution of Matthieu M.) but these have the disadvantage that the value of n may not be optimal.
Border cases occur when N=0, and when N=1 and the aspect ratio of the objects is such that objectHeight/objectWidth > screenHeight/screenWidth - both of these are easy to deal with.

Resources