4x4 Matrix (3D) animation - animation

I'm currently trying to animate a 3D object from one transform matrix to another. I know the "origin" and "target" transform matrix (4x4), and would like to get the "intermediate" matrices between to do an animation with a "progress" variable [0, 1].
For example, I could go from:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
To:
0.70711 -0.70711 0 0
0.70711 0.70711 0 0
0 0 1 0
0 0 0 1
It's a simple 45deg rotation, and I would like to be able to find intermediate matrix for a 0.5 progression for example. Of course, here it's simple, and I perfectly know how to animate it, but I which for an algorithm able to work for more complex transform matrices (one which contains translations, rotations on multiple axis and scaling).
I found some articles:
https://research.cs.wisc.edu/graphics/Courses/838-s2002/Papers/polar-decomp.pdf
https://link.springer.com/article/10.1007/s11075-016-0098-7
And even search on the chrome and firefox source code which use this kind of algorithm to animate css transform property: https://github.com/chromium/chromium/blob/2ca8c5037021c9d2ecc00b787d58a31ed8fc8bcb/cc/animation/transform_operation.cc
Sadly, I was not able to find a clear solution. No authors provide a proper algorithm (the ones provided are extremely abstract and doesn't specify their "sub-functions/algorithm"). Only a very few papers are available. And finally they don't provide any example (from one matrix to another) so it's almost impossible to verify their method.
Has any of you already faced this problem and implemented a solution ? Clear example, or pseudo code would be perfect, if someone knows the answer.
In any case, thanks in advance for all your help.

So, I finally found a solution: the w3c provides an algorithm to interpolate the transform matrices: https://www.w3.org/TR/css-transforms-2/#matrix-interpolation
It works in 3 steps:
decomposition: splits the matrix in translation, scaling, skewing, perspective and rotation (quaternion) vectors
interpolation: creates interpolated new vectors from the previous ones
recomposition: compose a transform matrix from these vectors
It may not be the faster but it does the job.

Related

The sharpening filter and edge detection kernel relationship?

From Wikipedia:
(The sharpening filter) is obtained by taking the identity kernel and subtracting an edge detection kernel
Can someone explain to be how is that the case.
As for as I understand it, to achieve a sharpening image, you take the original image and add high-contrast edges to it.
They even take the example of the matrix:
Should the matrix
0 -1 0
-1 4 -1
0 -1 0
be the edge detection kernel, according to another Wikipedia article? So, the math should be adding, not subtracting.
Anyway, I am a bit confused here and could use some help. Thank you!
As is being pointed out, one needs to distinguish between first derivatives (edges) and second derivatives (ridges, peaks).
You don't talk about it but you link to an article on "unsharp masking". That is supposed to use a difference of gaussians... which is close to a laplacian (... of gaussian). Not quite the same, but practically close enough.
That means you don't actually deal with edges but with ridges/peaks.
As for the kernels and their signs... Wikipedia is being mysterious and misleading as usual.
They subtract a laplacian because they have to. The laplacian has a negative response to peaks/ridges. Conceptually, you do add an edge/ridge detection filter... if it were one.
The kernel you see looks like an upside-down mexican hat. It's a "laplacian of gaussian". That means it's the second derivative of a gaussian kernel. As a second derivative, it responds negatively to a positive peak/ridge, e.g. of the gaussian.
Here's a plot of a gaussian and its first and second derivatives:
Since you'd expect a ridge/peak detection filter to have a positive response to a positive ridge/peak, you'd use the negated second derivative, and add that.
Look at these pictures:
1: 2: 3:
the [-1 +5 -1] kernel, i.e. identity - laplacian = identity + filter
the picture itself
[+1 -3 +1] kernel, i.e. identity + laplacian = identity - filter
You see, #3 looks blurry because some high frequencies were subtracted.

Speeding up a pre-computed seam carving algorithm

I'm writing a JS seam carving library. It works great, I can rescale a 1024x1024 image very cleanly in real time as fast as I can drag it around. It looks great! But in order to get that performance I need to pre-compute a lot of data and it takes about 10 seconds. I'm trying to remove this bottleneck and am looking for ideas here.
Seam carving works by removing the lowest energy "squiggly" line of pixels from an image. e.g. If you have a 10x4 image a horizontal seam might look like this:
........x.
.x.....x.x
x.xx..x...
....xx....
So if you resize it to 10x3 you remove all the 'X' pixels. The general idea is that the seams go around the things that look visually important to you, so instead of just normal scaling where everything gets squished, you're mostly removing things that look like whitespace, and the important elements in a picture are unaffected.
The process of calculating energy levels, removing them, and re-calculating is rather expensive, so I pre-compute it in node.js and generate a .seam file.
Each seam in the .seam file is basically: starting position, direction, direction, direction, direction, .... So for the above example you'd have:
starting position: 2
seam direction: -1 1 0 1 0 -1 -1 -1 1
This is quite compact and allows me to generate .seam files in ~60-120kb for a 1024x1024 image depending on settings.
Now, in order to get fast rendering I generate a 2D grids that represents the order in which pixels should be removed. So:
(figure A):
........1.
.1.....1.1
1.11..1...
....11....
contains 1 seam of info, then we can add a 2nd seam:
(figure B):
2...2....2
.222.2.22.
......2...
and when merged you get:
2...2...12
.122.2.1.1
1211..122.
....112...
For completeness we can add seams 3 & 4:
(figures C & D):
33.3..3...
..3.33.333
4444444444
and merge them all into:
(figure E):
2343243412
3122424141
1211331224
4434112333
You'll notice that the 2s aren't all connected in this merged version, because the merged version is based on the original pixel positions, whereas the seam is based on the pixel positions at the moment the seam is calculated which, for this 2nd seam, is a 10x3px image.
This allows the front-end renderer to basically just loop over all the pixels in an image and filter them against this grid by number of desired pixels to remove. It runs at 100fps on my computer, meaning that it's perfectly suitable for single resizes on most devices. yay!
Now the problem that I'm trying to solve:
The decoding step from seams that go -1 1 0 1 0 -1 -1 -1 1 to the pre-computed grid of which pixels to remove is slow. The basic reason for this is that whenever one seam is removed, all the seams from there forward get shifted.
The way I'm currently calculating the "shifting" is by splicing each pixel of a seam out of a 1,048,576 element array (for a 1024x1024 px image, where each index is x * height + y for horizontal seams) that stores the original pixel positions. It's veeerrrrrryyyyy slow running .splice a million times...
This seems like a weird leetcode problem, in that perhaps there's a data structure that would allow me to know "how many pixels above this one have already been excluded by a seam" so that I know the "normalized index". But... I can't figure it out, anything I can think of requires too many re-writes to make this any faster.
Or perhaps there might be a better way to encode the seam data, but using 1-2 bits per pixel of the seam is very efficient, and anything else I can come up with would make those files huge.
Thanks for taking the time to read this!
[edit and tl;dr] -- How do I efficiently merge figures A-D into figure E? Alternatively, any ideas that yield figure E efficiently, from any compressed format
If I understand correct your current algorithm is:
while there are pixels in Image:
seam = get_seam(Image)
save(seam)
Image = remove_seam_from_image(Image, seam)
You then want to construct an array containing the numbers of each seam.
To do so, you could make a 1024x1024 array where each value is the index of that element of the array (y*width+x). Call this Indices.
A modified algorithm then gives you what you want
Let Indices have the dimensions of Image and be initialized to [0, len(Image)0
Let SeamNum have the dimensions of Image and be initialized to -1
seam_num = 0
while there are pixels in Image:
seam = get_seam(Image)
Image = remove_seam_from_image(Image, seam)
Indices = remove_seam_from_image_and_write_seam_num(Indices, seam, SeamNum, seam_num)
seam_num++
remove_seam_from_image_and_write_seam_num is conceptually identical to remove_seam_from_image except that as it walks seam to remove each pixel from Indices it writes seam_num to the location in SeamNum indicated by the pixel's value in Indices.
The output is the SeamNum array you're looking for.

How could I judge if two photos are identical with some brightness difference

I have some photo pairs, the objects in some of which are different in details, while others are identical. However, even the photo pairs with identical objects have some difference in the illuminations or photo qualities (due to the unstable camera state), though the structures and details of the object are identical.
I need to distinguish those with identical objects with those with changed objects, and not impacted by the condition of light or camera quality. How could I do this?
========
Edit:
Here is a pair that has identical object:
And here is a pair that has object with different detail:
Even the first pair would have differences in light conditions or other non-content differences, but these are not expected to impact my results. How could I do this please ?
You can use global Lucas-Kanade algorithm (original paper "Lucas-Kanade 20 Years On: A Unifying Framework") for matching images without features. Richard Szeliski say it parametric (global) motion.
It returns transform matrix A: shift, scale, affine or homography. And some values in this matrix say that pictures are not identical:
Scale: A[0][0] != 1 or A[1][1] != 1
Shift horizontal and vertical: A[0][2] != 0 and A[1][2] != 0
Rotation: A[0][1] != 0 and A[1][0] != 0
They are several implementation of the algorithm - for example.
Addition: opencv_contrib has reg module with same functions.
Normalize all images to the same average brightness.

Pixel movement C++

This may or may not be a very stupid question so I do apologise, but I haven't come across this in any books or tutorials as yet. Also I guess it can apply to any language...
Assume you create a window of size: 640x480 and an object/shape inside it of size 32x32 and you're able to move the shape around the window with keyboard inputs.
Does it matter what Type (int, float...) you use to control the movement of the shape. Obviously you can not draw halfway through a pixel, but if you move the shape by 0.1f (for example with a glTranslation function) what happens as supposed to moving it by an int of 1... Does it move the rendered shape by 1/10 of a pixel or?
I hope I've explained that well enough not to be laughed at.
I only ask this because it can affect the precision of collision detection and other functions of a program or potential game.
glTranslate produces a translation by x y z . The current matrix (glMatrixMode) is multiplied by this translation matrix, with the product replacing the current matrix, as if glMultMatrix were called with the following matrix for its argument:
1 0 0 x 0 1 0 y 0 0 1 z 0 0 0 1
If the matrix mode is either GL_MODELVIEW or GL_PROJECTION, all objects drawn after a call to glTranslate are translated.
Use glPushMatrix and glPopMatrix to save and restore the untranslated coordinate system.
This meaning that glTranslate will give you a translation, to use with the current matrix, resulting in non decimal numbers. You can not use half a pixel. glTranslate receives either doubles or floats, so if you are supposed to move it 1 in x,y or z, just give the function a float 1 or double 1 as an argument.
http://www.opengl.org/sdk/docs/man2/xhtml/glTranslate.xml
The most important reason for using floats or doubles to represent positioning is the background calculation. If u keep calculating your position with ints not only do you have to probably use conversion steps to get back to ints. You will also lose data every x amount of steps
if you want to animate you sprite to have anything less than 1 pixel movement per update then YES you need to use floating point, otherwise you will get no movement. your drawing function would most likely round to the nearest integer so it's probably not relevant for that. however you can of course draw to sub pixel accuracy!

An algorithm to generate a game map from individual images

I am designing a game to be played in the browser.
Game is a space theme and I need to generate a map of the "Galaxy".
The basic idea of the map is here:
game map http://www.oglehq.com/map.png
The map is a grid, with each grid sector can contain a planet/system and each of these has links to a number of adjacent grids.
To generate the maps I figured that I would have a collection of images representing the grid elements. So in the case of the sample above, each of the squares is a separate graphic.
To create a new map I would "weave" the images together.
The map element images would have the planets and their links already on them, and I, therefore, need to stitch the map together in such a way that each image is positioned with its appropriate counterparts => so the image in the bottom corner must have images to the left and diagonal left that link up with it correctly.
How would you go about creating the code to know where to place the images?
Is there a better way than using images?
At the moment performance and/or load should not be a consideration (if I need to generate maps to have preconfigured rather than do it in real-time, I don't mind).
If it makes a difference I will be using HTML, CSS, and JavaScript and backed by a Ruby on Rails app.
There are two very nice browser-based vector / javascript-manipulable graphics packages which, together, are virtually universal: SVG and VML. They generally produce high-quality vector-based images with low bandwidth.
SVG is supported by firefox, opera, safari, and chrome - technically only part of the specification is supported, but for practical purposes you should be able to do what you need. w3schools has a good reference for learning/using svg.
VML is Microsoft's answer to SVG, and (surprise) is natively supported by IE, although SVG is not. Msdn has the best reference for vml.
Although it's more work, you could write two similar/somewhat integrated code bases for these two technologies. The real benefit is that users won't have to install anything to play your game online - it'll just work, for 99.9% of all users.
By the way, you say that you're asking for an algorithm, and I'm offering technologies (if that's the right term for SVG/VML). If you could clarify the input/output specification and perhaps what part presents the challenge (e.g. which naive implementation won't work, and why), that would clarify the question and maybe provide more focused answers.
Addendum The canvas tag is becoming more widely supported, with the notable exception of IE. This might be a cleaner way to embed graphic elements in html.
Useful canvas stuff: Opera's canvas tutorial | Mozilla's canvas tutorial | canvas-in-IE partial implementation
Hmm. If each box can only link to its 8 neighbours, then you only have 2^8 = 256 tile types. Fewer if you limit the number of possible links from any one tile.
You can encode which links are present in an image with an 8 char filename:
11000010.jpeg
Or save some bytes and convert that to decimal or hex
196.jpg
Then the code. There's lots of ways you could choose to represent the map internally. One way is to have an object for each planet. A planet object knows its own position in the grid, and the positions of its linked planets. Hence it has enough information to choose the appropriate file.
Or have a 2D array. To work out which image to show for each array item, look at the 8 neighbouring array items. If you do this, you can avoid coding for boundaries by making the array two bigger in both axes, and having an empty 'border' around the edges. This saves you checking whether a neighbouring array item is off the array.
There are two ways to represent your map.
One way is to represent it is a grid of squares, where each square can have a planet/system in it or not. You can then specify that if there is a neighbor one square away in any of the eight directions (NW, N, NE, W, E, SW, S, SE) then there is a connection to that neighbor. Note however in your sample map the center system is not connected to the system north/east of it, so perhaps this is not the representation you want. But it can be used to build the other representation
The second way is to represent each square as having eight bits, defining whether or not there is a connection to a neighbor along each of the same eight directions. Presumably if there is even one connection, then the square has a system inside it, otherwise if there are no connections it is blank.
So in your example 3x3 grid, the data would be:
Tile Connections
nw n ne w e sw s se
nw 0 0 0 0 0 0 0 0
n 0 0 0 0 1 0 1 0
ne 0 0 0 1 0 0 0 0
w 0 0 0 0 0 0 0 0
center 0 1 0 0 0 0 1 1
e 0 0 0 0 0 0 0 0
se 0 0 0 0 0 0 0 0
s 0 1 0 0 1 0 0 0
sw 1 0 0 1 0 0 0 0
You could represent these connections as an array of eight boolean values, or much more compactly as an eight bit integer.
Its then easy to use the eight boolean values (or the eight bit integer) to form the filename of the bitmap to load for that grid square. For example, your center tile using this scheme could be called "Bitmap01000011.png" (just using the boolean values), or alternatively "Bitmap43.png" (using the hexidecimal value of the eight bit integer representing that binary pattern for a shorter filename).
Since you have 256 possible combinations, you will need 256 bitmaps.
You could also reduce the data to four booleans/bits per tile, since a "north" connection for instance implies that the tile to the north has a "south" connection, but that makes selecting the bitmaps a bit harder, but you can work it out if you want.
Alternatively you could layer between zero (empty) and nine (fully connected + system circle) bitmaps together in each square. You would just need to use transparent .png's so that you could combine them together. The downside is that the browser might be slow to draw each square (especially the fully connected ones). The advantage would be less data for you to create, and less data to load from your website.
You would represent the map itself as a table, and add your bitmaps as image links to each cell as needed.
The pseudo-code to map would be:
draw_map(connection_map):
For each grid_square in connection_map
connection_data = connection_map[grid_square]
filenames = bitmap_filenames_from(connection_data)
insert_image_references_into_table(grid_square,filenames)
# For each square having one of 256 bitmaps:
bitmap_filenames_from(connection_data):
filename="Bitmap"
for each bit in connection_data:
filename += bit ? "1" : 0
return [filename,]
# For each square having zero through nine bitmaps:
bitmap_filename_from(connection_data):
# Special case - square is empty
if 1 not in connection_data:
return []
filenames=[]
for i in 0..7:
if connection_data[i]:
filenames.append("Bitmap"+i)
filenames.append("BitmapSystem");
return filenames
I would recommend using a graphics library to draw the map. If you do you won't have the above problem and you will end up with much cleaner/simpler code. Some options are SVG, Canvas, and flash/flex.
Personally I would just render the links in game, and have the cell graphics only provide a background. This gives you more flexibility, allows you to more easily increase the number of ways cells can link to each other, and generally more scalable.
Otherwise you will need to account for every possible way a cell might be linked, and this is rather a lot even if you take into account rotational and mirror symmetries.
Oh, and you could also just have a small number of tile png files with transparency on them, and overlap these using css-positioned div's to form a picture similar to your example, if that suffices.
Last time I checked, older versions of IE did not have great support for transparency in image files, though. Can anyone edit this to provide better info on transparency support?
As long as links have a maximum length that's not too long, then you don't have too many different possible images for each cell. You need to come up with an ordering on the kinds of image cells. For example, an integer where each bit indicates the presense or absence of an image component.
Bit 0 : Has planet
Bit 1 : Has line from planet going north
Bit 2 : Has line from planet going northwest
...
Bit 8 : Has line from planet going northeast
Ok, now create 512 images. Many languages have libraries that let you edit and write images to disk. If you like Ruby, try this: http://raa.ruby-lang.org/project/ruby-gd
I don't know how you plan to store your data structure describing the graph of planets and links. An adjacency matrix might make it easy to generate the map, although it's not the smallest representation by far. Then it's pretty straightforward to spit out html like (for a 2x2 grid):
<table border="0" cellspace="0" cellpadding="0">
<tr>
<td><img src="cell_X.gif"></td>
<td><img src="cell_X.gif"></td>
</tr>
<tr>
<td><img src="cell_X.gif"></td>
<td><img src="cell_X.gif"></td>
</tr>
</table>
Of course, replace each X with the appropriate number corresponding to the combination of bits describing the appearance of the cell. If you're using an adjacency matrix, putting the bits together is pretty simple--just look at the cells around the "current" cell.

Resources