I modified some Labview code I found online to use in my program. It works, I understand nearly all of it, but there's one section that confuses me. This is the program:
This program takes 2 images, subtracts them, and returns the picture plus a percentage difference. What I understand is it takes the pictures, subtracts them, converts the subtracted image into an array of colored pixels, then math happens, and the pixels are compared to the threshold. It adds a 1 for every pixel greater than the threshold, divides it by the image size, and out comes a percentage. The part I don't understand is the math part, the whole quotient and remainder section with a "random" 256. Because I don't understand how to get these numbers, I have a percentage, but I don't understand what they mean. Here's a picture of the front panel with 2 different tests.
In the top one, I have a percentage of 15, and the bottom a percentage of 96. This tells me that the bottom one is "96 percent different". But is there anyway to make sure this is accurate?
The other question I have is threshold, as I don't know exactly what that does either. Like if I change the threshold on the bottom image to 30, my percentage is 8%, with the same picture.
I'm sure once I understand the quotient/remainder part, it'll all make sense, but I can't seem to get it. Thank you for your help.
My best guess is that someone tried to characterize difference between 2 images with a single number. The remainder-quotient part is a "poor man" approach to split each 2D array element of the difference into 2 lower bytes (2 remainders) and the upper 2 byte word. Then lower 2 bytes of the difference are summed and the result is added to the upper 2 bytes (as a word). Maybe 3 different bytes each represented different channel of the camera (e.g. RGB color)?
Then, the value is compared against the threshold, and number of pixels above the threshold are calculated. This number is divided by the total number of pixels to calculate the %% difference. So result is a %% of pixels, which differ from the master image by the threshold.
E.g. if certain pixel of your image was 0x00112233 and corresponding master image pixel had a value of 0x00011122, then the number compared to the threshold is (0x11 - 0x01) + (0x22 - 0x11) + (0x33 - 0x22) = 0x10 + 0x11 + 0x11 = 0x32 = 50 decimal.
Whether this is the best possible comparison/difference criteria is the question well outside of this topic.
Related
I have a lot of (millions) of polygons from openstreetmap-data with mostly (more than 99%) exactly four coordinates representing houses.
Example
I currently save the four coordinates for each house explicitly as Tuple of floats (Latitude and Longitude), hence taking 32 bytes of memory.
Is there a way to store this information in a compressed way (fewer than 32 byte) since the four coordinates only differ very few in the last decimals?
If your map patch is not too large, you can store relative coordinates against some base point (for example, bottom left corner). Get these differences, norm them by map size like this:
uint16_diff = (uint16) 65535 * (lat - latbottom) / (lattop - latbottom)
This approach allows to store 16-bit integer values.
For rectangles (you can store them in separate list) there is a way to store 5 16-bit values instead of 8 values - coordinates of left top corner, width, height, and angle of rotation (there might be another sets of data, for example, including the second corner)
Combining both these methods, one might get data size loss upto 3.2 times
As #MBo said, you can store one corner of each house and compress the other three corners as relative to the first corner.
Also, if buildings are so similar you can set a "dictionary" of buildings. For each building you store its index in the dictionary and some feature, like its first corner coordinates and rotation.
You are giving no information on the resolution you want to keep.
Assuming 1 m accuracy is enough, 24 bits can cover up to 16000 km. Then 8 bits should also be enough to represent the size information (up to 256 m).
This would make 8 bytes per house.
More aggressive compression for instance with Huffman coding will probably not work on the locations (relatively uniform distribution); a little better on the sizes, but the benefit is marginal.
I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.
I wanted to use CreateBitmapFromMemory method, and its requires the stride as an input. and this stride confused me.
cbStride [in]
Type: UINT
The number of bytes between successive scanlines in pbBuffer.
and here it says: stride = image width + padding
Why do we need these extra space(padding). why dont just image width.
This is how calculate the stride right?
lWidthByte = (lWidth * bits + 7) / 8;
lWidth→pixel count
bits→bits per pixel
I suppuse deviding by 8 is for convert to byte. but,
What is (+7) doing here?
and finally
cbStride =((lWidthByte + 3) / 4) * 4;
Whats going on here? (why not cbStride = lWidthByte)
Please help me to clear these.
The use of padding is due to various (old and current) memory layout optimizations.
Having image pixel-rows have a length (in bytes) that is an integral multiple of 4/8/16 bytes can significantly simplify and optimized many image based operations. The reason is that these sizes allow proper storing and parallel pixel processing in the CPU registers, e.g. with SSE/MMX, without mixing pixels from two consecutive rows.
Without padding, extra code has to be inserted to handle partial WORD/DWORD pixel data since two consecutive pixels in memory might refer to one pixel on the right of one row and the left pixel on the next row.
If your image is a single channel image with 8 bit depth, i.e. grayscale in the range [0,255], then the stride would be the image width rounded up to the nearest multiple of 4 or 8 bytes. Note that the stride always specified in bytes even when a pixel may have more than one byte depth.
For images with more channels and/or more than one byte per pixel/channel, the stride would be the image width in bytes rounded up to the nearest multiple of 4 or 8 bytes.
The +7 and similar arithmetic examples you gave just make sure that the numbers are correctly rounded since integer math truncates the non-integer component of the division.
Just insert some numbers and see how it works. Don't forget to truncate (floor()) the intermediate division results.
I applied some operations on a grayscale image and now I am getting new values but the problem is the intensity values are now less than 0, between 0 and 255 and greater than 255. For values between [0-255] there is no problem but for intensity values < 0 and intensity values > 255 there is a problem as these values cannot occur in a grayscale image.
Therefore, I need to normalize the values so that all the values whether they are negative or greater than 255 or whatever other values are, comes in the range 0 to 255 so that the image can be displayed.
For that I know two methods:
Method #1
newImg = ((255-0)/(max(img(:))-min(img(:))))*(img-min(img(:)))
where min(img(:)) and max(img(:)) are the minimum and maximum values obtained after doing some operations on the input image img. The min can be less than 0 and the max can be greater than 255.
Method #2
I just make all the values less than 0 as 0 and all the values greater than 255 as 255, so:
img(img < 0) = 0;
img(img > 255) = 255;
I tried to use both the methods but I am getting good results using second method but not with the first one. Can anyone of you please tell me what the problem is?
That totally depends on the image content itself. Both of those methods are valid to ensure that the range of values is between [0,255]. However, before you decide on what method you're using, you need to ask yourself the following questions:
Question #1 - What is my image?
The first question you need to ask is what does your image represent? If this is the output of an edge detector for example, the method you choose will depend on the dynamic range of the values seen in the result (more below in Question #2). For example, it's preferable that you use the second method if there is a good distribution of pixels and a low variance. However, if the dynamic range is a bit smaller, then you'll want to use the first method to push up the contrast of your result.
If the output is an image subtraction, then it's preferable to use the first method because you want to visualize the exact differences between pixels. Truncating the result will not give you a good visualization of the differences.
Question #2 - What's the dynamic range of the values?
Another thing you need to take note of is how wide the dynamic range of the minimum and maximum values are. For example, if the minimum and maximum are not that far off from the limits of [0,255], then you can use the first or second method and you won't notice much of a difference. However, if your values are within a small range that is within [0,255], then doing the first method will increase contrast whereas the second method won't do anything. If it is your goal to also increase the contrast of your image and if the intensities are within the valid [0,255] range, then you should do the first method.
However, if you have minimum and maximum values that are quite far away from the [0,255] range, like min=-50 and max=350, then doing the first method won't bode very well - especially if the grayscale intensities have huge variance. What I mean by huge variance is that you would have values that are in the high range, values in the low range and nothing else. If you rescaled using the first method, this would mean that the minimum gets pushed to 0, the maximum gets shrunk to 255 and the rest of the intensities get scaled in between so for those values that are lower, they get scaled so that they're visualized as gray.
Question #3 - Do I have a clean or noisy image?
This is something that not many people think about. Is your image very clean, or are there a couple of spurious noisy spots? The first method is very bad when it comes to noisy pixels. If you only had a couple of pixel values that have a very large value but the other pixels are within the range of [0,255], this would make all of the other pixels get rescaled accordingly and would thus decrease the contrast of your image. You probably want to ignore the contribution made by these pixels and so the second method is preferable.
Conclusion
Therefore, there is nothing wrong with either of those methods that you have talked about. You need to be cognizant of what the image is, the dynamic range of values that you see once you examine the output and whether or not this is a clear or noisy image. You simply have to make a smart choice keeping those two factors in mind. So in your case, the first output probably didn't work because you have very large negative values and large positive values and perhaps very few of those values too. Doing a truncation is probably better for your application.
At the moment I am working on an on screen display project with black, white and transparent pixels. (This is an open source project: http://code.google.com/p/super-osd; that shows the 256x192 pixel set/clear OSD in development but I'm migrating to a white/black/clear OSD.)
Since each pixel is black, white or transparent I can use a simple 2 bit/4 state encoding where I store the black/white selection and the transparent selection. So I would have a truth table like this (x = don't care):
B/W T
x 0 pixel is transparent
0 1 pixel is black
1 1 pixel is white
However as can be clearly seen this wastes one bit when the pixel is transparent. I'm designing for a memory constrained microcontroller, so whenever I can save memory it is good.
So I'm trying to think of a way to pack these 3 states into some larger unit (say, a byte.) I am open to using lookup tables to decode and encode the data, so a complex algorithm can be used, but it cannot depend on the states of the pixels before or after the current unit/byte (this rules out any proper data compression algorithm) and the size must be consistent; that is, a scene with all transparent pixels must be the same as a scene with random noise. I was imagining something on the level of densely packed decimal which packs 3 x 4-bit (0-9) BCD numbers in only 10 bits with something like 24 states remaining out of the 1024, which is great. So does anyone have any ideas?
Any suggestions? Thanks!
In a byte (256 possible values) you can store 5 of your three-bit values. One way to look at it: three to the fifth power is 243, slightly less than 256. The fact that it's slightly less also shows that you're not wasting much of a fraction of a bit (hardly any, either).
For encoding five of your 3-bit "digits" into a byte, think of taking a number in base 3 made from your five "digits" in succession -- the resulting value is guaranteed to be less than 243 and therefore directly storable in a byte. Similarly, for decoding, do the base-3 conversion of a byte's value.