How to keep specific pixels untouched and the rest of pixels are trained and generated in GANs? - pixel

I want 2 pixels to be fixed and not changed in fake generated images in generator.
I have a feeling that masking that specific pixels are needed. But I dont know how exactly to do so in GANs training procedure. Because even though I mask the pixels before training loop starts, I do not know how to leave the fixed pixels not to be used in generating scene. I wish someone can code up a bit to show how it should be done.
Or I just need to train it and then replace those generated two pixel locations with original pixel values that I want to fix with? But this must distort other pixel values because I generated the two pixels as well that shouldnt be affected and affecting other pixels.

Related

anyway to remove algorithmically discolorations from aerial imagery

I don't know much about image processing so please bear with me if this is not possible to implement.
I have several sets of aerial images of the same area originating from different sources. The pictures have been taken during different seasons, under different lighting conditions etc. Unfortunately some images look patchy and suffer from discolorations or are partially obstructed by clouds or pix-elated, as par example picture1 and picture2
I would like to take as an input several images of the same area and (by some kind of averaging them) produce 1 picture of improved quality. I know some C/C++ so I could use some image processing library.
Can anybody propose any image processing algorithm to achieve it or knows any research done in this field?
I would try with a "color twist" transform, i.e. a 3x3 matrix applied to the RGB components. To implement it, you need to pick color samples in areas that are split by a border, on both sides. You should fing three significantly different reference colors (hence six samples). This will allow you to write the nine linear equations to determine the matrix coefficients.
Then you will correct the altered areas by means of this color twist. As the geometry of these areas is intertwined with the field patches, I don't see a better way than contouring the regions by hand.
In the case of the second picture, the limits of the regions are blurred so that you will need to blur the region mask as well and perform blending.
In any case, don't expect a perfect repair of those problems as the transform might be nonlinear, and completely erasing the edges will be difficult. I also think that colors are so washed out at places that restoring them might create ugly artifacts.
For the sake of illustration, a quick attempt with PhotoShop using manual HLS adjustment (less powerful than color twist).
The first thing I thought of was a kernel matrix of sorts.
Do a first pass of the photo and use an edge detection algorithm to determine the borders between the photos - this should be fairly trivial, however you will need to eliminate any overlap/fading (looks like there's a bit in picture 2), you'll see why in a minute.
Do a second pass right along each border you've detected, and assume that the pixel on either side of the border should be the same color. Determine the difference between the red, green and blue values and average them along the entire length of the line, then divide it by two. The image with the lower red, green or blue value gets this new value added. The one with the higher red, green or blue value gets this value subtracted.
On either side of this line, every pixel should now be the exact same. You can remove one of these rows if you'd like, but if the lines don't run the length of the image this could cause size issues, and the line will likely not be very noticeable.
This could be made far more complicated by generating a filter by passing along this line - I'll leave that to you.
The issue with this could be where there was development/ fall colors etc, this might mess with your algorithm, but there's only one way to find out!

Reading voxel values from binary file into matlab

I have a 16bit voxel data set from which I need to extract the integer values for each voxel. The data set can be downloaded from here, it is the 'Head Aneuyrism 16Bits' data set (You need to click on the blood vessels image to download the 16bit version). Its size is 512x512x512, but I don't know whether it is greyscale or color, nor if that matters. Looking at the image on the website I'd guess that it is color, but I am not sure whether the image should be taken literally.
A related question on SO is the following: How can I read in a RAW image in MATLAB?
and the following on mathworks: http://www.mathworks.com/matlabcentral/answers/63311-how-to-read-an-n-dimensioned-matrix-from-a-binary-file
Thanks to the information in the answers to these questions I managed to extract some information from the file with matlab as follows:
fileID=fopen('vertebra16.raw','r');
A=fread(fileID,512*512*512,'int16');
B=reshape(A,[512 512 512]);
I don't need to visualise the image, I only need to have the integer values for each voxel, but I am not sure whether I am reading the information in the correct way with my script.
The only way I found to try and check whether I have the correct voxel values is to visualise B using the following:
implay(B)
Now, with the code above, and then using implay(B) I get a black and white movie with a white disc in the center and black background and some black pixels moving in the disc (I tried to upload a frame of the movie, but it didn't work). Looking at the image on the website from which I downloaded the file, the movie frames I get seem quite different from that image, so I'd conclude that I do not have the correct voxel values.
Here are some questions related to my problem:
Do I need to know whether the image is in grey scale or color to read the voxel values correctly?
On the data set website there is only written that the data set is in 16bit format, so how do I know whether I am dealing with signed or unsigned integers?
In the SO question linked to above they use 'uint8=>uint8'. I could not find this in the matlab manual, so I wonder whether 'uint8=>uint8' is an obsolete matlab notation for 'uint8' or if it does something different. I suspect that it does something different since if I use 'int16=>int16' instead of 'int16' in my code above I get a completely black movie with implay.
It looks like you read the data correctly.
The problem when displaying it is the scale of the values. implay seems to assume the values to be in [0,1] and therefore clamps all values to be in that range, where are your data range is [0,3000].
Simply doing
B = B / max(B(:))
will rescale your data to [0,1] and looking at the data again with
implay(B)
shows you something much more sensible.

Matlab - Registration and Cropping of aligned images from two different sources

Good day,
In MATLAB, I have multiple image-pairs of various samples. The images in a pair are taken by different cameras. The images are in differing orientations, though I have created transforms (for each image-pair) that can be applied to correct that. Their bounds contain the same physical area, but one image has smaller dimensions (ie. 50x50 against 250x250). Additionally, the smaller image is not in a consistent location within the larger image. However, the smaller image is within the borders of the larger image.
What I'd like to do is as follows: after applying my pre-determined transform to the larger image, I want to crop the part of the larger image that is of the same as the smaller image.
I know I can specify XData and YData when applying my transforms to output a subset of the transformed image, but I don't know how to relate that to the location of the smaller image. (Note: Transforms were created from control-point structures)
Please let me know if anything is unclear.
Any help is much appreciated.
Seeing how you are specifying control points to get the transformation from one image to another, I'm assuming this is a registration problem. As such, I'm also assuming you are using imtransform to warp one image to another.
imtransform allows you to specify two additional output parameters:
[out, xdata, ydata] = imtransform(in, tform);
Here, in would be the smaller image and tform would be the transformation you created to register the smaller image to warp into the larger image. You don't need to specify the XData and YData inputs here. The inputs of XData and YData will bound where you want to do the transformation. Usually people specify the dimensions of the image to ensure that the output image is always contained within the borders of the image. However in your case, I don't believe this is necessary.
The output variable out is the warped and transformed image that is dictated by your tform object. The other two output variables xdata and ydata are the minimum and maximum x and y values within your co-ordinate system that will encompass the transformed image fully. As such, you can use these variables to help you locate where exactly in the larger image the transformed smaller image appears. If you want to do a comparison, you can use these to crop out the larger image and see how well the transformation worked.
NB: Sometimes the limits of xdata and ydata will go beyond the dimensions of your image. However, because you said that the smaller image will always be contained within the larger image (I'm assuming fully contained), then this shouldn't be a problem. Also, the limits may also be floating point so you'll need to be careful here if you want to use these co-ordinates to crop a minimum spanning bounding box.

Invoice / OCR: Detect two important points in invoice image

I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).

How do I locate black rectangles in a grid and extract the binary code from that

i'm working in a project to recognize a bit code from an image like this, where black rectangle represents 0 bit, and white (white space, not visible) 1 bit.
Somebody have any idea to process the image in order to extract this informations? My project is written in java, but any solution is accepted.
thanks all for support.
I'm not an expert in image processing, I try to apply Edge Detection using Canny Edge Detector Implementation, free java implementation find here. I used this complete image [http://img257.imageshack.us/img257/5323/colorimg.png], reduce it (scale factor = 0.4) to have fast processing and this is the result [http://img222.imageshack.us/img222/8255/colorimgout.png]. Now, how i can decode white rectangle with 0 bit value, and no rectangle with 1?
The image have 10 line X 16 columns. I don't use python, but i can try to convert it to Java.
Many thanks to support.
This is recognising good old OMR (optical mark recognition).
The solution varies depending on the quality and consistency of the data you get, so noise is important.
Using an image processing library will clearly help.
Simple case: No skew in the image and no stretch or shrinkage
Create a horizontal and vertical profile of the image. i.e. sum up values in all columns and all rows and store in arrays. for an image of MxN (width x height) you will have M cells in horizontal profile and N cells in vertical profile.
Use a thresholding to find out which cells are white (empty) and which are black. This assumes you will get at least a couple of entries in each row or column. So black cells will define a location of interest (where you will expect the marks).
Based on this, you can define in lozenges in the form and you get coordinates of lozenges (rectangles where you have marks) and then you just add up pixel values in each lozenge and based on the number, you can define if it has mark or not.
Case 2: Skew (slant in the image)
Use fourier (FFT) to find the slant value and then transform it.
Case 3: Stretch or shrink
Pretty much the same as 1 but noise is higher and reliability less.
Aliostad has made some good comments.
This is OMR and you will find it much easier to get good consistent results with a good image processing library. www.leptonica.com is a free open source 'C' library that would be a very good place to start. It could process the skew and thresholding tasks for you. Thresholding to B/W would be a good start.
Another option would be IEvolution - http://www.hi-components.com/nievolution.asp for .NET.
To be successful you will need some type of reference / registration marks to allow for skew and stretch especially if you are using document scanning or capturing from a camera image.
I am not familiar with Java, but in Python, you can use the imaging library to open the image. Then load the height and the widths, and segment the image into a grid accordingly, by Height/Rows and Width/Cols. Then, just look for black pixels in those regions, or whatever color PIL registers that black to be. This obviously relies on the grid like nature of the data.
Edit:
Doing Edge Detection may also be Fruitful. First apply an edge detection method like something from wikipedia. I have used the one found at archive.alwaysmovefast.com/basic-edge-detection-in-python.html. Then convert any grayscale value less than 180 (if you want the boxes darker just increase this value) into black and otherwise make it completely white. Then create bounding boxes, lines where the pixels are all white. If data isn't terribly skewed, then this should work pretty well, otherwise you may need to do more work. See here for the results: http://imm.io/2BLd
Edit2:
Denis, how large is your dataset and how large are the images? If you have thousands of these images, then it is not feasible to manually remove the borders (the red background and yellow bars). I think this is important to know before proceeding. Also, I think the prewitt edge detection may prove more useful in this case, since there appears to be less noise:
The previous method of segmenting may be applied, if you do preprocess to bin in the following manner, in which case you need only count the number of black or white pixels and threshold after some training samples.

Resources