How to find images with a variable sized grey rectangle (JPEG corruption) in them? - image

I had to recover a hard drive and a lot of photos in it came out corrupted. I'm talking about 200.000 photos. I already wrote a script that finds corrupted JPEGs. But some of these images are not corrupted on a file format level. Yet they appear as the example I am showing. The grey part i suspect is data missing from the file. The grey part size is variable and sometimes it has an incomplete line in it.
So I'm thinking I could write or find a script that finds grey rectangles in these images.
How do I do this? Something that opens the image data and looks for this giant grey rectangle? I have no idea where to start. I can code in a bunch of languages.
Any help/examples, is much appreciated.

I was thinking that the grey rectangle is always the same colour, so I created a function to see if that grey is one of the top 10 most frequent
colours.
If the colour had changed, then I would have adjusted the code accordingly to check if the top colour is at least 10x more frequent than the second most frequent colour.
Didn't have to learn feature detection this time. Shame. :(
from collections import Counter
from PIL import Image
# Open the image file
image = Image.open(file)
# Convert the image to RGB format (if it's not already)
image = image.convert('RGB')
# Get a list of all the pixels in the image
pixels = list(image.getdata())
# Count the number of pixels with each RGB value
counts = Counter(pixels)
most_common_colors = counts.most_common(10)
return (128,128,128) in [t[0] for t in most_common_colors]

Related

how to whiten the white parts and blacken the black parts of a scanned image in MatLab or photoshop

I have a scanned image, scanned from printed word (docx) file. I want the scanned image to be looked like the original word file i.e. to remove noise and enhancement. i.e. to fully whiten the white parts and fully blacken the black parts without changing colorful parts on the fileenter image description here
There are a number of ways you could approach this. The simplest would be to apply a levels filter with the black point raised a bit and the white point lowered a bit. This can be done to all 3 color channels or selectively to a subset. Since you're going for creating pure black and white and there's no color cast on the image, I would apply the same settings to all 3 color channels. It works like this:
destVal = (srcVal - blackPt) + srcVal / (whitePt - blackPt);
This will slightly change the colored parts of the image, probably resulting in making them slightly more or less saturated.
I tried this in a photo editing app and was disappointed with the results. I was able to remove most of the noise by bringing the white point down to about 66%. However, the logo in the upper left is so wispy that it also ended up turning it very white. The black point didn't really need to be moved.
I think you're going to have a tough time with that logo. You could isolate it from your other changes, though, and that might help. A simple circular area around it where you just ignore any processing would probably do the trick.
But I got to thinking - this was made with Word. Do you have a copy of Word? It probably wouldn't be too difficult to put together a layout that's nearly identical. It still wouldn't help with the logo. But what you could do is layout the text the same and export it to a PDF or other image format. (Or if you can find the original template, just use it directly.) Then you could write some code to process your scanned copy and wherever a pixel is grayscale (red = green = blue), use the corresponding pixel from the version you made, otherwise use the pixels from the scan. That would get you all the stamps and signatures, while having the text nice and sharp. Perhaps you could even find the organization's logo online. In fact, Wikipedia even has a copy of their logo.
You'd probably need to have some sort of threshold for the grayscale because some pixels might be close but have a slight color cast. One option might be something like this:
if ((fabs(red - green) < threshold) && (fabs(red - blue) < threshold))
{
destVal = recreationVal; // The output is the same as the copy you made manually
}
else
{
destVal = scannedVal; // The output is the same as the scan
}
You may find this eats away at some of the colored marks, so you could do a second pass over the output where any pixel that's adjacent to a colored pixel brings in the corresponding pixel from the original scan.

Applying an image as a mask in matlab

I am a new user on image processing via Matlab. My first aim is applying the article and comparing my results and authors' results.
The article can be found here: http://arxiv.org/ftp/arxiv/papers/1306/1306.0139.pdf
First problem, Image Quality: In Figure 7, masks are defined but I couldn't reach the mask data set, and I use the screenshot so image quality is low. In my view, it can effect the results. Is there any suggestions?
Second problem, Merging images: I want to apply mask 1 on the Lena. But I don't want to use paint =) On the other hand, is it possible merging the images and keeping the lena?
You need to create the mask array. The first step is probably to turn your captured image from Figure 7 into a black and white image:
Mask = im2bw(Figure7, 0.5);
Now the background (white) is all 1 and the black line (or text) is 0.
Let's make sure your image of Lena that you got from imread is actually grayscale:
LenaGray = rgb2gray(Lena);
Finally, apply your mask on Lena:
LenaAndMask = LenaGray.*Mask;
Of course, this last line won't work if Lena and Figure7 don't have the same size, but this should be an easy fix.
First of all, You have to know that this paper is published in archive. when papers published in archive it is always a good idea to know more about the author and/or the university that published the paper.
TRUST me on that: you do not need to waste your time on this paper.
I understand your demand: but it is not a good idea to do get the mask by doing print screen. The pixel values that can be achieved by using print screen may not be the same as the original values. The zoom may change the size. so you need to be sure that the sizes are the same.
you can do print screen. past the image.
crop the mask.
convert rgb to gray scale.
threshold the gray scale to get the binary.
if you saved the image as jpeg. distortions because of high frequency edges will change edge shape.

Save image in original resolution with imfindcircle plot in Matlab

I have a really long picture where I use imfindcircles on. But I need to check if the right ones are found. It is a 158708x2560 logical.
So I have:
[centers, radii] = imfindcircles(I,[15 35],'ObjectPolarity','bright','Sensitivity',0.91);
figure(1)
imshow(I)
viscircles(centers,radii);
and I want to save that output you see in the figure box (binary image with circles on it) to a image file. File format doesn't matter as long as it has the same resolution of 158708x2560 pixel.
Every suggestion I find online alters the resolution or makes the image broader, like when saving the figure directly you get a huge grey border and resolution goes down.
What would also work is a way to zoom into the figure but the zoom option in the figure menu does not magnify correctly. It does magnify but the image stays really thin so you can't see a thing.
Matrix: https://www.dropbox.com/s/rh9wakimc7atfhg/I.mat?dl=0
There are two round spots repeating. I want to find those, not the others. And export the image with the circles plotted over it.

Lung segmentation in Matlab

Trying to segment out the lung region, I am having a lot of trouble. Incoming image is like this: (This is essentially a jpg conversion, and each pixel is 8 bits.)
I = dicomread('000019.dcm');
I8 = uint8(I / 256);
B = im2bw(I8, 0.007);
segmented = imclearborder(B);
Above script generates:
Q-1
I am interested in entire inner black part with white matter as well. I have started matlab couple of days ago, so not quite getting how can I do it. If it is not clear to you what kind of output I want, let me know-I will upload an image. But I think there is no need.
Q-2
in B = im2bw(I8, 0.007); why I need to give a threshold so low? with higher thresholds everything is white or black. I have read the documentation and as I understand it, the pixels with value less than 0.007 are marked black and everything above is white. Is it because of my 16-to-8 bit conversion?
An other automatic solution that I did quickly using ImageJ (there are the same algorithms in MatLab):
Automatic thresholding using Huang or Li in the color space of your choice (all of them work)
Opening with a structuring element of type disk (delete the small components)
Connected components labeling.
Delete the components that touches the border of the images.
Fill holes.
And you have a clean result.
Here's a working solution in python using OpenCV:
import cv2 #openCV
import numpy as np
filename = 'zFrkx.jpg' #name of file in quotations here... assumes file is in same dir as .py file
img_gray = cv2.imread(filename, 0) #converts jpg image to grayscale representation
min_val = 100 #try shifting these around to expand or collapse area of interest
max_val = 150
ret, lung_mask = cv2.threshold(img_gray, min_val, max_val, cv2.THRESH_BINARY_INV) #fixed threshold uses values you'll def above
lung_layer = cv2.bitwise_and(img_gray, img_gray, mask = lung_mask)
cv2.imwrite('cake.tif', lung_layer) #outputs desired layer to current working dir
I tried running the script with threshold values set arbitrarily to 100,150 and got the following result, from which you could select the largest continuous element using dilation and segmentation techniques (http://docs.opencv.org/master/d3/db4/tutorial_py_watershed.html#gsc.tab=0).
Also, I suggest you crop the bottom and top X pixels to cut out text since no lung will fill the top or bottom of the picture.
Use tif instead of jpg format to avoid compression related artifact.
I know you noted that you'd like the medullar(?) white matter, too. Would be glad to help with that, but could you first explain in plain english how your shared matlab code works? Seems to work pretty well for the WM.
Hope this helps!

Image Compression Algorithm - Breaking an Image Into Squares By Color

I'm trying to develop a mobile application, and I'm wondering the easiest way to convert an image into a text file, and then be able to recreate it later in memory said text. The image(s) in question will contain no more than 16 or so colors, so it would work out fine.
Basically, brute-forcing this solution would require me saving each individual's pixel color data into a file. However, this would result in a HUGE file. I know there's a better way - like, if there's a huge portion of the image that consists of the same color, breaking up the area into smaller squares and rectangles and saving their coordinates and size to file.
Here's an example. The image is supposed to be just black/white. The big color boxes represent theoretical 'data points' in the outputted text file. These boxes would really state their origin, size, and what color they should be.
E.g., top box has an origin of 0,0, a size of 359,48, and it represents the color black.
Saved in a text file, the data would be 0,0,359,48,0.
What kind of algorithm would this be?
NOTE: The SDK that I am using cannot return a pixel's color from an X,Y coordinate. However, I can load external information into the program from a text file and manipulate it that way. This data that I need to export to a text file will be from a different utility that will have the capability to get a pixel's color from X,Y coordinates.
EDIT: Added a picture
EDIT2: Added constraints
Could you elaborate on why you want to save an image (or its parts) as plain text? Can't you use a binary representation instead? Also, if images typically have lots of contiguous runs of pixels of same color, you may want to use the so-called run-length encoding (RLE). Alternatively, one of Lempel-Ziv-something compression algorithms could be used (LZ77, LZ78, LZW).
Compress the image into a compressed format (e.g. JPEG, PNG, GIF, etc) and then save it as a .txt file or whatever. To recreate the image, just read in the file into your program using whatever library function suits your particular needs.
If it's necessary that the .txt file have some textual meaning, then you may be in some trouble.
In cs there is an algorithm like spatial index to recursivley subdivide a plane into 4 tiles. If the cell has the same size it looks like a quadtree. If want you to subdivide a plane into pattern (of colors) you can use this tiling idea to dynamically change the size of the cell. A good start to look at is a z-curve or a hilbert curve.

Resources