Tensorflow Object Detection Issue - image

I am trying Tensorflow faster_rcnn_resnet101 model to detect multiple objects in a 642*481 image. I tried two ways but none got satisfactory results.
1) In this way, I cropped the objects (50*160 rectangle) from training images (training images might be at a different dim size than 642*481 test image). I use those cropped images to train the faster_rcnn_resnet101. Looks like it has good result if the test set is also a cropped image on the same size. But for the 642*481 test image, it could not detect multiple objects there with good results.
Then I think about maybe the model rescaled the test image to match the 50*160 so the details got lost. In this thought, I tried another way
2) I copied each cropped images into a 642*481 white background respectively (padding basically). So each training image has the same dim size as the test image. The location of the cropped images copied on the background is purposely set as random. However, it still does not have good detection on the test image.
To try its performance, I use GIMP to make windows that contain objects and replace other parts outside the windows with white pixels. The result is much better.
If we only keep one object window in the image and make other parts as white, the result is super.
So my question is what is happening behind those? How could I make it work to detect multiple objects in the test images successfully? Thanks.

The idea is that you should use training images containing multiple instances of objects during training. In other words, you don't have to crop ! Especially that your images are not very big.
However, you DO have to identify (label) the objects. This means to know the bounding box of each object in an image. Furthermore, this data should be assembled together with the original image in a tfrecord file. There is a good guide here with the whole process. In your case, you would label all objects in the source image (642x481), instead of just the "raccoon". If your objects have multiple classes, make sure you label them as so !
If you train with these images, which contain the objects in their contexts, then the network will learn to recognize similar images, which have objects in context.

Related

Would it be effective to crop image in yolo v4?

Example Image
For example as Image above, area I need is just like red box, and other section doesn't have any labels for classification/object detection.
What I think is "If I use cropped image to red box will occur better effect" because there's to much useless area without labels in original image. When mosaic augmentation used in yolo v4, It will put images together in one. And, because there's so many area without labels, data after mosaic can be useless than before.
But, This is just my guess, and I need a test to confirm it, but the lack of computing power is limiting the actual test. So the question is, Is it possible to actually improve performance if the original image is cropped in the form of a red box? Is that why I guessed correctly?
Also, my partner said that cropping is not a good choice in Yolo because it can ruin the proportion of the object, but I couldn't understand what the proportion of the object meant in Yolo. I wonder why the proportion of objects in Yolo is not suit with cropping.
Thanks for read, and have a nice day
simply you shouldn't resize the images, however, if the training/testing data set contains considerable difference among width and heights, use the data augmentation methods. Pls, follow the link for more information.
https://github.com/pjreddie/darknet/issues/800

Compare two images, keep different pixels, set same pixels to transparent in actionscript 3

Ok, the question is pretty much in the title.
What I want to do is filter out differing pixels from one image and a default image. Then "print" only those pixels on a transparent layer if you will. Such that if you would "merge" the default image with the transparent layer, you would end up with the other image.
Is that possible with actionscript 3 ?
BitmapData.compare() is nearly what you want. It gives you a new BitmapData object where each pixel is the difference between the corresponding pixels in the two BitmapData objects being compared.
It sounds like you want the actual value of the changed pixels. I don't know of a built-in way to do that, so you might have to go through and do it pixel-by-pixel using BitmapData.getPixel() or building a PixelBender filter to do it. Either route is likely slower than you really want to deal with, honestly (I built a Chromakey demo app for a sales pitch that was barely able to scrape 8fps using those methods)

Overlay images in MATLAB using predefined points

Basically what I'm trying to do is overlay two images using predefined points on each image.
The images will be of two different sizes probably or scaled differently, don't know this for sure yet. But the images are of the same thing. So what I want to do is say this spot on image one is the same as this spot on image 2. And do this for multiple spots and then have matlab resize or transform to get all those points lined up so that the two images can be overlayed. The thing thats confusing me is having matlab automatically adjust the images so that they can "fit" together.
I have no idea where to start on this, and was just hoping to get a general idea of what i may be able to do.
Just incase someone else knows how to do this I'll throw in what else I need to do. After the two images are on top of each other, one images will be a region map the other a real image. What I need matlab to do is count the amount of dots from the real image in each region of the map.
Thanks for any help.
What you are trying to do is called image registration which is a very common image processing task. You wont need to write much code because matlab has built in functions for this. You use the cp2tform to create a transform from the first to second image and can then apply the transform to the first image using imtransform function. The code will look something like this assuming x,y coordinates of the control points are in an m by 2 matrix called points1 for image1 and points2 for image2.
tform= cp2tform(points1, points2 , 'similarity');
imtransform(image1, tform);

Detecting Object/Person in an image

I am new to Matlab, I am working on a project which will take input an image like this
as we can see it has a plain background (blue), and system will generate it's passport size image with given ratios, first I am working to separate background and person, the approach I searched is like if there is a blue in combinations of rgb matrices of image, then it is background, and rest is a person, but I am little bit confused that if this approach is correct or not, if it is correct then how can I find that current pixel is blue or not, how can I do it with matlab function find. Any help would be appreciated.
If you want to crop your image based on person's face, then there is no need in separating the background from the foreground. Nowadays you will easily find ready implementations of face detection, so, unless you want to implement your own method because the ready one fails, this should be a non-issue. See:
Show[img,
Graphics[{EdgeForm[{Yellow, Thick}], Opacity[0],
Rectangle ###
FindFaces[img = Import["http://i.stack.imgur.com/cSwzj.jpg"]]}]]
Supposing the face is detected correctly, you can expand/retract its bounding box to match the size you are after.

How to extract the layers from an image (jpg,png,etc)

Given an image such as the CakePHP logo, how can this image be converted back into a PSD with the layers. As a human, I can easily work out how to translate this back to a PSD with layers. I can tell that the background is a circular shape with star edges. So the circular star part is at the back, the cake image is on top of this and the words CakePHP is over all of these two images.
I can use Photoshop/Gimp tools to separate these images into three images and fill in the areas in-between. Then I have three layers.
As a human, it is easy to work out the layering of most logos and images and many images have multiple layers, the CakePHP logo is just one example. Images in the real world also have a layering, there may be a tree layer on top of a background of grass. I need a general way to convert from an image back to the layered representation, ideally a software solution.
In absence of a programmed solution, are there any papers or research which solve this problem or are related to this problem? I am mostly interested in converting human constructed images such as logos or website titles back to layered representation.
I want to point out some benefits of doing this, if you can get this image to a layered representation automatically then it is more easy to modify the image. For example, maybe you want to make the cake smaller, if the computer already layered the cake on top of the red background, you can just scale the cake layer. This allows for layer adjustment of images on websites which do not have layer information already.
As already mentioned, this is a non-trivial task. Ultimately, it can be most
simply phrased as: given an image (or scene if real photo) which is composed of
pixels N, how can those be assigned to M layers?
For segmentation, it's all about the prior knowledge you can bring to bear to
this as to what properties of pixels, and of groups of pixels, give "hints"(and
I use the word advisedly!) as to the layer they belong to.
Consider even the simplest case of using just the colour in your image. I can
generate these 5 "layers" (for hue values 0,24,90, 117 and 118):
With this code (in python/opencv)
import cv
# get orginal image
orig = cv.LoadImage('cakephp.png')
# show original
cv.ShowImage("orig", orig)
# convert to hsv and get just hue
hsv = cv.CreateImage(cv.GetSize(orig), 8, 3)
hue = cv.CreateImage(cv.GetSize(orig), 8, 1)
sat = cv.CreateImage(cv.GetSize(orig), 8, 1)
val = cv.CreateImage(cv.GetSize(orig), 8, 1)
cv.CvtColor(orig, hsv, cv.CV_RGB2HSV)
cv.Split(hsv,hue,sat,val,None)
#cv.ShowImage("hue", hue)
# loop to find how many different hues are present...
query = cv.CreateImage(cv.GetSize(orig), 8, 1)
result = cv.CreateImage(cv.GetSize(orig), 8, 1)
for i in range(0,255):
cv.Set(query,i)
cv.Cmp(query,hue,result,cv.CV_CMP_EQ)
# if a number of pixels are equal - show where they are
if (cv.CountNonZero(result)>1000): # <-what is signficant?
cv.ShowImage(str(i),result)
cv.SaveImage(str(i)+".png",result)
cv.WaitKey(-1)
But, even here we are having to describe what is "significant" in terms of the
number of pixels that belong to a mask (to the extent that we can miss some
colours). We could start to cluster similar colours instead - but at what
density does a cluster become significant? And if it wasn't just pure colour,
but textured instead, how could we describe this? Or, what about inference that
one layer is part of another, or in front of it? Or, ultimately, that some of
the layers seem to be what we humans call "letters" and so should probably be
all related...
A lot of the research in Computer Vision in segmentation generally tries to take
this problem and improve it within a framework that can encode and apply this
prior knowledge effectively...
When you convert from a layer representation to an image you are loosing information. For instance, you don't know the values of the pixels of the background layer behind the cake. Additionally, you don't know for sure which part of the image belong to which layer.
However it may be possible in some cases to recover or estimate at least partially this information. For instance, you could try to separate an image into "layers" using segmentation algorithms. On your exemple, a simple segmentation based on color would probably work.
As for recovering lost pixel values in the background, there is so-called inpainting technics which attempt to estimate missing areas in images based on its surroudings.
Lastly, to recover position and content of texts in images you can rely on Optical Character Recognition (OCR) methods.
Keep in mind that there is no simple algorithm to solve your problem which is more complex than it seems. However, using the above information, you can try to automate at least partially your problem.

Resources