Find all different objects/closed polygons in a room image - image

I am a newbie in image processing, I need to find different objects in an image (for example a room image) and be able to distinguish these objects and color them with distinct colors, I started with canny edge detection to find out different edges of the objects but in order to distinguish different objects what do I do next?

If you want to detect objects, you should think more than just image processing. The detecting object process is a long process.
Let's start with detecting that is an image a hotdog or not hotdog? Let me summarize the steps;
Set the hotdog image which you want to detect
Perform feature extraction
Read the input image and perform feature extraction on it
Perform feature matching between input image and hotdog image
Use a Machine Learning Classifier to decide that is hotdog or
not hotdog
If you want to detect hotdog from an image which contains another objects too (i.e. chair, table, coke and etc.), you should perform sliding window approach to detect hotdog and draw bounding box on it.
The state of the art in object detection is Deep Learning. You can train CNN to interpret images but it requires tons of works to train it. You can use trained neural nets, for example TensorFlow's trained neural nets, here is a demo video about it. You can use it to detect any different kind of object which are in your room image.

Related

Latent space image interpolation

Can someone tell me how (or the name of it, so that I could look it up) I can implement this interpolation effect? https://www.youtube.com/watch?v=36lE9tV9vm0&t=3010s&frags=pl%2Cwn
I tried to use r = r+dr, g = g+dr and b = b+db for the RGB values in each iteration, but it looks way too simple compared to the effect from the video.
"Can someone tell me how I can implement this interpolation effect?
(or the name of it, so that I could look it up)..."
It's not actually a named interpolation effect. It appears to interpolate but really it's just realtime updated variations of some fictional facial "features" (the hair, eyes, nose, etc are synthesized pixels taking hints from a library/database of possible matching feature types).
For this technique they used Neural Networks to do a process similar to DFT Image Reconstruction. You'll be modifying the image data in Frequency domain (with u,v), not Time domain (using x,y).
You can read about it at this PDF: https://research.nvidia.com/sites/default/files/pubs/2017-10_Progressive-Growing-of/karras2018iclr-paper.pdf
The (Python) source code:
https://github.com/tkarras/progressive_growing_of_gans
For ideas, on Youtube you can look up:
DFT image reconstruction (there's a good example with b/w Nicholas Cage photo reconstructed in stages. Loud music warning).
Image Synthesis with neural networks (one clip had salternative shoe and hand-bag designs (item photos) being "synthesized" by an N.N. after it analyzed features from other existing catalogue photos as "inspiration".
Image Enhancement Super Resolution using neural networks This method is closest to answering your question. One example has very low-res blurry pixelated image in b/w. Cannot tell if boy or girl. During a test, The network synthesizes various higher quality face images that it thinks is the correct match for the testing input.
After understanding what/how they're achieve it, you could think of shortcuts to get similar effect without needing networks eg: only using regular pixel editing functions.
Found it in another video, it is called "latent space interpolation", it has to be applied on the compressed images. If I have image A and the next image is image B, I have first to encode A and B, use the interpolation on the encoded data and finally decode the resulted image.
As of today, I found out that this kind of interpolation effect can be easily implemented for 3d image data. That is if the image data is available in a normalized and at 3d origin centred way, like for example in a unit sphere around the origin and the data of each faceimage is inside that unit sphere. Having the data of two images stored this way the interpolation can be calculated by taking the differences of rays going through the origin center and through each area of the sphere at some desired resolution.

Suggests or methods of tv logo auto finding/locating/detection

Usually the logo detection means find the logo and recognize the logo. Some common works do the two steps together using SIFT/SURF matching method, detailed in
(1) Logo recognition in images
(2) Logo detection using OpenCV
But, if the logo is tiny and blur, the result is poor, and kind of time consuming; I want to split the two steps, firstly finding where the logo is in video; then recognize the logo using template matching or other method, like:
(3) Logo recognition - how to improve performance
(4) OpenCV logo recognition
My problem is mainly focused on finding the logo automatically in video. I tried two methods:
Brightness method. The logo on tv screen usually always there when the show goes on, I select a list of frames randomly and do difference between frames, the logo area tend to be 0; I do some statistics of 0 brightness with threshold to determine whether the pix is logo or not. This method usually do well but failed while the show has static background.
Edge method. Likely, if the logo is there, the border tends to be obvious. I do the statistic work like Brightness method, but edge sometimes unstable,such as very bright background.
Are there any suggestions or state of art methods to auto finding logo areas and any other logo recognition method except sift or template matching ?
Let's assume your list of logos known before hand and you have access to examples (video streams/frames) of all logos.
The 2017 answer to your question is to train a logo classifier, and most likely a deep neural network.
With sufficient training data, if it is identifiable to the TV viewers it will be able to detect it. It will be able to handle local blurring and intensity changes (which may thwart "classic" image processing methods of brightness and edges).
OpenCV can load and run network models from multiple frameworks like Caffe, Torch and TensorFlow, so you can use one of their pre-trainined models or train one yourself.
You could also try the Tensorflow's object detection API here: https://github.com/tensorflow/models/tree/master/research/object_detection
The good thing about this API is that it contains State-of-the-art models in Object Detection & Classification. These models that tensorflow provide are free to train and some of them promise quite astonishing results. I have already trained a model for the company I am working on, that does quite amazing job in LOGO detection from Images & Video Streams. You can check more about my work here: https://github.com/kochlisGit/LogoLens
The problem with the TV is that the LOGOs will probably be not-static and move along the frames. This will result in a motion blur effect, which will probably make your classifier to get confused or not see the LOGOs. However, once you find a logo You can use an object tracking algorithm to keep track of the logo (e.g. deepsort)

How do I process an image to produce formatted images subject to some rules?

Imagine a digital picture of a flower. I am looking for an algorithm and a platform to use it, in which it will generate a series of "derivative images", in which each image shows the moulding of the flower in a time series. The rules for choosing areas and the colours in the derivative images will be instructed by the artist, and the final output must look as if one has actually filmed a similar flower becoming mouldy (like green), where the contours of objects remain fixed. It should also be based on a randomised algorithm where each generated sequence of images will be unique.
Judging by the description of the task, the program will have to perform complex image processing, involving estimation of an object's three-dimensional position and orientation from a 2d image, generation of a filter based on that data and its application on the image. This can be accomplished with the OpenCV library.

Augment reality like zookazam

What algorithms are used for augmented reality like zookazam ?
I think it analyze image and find planes by contrast, but i don't know how.
What topics should I read before starting with app like this?
[Prologue]
This is extremly broad topic and mostly off topic in it's current state. I reedited your question but to make your question answerable within the rules/possibilities of this site
You should specify more closely what your augmented reality:
should do
adding 2D/3D objects with known mesh ...
changing light conditions
adding/removing body parts/clothes/hairs ...
a good idea is to provide some example image (sketch) of input/output of what you want to achieve.
what input it has
video,static image, 2D,stereo,3D. For pure 2D input specify what conditions/markers/illumination/LASER patterns you have to help the reconstruction.
what will be in the input image? empty room, persons, specific objects etc.
specify target platform
many algorithms are limited to memory size/bandwidth, CPU power, special HW capabilities etc so it is a good idea to add tag for your platform. The OS and language is also a good idea to add.
[How augmented reality works]
acquire input image
if you are connecting to some device like camera you need to use its driver/framework or something to obtain the image or use some common API it supports. This task is OS dependent. My favorite way on Windows is to use VFW (video for windows) API.
I would start with some static file(s) from start instead to ease up the debug and incremental building process. (you do not need to wait for camera and stuff to happen on each build). And when your App is ready for live video then switch back to camera...
reconstruct the scene into 3D mesh
if you use 3D cameras like Kinect then this step is not necessary. Otherwise you need to distinguish the object by some segmentation process usually based on the edge detections or color homogenity.
The quality of the 3D mesh depends on what you want to achieve and what is your input. For example if you want realistic shadows and lighting then you need very good mesh. If the camera is fixed in some room you can predefine the mesh manually (hard code it) and compute just the objects in view. Also the objects detection/segmentation can be done very simply by substracting the empty room image from current view image so the pixels with big difference are the objects.
you can also use planes instead of real 3D mesh as you suggested in the OP but then you can forget about more realistic quality of effects like lighting,shadows,intersections... if you assume the objects are standing straight then you can use room metrics to obtain the distance from camera. see:
selection criteria for different projections
estimate measure of photographed things
For pure 2D input you can also use the illumination to estimate the 3D mesh see:
Turn any 2D image into 3D printable sculpture with code
render
Just render the scene back to some image/video/screen... with added/removed features. If you are not changing the light conditions too much you can also use the original image and render directly to it. Shadows can be achieved by darkening the pixels ... For better results with this the illumination/shadows/spots/etc. are usually filtered out from the original image and then added directly by rendering instead. see
White balance (Color Suppression) Formula?
Enhancing dynamic range and normalizing illumination
The rendering process itself is also platform dependent (unless you are doing it by low level graphics in memory). You can use things like GDI,DX,OpenGL,... see:
Graphics rendering
You also need camera parameters for rendering like:
Transformation of 3D objects related to vanishing points and horizon line
[Basic topics to google/read]
2D
DIP digital image processing
Image Segmentation
3D
Vector math
Homogenous coordinates
3D scene reconstruction
3D graphics
normal shading
paltform dependent
image acquisition
rendering

image background / foreground detection with ccv

I need to programmatically determine the best place to overlay text on an image. In other words, I need to tell the foreground from the background. I have tried imagemagick: http://www.imagemagick.org/Usage/scripts/bg_removal. Unfortunately this was not good enough. The images can be photographs of pretty much anything, but usually with a blurry background.
I would now like to try liuliu's CCV. Code: https://github.com/liuliu/ccv, Demo: http://liuliu.me/ccv/js/nss/
The demo uses what looks like a json haar cascade to detect faces: https://github.com/liuliu/ccv/blob/unstable/js/face.js
How do I:
1. Convert the xml haar cascade files to be able to be used with CCV
2. Generate the best cascade for my goal (text placement on an image)
3. Find any documentation for CCV
AND, finally, is there a better way to approche this problem?
EDIT: I've asked the border question here: https://stackoverflow.com/questions/10559262/programmatically-place-text-in-an-image
Convert the xml haar cascade files to be able to be used with CCV
Generate the best cascade for my goal (text placement on an image)
Find any documentation for CCV
I have no idea about 1)
(Anyway, which XML files? I guess some from opencv?)
or 3),
but here is my take on 2)
To make a haar cascade a lá viola&jones, you need a series of small training images that contain only your desired objects, for example faces.
One object per image, with as little background as possible, all in the same orientation and size, normalized so they all have the same average brightness and variance in brightness. You will need a lot of training images.
You also need a series of negative training images, same size/brightness etc as the positive examples, that contain only background.
However, I doubt that this approach will work for you at all:
Haar filters work by recognizing common rectangular light/dark structures in all your foreground objects.
So your desired foreground images need to have a common structure.
An example haar filter cascade works like this (extremely simplified):
is the rectangular region at x1,y1 darker than the region at x2,y2? if no --> not a face, if yes --> continue
is the region at x3,y3 darker than the region at x4,y4? if no --> not a face --> if yes, continue
and so on ....
(To find the position of a face in a larger image, you execute this filter for every possible position in the image. The filter cascade is very fast in rejecting non-faces, so this is doable.)
So your foreground objects need to have a common pattern among them.
For faces, the eye region is darker than the cheek region, and the mouth is darker than the chin, and so on.
The same filter for faces will cease to work if you just rotate the faces.
You cannot build a good filter for both trees and faces, and you definitely cannot build one for general foreground objects. They have no such common structure among them. You would need a separate filter for each possible type of object, so unless your pictures only show a very limited number of types this will not work

Resources