I am attempting to create a content based image retrieval system (CBIR) in MATLAB for colour images, and am using a k-means algorithm to extract the feature vectors for images in my database. Each image has four clusters, and each cluster has information about the colour (R,G,B) and position (X,Y).
I am now trying to add a texture feature to my clusters, and need to use grey level co-occurrence matrices (GLCM) for this. I know that GLCM is just an indicator of probability that a certain grey level will appear next to another, and have created the GLCM for my images.
I am unclear about how to map the GLCM to the original image (and thus its clusters), since GLCM talks about pairs of pixels, and I would like each X,Y position to have texture information. How does one go about translating GLCM to pixels?
The output of GLCM seems to be a T-by-T matrix where T is the number of distinct grayscale levels in the image. Therefore, the size of this matrix does not really depend on the size of your image. The matrix also describes the texture of the whole image, so it isn't especially meaningful to associate GLCM data with a single pixel.
It sounds like you could compute GLCM for the individual clusters, since this would describe the texture within that cluster? I think graycomatrix requires a rectangular image, but you could find the bounding box for each cluster and extract GLCM from them separately.
If you wanted to get some more meaningful information out of a GLCM matrix (i.e. something that is appropriate as a 'feature'), you could use graycoprops which returns 4 summary statistics.
Related
I am working on fusing Lidar and Camera images in order to perform a classification object algorithm using CNN.
I want to use the KITTI Dataset which provide synchronized lidar and rgb image data. Lidar are 3D-scanners, so the output is a 3D Point Cloud.
I want to use depth information from point cloud as a channel for the CNN. But I have never work with point cloud so I am asking for some help. Is projecting the point cloud into the camera image plane (using projection matrix provide by Kitti) will give me the depth map that I want? Is Python libray pcl useful or I should move to c++ libraries?
If you have any suggestions, thanks you in advance
I'm not sure what projection matrix provide by Kitti includes, so the answer is it depends. If this projection matrix only contains a transformation matrix, you cannot generate depth map from it. The 2D image has distortion that comes from the 2D camera and the point cloud usually doesn't have distortion, so you cannot "precisely" map point cloud to rgb image without intrinsic and extrinsic parameters.
PCL is not required to do this.
Depth map essentially is mapping depth value to rgb image. You can treat each point in point cloud(each laser of lider) as a pixel of the rgb image. Therefore, I think all you need to do is finding which point in point cloud corresponding to the first pixel(top left corner) of the rgb image. Then read the depth value from point cloud based on rgb image resolution.
You have nothing to do with camera. This is all about point cloud data. Lets say you have 10 million of points and each point has x,y,z in meters. If the data is not in meters first convert it. Then you need the position of the lidar. When you subtract position of car from all the points one by one, you will take the position of lidar to the (0,0,0) point, then you can just print the point on a white image. The rest is simple math, there may be many ways to do it. First that comes to my mind: think rgb as binary numbers. Lets say 1cm is scaled to change in 1 blue, 256cm change equals to change in 1 green and 256x256 which is 65536 cm change equals change in 1 red. We know that cam is (0,0,0) if rgb of the point is 1,0,0 then that means 256x256x1+0x256+0x1=65536 cm away from the camera. This could be done in C++. Also you can use interpolation and closest point algorithms to fill blanks if there are
I have an image with arbitrary regions shape (say objects), let's assume the background pixels are labeled as zeros whereas any object has a unique label (pixels of object 1 are labeled as 1, object 2 pixels are labeled as 2,...). Now for every object, I need to find the best elliptical fit of its pixels. This requires finding the center of the object, the major and minor axis, and the rotation angle. How can I find these?
Thank you;
Principal Component Analysis (PCA) is one way to go. See Wikipedia here.
The centroid is easy enough to find if your shapes are convex - just a weighted average of intensities over the xy positions - and PCA will give you the major and minor axes, hence the orientation.
Once you have the centre and axes, you have the basis for a set of ellipses that cover your shape. Extending the axes - in proportion - and testing each pixel for in/out, you can find the ellipse that just covers your shape. Or if you prefer, you can project each pixel position onto the major and minor axes and find the rough limits in one pass and then test in/out on "corner" cases.
It may help if you post an example image.
As you seem to be using Matlab, you can simply use the regionprops command, given that you have the Image Processing Toolbox.
It can extract all the information you need (and many more properties of image regions) and it will do the PCA for you, if the PCA-based based approach suits your needs.
Doc is here, look for the 'Centroid', 'Orientation', 'MajorAxisLength' and 'MinorAxisLength' parameters specifically.
I got a png image like this:
The blue color is represent transparent. And all the circle is a pixel group. So, I would like to find the biggest one, and remove all the small pixel, which is not group with the biggest one. In this example, the biggest one is red colour circle, and I will retain it. But the green and yellow are to small, so I will remove them. After that, I will have something like this:
Any ideas? Thanks.
If you consider only the size of objects, use the following algorithm: labellize the connex components of the mask image of the objects (all object pixels are white, transparent ones are black). Then compute the areas of the connex components, and filter them. At this step, you have a label map and a list of authorized labels. You can read the label map and overwrite the mask image with setting every pixel to white if it has an authorized label.
OpenCV does not seem to have a labelling function, but cvFloodFill can do the same thing with several calls: for each unlabeled white pixel, call FloodFill with this pixel as marker. Then you can store the result of this step in an array (of the size of the image) by assigning each newly assigned pixel with its label. Repeat this as long as you have unlabellized pixels.
Else you can recode the connex component function for binary images, this algorithm is well known and easy to implement (maybe start with Matlab's bwlabel).
The handiest way to filter objects if you have an a priori knowledge of their size is to use morphological operators. In your case, with opencv, once you've loaded your image (OpenCV supports PNG), you have to do an "openning", that is an erosion followed by a dilation.
The small objects (smaller than the size of the structuring element you chose) will disappear with erosion, while the bigger will remain and be restored with the dilation.
(reference here, cv::morphologyEx).
The shape of the big object might be altered. If you're only doing detection, it is harmless, but if you want your object to avoid transformation you'll need to apply a "top hat" transform.
Basically I was trying to achieve this: impose an arbitrary image to a pre-defined uneven surface. (See examples below).
-->
I do not have a lot of experience with image processing or 3D algorithms, so here is the best method I can think of so far:
Predefine a set of coordinates (say if we have a 10x10 grid, we have 100 coordinates that starts with (0,0), (0,10), (0,20), ... etc. There will be 9x9 = 81 grids.
Record the transformations for each individual coordinate on the t-shirt image e.g. (0,0) becomes (51,31), (0, 10) becomes (51, 35), etc.
Triangulate the original image into 81x2=162 triangles (with 2 triangles for each grid). Transform each triangle of the image based on the coordinate transformations obtained in Step 2 and draw it on the t-shirt image.
Problems/questions I have:
I don't know how to smooth out each triangle so that the image on t-shirt does not look ragged.
Is there a better way to do it? I want to make sure I'm not reinventing the wheels here before I proceed with an implementation.
Thanks!
This is called digital image warping. There was a popular graphics text on it in the 1990s (probably from somebody's thesis). You can also find an article on it from Dr. Dobb's Journal.
Your process is essentially correct. If you work pixel by pixel, rather than trying to use triangles, you'll avoid some of the problems you're facing. Scan across the pixels in target bitmap, and apply the local transformation based on the cell you're in to determine the coordinate of the corresponding pixel in the source bitmap. Copy that pixel over.
For a smoother result, you do your coordinate transformations in floating point and interpolate the pixel values from the source image using something like bilinear interpolation.
It's not really a solution for the problem, it's just a workaround :
If you have the 3D model that represents the T-Shirt.
you can use directX\OpenGL and put your image as a texture of the t-shirt.
Then you can ask it to render the picture you want from any point of view.
Maybe you've noticed but Google Image search now has a feature where you can narrow results by color. Does anyone know how they do this? Obviously, they've indexed information about each image.
I am curious what the best methods of analyzing an image's color data to allow simple color searching.
Thanks for any and all ideas!
Averaging the colours is a great start. Just downscale your image to 10% of the original size using a Bicubic or Bilinear filter (or something advanced anyway). This will vastly reduce the colour noise and give you a result which is closer to how humans perceive the image. I.e. a pixel-raster consisting purely of yellow and blue pixels would become clean green.
If you don't blur or downsize the image, you might still end up with an average of green, but the deviation would be huge.
The Google feature offers 12 colors with which to match images. So I would calculate the Lab coordinate of each of these swatches and plot the (a*, b*) coordinate of each of these colors on a two dimensional space. I'd drop the L* component because luminance (brightness) of the pixel should be ignored. Using the 12 points in the (a*, b*) space, I'd calculate a partitioning using a Voronoi Diagram. Then for a given image, I'd take each pixel, calculate its (a*, b*) coordinate. Do this for every pixel in the image and so build up the histogram of counts in each Voronoi partition. The partition that contains the highest pixel count would then be considered the image's 'color'.
This would form the basis of the algorithm, although there would be refinements related to ignoring black and white background regions which are perceptually not considered to be part of the subject of the image.
Average color of all pixels? Make a histogram and find the average of the 'n' peaks?