For specific reasons, I'm implementing my own font rendering, and an algorithm to compute the bounding box of a text by using bounding boxes of single glyphs, together with the respective advance, is needed. For clarification see this illustration:
For every of these N glyphs is given the relative bbox relative to the origin (xmin, xmax, ymin, ymax), and the advance (pseudocode):
int xmin[N], xmax[N], ymin[N], ymax[N], adv[N];
I will give the answer myself if noone bites.
For the general case you won't be able to accurately measure a string using just the glyph bounding box and the advance width due to hinting, kerning, and ligatures which will significantly modify the sizes of glyphs and their spacing depending on their position in the string as well as the chosen font size.
The naive way to calculate the string length by iterating the glyphs and adding the advance width and xMin will only be correct for certain fonts at certain sizes and shouldn't be used if at all possible.
The easiest way to get font metrics is to use an established font rendering engine to either directly get the length of the text (if the API exposes that) or render the text to a surface and find the min/max pixel in the X/Y directions.
Related
When using dot to draw a graph, is there a way to specify the desired node label fontsize and a maximum width for the overall graph?
I tried with setting the graph size to "2.79,10000000" in order to have a image at most 2.79in wide. The default for nodes is set to a specific fontsize and a margin of "0.0,0.0".
When outputting this, dot produces a pdf that is 2.79in wide, but the fonts have been scaled down. If I correct the set fontsize by the factor my font was shrunk now, the output looks fine, i.e., my labels appear in the correct fontsize.
Is there a way to achieve the desired maximum width plus the fixed fontsize without having to manually correct the fontsize by a factor?
I don't think there is an easy general solution for this particular problem: The graph first gets layed out, then scaled down (including text) to fit the size requirement (see also Controlling the size).
If the text of a wide graph were to stay the same size when getting scaled down, in most cases the final width of the graph will probably stay almost the same.
For some simple graphs, you may specify the ratio in addition to the page size and make sure that the graph to generate will not be wider that the size (in order to prevent shrinking).
I am looking for algorithms to, given an image containing a crossword
crop the image to just the crossword
distinguish between regular and barred crosswords
extract the grid size and the positions of the black squares/bars
The crossword itself can be assumed to be regular (i.e. I am interested in crosswords that have been generated by some program and published as an image, rather than scanned paper-based crosswords), and I would like the program to run without needing any inputs other than the image bitmap.
I can think of some brute-force multi-pass ways to do this (essentially using variants of imagemagick's hit-and-miss filter and then looping over the image looking for leftover dots) but I'm hoping for better ideas from people who actually know about image processing.
This is a really broad question, but I wil try to give you some pointers.
These are the steps you need to take:
Detect the position of the crossword.
Detect the grid of the crossword. For this, you will need some Computer Vision algorithm (for example the Hough lines detector).
For each cell, you need to find if it have a character or not. To do so, you just simply analize the "amount" of white color does the cell have
For the cells containing a character you need to recognize it. To do so, you need an OCR, and I recommend you Tesseract.
Create your own algorithm for solving the crosswords. You can use this.
And here (1,2,3) you have an example of a Sudoku Solver in Python. The first steps are common to your problem so you can use OpenCV to solve it like this:
import cv2
import numpy as np
#Load the Black and White image
img = cv2.imread('sudoku.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
#Detect the lines of the sudoku
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#Detect the square of the Sudoku
biggest = None
max_area = 0
for i in contours:
area = cv2.contourArea(i)
if area > 100:
peri = cv2.arcLength(i,True)
approx = cv2.approxPolyDP(i,0.02*peri,True)
if area > max_area and len(approx)==4:
biggest = approx
max_area = area
Using a screenshot of the linked crossword as example, I assume that:
the crossword grid is crisp, i.e. the horizontal and vertical grid lines are drawn at exact pixels with a constant dark colour and that there is no noise inside the grid cells,
the crossword is black or another relatively dark colour ("black") on white or light grey ("white"),
the clue numbers are written in the top left corner,
the crossword is rectangular and regular.
You can then scan the image from top to bottom to find horizontal black lines of sufficient length. A line starts with a black pixel and ends with a white pixel. Other pixels are indicators that it is not a line. (This is to weed out text and buttons.) Do the same for vertical lines.
Ideally, you now have the crossword lines. If your image is not cropped to the crossword, you might have false positives, such as the button borders. To find the crossword lines, sort them by length and look for the largest contiguous block of the same length. These should be your crossword lines unless you hae some degenerate cases
Now do a nested loop of horizontal and vertical lines, but skip the first line. Look two or three pixels to the northwest of the intersection of the lines. If the pixel is dark, that's a blank. If it is light, it's a cell. This heuristic seems to work well. I say dark and light here, bacause some crosswords use grey cells to save on ink when printing and some cell are highlighted in the screenshot.
If you end up with no blanks, you have a barred crossword. You can find the bars by checking whether one of the pixels to the left and right of a cell border is black.
Lastly, a tip: If you want to use your algorithm to find the cells in a crossword generated with the Crossword Compiler, look at the source. You will find a link to a Javascript file /puzzles/sample/cryptic_demo/cryptic_demo_xml.js, which contans the crossword as XML string, which also gives you the clues as a bonus.
Older versions of the Crossword Compiler, such as the one used for the Independent Cryptic hide their data in a file loaded from an applet. The format of that file is binary, but not too hard to read if you know the original data.
Try hough transform to find squares and when you get the squares check using histogram whether it is a dark or white square using threshold on its gray scale values
Thinking of an alternative way to do this.
This is similar in many respects to object recongition, computer vision
One way would be to use a framework like openCV which, trained with some samples of what you want to detect, can detect any similar results
(a javascript library for object detection based on Viola-Jones algorithm, used also by openCV and of which am the author is HAAR.js)
Apart from this (or a similar alternative to this) there is a possibility of constructing a "visual" template of a crossword you want to detect (in a scale-invariant way)
and scan the images looking for correlations of parts of the image with the template (complexity O(N*M), N size of image, M size of template)
Since crossword grids have relatively constant shapes (especially fixed outputs of crossword compilers) it should be relative easy to create a prototype template and have success in matching (and aligning) the detected regions to extract the shape information
I have a grid of wells in an image and I'm trying to analyze this in Matlab. I want to create a box around each well to use as a mask. The way I am trying to go about this is to find the offset vectors from the X and Y normal and then use that to make a grid since I know the size of the wells.
I can mask out some of the wells but not all of them---but this doesn't matter since I know that there is a well in every position (see here). I can use regionprops to get the centers but I can't figure out how to move to the next step.
Here is an image with the centers I can extract
Some people have suggested that I do an FFT of the image but I can't get it to work. Any thoughts or suggestions would be greatly appreciated. Thanks in advance!
Edit: Here is the mask with the centers from the centroid feature of regionprops.
here's a quick and dirty 2 cents:
First blur and invert the image so that the well lines will have high intensity values vs the rest, and further analysis will be less sensitive to noise:
im=double(imread('im.jpg'));
im=conv2(im,fspecial('Gaussian',10,1),'same');
im2=abs(im-max(im(:)));
Then, take a local threshold using the average intensity around a neighborhood of (more or less) a well size (~200 pixels)
im3=imfilter(im2,fspecial('average',200),'replicate');
im4=im2-im3;
bw=im2bw(im4,0);
Fill holes (or wells):
[bw2,locations] = imfill(bw,'holes');
Remove objects smaller than some size:
bw3 = bwareaopen(bw2, 2000, 8);
imagesc(bw3);
You can take it from there...
I'm searching for an certain object in my photograph:
Object: Outline of a rectangle with an X in the middle. It looks like a rectangular checkbox. That's all. So, no fill, just lines. The rectangle will have the same ratios of length to width but it could be any size or any rotation in the photograph.
I've looked a whole bunch of image recognition approaches. But I'm trying to determine the best for this specific task. Most importantly, the object is made of lines and is not a filled shape. Also, there is no perspective distortion, so the rectangular object will always have right angles in the photograph.
Any ideas? I'm hoping for something that I can implement fairly easily.
Thanks all.
You could try using a corner detector (e.g. Harris) to find the corners of the box, the ends and the intersection of the X. That simplifies the problem to finding points in the right configuration.
Edit (response to comment):
I'm assuming you can find the corner points in your image, the 4 corners of the rectangle, the 4 line endings of the X and the center of the X, plus a few other corners in the image due to noise or objects in the background. That simplifies the problem to finding a set of 9 points in the right configuration, out of a given set of points.
My first try would be to look at each corner point A. Then I'd iterate over the points B close to A. Now if I assume that (e.g.) A is the upper left corner of the rectangle and B is the lower right corner, I can easily calculate, where I would expect the other corner points to be in the image. I'd use some nearest-neighbor search (or a library like FLANN) to see if there are corners where I'd expect them. If I can find a set of points that matches these expected positions, I know where the symbol would be, if it is present in the image.
You have to try if that is good enough for your application. If you have too many false positives (sets of corners of other objects that accidentially form a rectangle + X), you could check if there are lines (i.e. high contrast in the right direction) where you would expect them. And you could check if there is low contrast where there are no lines in the pattern. This should be relatively straightforward once you know the points in the image that correspond to the corners/line endings in the object you're looking for.
I'd suggest the Generalized Hough Transform. It seems you have a fairly simple, fixed shape. The generalized Hough transform should be able to detect that shape at any rotation or scale in the image. You many need to threshold the original image, or pre-process it in some way for this method to be useful though.
You can use local features to identify the object in image. Feature detection wiki
For example, you can calculate features on some referent image which contains only the object you're looking for and save the results, let's say, to a plain text file. After that you can search for the object just by comparing newly calculated features (on images with some complex scenes containing the object) with the referent ones.
Here's some good resource on local features:
Local Invariant Feature Detectors: A Survey
Let's say I query for
http://images.google.com.sg/images?q=sky&imgcolor=black
and I get all the black color sky, how actually does the algorithm behind work?
Based on this paper published by Google engineers Henry Rowley, Shumeet Baluja, and Dr. Yushi Jing, it seems the most important implication of your question about recognizing colors in images relates to google's "saferank" algorithm for pictures that can detect flesh-tones without any text around it.
The paper begins by describing by describing the "classical" methods, which are typically based on normalizing color brightness and then using a "Gaussian Distribution," or using a three-dimensional histogram built up using the RGB values in pixels (each color is a 8bit integer value from 0-255 representing how much . of that color is included in the pixel). Methods have also been introduced that rely on properties such as "luminance" (often incorrectly called "luminosity"), which is the density of luminous intensity to the naked eye from a given image.
The google paper mentions that they will need to process roughly 10^9 images with their algorithm so it needs to be as efficient as possible. To achieve this, they perform the majority of their calculations on an ROI (region of interest) which is a rectangle centered in the image and inset by 1/6 of the image dimensions on all sides. Once they've determined the ROI, they have many different algorithms that are then applied to the image including Face-Detection algs, Color Constancy algs, and others, which as a whole find statistical trends in the image's coloring and most importantly find the color shades with the highest frequency in the statistical distribution.
They use other features such as Entropy , Edge-Detection, and texture-definitions to
In order to extract lines from the images, they use the OpenCV implementation (Bradski, 2000) of the probabilistic Hough transform (Kiryati et al., 1991) computed on the edges of the skin color connected components, which allows them to find straight lines which are probably not body parts and additionally allows them to better determine which colors are most important in an image, which is a key factor in their Image Color Search.
For more on the technicalities of this topic including the math equations and etc, read the google paper linked to in the beginning and look at the Research section of their web site.
Very interesting question and subject!
Images are just pixels. Pixels are just RGB values. We know what black is in RGB, so we can look for it in an image.
Well, one method is, in very basic terms:
Given a corpus of images, determine the high concentrations of a given color range (this is actually fairly trivial), store this data, index accordingly (index the images according to colors determined from the previous step). Now, you have essentially the same sort of thing as finding documents containing certain words.
This is a very, very basic description of one possible method.
There are various ways of extracting color from an image, and I think other answers addressed them (K-Means, distributions, etc).
Assuming you have extracted the colors, there are a few ways to search by color. One slow, but obvious approach would be to calculate the distance between the search color and the dominant colors of the image using some metric (e.g. Color Difference), and then weight the results based on "closeness."
Another, much faster, approach would be to essentially downscale the resolution of your color space. Rather than deal with all possible RGB color values, limit the extraction to a smaller range like Google does (just Blue, Green, Black, Yellow, etc). Then the user can search with a limited set of color swatches and calculating color distance becomes trivial.