I'm sure this is already answered somewhere, but I just don't know the correct terminology to search for.
Context: I'm developing some code to generate a PDF that is using a fairly low-level library. So I'm having to write some basic text layout and fitting routines that will break on word boundaries and fit the text within defined constraints (e.g., in a column or around a fixed block).
I'd like to a find a reasonably efficient approach for fitting text around an arbitrary shape; eg something like this:
(This example was taken from this blog post: http://blog.amyworrall.com/post/11098565269/text-wrap-with-core-text, that was an answer to this question: Rendering CoreText within an irregular shape)
I'm guessing I need to break the text down into a series of boxes and it then becomes a geometry problem of fitting boxes into the shape, but I'm struggling to find good explanations of suitable algorithms or approaches for this. Delving into browser engine layout code to see how they do it is a case of getting lost in all the detail.
Related
I have an idea for an app that takes a printed page with four squares in each corner and allows you to measure objects on the paper given at least two squares are visible. I want to be able to have a user take a picture from less than perfect angles and still have the objects be measured accurately.
I'm unable to figure out exactly how to find information on this subject due to my lack of knowledge in the area. I've been able to find examples of opencv code that does some interesting transforms and the like but I've yet to figure out what I'm asking in simpler terms.
Does anyone know of papers or mathematical concepts I can lookup to get further into this project?
I'm not quite sure how or who to ask other than people on this forum, sorry for the somewhat vague question.
What you describe is very reminiscent of augmented reality marker tracking. Maybe you can start by searching these words on a search engine of your choice.
A single marker, if done correctly, can be used to identify it without confusing it with other markers AND to determine how the surface is placed in 3D space in front of the camera.
But that's all very difficult and advanced stuff, I'd greatly advise to NOT try and implement something like this, it would take years of research... The only way you have is to use a ready-made open source library that outputs the data you need for your app.
It may even not exist. In that case you'll have to buy one. Given the niché of your problem that would be perfectly plausible.
Here I give you only the programming aspect and if you want you can find out about the mathematical aspect from those examples. Most of the functions you need can be done using OpenCV. Here are some examples in python:
To detect the printed paper, you can use cv2.findContours function. The most outer contour is possibly the paper, but you need to test on actual images. https://docs.opencv.org/3.1.0/d4/d73/tutorial_py_contours_begin.html
In case of sloping (not in perfect angle), you can find the angle by cv2.minAreaRect which return the angle of the contour you found above. https://docs.opencv.org/3.1.0/dd/d49/tutorial_py_contour_features.html (part 7b).
If you want to rotate the paper, use cv2.warpAffine. https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_geometric_transformations/py_geometric_transformations.html
To detect the object in the paper, there are some methods. The easiest way is using the contours above. If the objects are in certain colors, you can detect it by using color filter. https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html
I just read Scott Hansleman's post on Guided View Technology in comics
and I though that this would be awesome if implemented in other avenues (specifically in manga )
I mean reading right to left in itself can be a little weird to start with and this would lower the barrier to entry for new readers.
I was wondering if there was possibly an open source project out there in the wild or if not then possibly a means to get started with something like this as I am not an image processing guru. In particular I just really would need to figure out which lines are panels and where to slice into smaller pictures. Because comics all have their own prefs as far as line thickness I'm not sure if there is a simple way to do this that works across many different border thicknesses and styles. Language doesn't matter so much here, I'm really about dealing with concepts and patterns of attack.
You can start by looking at the Duda-Hart implementation of the Hough transform for lines.
http://en.wikipedia.org/wiki/Hough_transform
The Hough algorithm will yield equations for straight lines. From that you can find intersections, identify rectangles, etc.
You can also use a kernel-based corner detection to find T-, L-, and X-intersections.
http://en.wikipedia.org/wiki/Corner_detection
One difficulty is that some panels in comics won't have "hard" edges, or may have edges that are squiggly, circular/elliptical, French curvy, etc. You can find particular algorithms for particular problems, but it would be hard to generalize these algorithms in a set of rules and programmatic logic that will work for all (or even most) samples. It seems that a hallmark of a good comic could be considered to be the elegant and sometimes surprising panelization, "surprising" being a synonym for unpredictable. Although there are many methods to "segment" an image into different regions, this is still an active area of research.
But if you start with Hough lines you'll have a good start and learn a lot about image processing.
I 'm trying to find an efficient way of acceptable complexity to
detect an object in an image so I can isolate it from its surroundings
segment that object to its sub-parts and label them so I can then fetch them at will
It's been 3 weeks since I entered the image processing world and I've read about so many algorithms (sift, snakes, more snakes, fourier-related, etc.), and heuristics that I don't know where to start and which one is "best" for what I'm trying to achieve. Having in mind that the image dataset in interest is a pretty large one, I don't even know if I should use some algorithm implemented in OpenCV or if I should implement one my own.
Summarize:
Which methodology should I focus on? Why?
Should I use OpenCV for that kind of stuff or is there some other 'better' alternative?
Thank you in advance.
EDIT -- More info regarding the datasets
Each dataset consists of 80K images of products sharing the same
concept e.g. t-shirts, watches, shoes
size
orientation (90% of them)
background (95% of them)
All pictures in each datasets look almost identical apart from the product itself, apparently. To make things a little more clear, let's consider only the 'watch dataset':
All the pictures in the set look almost exactly like this:
(again, apart form the watch itself). I want to extract the strap and the dial. The thing is that there are lots of different watch styles and therefore shapes. From what I've read so far, I think I need a template algorithm that allows bending and stretching so as to be able to match straps and dials of different styles.
Instead of creating three distinct templates (upper part of strap, lower part of strap, dial), it would be reasonable to create only one and segment it into 3 parts. That way, I would be confident enough that each part was detected with respect to each other as intended to e.g. the dial would not be detected below the lower part of the strap.
From all the algorithms/methodologies I've encountered, active shape|appearance model seem to be the most promising ones. Unfortunately, I haven't managed to find a descent implementation and I'm not confident enough that that's the best approach so as to go ahead and write one myself.
If anyone could point out what I should be really looking for (algorithm/heuristic/library/etc.), I would be more than grateful. If again you think my description was a bit vague, feel free to ask for a more detailed one.
From what you've said, here are a few things that pop up at first glance:
Simplest thing to do it binarize the image and do Connected Components using OpenCV or CvBlob library. For simple images with non-complex background this usually yeilds objects
HOwever, looking at your sample image, texture-based segmentation techniques may work better - the watch dial, the straps and the background are wisely variant in texture/roughness, and this could be an ideal way to separate them.
The roughness of a portion can be easily found by the Eigen transform (explained a bit on SO, check the link to the research paper provided there), then the Mean Shift filter can be applied on the output of the Eigen transform. This will give regions clearly separated according to texture. Both the pyramidal Mean Shift and finding eigenvalues by SVD are implemented in OpenCV, so unless you can optimize your own code its better (and easier) to use inbuilt functions (if present) as far as speed and efficiency is concerned.
I think I would turn the problem around. Instead of hunting for the dial, I would use a set of robust features from the watch to 'stitch' the target image onto a template. The first watch has a set of squares in the dial that are white, the second watch has a number of white circles. I would per type of watch:
Segment out the squares or circles in the dial. Segmentation steps can be tricky as they are usually both scale and light dependent
Estimate the centers or corners of the above found feature areas. These are the new feature points.
Use the Hungarian algorithm to match features between the template watch and the target watch. Alternatively, one can take the surroundings of each feature point in the original image and match these using cross correlation
Use matching features between the template and the target to estimate scaling, rotation and translation
Stitch the image
As the image is now in a known form, one can extract the regions simply via pre set coordinates
I'm looking for guidance on implementing a view that renders an NSAttributedString within a polygon with holes, wrapping and reflowing text to fit the geometry. It's not CoreText that's the issue, but the general problem of partitioning an irregular shape into an ordered sequence of squat rectangles.
Similar questions haven't been answered fully:
How to fill a shape with text in Javascript
https://stackoverflow.com/q/3048305
CoreText's CTFramesetter does not support rendering into a CGPath
https://stackoverflow.com/q/3813318
CoreText handles an unbelievable amount of the grunt work associated with text layout and display, so I can't help but suspect that I'm reinventing a wheel. For the purposes of this question, please assume that I can check the substring that fits within a given rectangle, taking into account word wrap and hyphenation.
Edit: I've since decided to just sweep left-to-right drawing as much as fits between boundaries. It looks a bit haphazard even though I'm breaking at natural word boundaries, so I'd still appreciate guidance on how other applications wrap text.
Edit #2: It looks decent now that it supports basic word wrap and avoids rendering very short lines. My question must have been too vague. Thanks for looking.
Edit #3: Amorya points out that CTFramesetter now accepts any CGPath.
I wrote a blog post about achieving text wrap with Core Text:
http://blog.amyworrall.com/post/11098565269/text-wrap-with-core-text
The feature is new in iOS 4.3 and MacOS X Lion. You can now firstly draw inside non-rectangular paths, and secondly pass in other paths to mask the flow (i.e. be the holes you wrap around).
With my new assignment I am looking for a method to detect the presence of text on image. The image is a map - can be for example google map. The task is to detect where the street/city label is placed.
I know that opencv library has algorithm that can detect features (for example human faces) - haar classifier or hog (histogram of oriented gradients), but I heard that learning process of such algorithms is quite difficult.
Do you know of any algorithm, method or a library that could do that (detect presence of text on image)?
Thanks,
John
There is a standard problem in vision called text detection in images. it is quite different to OCR. OCR concerms itself with what it says, while text detection is about determining if there is text in the image. Adi Shavit's third link is a method to address this problem. You can look on google scholar well cited articles on text detection.
There are several possible approaches you can take.
Use OCR. A search for OCR on Stackoverflow will show many options. These include Tesseract and Ocropus.
If your text uses very specific fixed font, you may get away with simple template matching.
In the more general case you might want to take a look at "Detecting Text in Natural Scenes with Stroke Width Transform"
UPDATE Jan. 2017
The OpenCV 3.2 contrib module now has a text detection module.
It also includes a sample (C++, Python) of how to use it.
You need to tune this to a specific type of map images, or the problem is going to be very difficult (see the previous post about links to articles).
OCR is the way to go, and you should use an existing library. However, OCR is mainly done on text on white backgrounds. To reduce your problem to a regular OCR problem, you should attempt to work on the color space of the map. Likely the map text has a very specific color and this may be enough to find these pixels. You can then filter the detected pixels based on the size of connected regions.
If you literally only want to find the locations of text labels, you can do the above, and pretty much just skip the OCR step. If the labels are not too close, simple clustering algorithms can be used to find their respective positions.