I am looking for Javascript libraries that can help me do the following:
1.Given an image. Crop it using 4 coordinates points. (this may result in a non regular quadrilateral)
2.Transform the new image into a regular rectangle.
There is an example here.
I've looking at libraries like JCrop (and many others), but as far as I can see it only crops using regular rectangles.
The libraries could either be for client or node.js.
Bonus points for a computer vision library that can do corner detection.
Many thanks,
As far as i can see the "transformation" just crops the image anyway.
So just first transform the 4 coordinates to a regular rectangle and then use a library like JCrop.
Example: (each coordinate is x,y)
[left-up] | [right-up]
[left-down] | [right-down]
4 coords:
22,11 | 45,13
25,56 | 47,62
Rectangle:
25,13 | 45,13
25,56 | 45,56
Related
I am trying to find a way to determine whether an image needs to be rotated in order for the text to be horizontally aligned. And if it does need to be rotated then by how many degrees?
I am sending the images to tesseract and for tesseract to be effective, the text in the images needs to be horizontally aligned.
I'm looking for a way do this without depending on the "Orientation" metadata in the image.
I've thought of following ways to do this:
Rotate the image 90 degrees clockwise four times and send all four images to tesseract. This isn't ideal because of the need to process one image 4 times.
Use hough line transform to see if the lines are vertical or horizontal. If they are vertical then rotate the image. This way the image still might need to be rotated 180 degrees. So I'm unsure how effective this would be.
I'm wondering if there are other ways to accomplish this using OpenCV, imageMagik or any other image processing techniques.
If you have a 1000 images which say horizontal or vertical, you can resize these images to 224x224 and then fine-tune a Convolutional neural network, like AlexNet or VGG for this task. If you want to know how many right rotations to make for the image, you can set the labels as the number of clock-wise rotations, like 0,1,2,3.
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
Aytempting ocr on all 4 orientations seems like a reasonable choice, and I doubt you will find a more reliable heuristic.
If speed is an issue, you could OCR a small part of the image first. Select a rectangular region, that has the proper amount of edge pixels and white/black ratio for text, then send that to tesseract in different orientations. With a small region, you could even try smaller steps than 90°, or combine it with another heuristic like Hough.
If you remember the most likely orientation based on previous images, and stop once an orientation is successfully processed by tesseract, you probably do not even have to try most orientations in most cases.
You can figure this out in a terminal with tesseract's psm option.
tesseract --psm 0 "infile" "outfile" will create outfile.osd which contains the info:
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 27.93
Script: Latin
Script confidence: 6.55
man tesseract
...
--psm N
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
...
I have about 6000 aerial images taken by 3DR drone for vegetation sites.
The images have to overlap to some extant because the drone flights cover the area go from EW and then again NS, so the images present the same area from two directions. I need the overlap for the images for extra accuracy.
I don't know to write a code on IDL to combine the images and create that overlap. Can anyone help please?
Thanks
What you need is something that is identifiable that occurs in both images. Preferably you would have several things across the field of view so that you could get the correct rotation as well as a simple x-y shift.
The basic steps you will need to follow are:
Source Identification - Identify sources in all images that will later be used to align the images. Make sure the centering of these soruces are good so that they will align better later.
Basic alignment. Start with a guess on where the images should align then try to match the sources.
Match the sources. There are several libraries that can do this for stars (in astronomical images) that could be adapted for this.
Shift and rotate the images. This can be done to the pixels or to the header that is read in and have a program manipulate the pixels on the fly.
I drew some quad like this with OpenGL:
_______ _______
| || |
| || |
| || |
|_______||_______|
currently I draw second quad using firstQuad.pos.x + width which is calculated manually.but when I want to scale them in at center point, I was wondering is it the right way to use calculated value, or use glTranslatef one by one, then glTranslatef to center of them, then use glScalef to scale them in? or how to do it right?
Unless you are updating quad's position firstQuad according to your transformations, yes, you will have to use the GL matrix manipulation functions as you described. I'm assuming you are using legacy GL here (2.1 and older), modern releases no longer provide matrix manipulation functions.
What you must understand is that a GL transformation must be seen as a transformation on the base and origin that will be used for further draw calls, until reset to a previous state with glPopMatrix().
Note: This is not for a webpage, it is a simple program that holds a set of images and will randomly pick a number of images and display them on the screen. Imagine working with an image editor and manually positioning imported images on the canvas.
I am having difficulty coming up with a way to position a set of arbitrary images on a screen of fixed dimension (it's just a window)
So for example, if I have one image, I would probably just position it in the center of the screen.
|
If I have two images, I would try to place them in the center of the screen, but then spread them apart horizontally so that they look centered relative to each other and also the screen.
| |
But what if one image is larger than the other two? I might have something like
|-----|
| |
Similarly, maybe I have two larger ones and two smaller ones
|-----| |-----|
| |
So that the large one appears "in the back" while the small ones are up front.
It is inevitable that some images will end up covering up parts of other images but the best I can do is try to make it as orderly as possible.
I can quickly grab the dimensions of each image object that is to be drawn, and there is a limit on how many images will be drawn (from 1 to 8 inclusive).
Images can be drawn anywhere on the screen, and if any part of the image is outside of the screen those parts will just be cut off. All images have dimensions smaller than the dimensions of the screen, and are typically no bigger than 1/4 of the entire screen.
What is a good way to approach this problem? Even handling the base cases like having two images (of possibly different sizes) is already pretty confusing.
You could treat this as the 2D bin packing problem, which will optimise for non-overlapping rectangles in a "compact" way, though aesthetics won't be a consideration.
If you want to roll your own, you could try placing all images on the canvas on a grid, with the centre-to-centre spacing being large enough that no images overlap. Then "squash" the images closer together, left to right and top to bottom, to reduce the amount of whitespace.
html tables of 100% width and height (with disabled overflows) are a good starting point IMO - in the first iteration just order the pictures by size and make 8 templates like:
<tr><td><img></td></tr>
<tr><td><img></td><td><img></td></tr>
2 rows, first with colspan=2
...
then find some ugly cases and make special rules for them (like for 3 vertical images make 1 row, ...)
I have a large collection of scanned images, and they are all somewhat skewed, with a white area around them.
So, these images have rectangles of colors, surrounded by a large white area. The problem is that these rectangles of color are not parallel to the image border.
I'm sure there must be a way to programmatically detect these rectangles of color, so that I can rotate the image (thus un-skewing it) and then crop it so that just the interesting part is left. I guess I'm not really sure what this process is called, so I am having trouble searching for a solution on Google.
Does anyone know of an approach that would get me started? Any libraries out there that I should look into? Or the name of an algorithm that would help?
I am planning on using Java for this project, but I haven't really started yet, so I am open to library suggestions in any language.
border detection
hough transform (if all rectangles on an image have the same skew)
rectangle contour detection (connected component contour, then minimum area bounding rectangle)
Alyn is a third party package to detect and fix skew in images containing text. It uses Canny Edge Detection and Hough Transform to find skew.
To detect the skew, just run
./skew_detect.py -i image.jpg
To correct the skew, run
./deskew.py -i image.jpg -o skew_corrected_image.jpg
You might also try scikit-image http://scikit-image.org/docs/dev/auto_examples/.
It's a great library for the hough transformation, but also has other methods like Radon transformation and geometric transformations for this kind of task.
This is a python library.