Performing OCR on gas meter - image

I want to perform OCR on a gas meter so it can read the value. An example of a meter I want to perform OCR on:
The OCR should return 25539144 in this case.
As you can see there is a bit of a problem: There is a lot of text around the meter. So a normal OCR library wouldn't work here since it would return the text around it aswell.
I already tried object detection to detect the meter but the only one that seems to work well (because I have only 50 pictures) is the azure cognitive services. The problem is that later on it should be able to detect it in a live stream so a web service is not possible.
Can anyone help me in the right direction to tackle this problem?

If the comment about using the colour doesn't help you then you could try this approach:
One possible approach could be to train a model (a NN perhaps) to draw a bounding box around the usage numbers.
You're going to have to draw a few boxes by hand to provide training examples.
Once you've run this "bounding-box creation model" you can crop out all the irrelevant stuff and you'll have a new training set consisting of examples that are easier to learn from.
You can then try retraining your ocr model on this new dataset.

Related

Image segmentation for yolo

For a project I am using YOLO to detect phallusia (microbial organisms) that swim into focus in a video. The issue is that I have to train YOLO on my own data. The data needs to be segmented so I can isolate the phallusia. I am not sure how to properly segment/cut-out the phallusia to fit the format that YOLO needs. For example in the picture below I want YOLO to detect when a phallusia is in focus similar to the one I have boxed in red. Do I just cut-out that segment of the image and save it as its own image and feed to that to YOLO? Do all segmented images need to have the same dimensions? Not sure what I am doing and could use some guidance.
It looks like you need to start from basics, ok, no fear. I will try to suggest a simple route to start efficiently to use YOLO techniques. Luckly the web has a lot of examples.
Understand WHAT is a YOLO method.
Andrew NG's YOLO explanation is a good start, but only if you alread know what are classification and detection.
Understand the YOLO Loss function, the heart of the algorithm.
Check the paper YOLO itself, don't be scared. At page #2, in Unified Detection section, you will find the information about the bounding box detection used, but be aware that you can use whatever notation you want (even invent a new one), in order to be compatible with the Loss function, real meaning of this algorithm.
Start to implement an example
As I wrote above, there are plenty of examples. You can check this one if you are familiar with python and tensorflow.
Inside it you will find a way to prepare the dataset, that is your target for this question, I think. In this case a tool named labelImg is used.
I hope it will be useful. Please share your code when it will be ready, I'm curious :). Good luck!
Do I just cut-out that segment of the image and save it as its own image and feed to that to YOLO?
You need as much images as you can get of your microbial organism, in different sizes, positions, etc. It doesn't need to be the only thing on the image, but you need to know the <x> <y> <width> <height> position of it.
Do all segmented images need to have the same dimensions?
No, they can be of any size and Yolo adapts them. See the VOC dataset for examples of images Yolo is normally trained on. A couple examples;
kitchen, dogs
Not sure what I am doing and could use some guidance.
My advice would be to follow the instructions for "Training YOLO on VOC" from the original Yolo website; https://pjreddie.com/darknet/yolo/
Once you have that working, you will have a better idea of the steeps you need to take.
I had similar problems when I wanted to train YOLOv2 for some game cards.
In order to solve the problem I took a picture from every game card with my cellphone and I cut out them. Because I didn't have enough training data I wrote a dataset generator program what generated the training data by using the photos from the cards. This program is able to multiply, rotate, scale the image then to place it on a background.
It can happen that you will have problems if you don't have enough learning data. In this case don't panic, because from several raw images by rotating and scaling you can generate a large dataset.
Here you can find my dataset generator, which is able to generate Pascal VOC style and darknet style training data: https://github.com/szaza/dataset-generator. Feel free to reuse it, if you need something similar.

Image analysis - how can I extract image label out of bottle

I am thinking of using OpenCV library for image analysis. Basically I want to automate in my project the extraction of image label from wine bottle.
This is the sample input image:
This is the sample output:
I am thinking what should be my general strategy to extract the image. I am not asking for direct code. Just want to know the general approach to solve the problem.
Thanks!
Sorry for vage answer but in applied computer vision is no such thing like general approach.
some will disagree of course but in reality
all CV applications are custom made for some specific purpose/task
in your case is the idea to find cylindric and probably standing object (bottle)
and then finding of irregular parts in it
I would do it like this:
1.remove noise as much as possible (smooth/sharpen filters)
2.(optionaly) reduce image data (via (i)FT or (i)DCT for example)
3.segmentate objects (usually by homogenity of color or by edge detection or by booth)
4.identify bottle object (by color,shape,or illumination (glass is transparent))
5.identify objects inside bottle (homogenity,not transparent,usually sharp edges,color is not good some labels are black on dark glass)
6.(optional) project label back from cylindric space to flat texture
[notes]
create app with many scrollbars and checkboxes
to be able to change all tresholds and enable disable filters or their order on the run
all parts will take a lot of tweaking of tresholds and weights
you have to do a lot of trial and error runs to find the best filters and their config for your task

Lightweight 3D animation driven by external data

I'm a structural engineering master student work on a seismic evaluation of a temple structure in Portugal. For the evaluation, I have created a 3D block model of the structure and will use a discrete element code to analyze the behaviour of the structure under a variety of seismic (earthquake) records. The software that I will use for the analysis has the ability to produce snapshots of the structure at regular intervals which can then be put together to make a movie of the response. However, producing the images slows down the analysis. Furthermore, since the pictures are 2D images from a specified angle, there is no possibility to rotate and view the response from other angles without re-running the model (a process that currently takes 3 days of computer time).
I am looking for an alternative method for creating a movie of the response of the structure. What I want is a very lightweight solution, where I can just bring in the block model which I have and then produce the animation by feeding in the location and the three principal axis of each block at regular intervals to produce the animation on the fly. The blocks are described as prisms with the top and bottom planes defining all of the vertices. Since the model is produced as text files, I can modify the output so that it can be read and understood by the animation code. The model is composed of about 180 blocks with 24 vertices per block (so 4320 vertices). The location and three unit vectors describing the block axis are produced by the program and I can write them out in a way that I want.
The main issue is that the quality of the animation should be decent. If the system is vector based and allows for scaling, that would be great. I would like to be able to rotate the model in real time with simple mouse dragging without too much lag or other issues.
I have very limited time (in fact I am already very behind). That is why I wanted to ask the experts here so that I don't waste my time on something that will not work in the end. I have been using Rhino and Grasshopper to generate my model but I don't think it is the right tool for this purpose. I was thinking that Processing might be able to handle this but I don't have any experience with it. Another thing that I would like to be able to do is to maybe have a 3D PDF file for distribution. But I'm not sure if this can be done with 3D PDF.
Any insight or guidance is greatly appreciated.
Don't let the name fool you, but BluffTitler DX9, a commercial software, may be what your looking for.
It's simple interface provides a fast learning curve, may quick tutorials to either watch or dissect. Depending on how fast your GPU is, real-time previews are scalable.
Reference:
Model Layer Page
User Submitted Gallery (3D models)
Jim Merry from tetra4D here. We make the 3D CAD conversion tools for Acrobat X to generate 3D PDFs. Acrobat has a 3D javascript API that enables you to manipulate objects, i.e, you could drive translations, rotations, etc of objects from your animation information after translating your model to 3D PDF. Not sure I would recommend this approach if you are in a hurry however. Also - I don't think there are any commercial 3D PDF generation tools for the formats you are using (Rhino, Grasshopper, Processing).
If you are trying to animate geometric deformations, 3D PDF won't really help you at all. You could capture the animation and encode it as flash video and embed in a PDF, but this a function of the multimedia tool in Acrobat Pro, i.e, is not specific to 3D.

Morphing 2 faces images

I would like some help from the aficionados of openCV here.
I would like to know the direction to take (and some advices or piece of code) on how to morph 2 faces together with a kind of ratio saying 10% of the first and 90% of the second.
I have seen functions like cvWarpAffine and cvMakeScanlines but I am not sure how to use them.
So if somebody could help me here, I'll be very grateful.
Thanks in advance.
Unless the images compared are the exact same images, you would not go very far with this.
This is an artificial intelligence problem and needs to be solved as such. Typical solution involves:
Normalising the data (removing noise, skew, ...) from the images
Feature extraction (turn the image into a smaller set of data)
Use a machine learning (typically classifiers) to train the data with your matches
Test the result
Refine previous processes according to the results until you get good recognition
The choice of OpenCV functions used depends on your feature extraction method. Have a look at Eigenface.

Dilemma about image cropping algorithm - is it possible?

I am building a web application using .NET 3.5 (ASP.NET, SQL Server, C#, WCF, WF, etc) and I have run into a major design dilemma. This is a uni project btw, but it is 100% up to me what I develop.
I need to design a system whereby I can take an image and automatically crop a certain object within it, without user input. So for example, cut out the car in a picture of a road. I've given this a lot of thought, and I can't see any feasible method. I guess this thread is to discuss the issues and feasibility of achieving this goal. Eventually, I would get the dimensions of a car (or whatever it may be), and then pass this into a 3d modelling app (custom) as parameters, to render a 3d model. This last step is a lot more feasible. It's the cropping issue which is an issue. I have thought of all sorts of ideas, like getting the colour of the car and then the outline around that colour. So if the car (example) is yellow, when there is a yellow pixel in the image, trace around it. But this would fail if there are two yellow cars in a photo.
Ideally, I would like the system to be completely automated. But I guess I can't have everything my way. Also, my skills are in what I mentioned above (.NET 3.5, SQL Server, AJAX, web design) as opposed to C++ but I would be open to any solution just to see the feasibility.
I also found this patent: US Patent 7034848 - System and method for automatically cropping graphical images
Thanks
This is one of the problems that needed to be solved to finish the DARPA Grand Challenge. Google video has a great presentation by the project lead from the winning team, where he talks about how they went about their solution, and how some of the other teams approached it. The relevant portion starts around 19:30 of the video, but it's a great talk, and the whole thing is worth a watch. Hopefully it gives you a good starting point for solving your problem.
What you are talking about is an open research problem, or even several research problems. One way to tackle this, is by image segmentation. If you can safely assume that there is one object of interest in the image, you can try a figure-ground segmentation algorithm. There are many such algorithms, and none of them are perfect. They usually output a segmentation mask: a binary image where the figure is white and the background is black. You would then find the bounding box of the figure, and use it to crop. The thing to remember is that none of the existing segmentation algorithm will give you what you want 100% of the time.
Alternatively, if you know ahead of time what specific type of object you need to crop (car, person, motorcycle), then you can try an object detection algorithm. Once again, there are many, and none of them are perfect either. On the other hand, some of them may work better than segmentation if your object of interest is on very cluttered background.
To summarize, if you wish to pursue this, you would have to read a fair number of computer vision papers, and try a fair number of different algorithms. You will also increase your chances of success if you constrain your problem domain as much as possible: for example restrict yourself to a small number of object categories, assume there is only one object of interest in an image, or restrict yourself to a certain type of scenes (nature, sea, etc.). Also keep in mind, that even the accuracy of state-of-the-art approaches to solving this type of problems has a lot of room for improvement.
And by the way, the choice of language or platform for this project is by far the least difficult part.
A method often used for face detection in images is through the use of a Haar classifier cascade. A classifier cascade can be trained to detect any objects, not just faces, but the ability of the classifier is highly dependent on the quality of the training data.
This paper by Viola and Jones explains how it works and how it can be optimised.
Although it is C++ you might want to take a look at the image processing libraries provided by the OpenCV project which include code to both train and use Haar cascades. You will need a set of car and non-car images to train a system!
Some of the best attempts I've see of this is using a large database of images to help understand the image you have. These days you have flickr, which is not only a giant corpus of images, but it's also tagged with meta-information about what the image is.
Some projects that do this are documented here:
http://blogs.zdnet.com/emergingtech/?p=629
Start with analyzing the images yourself. That way you can formulate the criteria on which to match the car. And you get to define what you cannot match.
If all cars have the same background, for example, it need not be that complex. But your example states a car on a street. There may be parked cars. Should they be recognized?
If you have access to MatLab, you could test your pattern recognition filters with specialized software like PRTools.
Wwhen I was studying (a long time ago:) I used Khoros Cantata and found that an edge filter can simplify the image greatly.
But again, first define the conditions on the input. If you don't do that you will not succeed because pattern recognition is really hard (think about how long it took to crack captcha's)
I did say photo, so this could be a black car with a black background. I did think of specifying the colour of the object, and then when that colour is found, trace around it (high level explanation). But, with a black object in a black background (no constrast in other words), it would be a very difficult task.
Better still, I've come across several sites with 3d models of cars. I could always use this, stick it into a 3d model, and render it.
A 3D model would be easier to work with, a real world photo much harder. It does suck :(
If I'm reading this right... This is where AI shines.
I think the "simplest" solution would be to use a neural-network based image recognition algorithm. Unless you know that the car will look the exact same in each picture, then that's pretty much the only way.
If it IS the exact same, then you can just search for the pixel pattern, and get the bounding rectangle, and just set the image border to the inner boundary of the rectangle.
I think that you will never get good results without a real user telling the program what to do. Think of it this way: how should your program decide when there is more than 1 interesting object present (for example: 2 cars)? what if the object you want is actually the mountain in the background? what if nothing of interest is inside the picture, thus nothing to select as the object to crop out? etc, etc...
With that said, if you can make assumptions like: only 1 object will be present, then you can have a go with using image recognition algorithms.
Now that I think of it. I recently got a lecture about artificial intelligence in robots and in robotic research techniques. Their research went on about language interaction, evolution, and language recognition. But in order to do that they also needed some simple image recognition algorithms to process the perceived environment. One of the tricks they used was to make a 3D plot of the image where x and y where the normal x and y axis and the z axis was the brightness of that particular point, then they used the same technique for red-green values, and blue-yellow. And lo and behold they had something (relatively) easy they could use to pick out the objects from the perceived environment.
(I'm terribly sorry, but I can't find a link to the nice charts they had that showed how it all worked).
Anyway, the point is that they were not interested (that much) in image recognition so they created something that worked good enough and used something less advanced and thus less time consuming, so it is possible to create something simple for this complex task.
Also any good image editing program has some kind of magic wand that will select, with the right amount of tweaking, the object of interest you point it on, maybe it's worth your time to look into that as well.
So, it basically will mean that you:
have to make some assumptions, otherwise it will fail terribly
will probably best be served with techniques from AI, and more specifically image recognition
can take a look at paint.NET and their algorithm for their magic wand
try to use the fact that a good photo will have the object of interest somewhere in the middle of the image
.. but i'm not saying that this is the solution for your problem, maybe something simpler can be used.
Oh, and I will continue to look for those links, they hold some really valuable information about this topic, but I can't promise anything.

Resources