Can tesseract be trained for non-font symbols?

Can tesseract be trained for non-font symbols? - image

I'm curious about how I may be able to more reliably recognise the value and the suit of playing card images. Here are two examples:
There may be some noise in the images, but I have a large dataset of images that I could use for training (roughly 10k pngs, including all values & suits).
I can reliably recognise images that I've manually classified, if I have a known exact-match using a hashing method. But since I'm hashing images based on their content, then the slightest noise changes the hash and results in an image being treated as unknown. This is what I'm looking to reliably address with further automation.
I've been reviewing the 3.05 documentation on training tesseract:
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method
Can tesseract only be trained with images found in fonts? Or could I use it to recognise the suits for these cards?
I was hoping that I could say that all images in this folder correspond to 4c (e.g. the example images above), and that tesseract would see the similarity in any future instances of that image (regardless of noise) and also read that as 4c. Is this possible? Does anyone here have experience with this?

This has been my non-tesseract solution to this, until someone proves there's a better way. I've setup:
Caffe: http://caffe.berkeleyvision.org/install_osx.html
Digits: https://github.com/NVIDIA/DIGITS/blob/master/docs/BuildDigits.md
Getting these to running was the hardest part. Next, I used my dataset to train a new caffe network. I prepared my dataset into a single depth folder structure:
./card
./card/2c
./card/2d
./card/2h
./card/2s
./card/3c
./card/3d
./card/3h
./card/3s
./card/4c
./card/4d
./card/4h
./card/4s
./card/5c
./card/5d
./card/5h
./card/5s
./card/6c
./card/6d
./card/6h
./card/6s
./card/7c
./card/7d
./card/7h
./card/7s
./card/8c
./card/8d
./card/8h
./card/8s
./card/9c
./card/9d
./card/9h
./card/9s
./card/_noise
./card/_table
./card/Ac
./card/Ad
./card/Ah
./card/As
./card/Jc
./card/Jd
./card/Jh
./card/Js
./card/Kc
./card/Kd
./card/Kh
./card/Ks
./card/Qc
./card/Qd
./card/Qh
./card/Qs
./card/Tc
./card/Td
./card/Th
./card/Ts
Within Digits, I chose:
Datasets tab
New Dataset Images
Classification
I pointed it to my card folder, e.g: /path/to/card
I set the validation % to 13.0%, based on the discussion here: https://stackoverflow.com/a/13612921/880837
After creating the dataset, I opened the models tab
Chose my new dataset.
Chose the GoogLeNet under Standard Networks, and left it to train.
I did this several times, each time I had new images in the dataset. Each learning session took 6-10 hours, but at this stage I can use my caffemodel to programmatically estimate what each image is expected to be, using this logic:
https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp
The results are either a card (2c, 7h, etc), noise, or table. Any estimates with an accuracy bigger than 90% are most likely correct. The latest run correctly recognised 300 out of 400 images, with only 3 mistakes. I'm adding new images to the dataset and retraining the existing model, further tuning the result accuracy. Hope this is valuable to others!
While I wanted the high level steps here, this was all done with large thanks to David Humphrey and his github post, I really recommend reading it and trying it out if you're interested in learning more: https://github.com/humphd/have-fun-with-machine-learning

Related

Images storage performance react native (base64 vs uri path)

I have an app to create reports with some data and images (min 1 img, max 6). This reports keeps saved on my app, until user sent it to API (which can be done at the same day that he registered a report, or a week later).
But my question is: What's the proper way to store this images (I'm using Realm), is it saving the path (uri) or a base64 string? My current version keeps the base64 for this images (500 ~~ 800 kb img size), and then after my users send his reports to API, I deleted this base64 hash.
I was developing a way to save the path to the image, and then I display it. But image-picker uri returned is temporary. So to do this, I need to copy this file to another place, then save the path. But doing it, I got (for kind of 2 or 3 days) 2x images stored on phone (using memory).
So before I develop all this stuff, I was wondering, will it (copy image to another path then save path) be more performant that save base64 hash (to store at phone), or it shouldn't make much difference?

I try to avoid text only answers; including code is best practice but the question about storing images comes up frequently and it's not really covered in the documentation so I thought it should be addressed at a high level.
Generally speaking, Realm is not a solution for storing blob type data - images, pdf's etc. There are a number of technical reasons for that but most importantly, an image can go well beyond the capacity of a Realm field. Additionally it can significantly impact performance (especially in a sync'ing use case)
If this is a local only app, storing the images on disk in the device and keep a reference to where they are (their path) stored in Realm. That will enable the app to be fast and responsive with a minimal footprint.
If this is a sync'd solution where you want to share images across devices or with other users, there are several cloud based solutions to accommodate image storage and then store a URL to the image in Realm.
One option is part of the MongoDB family of products (which also includes MongoDB Realm) called GridFS. Another option is a solid product we've leveraged for years is called Firebase Cloud Storage.
Now that I've made those statements, I'll backtrack just a bit and refer you to this article Realm Data and Partitioning Strategy Behind the WildAid O-FISH Mobile Apps which is a fantastic article about implementing Realm in a real-world use application and in particular how to deal with images.
In that article, note they do store the images in Realm for a short time. However, one thing they left out of that (which was revealed in a forum post) is that the images are compressed to ensure they don't go above the Realm field size limit.
I am not totally on board with general use of that technique but it works for that specific use case.
One more note: the image sizes mentioned in the question are pretty small (500 ~~ 800 kb img size) and that's a tiny amount of data which would really not have an impact, so storing them in realm as a data object would work fine. The caveat to that is future expansion; if you decide to later store larger images, it would require a complete re-write of the code; so why not plan for that up front.

What does an Area Description File (ADF) looks like?

I'm starting to work with the Google Tango Tablet, hopefully to create (basic) 2D / 3D maps from scanned areas. But first I would like to read as much about the Tango (sensors / API) as I can, in order to create a plan to be as time efficient as possible.
I instantly noticed the ability to learn areas, which is a very interesting concept, nevertheless I couldn't find anything about these so called Area Description Files (ADF).
I know the ADF files can be geographically referenced, that they contain metadata and an unique UUID. Furthermore I know their basic functionalities, but that's about it.
In some parts of the modules ADF files are referred to as 'maps', in other parts they are just called 'descriptions'.
So what do these files look like? Are they already basic (GRID) (2D) maps, or are they just descriptions?
I know there are people who already extracted the ADF files, so any help would be greatly appreciated!

From Tango ADF Doco
Important: Saved area descriptions do not directly record images or
video of the location, but rather contain descriptions of images of
the environment in a very compressed form. While those descriptions
can’t be directly viewed as images, it is in principle possible to
write an algorithm that can reconstruct a viewable image. Therefore,
you must ask the user for permission before saving any of their
learned areas to the cloud or sharing areas between users to protect
the user's privacy, just as you would treat images and video.
Other than that there doesn't seem to be much info about the file internals - I use a lot of them, but I've never been compelled to look inside - curious yes, but not compelled

Without any direct info from the project Tango folks anything we provide would be merely speculation. I'm with Mark, not much compelling reason to get details. My speculation: probably contains a set of image descriptors, like SIFT, and whatever other known device settings are available, like GPS location, orientation (gravity), time(?), etc.

I got the ADF file, basically coded binaries and seems difficult to decode.
I will be happy to share the file if anyone is still interested.

FPDF with lots of images, any way to speed it up?

I'm using FPDF with PHP and need to print an order manifest. This manifest will have up to 200-300 products with images. Creating it at this point is quite slow, and the images are stored on AmazonS3. Any idea if this could be sped up?
Right now just with images of about 15X15 mm it generates a file size of about 16mb and takes 3 1/2 to 4 minutes, which without the images is only about 52k and comes up almost instantly.
Of course, it may just be downloading that many images about which there's not really much I can do.

I suggest you to try img2pdf.
While this module offers much less options for interacting with PDFs compared with fpdf, if you are only interested in combining images into a PDF file, this is probably the best module you can use. It is fast. It is easy to use.
Here is an example code:
import img2pdf
filename = "mypdf.pdf"
images = ["image1.jpg", "image2.jpg"]
with open(filename,"wb") as f:
f.write(img2pdf.convert(images))
I used it to combine 400 images - it only took a second or so.

I found the extension i mentioned in my comment above:
http://fpdf.org/en/script/script76.php
this seems to reduce the time it takes a little for myself, you may have better results as your document is much larger than mine.

Performance Issue with Doctrine, PostGIS and MapFish

I am developing a WebGIS application using Symfony with the MapFish plugin http://www.symfony-project.org/plugins/sfMapFishPlugin
I use the GeoJSON produced by MapFish to render layers through OpenLayers, in a vector layer of course.
When I show layers up to 3k features everything works fine. When I try with layers with 10k features or more the application crash. I don't know the threshold, because I either have layers with 2-3k features or with 10-13k features.
I think the problem is related with doctrine, because the last report in the log is something like:
Sep 02 13:22:40 symfony [info] {Doctrine_Connection_Statement} execute :
and then the query to fetch the geographical records.
I said I think the problem is the number of features. So I used the OpenLayers.Strategy.BBox() to decrease the number of feature to get and to show. The result is the same. The app seems stuck while executing the query.
If I add a limit to the query string used to get the features' GeoJSON the application works. So I don't think this is related to the MapFish plugin but with Doctrine.
Anyone has some enlightenment?
Thanks!

Even if theorically possible, it’s a bad idea to try to show so many vector features on a map.
You'd better change the way features are displayed (eg. raster for low zoom levels, get feature on click…).
Even if your service answer in a reasonable time, your browser will be stuck, or at least will have very bad performance…
I’m the author of sfMapFishPlugin and I never ever tried to query so many features, and even less tried to show them on a OL map.
Check out OpenLayers FAQ on this subject: http://trac.osgeo.org/openlayers/wiki/FrequentlyAskedQuestions#WhatisthemaximumnumberofCoordinatesFeaturesIcandrawwithaVectorlayer , a bit outdated with recent browsers improvements, but 10k vector features on a map is not reasonable.
HTH,

Best way to store "doodle" data?

On one of the applications that I am writing, I was asked to provide the feature for "pencil and eraser" to allow the user to doodle randomly on a document (for proofreading, note-taking, etc.)
What would be the best way to store such data?
I was thinking of using an image with transparency for each doodle (so that I can also support multiple colors of "doodles") but it seems like it will very quickly make any saved project with doodles grow large in file size.
I am looking if there is a better (existing) alternative (e.g. is there a DoodleXML spec out there?) or just any suggestions.

I think the "DoodleXML" spec you're looking for might just be SVG. Simply save the doodles as a series of lines. You don't need a full SVG engine as long as you're only supporting the subset that you generate in the first place.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio