Update
I asked this question quite a while ago now, and I was curious if anything like this has been developed since I asked the question?
I don't even know if there is a term for this kind of algorithm, and I guess there won't be if nobody has invented it yet. However it also makes googling for this a bit hard. Does anybody know if there is a term for this algorithm/principle yet?
This is an idea I have been thinking about, but I do not quite know how to solve it. I would like to know if any solutions like this exists out there, or if you guys have any idea how this could be implemented.
Steganography
Steganography is basically the art of hiding messages. In modern days we do this digitally by e.g. modifying the least significant bits in a image as the one below. Thus for every pixel and for every colour component of that pixel we might be able to hide a byte or two.
This alternation is not visibly by the naked eye, but analysing the least significant bits might reveal patterns that exposes the existence and possibly content of a hidden message. To counter this we simply encrypt the message before embedding it in the image, which keeps the message safe and also helps preventing discovery of the existence of a hidden message.
Thus, in principle, steganography provides the following:
Hiding encrypted message in any kind of media data. (Images, music, video, etc.)
Complete deniability of the existence of a hidden message without the correct key.
Extraction of the hidden message with the correct key.
(source: cs.vu.nl)
Semacodes
Semacodes are a way of encoding data in a visually representation, that may be printed, copied, and scanned easily. The Data Matrix shown below is a example of a semacode containing the famous Lorem Ipsum text. This is essentially a 2D barcode with a higher capacity that usually barcodes. Programs for generating semacodes are readily available, and ditto for software for reading them, especially for cell phones. Semacodes usually contains error correcting codes, are generally very robust, and can be read in very damaged conditions.
Thus semacodes has the following properties:
Data encoding that may be printed and copied.
May be scanned and interpreted even in damaged (dirty) conditions, and generally a very robust encoding.
Combining it
So my idea is to create something that combines these two, with all of the combined properties. This means it would have to:
Embed a encrypted message in any media, probably a scanned image.
The message should be extractable even if the image is printed and scanned, and even partly damaged.
The existence of a embedded message should be undetectable without the key used for encryption.
So, first of all I would like to know if any solutions, algorithms or research is available on this? Secondly I would like to hear any ideas/thoughts on how this might be done?
I really hope to get a good discussion going on the possibilities and feasibility of implementing something like this, and I am looking forward to reading your answers.
Update
Thanks for all the good input on this. I will probably work a bit more on this idea when I have more time. I am convinced it must be possible. Think about research in embedding watermarks in music and movies.
I imagine part of the robustness of a semacode to damage/dirt/obscuration is the high contrast between the two states of any "cell". The reader can still make a good guess as to the actual state, even with some distortion.
That sort of contrast is not available in a photographic image, and is the very reason why steganography works - the lsb bit-flipping has almost no visual effect on the image itself, while digital fidelity ensures that a non-visual system can still very accurately read the embedded data.
As the two applications are sort of at opposite ends of the analog/digital spectrum (semacodes are all about being decipherable by analog (visual) processing but are on paper, not digital; steganography is all about the bits in the file and cares nothing for the analog representation, whether light or sound or something else), I imagine a combination of the two will extremely difficult, if not impossible.
Essentially what you're thinking of is being able to steganographically embed something in an image, print the image, make a colour photocopy of it, scan it in, and still be able to extract the embedded data.
I'm afraid I can't help, but if anyone achieves this, I'll be DAMN impressed! :)
It's not a complete answer, but you should look at watermarking. This technique solves your first two goals (embedable in a printed image and readable even from partly damaged scan).
Part of watermarking's reliability to distortion and transcription errors (from going from digital to analog and back) come from redundancy (e.g. repeating the data several times). Those would make the watermark detectable even without a key. However, you might be able to use redundancy techniques that are more subtle, maybe something related to erasure coding or secret sharing.
I know that's not a complete answer, but hopefully those leads will point you in the right direction!
What language/environment are you using? It shouldn't be that hard to write code that opens both the image and semacode as a bitmap (the latter as a monochrome), sets the lowest bit(s) of each byte of each pixel in the color image to the value of the corresponding pixel of the monochrome bitmap.
(optionally expand the semacode bitmap first to the same pixel-dimensions extending with white)
Related
The description of my problem is simple, I fear that the problem isn't that simple. I would like to find the copied, duplicated part on an image. Which part of the image is copied and pasted back to the same image to another position(for example by using Photoshop)?
Please check the attached image. The red rectangle containing the value 20 is moved from the price field to the validity field. Please note that the rectangle size and position isn't fixed and unknown, it could vary, just the image is given, no other information.
Could you help me naming a theoretical method, idea, paper, people who are working on the problem above?
I posted my method to here(stackoverflow) instead of Computer Vision to reach as many people I can, because maybe the problem can be transformed. I could think a solution, like looking for the 2 largest rectangle which contain the same values inside a huge matrix(image).
Thanks for your help and time.
Note: I don't want to use the metadata to detect the forgery.
If you have access to the digital version of the forgery, and the forger (or the author of the forger-creation software) is a complete idiot, it can be as simple as looking at the image metadata for signs of 'shopping.
If digital files has been "washed" to remove said signs, or the forgery has been printed and then scanned back to you, it is a MUCH harder problem, again unless the forgers are complete idiots.
In the latter case you can only hope for making the forger's work harder, but there is no way to make it impossible - after all, banknotes can be forged, and they are much better protected than train tickets.
I'd start reading from here: http://www.cs.dartmouth.edu/farid/downloads/publications/spm09.pdf
SHIFT features can be used to identify "similar regions" that might have been copied from a different part of the image. A starting point can be to use OpenCV's SHIFT demo (included in the library) and use parts of the image as input, to see where a rough match is available. Detailed matching can follow to see if the region actually is a copy.
Would OCR Software be able to reliably translate an image such as the following into a list of values?
UPDATE:
In more detail the task is as follows:
We have a client application, where the user can open a report. This report contains a table of values.
But not every report looks the same - different fonts, different spacing, different colors, maybe the report contains many tables with different number of rows/columns...
The user selects an area of the report which contains a table. Using the mouse.
Now we want to convert the selected table into values - using our OCR tool.
At the time when the user selects the rectangular area I can ask for extra information
to help with the OCR process, and ask for confirmation that the values have been correct recognised.
It will initially be an experimental project, and therefore most likely with an OpenSource OCR tool - or at least one that does not cost any money for experimental purposes.
Simple answer is YES, you should just choose right tools.
I don't know if open source can ever get close to 100% accuracy on those images, but based on the answers here probably yes, if you spend some time on training and solve table analisys problem and stuff like that.
When we talk about commertial OCR like ABBYY or other, it will provide you 99%+ accuracy out of the box and it will detect tables automatically. No training, no anything, just works. Drawback is that you have to pay for it $$. Some would object that for open source you pay your time to set it up and mantain - but everyone decides for himself here.
However if we talk about commertial tools, there is more choice actually. And it depends on what you want. Boxed products like FineReader are actually targeting on converting input documents into editable documents like Word or Excell. Since you want actually to get data, not the Word document, you may need to look into different product category - Data Capture, which is essentially OCR plus some additional logic to find necessary data on the page. In case of invoice it could be Company name, Total amount, Due Date, Line items in the table, etc.
Data Capture is complicated subject and requires some learning, but being properly used can give quaranteed accuracy when capturing data from the documents. It is using different rules for data cross-check, database lookups, etc. When necessary it may send datafor manual verification. Enterprises are widely usind Data Capture applicaitons to enter millions of documents every month and heavily rely on data extracted in their every day workflow.
And there are also OCR SDK ofcourse, that will give you API access to recognition results and you will be able to program what to do with the data.
If you describe your task in more detail I can provide you with advice what direction is easier to go.
UPDATE
So what you do is basically Data Capture application, but not fully automated, using so-called "click to index" approach. There is number of applications like that on the market: you scan images and operator clicks on the text on the image (or draws rectangle around it) and then populates fields to database. It is good approach when number of images to process is relatively small, and manual workload is not big enough to justify cost of fully automated application (yes, there are fully automated systems that can do images with different font, spacing, layout, number of rows in the tables and so on).
If you decided to develop stuff and instead of buying, then all you need here is to chose OCR SDK. All UI you are going to write yoursself, right? The big choice is to decide: open source or commercial.
Best Open source is tesseract OCR, as far as I know. It is free, but may have real problems with table analysis, but with manual zoning approach this should not be the problem. As to OCR accuracty - people are often train OCR for font to increase accuracy, but this should not be the case for you, since fonts could be different. So you can just try tesseract out and see what accuracy you will get - this will influence amount of manual work to correct it.
Commertial OCR will give higher accuracy but will cost you money. I think you should anyway take a look to see if it worth it, or tesserack is good enough for you. I think the simplest way would be to download trial version of some box OCR prouct like FineReader. You will get good idea what accuracy would be in OCR SDK then.
If you always have solid borders in your table, you can try this solution:
Locate the horizontal and vertical lines on each page (long runs of
black pixels)
Segment the image into cells using the line coordinates
Clean up each cell (remove borders, threshold to black and white)
Perform OCR on each cell
Assemble results into a 2D array
Else your document have a borderless table, you can try to follow this line:
Optical Character Recognition is pretty amazing stuff, but it isn’t
always perfect. To get the best possible results, it helps to use the
cleanest input you can. In my initial experiments, I found that
performing OCR on the entire document actually worked pretty well as
long as I removed the cell borders (long horizontal and vertical
lines). However, the software compressed all whitespace into a single
empty space. Since my input documents had multiple columns with
several words in each column, the cell boundaries were getting lost.
Retaining the relationship between cells was very important, so one
possible solution was to draw a unique character, like “^” on each
cell boundary – something the OCR would still recognize and that I
could use later to split the resulting strings.
I found all this information in this link, asking Google "OCR to table". The author published a full algorithm using Python and Tesseract, both opensource solutions!
If you want to try the Tesseract power, maybe you should try this site:
http://www.free-ocr.com/
Which OCR you are talking about?
Will you be developing codes based on that OCR or you will be using something off the shelves?
FYI:
Tesseract OCR
it has implemented the document reading executable, so you can feed the whole page in, and it will extract characters for you. It recognizes blank spaces pretty well, it might be able to help with tab-spacing.
I've been OCR'ing scanned documents since '98. This is a recurring problem for scanned docs, specially for those that include rotated and/or skewed pages.
Yes, there are several good commercial systems and some could provide, once well configured, terrific automatic data-mining rate, asking for the operator's help only for those very degraded fields. If I were you, I'd rely on some of them.
If commercial choices threat your budget, OSS can lend a hand. But, "there's no free lunch". So, you'll have to rely on a bunch of tailor-made scripts to scaffold an affordable solution to process your bunch of docs. Fortunately, you are not alone. In fact, past last decades, many people have been dealing with this. So, IMHO, the best and concise answer for this question is provided by this article:
https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/
Its reading is worth! The author offers useful tools of his own, but the article's conclusion is very important to give you a good mindset about how to solve this kind of problem.
"There is no silver bullet."
(Fred Brooks, The Mitical Man-Month)
It really depends on implementation.
There are a few parameters that affect the OCR's ability to recognize:
1. How well the OCR is trained - the size and quality of the examples database
2. How well it is trained to detect "garbage" (besides knowing what's a letter, you need to know what is NOT a letter).
3. The OCR's design and type
4. If it's a Nerural Network, the Nerural Network structure affects its ability to learn and "decide".
So, if you're not making one of your own, it's just a matter of testing different kinds until you find one that fits.
You could try other approach. With tesseract (or other OCRS) you can get coordinates for each word. Then you can try to group those words by vercital and horizontal coordinates to get rows/columns. For example to tell a difference between a white space and tab space. It takes some practice to get good results but it is possible. With this method you can detect tables even if the tables use invisible separators - no lines. The word coordinates are solid base for table recog
We also have struggled with the issue of recognizing text within tables. There are two solutions which do it out of the box, ABBYY Recognition Server and ABBYY FlexiCapture. Rec Server is a server-based, high volume OCR tool designed for conversion of large volumes of documents to a searchable format. Although it is available with an API for those types of uses we recommend FlexiCapture. FlexiCapture gives low level control over extraction of data from within table formats including automatic detection of table items on a page. It is available in a full API version without a front end, or the off the shelf version that we market. Reach out to me if you want to know more.
Here are the basic steps that have worked for me. Tools needed include Tesseract, Python, OpenCV, and ImageMagick if you need to do any rotation of images to correct skew.
Use Tesseract to detect rotation and ImageMagick mogrify to fix it.
Use OpenCV to find and extract tables.
Use OpenCV to find and extract each cell from the table.
Use OpenCV to crop and clean up each cell so that there is no noise that will confuse OCR software.
Use Tesseract to OCR each cell.
Combine the extracted text of each cell into the format you need.
The code for each of these steps is extensive, but if you want to use a python package, it's as simple as the following.
pip3 install table_ocr
python3 -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png
That package and demo module will turn the following table into CSV output.
Cell,Format,Formula
B4,Percentage,None
C4,General,None
D4,Accounting,None
E4,Currency,"=PMT(B4/12,C4,D4)"
F4,Currency,=E4*C4
If you need to make any changes to get the code to work for table borders with different widths, there are extensive notes at https://eihli.github.io/image-table-ocr/pdf_table_extraction_and_ocr.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to develop an app for detecting wind according the audio stream.
I need some expert thoughts here, just to give me guide lines or some links, I know this is not easy task but I am planning to put a lot of effort here.
My plan is to detect some common patterns in the stream, and if the values are close to this common patterns of the wind noise I will notify that match is found, if the values are closer to the known pattern great, I can be sure that the wind is detected, if the values doesn't match with the patterns then I guess there is no so much wind....
That is my plan at first, but I need to learn how this things are done. Is there some open project already doing this ? Or is there someone who is doing research on this topics ?
The reason I write on this forum is because I do not know how to google it, the things I found was not I was looking for. I really do not know how to start developing this kind of algorithm.
EDIT 1 :
I tried to record a wind, and when I open the saved audio file for me it was just a bunch of numbers :). I do not even see in what format should I save this, is wave good enough ? Should I use something else, or what if I convert the wind noise audio file in mp3 : is this gonna help with parsing ?
Well I got many questions, that is because I do not know from where to read more about this kind of topic. I tag my question with guidlines so I hope someone will help me.
There must be something that is detectable, cause the wind noise is so common, there must be somehow to detect this, we need only someone to give me tips, someone who is familiar with this topic.
I just came across this post I have recently made a library which can detect wind noise in recordings.
I made a model of wind noise and created a database of examples and then trained a Machine Learning algorithm to detect and meter the wind level in a perceptually weighted way.
The C++/C code is here if it is of use to anyone!
The science for your problem is called "pattern classification", especially the subfield of "audio pattern classification". The task is abstracted as classifying a sound recording into two classes (wind and not wind). You seem to have no strong background in signal processing yet, so let me insert one central warning:
Pattern classification is not as easy as it looks at first. Humans excel at pattern classification. Computers don't.
A good first approach is often to compute the correlation of the Fourier transform of your signal and a sample. Don't know how much that will depend on wind speed, however.
You might want to have a look at the bag-of-frames approach, it was used successfully to classify ambient noise.
As #thiton mentioned this is an example of audio pattern classification.
Main characteristics for wind: it's a shaped (band/hp filtered) white noise with small semi-random fluctuations in amplitude and pitch. At least that's how most synthesizers reproduce it and it sounds quite convincing.
You have to check the spectral content and change in the wavefile, so you'll need FFT. Input format doesn't really matter, but obviously raw material (wav) is better.
Once you got that you should detect that it's close to some kind of colored noise and then perhaps extract series of pitch and amplitude and try to use classic pattern classification algorithm for that data set. I think supervised learning could work here.
This is actually a hard problem to solve.
Assuming you have only a single microphone data. The raw data you get when you open an audio file (time-domain signal) has some, but not a lot of information for this kind of processing. You need to go into the frequency domain using FFTs and look at the statistics of the the frequency bins and use that to build a classifier using SVM or Random Forests.
With all due respect to #Karoly-Horvath, I would also not use any recordings that has undergone compression, such as mp3. Audio compression algorithms always distorts the higher frequencies, which as it turns out, is an important feature in detecting wind now. If possible, get the raw PCM data from a mic.
You also need to make sure your recording is sampled at at least 24kHz so you have information of the signal up to 12kHz.
Finally - the wind shape in the frequency domain is not a simple filtered white noise. The characteristics is that it usually has high energy in the low frequencies (a rumbling type of sound) with sheering and flapping sounds in the high frequencies. The high frequency energy is quite transient, so if your FFT size is too big, you will miss this important feature.
If you have 2 microphone data, then this gets a little bit easier. Wind, when recorded, is a local phenomenon. Sure, in recordings, you can hear the rustling of leaves or the sound of chimes caused by the wind. But that is not wind-noise and should not be filtered out.
The actual annoying wind noise you hear in a recording is the air hitting the membrane of your microphone. That effect is a local event - and can be exploited if you have 2 microphones. It can be exploited because the event is local to each individual mic and is not correlated with the other mic. Of course, where the 2 mics are placed in relations to each other is also important. They have to be reasonably close to each other (say, within 8 inches).
A time-domain correlation can then be used to determine the presence of wind noise. (All the other recorded sound are correlated with each other because the mics are fairly close to each other, so a high correlation means no wind, low correlation means wind). If you are going with this approach, your input audio file need not be uncompressed. A reasonable compression algorithm won't affect this.
I hope this overview helps.
I recently learned about PDF417 barcodes and I was astonished that I can still read the barcode after I ripped it in half and scanned only a fragment of the original label.
How can the barcode decoding be that robust? Which (types of) algorithms are used during encoding and decoding?
EDIT: I understand the general philosophy of introducing redundancy to create robustness, but I'm interested in more details, i.e. how this is done with PDF417.
the pdf417 format allows for varying levels of duplication/redundancy in its content. the level of redundancy used will affect how much of the barcode can be obscured or removed while still leaving the contents readable
PDF417 does not use anything. It's a specification of encoding of data.
I think there is a confusion between the barcode format and the data it conveys.
The various barcode formats (PDF417, Aztec, DataMatrix) specify a way to encode data, be it numerical, alphabetic or binary... the exact content though is left unspecified.
From what I have seen, Reed-Solomon is often the algorithm used for redundancy. The exact level of redundancy is up to you with this algorithm and there are libraries at least in Java and C from what I've been dealing with.
Now, it is up to you to specify what the exact content of your barcode should be, including the algorithm used for redundancy and the parameters used by this algorithm. And of course you'll need to work hand in hand with those who are going to decode it :)
Note: QR seems slightly different, with explicit zones for redundancy data.
I don't know the PDF417. I know that QR codes use Reed Solomon correction. It is an oversampling technique. To get the concept: suppose you have a polynomial in the power of 6. Technically, you need seven points to describe this polynomial uniquely, so you can perfectly transmit the information about the whole polynomial with just seven points. However, if one of these seven is corrupted, you miss the information whole. To work around this issue, you extract a larger number of points out of the polynomial, and write them down. As long as you have at least seven out of the bunch, it will be enough to reconstruct your original information.
In other words, you trade space for robustness, by introducing more and more redundancy. Nothing new here.
I do not think the concept of trade off between space and robustness is any different here as anywhere else. Think RAID, let's say RAID 5 - you can yank a disk out of the array and the data is still available. The price? - an extra disk. Or in terms of the barcode - extra space the label occupies
This is a fairly broad question; what tools/libraries exist to take two photographs that are not identical, but extremely similar, and identify the specific differences between them?
An example would be to take a picture of my couch on Friday after my girlfriend is done cleaning and before a long weekend of having friends over, drinking, and playing rock band. Two days later I take a second photo of the couch; lighting is identical, the couch hasn't moved a milimeter, and I use a tripod in a fixed location.
What tools could I use to generate a diff of the images, or a third heatmap image of the differences? Are there any tools for .NET?
This depends largely on the image format and compression. But, at the end of the day, you are probably taking two rasters and comparing them pixel by pixel.
Take a look at the Perceptual Image Difference Utility.
The most obvious way to see every tiny, normally nigh-imperceptible difference, would be to XOR the pixel data. If the lighting is even slightly different, though, it might be too much. Differencing (subtracting) the pixel data might be more what you're looking for, depending on how subtle the differences are.
One place to start is with a rich image processing library such as IM. You can dabble with its operators interactively with the IMlab tool, call it directly from C or C++, or use its really decent Lua binding to drive it from Lua. It supports a wide array of operations on bitmaps, as well as an extensible library of file formats.
Even if you haven't deliberately moved anything, you might want to use an algorithm such as SIFT to get good sub-pixel quality alignment between the frames. Unless you want to treat the camera as fixed and detect motion of the couch as well.
I wrote this free .NET application using the toolkit my company makes (DotImage). It has a very simple algorithm, but the code is open source if you want to play with it -- you could adapt the algorithm to .NET Image classes if you don't want to buy a copy of DotImage.
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/13/image-difference-utility.aspx
Check out Andrew Kirillov's article on CodeProject. He wrote a C# application using the AForge.NET computer vision library to detect motion. On the AForge.NET website, there's a discussion of two frame differences for motion detection.
It's an interesting question. I can't refer you to any specific libraries, but the process you're asking about is basically a minimal case of motion compensation. This is the way that MPEG (MP4, DIVX, whatever) video manages to compress video so extremely well; you might look into MPEG for some information about the way those motion compensation algorithms are implemented.
One other thing to keep in mind; JPEG compression is a block-based compression; much of the benefit that MPEG brings from things is to actually do a block comparison. If most of your image (say the background) is the same from one image to the next, those blocks will be unchanged. It's a quick way to reduce the amount of data needed to be compared.
just use the .net imaging classes, create a new bitmap() x 2 and look at the R & G & B values of each pixel, you can also look at the A (Alpha/transparency) values if you want to when determining difference.
also a note, using the getPixel(y, x) method can be vastly slow, there is another way to get the entire image (less elegant) and for each ing through it yourself if i remember it was called the getBitmap or something similar, look in the imaging/bitmap classes & read some tutes they really are all you need & aren't that difficult to use, dont go third party unless you have to.