Algorithm to decrypt data with drawn strokes - algorithm

Let's say I have an encrypted file on an iPhone and every time I want to decrypt it, I want to "draw" a decryption symbol instead of having to use a keyboard to type it in.
If you request from the user to draw a symbol to decrypt a file every time it is needed (e.g. every time they launch your application) they would probably prefer it to having to type a 20 character or so password on the tiny keyboard, and they would still get the security a 20 character password would give them (depending on how complicated the shape/symbol they draw is).
The symbol they would draw would most likely be one stroke (e.g. it's over once you lift up your finger) but can be very complex, such that it's hard for someone else to repeat it, even if they do see you draw it in. Kind of like how every person's signature is unique and hard to duplicate. Actually, this may just overly complicate it if it had to prevent from being duplicated, so for now this can be ignored and we can assume that the symbol will not be seen by someone else and thus it doesn't matter if it could be repeated by them or not.
I guess the real question is how would you convert the same (reasonably) stroke consistently to the same key (e.g. hash value). There should obviously be some threshold of forgiveness in the algorithm, because the user can't be expected to repeat the stroke exactly 100%.
Using the symbol as a decryption method adds a whole other dimension to this problem. You never want to store the generated hash value anywhere in unencrypted form, cause then someone might be able to access that part of the hard drive and get the decryption key without needing to go through the whole drawing process and decrypt the file manually. You also most likely don't want to store anything about how the shape is drawn.
A good example of a stroke that a user might use as their decryption symbol is the "&" symbol. Imagine a user drawing this symbol on their iPhone every time they need to decrypt a file. The size of the symbol might not be the same every time it is drawn. Also, the rotation of the symbol may be different depending on how the user holds their device. Ideally, in both cases, because the symbol was drawn, relative to the user strokes, the same, it should be able to generate the same hash value and thus decrypt the file.
I thought something like shape or character recognition is a similar algorithm. Where the user draws something (reasonably representing a shape) and it then fixes it to the correct shape which would have the same hash value every time it is drawn. However, for something like this you would most likely need a database of shapes that can be drawn, and if you choose something like all the letters in the alphabet, you only get 26 letters. And assuming the user should only need to draw one symbol to decrypt the file, you have an extremely insecure password with only 26 possibilities.
Another thing I thought of is you could break up the symbol that is drawn into tiny segments and then run symbol recognition on those. So imagine that you have 4 symbols in a database: vertical line, horizontal line, and diagonal in both directions. Now as the user draws, each segment is recognized as one of these, and then they are all combined to form some hash value. So imagine the user chose as their decryption symbol the lowercase letter "r". So they would begin by drawing a vertical line down, followed by a vertical line up, followed by a diagonal line up and to the right. One problem with this method is how would you know when to split up the stroke into individual segments? You would probably also want to take into account how long each individual segment is roughly (e.g. in increments of 40 pixels). That way if someone drew a deformed "r" where the hump comes out near the bottom it isn't recognized as the same symbol and thus wouldn't decrypt the file.
A third method might be dividing the screen up into a grid (not sure which size yet) and simply seeing in which cells the stroke is drawn and using this data somehow to generate a string.
Any other ideas of how this could be implemented? Have you ever heard of something like this? Are there any fundamental flaws that would prevent a system like this from working?
Thanks

I would try a variation of the segmentation variant: Recognize simple patterns - I'll stick to straight and diagonal lines for this, but in theory you could also add circles, arcs and perhaps other things.
You can be quite sure when one line ends and another one starts as there are 8 directions and you can detect a direction change (or for a simpler approach, just detect pen up and pen down and use them as line delimiters). The first line gives a scale factor, so the length of every other line can be represented as a factor (for example, in an usual L shape, the first vertical line would give the "base length" b and the other line would then have the length of roughly 0.5 * b). After the user is finished, you can use the smallest factor s to "round" the lengths, so that you'll have an array of integer lengths like [1 * s, 2 * s, 4 * s, 5 * s]. This will prevent the system from being too exact, and using the base length makes the system robust against scaling.
Now somehow convert these informations (lengths and directions) to a string (or a hash value, whatever you like) and it will be the same for the same strokes, even if the symbol is translated or scaled.
Additionally, you can store an 2D offset value (of course "rounded", too) for every line after the second line so that the lines will also have to be at the same position, if you don't do this, L and T will most likely get the same string (1 line up-down, 1 line left-right length 0.5). So storing positions strengthens the whole thing a bit but is optional.
EDIT:
If you take the angle of the first line as a base angle, you can even make this robust to rotation.
Please note that this algorithm only gives 3 bits per stroke if all lines are of the same length and a maximum of perhaps up to 6-8 bits per stroke, some more if you store positions, too. This means you'd need a quite complex symbol of about 20-40 strokes to get 128 bits of security.
An easy way to add more variation/security would be to let the user use different colors from a given palette.
To reduce the risk of someone watching you, you could make each line disappear after it has been drawn or change the color to a color with a very low contrast to the background.

The problem of encrypting data with keymaterial that may have small errors has been studied quite extensively.
In particular there are a number of proposals for protecting data using biometric data (e.g. fingerprints or a retina scan) as a key. A typical approach is to use an appropriate error correction code, take your original key material K, compute the syndrome of it and only store the syndrome. Once you get a second reading of your key material K', the syndrome can be use to restore K from K' if K and K' are close enough (where 'close enough" of course depends on the error correction scheme.)
To get you started, here is a paper proposing a fuzzy vault scheme. This is a general proposal for an encryption scheme using a "fuzzy" key. Of course, you still need to examine how to extract characteristics from drawings that are stable enough for using such an error correction scheme. You will also have to examine how much entropy you can extract from such drawings. As bad as passwords are with respect to entropy, they may still be hard to beat.

Handwriting recognition often takes the duration of the stroke into account more than the actual length and such.
While it relates to pressure sensitivity, I think you may be able to see some similar conceptual bits to what you are thinking here.... jdadesign.net/safelock/
That's not exactly the same topic, but it's the closest thing that comes to mind at the moment.

I don't think that you could get enough "bits" from a hand-drawn symbol to perform secure encryption. As you note, you have to allow enough slop in the recognition that the natural variations in drawing will be tolerated. In other words, you have to discard noise in the strokes, smoothing them into reproducible signal. But noise (high entropy) makes better cryptographic keys.
Think of it this way. If you did decompose the gesture into segments of up, down, left, and right, each segment would represent 2 bits of information. For an AES key, the symbol would need 64 such segments. That's a pretty complicated gesture to remember. And if it's simplified by repeating many segments in a row ("right, right, right, ...") it makes a lousy (predictable, non-random) key.

I've had another think about this. I'm not a comp-sci person, but would something like this work.
Let's say that with whatever symbol or "pattern" someone draws. The only viable thing you're left with to analyze are all the points in the pattern generated in touchBegan, touchMoved and touchEnded events.
So... let's take all the points generated, be it 100 or 1,000,000, it doesn't really matter.
Divide them into groups, as many groups as you want. The more the merrier I assume, but for this example, let's put them in 4 groups. With 100 points, group 1 would contain points 1 > 25, group 2 contains 26 > 50 and so on.
For each group, use all the points to calculate an average position.
It may work better if the canvas spaces is divided into a grid, and the 'average positions' get plotted onto their nearest coordinate.
Then check the relative distance between all the groups. So the between 1,2 1,3 1,4 2,3 2,4 3,4.
You now have as many distinct points, and information about those points to generate a key. The averages and the grid should help smooth out some, if not all of the entropy.
You may have to ask the user to draw their pattern a few times, and compare each group against groups from previous attempts. That way, you can identify which groups the users can plot consistently. It has the added benefit of training the users hand at drawing their pattern.
I suspect that the more points and groups you have, the more accurate this will be.
In fact, I'm going to give it a try myself.

Gestures.
http://depts.washington.edu/aimgroup/proj/dollar/
You can define you own algorithms for particular gestures. EG a circle,
1.Find the start point
2. find the most left, most right and furthest for points and get an approximate radius.
3. check all the points against the radius with a margin of error (25%?)
4. If the radius checks out, you have a circle.
Vertical Straight line:
1. Check the start point and end point X and Y positions.
2. Compare the inbetween points against the x and y of the start and end.
3. If they're roughly on the same X coord, but ascending or descending Y coords, you have a vertical line.
And so on, getting more complex for more complicated gestures.
You can even combine gestures. So let's say you have an algorithm for 6 gestures. You can combine them to form different symbols. The order in which the gestures are created could be important, adding an extra layer of security.

what if you took all of the x,y coordinates of the stroke and preformed some sort of linear 2 way operation on them? You could then compute an 'approximate' hash, and if the number calculated when the stroke is within... say 10% of your approximation, then you grant access..

It all depends on what kind of attack you are trying to prevent against. If you want full on encryption, where you assume that the attacker has full access to the encrypted file, then you'll need quite a lot of bits of entropy to achieve a decent level of protection. Assuming you get the algorithms right, then you can take the two to the power of the entropy of input in bits (the upper limit for this is the number of different possible inputs), multiply by the amount of time the key setup procedure takes, divide by how much more computing power the attacker has and get the time the attacker has to take to break your encryption by brute force.
For example, something like the 9 cell figure unlock method of android might get you around 16 bits of entropy. Lets assume you use 5 seconds of CPU time to calculate the encryption key. Then with an average PC, it takes 5*2**16/20 seconds, or about 4.5 hours to crack. Any loss of entropy in the input or inefficiency in the key-setup and encryption will quickly take that down to minutes, not to mention if clusters of computers are used.
To be frank, that won't be much better than just storing the file in an obscure file format and hoping no one figures it out

Related

How to detect similar objects in this picture?

I want to find patterns in image. Saying "to find patterns" I mean "to detect similar objects", thus these patterns shouldn't be some high-frequency info like noise.
For example, on this image I'd like to get pattern "window" with ROI/ellipse of each object:
I've read advices to use Autocorrelation, FFT, DCT for this problem. As far as I've understood, Autocorrelation and FFT are alternative, not complementary.
First, I don't know if it is possible to get such high-level info in frequency domain?
As I have FFT implemented, I tried to use it. This is spectrogram:
Could you suggest how to further analyze this spectogram to detect objects "window" with their spatial locations?
Is it needed to find the brightest points/lines on spectrogram?
Should the FFT be done for image chunks instead of whole image?
If that't not possible to find such objects with this approach, what would you advice?
Thanks in advance.
P.S. Sorry for large image size.
Beware this is not my cup of tea so read with extreme prejudice. IIRC for such task are usually used SIFT/SURF + RANSAC methods.
Identify key points of interest of image SIFT/SURF
This will get you list of 2D locations in your image with specific features (which you can handle as integer hash code). I think SIFT (Scale Invariant Feature transform) is ideal for this. They work similarly like our human vision works (identify specific change in some feature and "ignore" the rest of the image). So instead of matching all the pixels of the image we cross match only few of them.
sort by occurrence
each of the found SIFT points have some feature list. if we do a histogram of this features (count how many similar or identical feature points there are) then we can group points with the same occurrence. The idea is that if we got n object placings in the image each of its key points should be n times duplicated in the final image.
So if we have many points with some n times occurrence it hints we got n similar objects in the image. From this we select just these key points for the next step.
find object placings
each object can have different scale,position and orientation. Let assume they got the same aspect ratio. So the corresponding key points in each object should have the same relative properties between the objects (like relative angle between key points, normalized distance, etc).
So the task is to regroup our key points into each object so all the objects have the same key points and the same relative properties.
This can be done by brute force (testing all the combination and checking the properties) or by RANSAC or any other method.
Usually we select one first key point (no matter which) and find 2 others that form the same angle and relative distance ratio (in all of the objects)
so angle is the same and |p1-p0| / |p2-p0| is also the same or close. While grouping realize that key points within objects are more likely closer to each other ... so we can augment our search by distance from the first selected key point.... to decide to which object the key point probably belongs to (if try those first we got high probability we found our combination fast). All the other points pi we can add similarly one by one (using p0,p1,pi)
So I would starting by closest 2 key points ... (this sometimes can be fouled by overlapping or touching mirrored images) as the key point from the neighbor object can be sometimes closer that from the own one ...
After such regrouping just check if all the found objects have the same properties (aspect ratio) ... to visualize them you can find the OBB (Oriented Bounding Box) of the key points (which can be also used for the check)

Fast algorithm required to detect changes in pixels

I am looking for an algorithm to detect changes in a pixel array. In the picture below, the red line illustrates the pixel array. I want the algorithm to recognize and trigger once the golf club below passes through the red line.
My original thought was to read the pixels along the line and to detect changes in color (specifically on the lower half of the line) when the color changes above a certain threshold.
Is there a better way to do this or an algorithm I can use which detects once the golf club passes the red line?
Thank you for your thoughts.
Tracking the pixels along the line is a feasible approach.
I recommend not to check the pixels in the order top to bottom or bottom to top, I recommend a kind of interlacing order to increase the chance to detect a change with few pixel-checks.
Here is what I mean:
If the index the pixels from the to to the bottom by 0,1,2,3,..63 [just an example]
I would check e.g. this order
0,32 ,16,48 ,8,24,40,56, 4,12,20,28,36,44,52,60, ...
Or if you predict a non equal distributed probability for the change of the pixels, you can take even this in account and find an order of pixel checks with, detects the change with the lowest average pixel checks.
What kind of video-source do you use? Most video compressions already use differences between images to reduce the amount of data to store.
It might be a quite a bit of work, but you might be able to see the change directly in the compressed data without decoding it. Sounds like an interesting challenge to me to implement detection of change in a video, by simply reading the undecoded data.

Invoice / OCR: Detect two important points in invoice image

I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).

Detecting empty pages in scanned documents

So we need to detect whether an image, created by a scanner, represents an empty page. I'm way out of my depth when it comes to image processing, so I have to run this by the community.
Here's what I have come up with so far:
Empty pages can be glaringly white, gray recycled paper, or yellowed old paper. The current idea is to create a histogram for a page, look for a steep increase of the curve, and get the percentage of pixels are darker than that. If that exceeds a threshold, the page is likely not empty.
Since this would likely classify a page containing a single line of text at the top as empty, we would tile the page and gather statistics about each tile.
We would need to detect scanned staplers and holes from binding (likely only in certain tiles), but this can be put off to some later stage. However, if you have an idea what to look out for besides these two, please mention it in a comment.
This needs to be fast. It's part of a document processing workflow that processes (tens of) thousands of pages per day. If processing a page takes ten seconds longer, than our customers will have to tell their customers that they'll have to wait several days longer for their results. (If this results in more false positives, some customers would rather have someone check a few dozen found "empty" pages, than have their customer wait one more day.)
So here's my questions:
Is it a good idea to take this route or is there something better?
If we do it this way, how would I do this? What's a good (cheap) algorithm for finding a threshold for a page? Could we gain significant speed by assuming a similar threshold for a batch of documents? To which precision could brightness values be rounded, before getting logged? What quirks could we expect?
If you know that a scanned page is going to fill the image entirely, then calculating the standard deviation might be a good way of doing this.
I would suggest blurring page slightly to reduce some noise. Then calculate the SD for the page, in theory, a page the is more or less all one colour will have a low SD and one with lots of text will have a higher SD. Then it's a case of 'training' the system to work out when a page is plain and when it is text. You might find that certain pages are hard for it to tell.
You could have it trained by having it process a vast number of pages, and it goes through them all, and you say if it is plain or not.
EDIT
ok, a white page with black text, if we have just the page and no surrounding stuff, will have a mean colour of grey, probably a fairly light grey. Getting the average is a for loop through all the pixels, adding their values and then dividing by the number of pixels. I'm not good with this o(logN) stuff, but suffice to say, it will not that long. Unless you have HUGE images.
SD is a second for loop, this time we are counting up how different each pixel is from the mean, and then dividing by the mean. This will take a bit longer then the mean, as we have to do something like
diff = thispixel - mean;
if(diff < 0) {
diff = -diff;
}
runningTotal += diff;
For a plain coloured page, each pixel will be close to the mean value, thus our SD will be low. If the SD is below a certain value, we can assume that this means the page is all one colour.
This might have problems if their is very minimal amount of text, as it will not have a large influence on the SD, so maybe like you suggested in the question, break the page into sections. I suggest strips horizontally, as text tends to go this way. If we do one of these strips one at a time, once one strip suggests it has text, we can stop as we don't care if the rest is blank or not.
Blurring the page will help reduce noise, as the odd pixel of noise will be reduced in its impact, thus give you a 'tighter' SD. You could also use it to reduce the resolution of your image.
Say you sauce image is 300 wide by 900 high, you could sample pixels in blocks of nine, 3 *3, and thus end up with an image that is 100 wide by 300 high, so it can actually be used to reduce the amount of calculations you need to do, in this case by a ninth!
The main problem is going to be in working out how high an SD can be with just a plain page. Maybe have it find the SD of a load of blank pages.
By the sounds of it, you are probably going to want to have a middle ground that lets it be unsure and ask for human intervention, possibly letting the human value train the system to get better?
Perform some sort of simple edge detection. If the number of pixels constituting edges is below some threshold, then there's going to be a high probability the page is empty. This could be improved by classifying certain edges that correspond with high certainty (by shape and location) to punched holes and staples as trivial and discounting them from the metric.
When I worked for a document processor (~8 years ago), we handled client projects varying from very "clean" only-US-letter-sized pages to cover-/cardstock of irregular shapes mixed with normal pages. Operators fed pre-sorted files into scanning machines and only had to watch for folded corners and similar mechanical problems. Their output was multiple streams of hundreds of images corresponding to a range of files. A single scanner operator could easily scan 15k pieces of paper in a shift (that's only 0.60 pages/sec, while a scanner at speed could handle 3 pages/sec and still scan both sides). Later operators processed those looking for key pages to mark file start and end. (Image recognition can be used here, sometimes, but people also provide a quality check on the first operators.) We had many variables that could be set per client project.
I'm basing the rough outline below on that experience and how it appears that your goals and workflow are similar.
(Terminology: By client I mean our client, e.g. a specific bank. A project or client project is a set of documents from that client that contains many files, e.g. all mortgages handled by a specific branch in a given year. A file is a logical arrangement that would normally be a physical file folder for one of the client's customers, e.g. all mortgage papers for one address.)
Cut off the top, bottom, sides, and corners. Throw these out of your calculations (even though you'll probably store them in the final image). This will cover staple holes, binder holes, but also (tiny) folded corners and very minute torn edges which appear as black spots. Depending on how you're scanning, the last two may be less of a problem.
Vary the sizes of these cuts for each client project, as required. For example, even a very thin edge slice, say 1-2mm, will eliminate most ragged edges without increasing false positive rate.
Convert to black and white, 1 bit per pixel. I suspect you are already doing this for some client projects anyway, so doing this efficiently and effectively, which can be subtle, should be no extra work. (Even if you don't store the 1bpp image as the deliverable result, the conversion will be helpful in empty page detection.) Eliminate noise by dropping any black pixels with none or only one black neighbor (using all surrounding 8 as neighbors).
After cutting extremities (#1) and this simplistic noise reduction, blank pages will have a very low number of black pixels; most blanks will have no black pixels at all – exempting exceptionally poor page quality, inked stamps (when scanning back-sides, mentioned more below), or other circumstances across the whole project, and so forth.
Depending on client project, you may set hotspots to be watched – the converse of cutting off the sides. For example, watching a 1" strip where a single line at the top of the page would appear may reduce false positives. A low contrast scan or faded hardcopy (perhaps even pencil, which can be common on back-sides) with only one line of text will be distinguished from a blank page this way.
What sections are worth watching depends on each project, but do not try to divide the page up into tiles and then subdivide those tiles into areas of interest. Instead, parallelize this on the page level; e.g. 1 worker per core, each worker handles a full page at a time.
Depending on how you're keying individual files, you may find it helpful to drop blanks (before marking start-of-file pages, which is still often a manual process even at high volume) then watch for blank pages at unexpected points after files have been keyed (e.g. expected would be the last page of the file, without being two blanks in a row, etc.).
For example, if a particular project is only scanning one side of each page, then detecting two blank pages in a row is a good indication that a couple pages in a file were flipped upside-down (clients often hand over hardcopy files like this). Either the sorters (who remove things like staples and paperclips) or the first machine operators should have caught this mistake, but, regardless, it will now need a manual check to verify.
On the other hand, there were projects that had very clean files so sorters could insert (usually colored) blank pages marking file boundaries. In this case, the second set of people still did the keying by file number, but only had to examine the first page of each file. This wasn't rare, but not common either.
Before I start rambling a bit, I hope my main point comes across: you have to decide how to mitigate rates of false positives (= data loss) and false negatives (= annoying blanks and otherwise harmless, but a maximum allowed rate may still be specified in the project contract). That varies drastically by project and the type of files/documents you're handling, but it guides you in how to do the detection. You will get much better results from a tailored approach than trying one-size-fits-all, even if the tailored approaches are 80-98% similar.
If you're delivering 1bpp images to the client, for example, you might not even want/need to eliminate blanks as filesize (and ultimately size of the delivered dataset) won't be an issue. This can be an acceptable trade-off when eliminating most blanks is harder while maintaining a low false positive rate; such as for files with inked stamps ("received on", "accepted", "due date", etc.; they bleed through to the back) or other problems, for example.
My fall class does a bunch of image-processing projects.
Here's what I would try:
Project from color to grayscale
Pour all the pixels into a simple histogram with say 100 buckets between 0 and 1
Find a local minimum in the histogram such that the absoluete value of above - below is as small as possible, where above is the number of brighter pixels and below is the number of darker pixels
Force the above pixels to white and the below pixels to black
If you like, as an extra step you could remove black edges
If there are hardly any black pixels, the page is blank
The first two steps should be combined, and they are the only time-consuming steps; on a 600dpi images you may have to touch many millions of pixels. The rest will be lightning fast. I'd be very surprised if you can't classify multiple images per second—especially if you know there will be no black edges.
The only part that requires training or experiment is the last step. It's also possible that you will need to fiddle around with the number of buckets in the histogram; if there are too many buckets, you may have a bad local minimum.
Good luck, and report back to us how you make out!
Check out this line detection algorithm: http://homepages.inf.ed.ac.uk/rbf/HIPR2/linedet.htm. In addition to a detailed explanation of how the algo works there's a demo where you can use your own image and see the results. I tried two images: 1) a B&W scan of a receipt, 2) the B&W, "blank" back side of that same receipt. All of the edge detection algorithms I tried found edges on the "blank" page. But, this line detection algorithm was the only algorithm that correctly found lines on the front page and yet didn't find anything on the "blank" back page.
It looks as if you're trying to convert all paperwork for a company into digital documents. Some of this paper can be really old.
Say your text is black, and any other color is the background. If you take two weighted averages, one consisting of what you think is the text, and one consisting of the background, you can compare those two and see if they're distant enough to consider further evaluation. This will removing any uneven aging of the paper.
Staple holes and punched holes in paper are pretty standard in size, but they'd show up as gray or not at all if you're scanning on a white background. If not, then you can guess where these are and remove them.
Now, we look at areas of high interest, areas where the black pixels are the most dense. Select a portion of that and OCR it. Place the starting top-left closest to an area where text begins. On a typical document, a solid blank linear area going left-to-right and another going top-to-bottom denotes the top and left sides of a paragraph. You can be sure that you got a line of text because below a line of text is another blank left-to-right area. So you don't need to worry about selecting a portion that will chop text in half.
You could take the mean gray level (integer) of each few rows of the scanned image (depending on the resolution and how many lines are required to capture one line of text), then consider the spread of row means. If there is no text on the page, the spread of means should be small (i.e. background ranges from 250-255), and if there is text on the whole page or on part of the page, the spread would be much larger (i.e. 15 for text to 250 for background).
Seems to me like the solution should be computationally simple due to the large number of pages to check. Approaches requiring further processing (edge detection, filtering, etc) seem like overkill, and will take much longer to run.
There is no need to process pixel by pixel, using matrices will help this be more efficient, for example using Numpy you can calculate means, sums, etc. for entire rows, columns or matrices at once much more efficiently. There is also no need to process EVERY pixel, a good sample of rows should be able to accomplish the task with similar accuracy. 8bit accuracy should be fine, and you could even resample to large pixels before running this processing algorithm.
You can do a noisy trim, i.e. blur the image and do an auto-trim (without actually modifying the image). If the width or height of the trim result is below a threshold (e.g. 80 to 100 for a 600 dpi image) then the page is empty.
A proof of concept using the ImageMagick command line front-end:
$ convert scan.png -shave 300x0 -virtual-pixel White -blur 0x15 -fuzz 15% \
-trim info:
The above command assumes a 600 dpi DIN A4 black and white (1 Bit) image. It also ignores a margin of 300 pixels such that artifacts like perforation holes don't yield false negatives.

Cunning ways to draw a starfield

I'm working on a game, and I've come up with a rather interesting problem: clever ways to draw starfields.
It's a 2D game, so the action can scroll in the X and Y directions. In addition, we can adjust the scale to show more or less of the play area. I'd also like the starfield to have fake parallax to give an impression of depth.
Right now I'm doing this in the traditional way, by having a big array of stars, each of which is tagged by a 'depth' factor. To draw, I translate each star according to the camera position multiplied by the 'depth', so some stars move a lot, and some move a little. This all works fine, but of course since I have a finite number of stars in my array I have issues when the camera moves too far or we zoom out too much. This is will all work, but is involving lots of code and special cases.
This offends my sense of elegance. There has got be a better way of achieving this.
I've considered procedurally generating my stars, which allows me to have an unlimited number: e.g. by using a fixed seed and PRNG to determine the coordinates. I would need to divide the sky up into tiles, generate the seed by hashing the tile coordinates, and then draw, say, 100 stars per tile. This allows me to extend my starfield indefinitely in all directions while still only needing to consider the tiles that are visible --- but this doesn't work with the 'depth' factor, as this allows stars to stray outside their tile. I could simply use multiple layered non-parallax starfields using this algorithm but this strikes me as cheating.
And, of course, I need to do all this every frame, so it's got to be fast.
What do you all reckon?
Have a few layers of stars.
For each layer, use a seeded random number generator (or just an array) to generate the amount of blank space between a star and the next one (a poisson distibution, if you want to be picky about it). You want the stars pretty sparse, so the blank space will often be more than whole row. The back layers will be more dense than the front ones, obviously.
Use this to give yourself several tiles each (say) two screens wide. Scroll the starfield by keeping track of where that "first" star is for each layer.
The player won't notice the tiling, because you scroll the tiles at different rates for each layer, especially if you use a few layers that are each fairly sparse.
As stars in the background don't move as fast as those in the foreground, you could maybe make multi-layer tiles for the background and replace them with one-layer-ones when you've got time to do that. Oh, and how about repeating patterns in the background layers? This would maybe allow you to pregenerate all background tiles - you could still shift them in height and overlay multiple ones with random offsets or so to make it look random.
Is there anything wrong with wrapping the star field around in X and Y? Because of your depth, the wraparound distance should depend on the depth, but you can do that. Each recorded star at (x,y,depth) should appear at all points
[x + j * S * depth, y + k * S * depth]
for all integers j and k. S is a wraparound parameter. If S is 1 then wraparound happens immediately and all stars are always shown somewhere. If S is higher wraparound doesn't happen immediately and some stars are shown off screen. You'll probably want S big enough to ensure no repeats at maximum zoom out.
Each frame, render the stars on one single bitmap/layer. They are only dots, and so it will be faster than using any algorithm with multiple layers.
Now you need an infinite 2D-grid of 3D-boxes filled with a finite number of stars. For each box, you can define an individual RANDOM_SEED value, using its grid-coordinates. The stars in each box can be generated on-the-fly.
Remember to correct the perspective when you zoom: Each 3D-box has a near-rectangle (front-face) and a far-rectangle. You will see more stars of neighbouring boxes, whenever the far-rectangle or near-rectangle shrinks in your view.
Your far-rectangles should never be smaller than half the width of the near-rectangles, otherwise it might be troublesome: You might have to scan huge lists of stars where most of them are out of bounds. You can realize stars behind the far-rectangles via additional 2D-grids of 3D-boxes with other sizes and depths.
Why not combine the coordinates of the starfield 3D boxes to form the random number seed? Use a global "adjustment" if you want to produce different universes. That way you don't need to track the boxes you can't see because the contents are fixed by their location.

Resources