Image Hash for very similar images [closed] - image

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am taking screenshots of an application, and trying to detect if the exact image has been seen before. I am looking to detect trivial changes as different - e.g. if there is text in the image, and the spelling changes, that counts as a mismatch.
I've been successfully using an MD5 hash of the contents of an screen-shot image to lookup in a database of known images, and detect if it has been seen before.
Now, I have ported it to another machine, and despite my attempts to exactly match configurations, I am getting ever-so-slightly different images to the older machine. When I say different, the changes are minute - if I blow up the old and new images and flick between then, I can't see a single difference! Nonetheless, ImageMagick's compare command can see a smattering of pixels that are different.
So my MD5 hashes are no longer matching. Rather than a simple MD5 hash, I need an image hash.
Doing my research, I find that most of the image hashes try to be fairly generous - they accept resized, transformed and watermarked images, with a corresponding false positive matches. I want an image hash that is far more strict - the only changes permitted are minute changes in colour.
Can anyone recommend an image hash library or algorithm? (Not an application, like dupdetector).
Remember: My requirements are different from the many similar questions in that I don't want a liberal algorithm like shrinking or pHash, and I don't want a comparison tool like structural similarity or ImageMagick's compare.
I want a hash that makes very similar images give the same hash value. Is that even possible?

You can have a look at the following paper called "Spectral hashing". It is an algorithm that is designed to produce hash codes from images in order to group together similar images (see the retrieval examples at the end of the paper). It is a good starting point.
The link: http://www.cs.huji.ac.il/~yweiss/SpectralHashing/

Related

Approach to speedup DB-centric app [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We have an application where in I have 65 GB of Data in MSSQL Server.
with around 250 tables and 1000 stored procedures and functions.
Now the application is complete DB specific with almost all the logic coded in procedures and functions. Some of the Stored procs take as long as over 4-5 minutes to execute. Now we have been given the task of optimizing/re-engineering these slow running stored procs.
We have not much info about the project/schema/design but we have access to the schema and data and we fortunately have to deal with just a module to optimize which is slow. (But that deals with many SPs and functions running over 1000 of lines.. encompassing application logic..)
My question is how do I get started with such a project. We have been set some unrealistic deadline of coming up with fixes in 2-3 days and i have already spent a day in setting things up!
What should be the approach:
Suggest increase in hardware infrastructure.
Re-engineer app (push some of the computations to the app side) make it less DB-centric ?
Ask for more time (how much) to optimize this ? Funny thing is we are not the original coders and have very less idea about the App i.e. whats coded in the SPs and functions.
Thanks
You'll need to know the problem areas before you can attempt any fixes.
You say you are just looking at one module to begin with, then I'd suggest using things like SQL Profiler to determine the frequency with which statements are executed and also times taken to execute and use this data as a starting point to see if the logic can be optimised.
Look for any operations that use cursors that could possibly benefit from a more set based approach.
As for your three options, I'd say you HAVE to go for (3) because you've stated you don't have a thorough understanding of the app, so you'll need to gain some further exposure in order to establish where to focus your efforts. I don't think (1) is a long term solution although it would obviously provide some benefit (how much determines current and proposed specs). You'll only have an idea if (2) is a valid option once you've had a chance to establish the problem areas first.
Best of luck.

Visualizing large data sets with Hadoop [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking for a framework, a combination of frameworks, best-practices, or a tutorial about visualizing large data sets with Hadoop.
I am not looking for a framework to visualize the mechanics of running Hadoop jobs or managing disk space on Hadoop. I am looking for an approach or a guideline for visualizing the data contained within HDFS using graphs and charts, etc.
For example, let's say I have a set of data points stored in multiple files in HDFS, and I would like to show a histogram of the data. Is my only option to write a custom map/reduce job that would try and figure out which points fall into which bucket, write the totals to a file, and then use a plotting library to visualize that?
Do I need to roll out a custom solution, or is there anyone else doing this sort of thing out there? I've trying looking online, but I haven't been able to find something that directly relates to this.
Thank you for your help
We do something like this at Datameer. The files would take a few more processing steps to get to our visualizations, but we run natively on Hadoop so the files would not be far away.

How to write a desktop app that filters test questions according to topic [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What programming language/method would be best suited to writing a desktop app that
filters question types and displays a listing of those questions to view.
For example, if I have a mix algebra, geometry, and calculus questions stored in the app,
I should be able to select just the algebra questions to view and print.
I have a little experience with python/django but I've never made a desktop app before.
You have lots of options. You will need to make several design decisions before you move forward. Things to consider are:
Which technologies do you feel comfortable with?
How much time/effort do you want to put into the project?
Are you willing to spend money on tools?
Etc.
That being said, the rest of this answer is to give you some options to consider:
You'll need a data structure which can filter the problems for you.
From your description, the first thing I thought of was using a
database, however I'm not sure if you are familar with databases, in
which case you'd have to create some classes/structs that would allow for you to do the filtering yourself. Some options for databases are SQL Express, Oracle, MySQL, DB2, and many more.
Another thing to consider is you mentioned several different type of
math problems. You'd want to consider how you would be displaying
the problems. Mathematica formats math problems nicely, but if you
wanted to go down this road, you'd either have to find a tool that
would allow you to display that math problems in a syntax like
Mathematica or do exports/screen shots of the problems and have those as
part of your program.
Another option would be to try to find a
language that has some sort of plugin for TeX or LaTeX (For example,
you can see how wikipedia allows for nice math formatting here:
http://en.wikipedia.org/wiki/Help:Displaying_a_formula
This sounds like a good pet project to play with to learn different technologies. If that is the intent, great. If not, then you might want to do some googling to see if someone else has already created what you are looking for.

What algorithm to choose [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
asked in a recent interview:
What data structure would you use to implement spell correction in a document. The goal is to find if a given word typed by the user is in the dictionary or not (no need to correct it).
What is the complexity?
I would use a "Radix," or "Patricia," tree to index the dictionary. See here, including an example of its use to index dictionary words: https://secure.wikimedia.org/wikipedia/en/wiki/Radix_tree. There is a useful discussion at that link of its complexity.
if I'm understanding the question correctly, you are given a dictionary (or a list of "correct" words), and are asked to specify whether an input word is in the dictionary. So you're looking for data structures with very fast lookup times. I would go with a hash table
I would use a DAWG (Directed Acyclic Word Graph) which is basically a compressed Trie.
These are commonly used in algorithms for Scrabble and other words games, like Boggle.
I've done this before. The TWL06 Scrabble dictionary with 170,000 words fits in a 700 KB structure both on disk and in RAM.
The Levenshtein distance tells you how many letters you need to change to get from one string to another ... by finding the one with less substitutions you are able to provide correct words (also see Damerau Levenshtein distance)
The increase performance you should not calculate the distance against your whole dictionary and constrain it with some heuristic, for instance words that start with same first letter.
Bloom Filter. False positives are possible, but false negatives are not. As you know the dictionary in advance you can eliminate the false negatives by using a perfect hash for your input.(dictionary). Or you can use this as an auxiliary data structure behind your actual dictionary data structure.
edit: Of course complexity is O(1) for bloom filter.

Alternatives To The Treeview [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
In my opinion treeviews are overused, therefore I don't really care for them. Sometimes they're necessary but I can imagine that one could always find a good alternative to the standard treeview.
What are some other innovative ways to display hierarchical information that convey the same information without the drab of a treeview? Which one(s) are the best? Should I just be happy with the treeview because that's what everyone knows how to use?
Take a look at Quince for some UI (they call it UX) inspiration. Search for hierarchical.
Examples include patterns such as Cascading Lists and TreeMap.
From those, you can click the "related" button to see even more ideas.
UPDATE: 2014-Sep-21, Sad news from Infragistics: "Quince Pro - We are officially retiring this product." More on their blog under "Product Status Change Notifications". I hope they retain it for a while as reference!
First off - I don't necessarily agree that TreeView's suck. TreeView is a fairly clean, standard, understandable way for people to work with a hierarchy of items.
That being said, there are many other alteratives. You can have multiple lists, with a way to go up/down in the tree. You can have something like Vista's file browsing, where you have an address area with a list under, and can drill down. There are many other options.
TreeViews end up being used in many cases, though, because it's one of the more concise ways of displaying a hierarchy, and it's obvious that you're looking at hierarchical data.
What I find works well is a combination of more advanced controls and tree views combined together. For example, take Outlooks explorer bar setup. I think that works well.

Resources