Estimating difficulty of instances of NP problems - complexity-theory

NP problems look like they are suitable for use as trapdoor functions or proofs of work, since they are difficult to solve, but easy to verify. Unfortunately, they seem a little hard to use in adversarial settings where an opponent can control problem selection because while worst-case problems are NP, particular instances can be solved very quickly.
So: is there any algorithm which can take instances and estimate - more efficiently than trying to solve them - how hard or close to worst-case they are?
(The context is musing about a Bitcoin protocol where the proofs-of-work were reusable and not useless hash checks. The obvious approach is to have a central authority issue, for each transaction block, a NP instance which corresponds to a real-world problem. But the central authority could be subverted, and start issuing easy problems which would render the network vulnerable to double-spends. One could accept problems from multiple authorities, or anyone, but the chosen-easy problem remains. If there were some way to estimate the difficulty of any problem presented to the network, then 'too easy' problems could simply be ignored, falling back to the hash race if necessary.)
EDIT: jaxtr links me to "Predicting Satisfiability at the Phase Transition", which gives algorithms which estimate hardness at 70% accuracy - but they don't seem to investigate whether the algorithm can be deliberately fooled. (As well, one can apparently generate SAT problems with specified probabilities of being satisfiable.)

This is the same problem faced by researchers trying to create public-key encryption algorithms based on np-completeness. As far as I know, there are some stabs at it, but it's still an open problem. See the discussion here: Are there public key cryptography algorithms that are provably NP-hard to defeat?
I know I've seen more recent work, but can't find it offhand. I recall a book composed of articles about alternative cryptosystems should factorization suddenly become cheap, and I'll try to dig up the link.
Edit: The comment below points to the book I was thinking of. The website has lots of good references to various relevant papers. See the "code based" section in particular.

Related

Algorithm to do Minimization in Integer Programming

I understand that doing minimization in integer programming is a very complex problem. But what makes this problem so difficult?
If I were to (attempt) to write an algorithm to solve it, what would I need to take into account? I'm only familiar with the branch-and-bound technique for solving it and I'm wondering what sort of roadblocks I will face when attempting to apply this technique programatically.
I'm wondering what sort of roadblocks I will face when attempting to apply this technique programatically.
None in particular (assuming a fairly straightforward implementation without a lot of tricks). The algorithms aren’t complicated – they are complex, that’s a fundamental difference.
Techniques such as branch and bound or branch and cut try to prune the search tree and thus speed up the running time. But the whole problem tree is nevertheless exponentially large, hence the problem.
Like the other said, those problem are very hard and there are no simple solution nor simple algorithm that apply to all classes of problems.
The "classic" way of solving those problem is to do a branch-and-bound and apply the simplex algorithm at each node, as you say in your question. However, I would not recommand implementing this yourself if you are not an expert.
As for a lot of numerical methods, it is very hard to get it right (good parameter values, good optimisations), and a lot have been done (see CPLEX, COIN_OR, etc).
It's not that you can't do it: the branch-and-bound part is pretty straigtforward, but without all the tricks your program will be really slow.
Also, you will need a simplex implementation and this is not something you want to do yourself: you will have to use a third-part lib anyway.
Most likely, wether
if your data set is not that big (try it !), and you are not interested in solving it really fast: use something like COIN-OR or lp_solve with the default method, it will work;
if your data set is really big (and/or you need to find a solution quickly each time), you need to work with an expert in this field.
My main point is that only experienced people will know which algorithm will perform better on your problem, wich form of the model will be the easiest to solve, which method to apply and what kind of optimisations you can try.
If you are interested in those problems, I would recommend this book for an introduction to the math behind all this (with a lot of examples). It is incredibly expansive, so you may want to go to a library instead of buying it: Nemhauser and Wolsey.
Integer programming is NP-hard. That's why it is so difficult.
There is a tutorial that you might be interested.
The first thing you do before you solve any mathematical optimization problem is you categorize it. Except special cases, most of the time, integer programming problems will be np-hard. So instead of using an "algorithm", you will use a "heuristic". The final solution you will find will not be a guaranteed optimum, but it will be a pretty good solution for real life problems.
Your main roadblock will your programming skills. Heuristic programming requires a good level of programming understanding. So instead of programming your own heuristic you are better of using well known package (eg, COIN-OR, free). This way you can focus on your problem instead of the heuristic.

how to get started with TopCoder to update/develop algorithm skills?

at workplace, the work I do is hardly near to challenging and doing that I think I might be losing the skills to look at a completely new problem and think about different ideas to solve it.
A friend suggested TopCoder.com to me, but looking at the overwhelming number of problems I can not decide how to get started?
what I want is to sharpen my techniques ( not particular language or framework ).
The only way to get started would be to pick problems. Division I is the more difficult division, so you will probably find that the division I medium and hard problems will be somewhat interesting and challenging (unless you are quite clever.)
If you check the event calendar, you can see what algorithm competition rounds are coming up in your time zone. The competitions have the added virtue of forcing you to read and analyze other people's code in the challenge phase, so even if you would just as soon practice without a clock, you may find them interesting.
TopCoder algorithm contests are a way to develop your coding speed. Solving any of the problems in the practice arena is difficult unless you already have knowledge of various algorithms.
The problems on Project Euler suffer from the same flaw. You already have to know the algorithms to solve the problems in a reasonable time frame.
What I would suggest is to pick a project that you're interested in, and pursue it as you have time. As an example, I'm currently learning how to work with the open street map tiles in an Eclipse rich client platform.
Try whit http://projecteuler.net Problems difficulty can be assumed by number of solvers.
I prefer this page, because it is language invariant and problems are really challenging
You need the experience of solving 2 problems in any online judge (like http://www.spoj.com, http://www.lightoj.com, http://www.codeforces.com) in any programming language of your choice. That will give you an idea about how are your programs tested online.
Then you can follow this -> http://localboyfrommadurai.blogspot.in/2011/12/new-to-topcoder.html

Relating NP-Complete problems to real world problems

I have a decent grasp of NP Complete problems; that's not the issue. What I don't have is a good sense of where they turn up in "real" programming. Some (like knapsack and traveling salesman) are obvious, but others don't seem obviously connected to "real" problems.
I've had the experience several times of struggling with a difficult problem only to realize it is a well known NP Complete problem that has been researched extensively. If I had recognized the connection more quickly I could have saved quite a bit of time researching existing solutions to my specific problem.
Are there any resources (online or print) that specifically connect NP Complete to real world instances?
Edit:
For example, I was working on a program that tried to divide students into groups based on age, grade, and school of origin, which is essentially a graph partitioning problem. It took me a while to realize the connection.
I have found that Computers and Intractability is the definitive reference on this topic.
Usually the connection you are talking about must be extracted with a so-called reduction, for example you reduce 3-SAT to the problem you are working with and then you can conclude that your problem has the same complexity of it.
This passage is not trivial, since you have to prove that you can turn every problem instance l of a known NP-Hard problem L into an instance c of your problem C using a deterministic polinomyal algorithms.
So, except from learning basical correlations of common NP-Hard problems using your memory, there's no way to be sure if a problem is similar to another NP-Hard without first trying to guessing and then proving it, you have to be smart.
here is a wiki link:
http://wapedia.mobi/en/List_of_NP-complete_problems
Notice it says
This list is in no way comprehensive (there are more than 3000 known NP-complete problems)
probably it would be a great task if anyone could compile such list.
A theorist should try to understand/proof an NP-Complete/Hard problem. But, a programmer doesn't have that time to. He needs a list.
Am I correct?
I think you should google it. And, read through all the links. Add any new problem found in the link to your list.
Hope it helps
PS : Don't forget to post the list when you're finished :P
For developing better intuition the book "The Algorithm Design Manual, Second Edition" by Skiena (excerpts on google books) is simply great.
List in the back with problems
(including hard problems), that
include an illustration and a
discussion (often) with a real world
example.
Covers both the theoretical
and practical side of things, often
talking about actual code.
Read excepts online here (see some examples in chapters 14):
http://books.google.dk/books?id=7XUSn0IKQEgC&printsec=frontcover#v=onepage&q&f=false
Chapter 16 (not online) discusses some hard problems, including graph partition.

Prime Factorization

I have recently been reading about the general use of prime factors within cryptography. Everywhere i read, it states that there is no 'PUBLISHED' algorithm which operates in polynomial time (as opposed to exponential time), to find the prime factors of a key.
If an algorithm was discovered or published which did operate in polynomial time, then how would this impact in the real world computing environment as opposed to the world of theory and computer science. Considering the extent we depend on cryptography would the would suddenly come to halt.
With this in mind if P = NP is true, what might happen, how much do we depend on the fact that it is yet uproved.
I'm a beginner so please forgive any mistakes in my question, but i think you'll get my general gist.
With this in mind if N = NP is true, would they ever tell us.
Who are “they”? If it were true, we would know. The computer scientists? That’s us. The cryptographers and mathematicians? The professionals? The experts? People like us. Users of the Internet, even of Stack Overflow.
We wouldn’t need being told. We’d tell.
Science and research isn’t done behind closed doors. If someone finds out that P = NP, this couldn’t be kept secret, simply because of the way that research is published. In principle, everyone has access to such research.
It depends on who discovers it.
NSA and other organizations that research cryptography under state sponsorship, contrary to Konrad's assertion, do research and science behind closed doors—and guns. And they have "scooped" published academic researchers on some important discoveries. Finally, they have a history of withholding cryptanalytic advances for years after they are independently discovered by academic researchers.
I'm not big into conspiracy theories. But I'd be very surprised if a lot of "black" money hasn't been spent by governments on the factorization problem. And if any results are obtained, they would be kept secret. A lot of criticism has been leveled at agencies in the U.S. for failing to coordinate with each other to avert terrorism. It might be that notifying the FBI of information gathered by the NSA would reveal "too much" about the NSA's capabilities.
You might find the first question posed to Bruce Schneier in this interview interesting. The upshot is that NSA will always have an edge over academia, but that margin is shrinking.
For what it is worth, the NSA recommends the use of elliptic curve Diffie-Hellman key agreement, not RSA encryption. Do they like the smaller keys? Are they looking ahead to quantum computing? Or … ?
Keep in mind that factoring is not known to be (and is conjectured not to be) NP-complete, thus demonstrating a P algorithm for factoring will not imply P=NP. Presumably we could switch the foundation of our encryption algorithms to some NP-complete problem instead.
Here's an article about P = NP from the ACM: http://cacm.acm.org/magazines/2009/9/38904-the-status-of-the-p-versus-np-problem/fulltext
From the link:
Many focus on the negative, that if P
= NP then public-key cryptography becomes impossible. True, but what we
will gain from P = NP will make the
whole Internet look like a footnote in
history.
Since all the NP-complete optimization
problems become easy, everything will
be much more efficient. Transportation
of all forms will be scheduled
optimally to move people and goods
around quicker and cheaper.
Manufacturers can improve their
production to increase speed and
create less waste. And I'm just
scratching the surface.
Given this quote, I'm sure they would tell the world.
I think there were researchers in Canada(?) that were having good luck factoring large numbers with GPUs (or clusters of GPUs). It doesn't mean they were factored in polynomial time but the chip architecture was more favorable to factorization.
If a truly efficient algorithm for factoring composite numbers was discovered, I think the biggest immediate impact would be on e-commerce. Specifically, it would grind to a halt until a form of encryption was developed that doesn't rely on factoring being a one-way function.
There has been a lot of research into cryptography in the private sector for the past four decades. This was a big switch from the previous era, where crypto was largely in the purview of the military and secret government agencies. Those secret agencies definitely tried to resist this change, but once knowledge is discovered, it's very hard to keep it under wraps. With that in mind, I don't think a solution to the P = NP problem would remain a secret for long, despite any ramifications it might have in this one area. The potential benefits would be in a much wider range of applications.
Incidentally, there has been some research into quantum cryptography, which
relies on the foundations of quantum mechanics, in contrast to traditional public key cryptography which relies on the computational difficulty of certain mathematical functions, and cannot provide any indication of eavesdropping or guarantee of key security.
The first practical network using this technology went online in 2008.
As a side note, if you enter into the realm of quantum computing, you can factor in polynomial time. See Rob Pike's notes from his talk on quantum computing, page 25, also known as Shor's algorithm.

How to cultivate algorithm intuition?

When faced with a problem in software I usually see a solution right away. Of course, what I see is usually somewhat off, and I always need to sit down and design (admittedly, I usually don't design enough), but I get a certain intuition right away.
My problem is I don't get that same intuition when it comes to advanced algorithms. I feel much more up to the task of building another Facebook then building another Google search, or a Music Genom project. It's probably because I've been building software for quite some time, but I have little experience with composing algorithms.
I would like the community's advice on what to read and what projects to undertake to be better at composing algorithms.
(This question has nothing to do with Algorithmic composition. Well, almost nothing)
+1 To whoever said experience is the best teacher.
There are several online portals which have a lot of programming problems, that you can submit your own solutions to, and get an automated pass/fail indication.
http://www.spoj.pl/
http://uva.onlinejudge.org/
http://www.topcoder.com/tc
http://code.google.com/codejam/contests.html
http://projecteuler.net/
https://codeforces.com
https://leetcode.com
The USACO training site is the training program that all USA computing olympiad participants go through. It goes step by step, introducing more and more complex algorithms as you go.
You might find it helpful to perform algorithms physically. For example, when you're studying sorting algorithms, practice doing each one with a deck of cards. That will activate different parts of your brain than reading or programming alone will.
Steve Yegge referred to "The Algorithm Design Manual" in one of his rants. I haven't seen it myself, but it sounds like it's just the ticket from his description.
My absolute favorite for this kind of interview preparation is Steven Skiena's The Algorithm Design Manual. More than any other book it helped me understand just how astonishingly commonplace (and important) graph problems are – they should be part of every working programmer's toolkit. The book also covers basic data structures and sorting algorithms, which is a nice bonus. But the gold mine is the second half of the book, which is a sort of encyclopedia of 1-pagers on zillions of useful problems and various ways to solve them, without too much detail. Almost every 1-pager has a simple picture, making it easy to remember. This is a great way to learn how to identify hundreds of problem types.
problem domain
First you must understand the problem domain. An elegant solution to the wrong problem is no good, nor is an inefficient solution to the right problem in most cases. Solution quality, in other words, is often relative. A simple scheduling problem that has a deterministic solution that takes ten minutes to run may be fine if schedules are realculated once per week, but if schedules change several times a day then a genetic algorithm solution that converges in a few seconds may be required.
decomposition and mapping
Second, decompose the problem into sub-problems and known/unknown elements that correspond to elements of the solution. Sometimes this is obvious, e.g. to count widgets you need a way of identifying widgets, an incrementable counter, and a way of storing the count. Sometimes it is not so obvious. Sometimes you have to decompose the problem, the domain, and possible solutions at the same time and try several different mappings between them to find one that leads to the correct results [this is the general method].
model
Model the solution, in your head at least, and walk through it to see if it works correctly. Adjust as necessary (See decomposition and mapping, above).
composition/interfaces
Many times you can find elements of the problem and elements of the solution that map to each other and produce partial results that are useful. This composition and interface construction provides the kernal of the solution, and also serves to reduce the scope of the problem remaining. So then you just loop back to the top with a smaller initial problem, and go through it again.
experience
Experience is the best teacher, of course, but reading about different kinds of problems and solutions will also be helpful. Studying some of the well-known algorithms and their applications is likewise very helpful, e.g. Dijkstra, Bresenham, Unification, and of course, graph theory.
I am not sure intuition can be cultivated, but I think I know what you are asking. The more problems you solve, the more information and experience you have at your disposal for future problems. So, I say just practice. Practice programming real world applications and you run into plenty of problems. Sometimes, solving puzzles can be very educational as well.
I try to find physical analogues when I'm looking at a complex problem.

Resources