Utilizing algorithms in academic papers - algorithm

I'm unclear about legal status of utilizing an algorithm from a published academic paper. Is there an implicit patent over that material? How about open source applications? Would it be ok to implement that algorithm in an open source app, with one of free software licenses?
Let's say I have access to paper A which describes algorithm B. How can I determine if I can use algorithm B in my commercial closed-source app C or open source app D? Is the answer always "no"? Is there an expiration date?

There's no such thing as an "implicit patent".
Unfortunately, I'd think you need to evaluate the IP restrictions for each paper.
One of the more famous situations where the algorithm described in an academic paper was ultimately encumbered by a patent as the RSA asymmetric encryption algorithm. A paper, "On digital signatures and public-key cryptosystems", was published in an ACM journal in 1977 describing the algorithm, a patent was awarded in 1983 (US Patent 4,405,829).
I have not read the paper, so I don't know if the application for a patent was mentioned - I do know that the algorithm was rather widely implemented, then when the patent was awarded MIT/RSADSI started enforcing it. The patent became a rather big issue with PGP, for example. MIT/RSADSI ultimately permitted free use of the patent for non-commercial use, I believe. A similar situation occurred with the LZW compression.

AFAIK, Having something published in a publicly-accessible way (e.g., academic paper) actually eliminates the possibility of a patent in Europe. In the US, there is a one year grace period from first publication to obtain a patent. Students and professors usually figure out the potential, and they apply to the university's technology transfer office that figures out what to do with it. This, for example, was the case with Captchas. If the paper is by commercial company (e.g., IBM), it is much more likely to get patented or to have additional protections because employees at Research are evaluated also based on number of patent applications.
The problem is that there is often no way to know because the patent lawyers will usually write a general name that has no contact to the original idea. So contacting the author just in case may be prudent. The author may also have an existing implementation available, often open-source.

This is what happened with GIF, the compression algorithm was published openly but without mentioning that it had been patented.
Even if the authors of the paper claim not to have patented it - there is no guarantee that another algorithm hasn't already been patented and a court might decide that it covers that work.
Remember as well that inventing it yourself doesn't protect you from patents - you can invent an algorithm that you have never seen in print and later discover it's covered by some patent.

As Michael said, there's no such thing as an implicit patent.
Generally, when an algorithm or any other research is published, the authors want to put it out into the world for other people to use. If you have access to a published paper which describes some algorithm, it's probably going to be okay for you to use it in any program you might create. However, some algorithms may be patented by their inventors before being published, and in that case if you want to use the algorithm you would have to come to some arrangement with the patent holder, regardless of whether your application is open-source or not. Basically to be safe, you need to check the specific restrictions on that algorithm. (The type of people who publish papers about algorithms are often more likely to let you use their algorithms if your application is open source than if it's proprietary.)

Related

Spontaneous emergence of replicators in Artificial Life

One of the corner stones of The Selfish Gene (Dawkins) is the spontaneous emergence of replicators, i.e. molecules capable of replicating themselves.
Has this been modeled in silico in open-ended evolutionary / artificial life simulations?
Systems like Avida or Tierra explicitly specify the replication mechanisms; other genetic algorithm/genetic programming systems explicitly search for the replication mechanisms (e.g. to simplify the von Neumann universal constructor)
Links to simulations where replicators emerge from a primordial digital soup are welcome.
The software system Amoeba by Andrew Pargellis has studying the origins of life as an explicit goal; he saw interesting patterns developing first, that eventually turned into self-replicators.
More recently (well, 2011) Evan Dorn used Avida to study a range of questions to do with the origin of life, with a focus on astrobiology. They wanted to examine how chemical distributions changed as abotic environments shifted to biotic so that astronomers would know what to look for.
For a good starting point, take a look at:
https://link.springer.com/article/10.1007/s00239-011-9429-4

Formally verifying the correctness of an algorithm

First of all, is this only possible on algorithms which have no side effects?
Secondly, where could I learn about this process, any good books, articles, etc?
COQ is a proof assistant that produces correct ocaml output. It's pretty complicated though. I never got around to looking at it, but my coworker started and then stopped using it after two months. It was mostly because he wanted to get things done quicker, but if you need to verify an algorithm this might be a good idea.
Here is a course that uses COQ and talks about proving algorithms.
And here is a tutorial about writing academic papers in COQ.
It's generally a lot easier to verify/prove correctness when no side effects are involved, but it's not an absolute requirement.
You might want to look at some of the documentation for a formal specification language like Z. A formal specification isn't a proof itself, but is often the basis for one.
I think that verifying the correctness of an algorithm would be validating its conformance with a specification. There is a branch of theoretical Computer Science called Formal Methods which may be what you are looking for if you need to get as close to proof as you can. From wikipedia,
Formal Methods are a particular kind
of mathematically-based techniques for
the specification, development and
verification of software and hardware
systems
You will be able to find many learning resources and tools from the multitude of links on the linked Wikipedia page and from the Formal Methods wiki.
Usually proofs of correctness are very specific to the algorithm at hand.
However, there are several well known tricks that are used and re-used again. For example, with recursive algorithms you can use loop invariants.
Another common trick is reducing the original problem to a problem for which your algorithm's proof of correctness is easier to show, then either generalizing the easier problem or showing that the easier problem can be translated to a solution to the original problem. Here is a description.
If you have a particular algorithm in mind, you may do better in asking how to construct a proof for that algorithm rather than a general answer.
Buy these books: http://www.amazon.com/Science-Programming-Monographs-Computer/dp/0387964800
The Gries book, Scientific Programming is great stuff. Patient, thorough, complete.
Logic in Computer Science, by Huth and Ryan, gives a reasonably readable overview of modern systems for verifying systems. Once upon a time people talked about proving programs correct - with programming languages which may or may not have side effects. The impression I get from this book and elsewhere is that real applications are different - for instance proving that a protocol is correct, or that a chip's floating point unit can divide correctly, or that a lock-free routine for manipulating linked lists is correct.
ACM Computing Surveys Vol 41 Issue 4 (October 2009) is a special issue on software verification. It looks like you can get to at least one of the papers without an ACM account by searching for "Formal Methods: Practice and Experience".
The tool Frama-C, for which Elazar suggests a demo video in the comments, gives you a specification language, ACSL, for writing function contracts and various analyzers for verifying that a C function satisfies its contract and safety properties such as the absence of run-time errors.
An extended tutorial, ACSL by example, shows examples of actual C algorithms being specified and verified, and separates the side-effect-free functions from the effectful ones (the side-effect-free ones are considered easier and come first in the tutorial). This document is also interesting in that it was not written by the designers of the tools it describe, so it gives a fresher and more didactic look at these techniques.
If you are familiar with LISP then you should definitely check out ACL2: http://www.cs.utexas.edu/~moore/acl2/acl2-doc.html
Dijkstra's Discipline of Programming and his EWDs lay the foundation for formal verification as a science in programming. A simpler work is Wirth's Systematic Programming, which begins with the simple approach to using verification. Wirth uses pre-ISO Pascal for the language; Dijkstra uses an Algol-68-like formalism called Guarded (GCL). Formal verification has matured since Dijkstra and Hoare, but these older texts may still be a good starting point.
PVS tool developed by Stanford guys is a specification and verification system. I worked on it and found it very useful for Theoram Proving.
WRT (1), you will probably have to create a model of the algorithm in a way that "captures" the side-effects of the algorithm in a program variable intended to model such state-based side-effects.

Prime Factorization

I have recently been reading about the general use of prime factors within cryptography. Everywhere i read, it states that there is no 'PUBLISHED' algorithm which operates in polynomial time (as opposed to exponential time), to find the prime factors of a key.
If an algorithm was discovered or published which did operate in polynomial time, then how would this impact in the real world computing environment as opposed to the world of theory and computer science. Considering the extent we depend on cryptography would the would suddenly come to halt.
With this in mind if P = NP is true, what might happen, how much do we depend on the fact that it is yet uproved.
I'm a beginner so please forgive any mistakes in my question, but i think you'll get my general gist.
With this in mind if N = NP is true, would they ever tell us.
Who are “they”? If it were true, we would know. The computer scientists? That’s us. The cryptographers and mathematicians? The professionals? The experts? People like us. Users of the Internet, even of Stack Overflow.
We wouldn’t need being told. We’d tell.
Science and research isn’t done behind closed doors. If someone finds out that P = NP, this couldn’t be kept secret, simply because of the way that research is published. In principle, everyone has access to such research.
It depends on who discovers it.
NSA and other organizations that research cryptography under state sponsorship, contrary to Konrad's assertion, do research and science behind closed doors—and guns. And they have "scooped" published academic researchers on some important discoveries. Finally, they have a history of withholding cryptanalytic advances for years after they are independently discovered by academic researchers.
I'm not big into conspiracy theories. But I'd be very surprised if a lot of "black" money hasn't been spent by governments on the factorization problem. And if any results are obtained, they would be kept secret. A lot of criticism has been leveled at agencies in the U.S. for failing to coordinate with each other to avert terrorism. It might be that notifying the FBI of information gathered by the NSA would reveal "too much" about the NSA's capabilities.
You might find the first question posed to Bruce Schneier in this interview interesting. The upshot is that NSA will always have an edge over academia, but that margin is shrinking.
For what it is worth, the NSA recommends the use of elliptic curve Diffie-Hellman key agreement, not RSA encryption. Do they like the smaller keys? Are they looking ahead to quantum computing? Or … ?
Keep in mind that factoring is not known to be (and is conjectured not to be) NP-complete, thus demonstrating a P algorithm for factoring will not imply P=NP. Presumably we could switch the foundation of our encryption algorithms to some NP-complete problem instead.
Here's an article about P = NP from the ACM: http://cacm.acm.org/magazines/2009/9/38904-the-status-of-the-p-versus-np-problem/fulltext
From the link:
Many focus on the negative, that if P
= NP then public-key cryptography becomes impossible. True, but what we
will gain from P = NP will make the
whole Internet look like a footnote in
history.
Since all the NP-complete optimization
problems become easy, everything will
be much more efficient. Transportation
of all forms will be scheduled
optimally to move people and goods
around quicker and cheaper.
Manufacturers can improve their
production to increase speed and
create less waste. And I'm just
scratching the surface.
Given this quote, I'm sure they would tell the world.
I think there were researchers in Canada(?) that were having good luck factoring large numbers with GPUs (or clusters of GPUs). It doesn't mean they were factored in polynomial time but the chip architecture was more favorable to factorization.
If a truly efficient algorithm for factoring composite numbers was discovered, I think the biggest immediate impact would be on e-commerce. Specifically, it would grind to a halt until a form of encryption was developed that doesn't rely on factoring being a one-way function.
There has been a lot of research into cryptography in the private sector for the past four decades. This was a big switch from the previous era, where crypto was largely in the purview of the military and secret government agencies. Those secret agencies definitely tried to resist this change, but once knowledge is discovered, it's very hard to keep it under wraps. With that in mind, I don't think a solution to the P = NP problem would remain a secret for long, despite any ramifications it might have in this one area. The potential benefits would be in a much wider range of applications.
Incidentally, there has been some research into quantum cryptography, which
relies on the foundations of quantum mechanics, in contrast to traditional public key cryptography which relies on the computational difficulty of certain mathematical functions, and cannot provide any indication of eavesdropping or guarantee of key security.
The first practical network using this technology went online in 2008.
As a side note, if you enter into the realm of quantum computing, you can factor in polynomial time. See Rob Pike's notes from his talk on quantum computing, page 25, also known as Shor's algorithm.

How would I figure out if there are other algorithms similar to mine?

In another question, I asked something similar but I ended up just posting my algorithm there and invalidating several answers. I re-ask it here:
If I "invented" an algorithm, what's the best way for me to figure out if it's already been published about/patented?
You would need to do some searching. Starting with Google search, generically, will often be sufficient to reassure you that your algorithm is not novel. If that is not conclusive, then you need to search harder, perhaps looking at searching various patent sites (Google, USPTO, other places too). If you still don't find anything, then maybe your algorithm is novel.
Next questions: is it worth it to you to try and patent it, or get someone else to patent it for you (a company, for example)? Indeed, can you patent it or does your employer already own it? This will depend in part on how likely it is that everyone else will want to use the same algorithm. The chances are, they won't. If you patent it, they will ignore it until the patent expires.
If you do find a way to afford getting the patent filed - and issued (which is not automatic just because you filed) - then you face enforcing your patent. Will you be able to identify and prosecute those who abuse your patent? If not, was it worth chasing it? Maybe, maybe not; but probably not.
Finally, note that you cannot actually patent a pure algorithm. You would have to reduce it to practice. That isn't as hard as it seems, but just be aware that pure mathematical algorithms are inherently non-patentable.
In summary:
You will probably find someone else already thought of it.
If you decide to patent it because it is novel, you need money.
You need money to file for the patent.
You need money to pursue those who abuse your patent.
You would probably be better off just publishing it.
Most often you basically just have to do back ground research in the given area. This is why when academics do research projects they start of by learning about the history (back ground) of the area all the way up to the current methods or theories being used. It also helps to ask someone who knows the area and has worked in it for many years.
Well, if it's in a textbook like your algorithm seems to have been (Dijkstra), then it definitely already exists in the public domain and cannot be patented. How you use the algorithm in your application as a whole might be, but most abstract ideas or implementations thereof (such as "finding the shortest path between two nodes") cannot be patented.
Or, you could waste a bunch of money and submit a patent and see what happens :)
In all seriousness though, you might start by searching for existing patents, or read up on some articles like this one to get a better feel for the patent process.
Tracking down every algorithm for a particular problem would be quite daunting. A better process might be to track down the best solutions known for the problem and compare them with yours.
I would start with Wikipedia. I know people say "don't use wiki for research", but it's pretty good at computer science (all those geeks contributing), and it will tell you pretty quickly what the best widely-known algorithms are. If you've got something strictly better than the algorithms you can find in Wikipedia, then it might be worth looking further. If Wikipedia's got something strictly better than your algorithm, then you've invented a curiosity at best and probably won't get rich or famous from it.
Next, check the references at the bottom; they may lead you to papers (which will have more references that you can follow), or to academics' websites (which might have links). Also go to Citeseer and search for key words.
Unfortunately, there's no real replacement for having some basic knowledge. If you've invented (for instance) a graph-theoretic algorithm, but you don't know the language of graph-theory, then you'll struggle to find it because you won't know where to start looking. You might profitably spend your time reading an algorithms textbook -- that will give you an overview of good algorithms and how to speak about them.
If an algorithm or method can not be found right away (wikipedia/google), i find it rewarding to scan academic/engineering websites (web of science, ieee explore, acm etc.) for 'review' papers. If recent, they can give a solid overview over the field (e.g. graph search) mentioning books, papers and conferences. After that one can focus the search on particular methods.

Have you ever used a genetic algorithm in real-world applications?

I was wondering how common it is to find genetic algorithm approaches in commercial code.
It always seemed to me that some kinds of schedulers could benefit from a GA engine, as a supplement to the main algorithm.
Genetic Algorithms have been widely used commercially. Optimizing train routing was an early application. More recently fighter planes have used GAs to optimize wing designs. I have used GAs extensively at work to generate solutions to problems that have an extremely large search space.
Many problems are unlikely to benefit from GAs. I disagree with Thomas that they are too hard to understand. A GA is actually very simple. We found that there is a huge amount of knowledge to be gained from optimizing the GA to a particular problem that might be difficult and as always managing large amounts of parallel computation continue to be a problem for many programmers.
A problem that would benefit from a GA is going to have the following characteristics:
A good way to encode potential solutions
A way to compute an a numerical score to evaluate the quality of the solution
A large multi-dimensional search space where the answer is non-obvious
A good solution is good enough and a perfect solution is not required
There are many problems that could probably benefit from GAs and in the future they will probably be more widely deployed. I believe that GAs are used in cutting edge engineering more than people think however most people (like my company does) guards those secrets extremely closely. It is only long after the fact that it is revealed that GAs were used.
Most people that deal with "normal" applications probably don't have much use for them though.
If you want to find an example, look at Postgres's Query Planner. It uses many techniques, and one just so happens to be genetic.
http://developer.postgresql.org/pgdocs/postgres/geqo-pg-intro.html
I used GA in my Master's thesis, but after that I haven't found anything in my daily work a GA could solve that I couldn't solve faster with some other Algorithm.
I don't think it is particularly common to find genetic algorithms in everyday-commercial code. They are more commonly found in academic/research code where the need to find the "best algorithm" is less important than the need to just find a good solution to a problem.
Nonetheless, I have consulted on a couple of commercial projects that do use GAs (chiefly as a result of my involvement with GAUL). I think the most interesting example was at a Biotech company. They used the GA to optimise scoring functions that were used for virtual screening, as part of their drug discovery application.
Earlier this year, with my current company, I added a new feature to one of our products that uses another GA. I think we might be marketing this from next month. Basically, the GA is used to explore molecules that have the potential for binding to a protein, and could therefore be further investigated as drugs targeting that protein. A competing product that also uses a GA is EA inventor.
As part of my thesis I wrote a generic java framework for the multi-objective optimisation algorithm mPOEMS (Multiobjective prototype optimization with evolved improvement steps), which is a GA using evolutionary concepts. It is generic in a way that all problem-independent parts have been separated from the problem-dependent parts, and an interface is povided to use the framework with only adding the problem-dependent parts. Thus one who wants to use the algorithm does not have to begin from zero, and it facilitates work a lot.
You can find the code here.
The solutions which you can find with this algorithm have been compared in a scientific work with state-of-the-art algorithms SPEA-2 and NSGA, and it has been proven that
the algorithm performes comparable or even better, depending on the metrics you take to measure the performance, and especially depending on the optimization-problem you are looking on.
You can find it here.
Also as part of my thesis and proof of work I applied this framework to the project selection problem found in portfolio management. It is about selecting the projects which add the most value to the company, support most the strategy of the company or support any other arbitrary goal. E.g. selection of a certain number of projects from a specific category, or maximization of project synergies, ...
My thesis which applies this framework to the project selection problem:
http://www.ub.tuwien.ac.at/dipl/2008/AC05038968.pdf
After that I worked in a portfolio management department in one of the fortune 500, where they used a commercial software which also applied a GA to the project selection problem / portfolio optimization.
Further resources:
The documentation of the framework:
http://thomaskremmel.com/mpoems/mpoems_in_java_documentation.pdf
mPOEMS presentation paper:
http://portal.acm.org/citation.cfm?id=1792634.1792653
Actually with a bit of enthusiasm everybody could easily adapt the code of the generic framework to an arbitrary multi-objective optimisation problem.
I haven't but I've heard of this company (can't remember their name) which uses mutating, genetic algos to calculate placements and lengths of antennas (or something) from a friend of mine. And they're supposed to (according to my friend) have huge success with this. I guess GA is just too complex for "average Joe developer" to become mainstream. Kind of like Map Reduce - spectacularly cool, but WAY too advanced to hit the "mainstream"...
LibreOffice Calc uses it in its Solver module.

Resources