Kolmogorov complexity - computation-theory

I'll be more than thankful if anyone would be able to explain me how Kolmogorov complexity is related to randomness, and random inputs.
Another thing that I can't understand - we know that calculating Kolmogorov complexity of a given input X isn't decidable. Given that, how can it be a measure of randomness?
thanks

Kolmogorov random is a particular definition of the vague intuitive concept of 'random' - which other definition are you referring to when asking for relationship (for reference http://en.wikipedia.org/wiki/Random_number)?
I don't follow your thought pattern in asking why one would have to be able to determine in the general case which strings are Kolmogorov random in order for the concept to be well-defined. Could you elaborate on what is giving you trouble? If nothing else, allow me to point you to the halting problem - certainly the concept of a program halting is well-defined even though there can be no algorithm for determining, in the general case, whether a particular program exhibits the property.

You should think the other way round. If something is not random, it means it follows some law, and this law can give a simpler description of the information. Think of zip: instead of the file you give a procedure to generate the file that is usually more compact than the source file. This is possible because the source file contained some order: if the source file was a random sequence of characters no compression would have been possible.
If you are interested in this topic, I strongly recommend "Computability and Randomness" by Andre' Nies, Oxford Logic Guides n.51.

Related

What is the difference between Genetic Algorithm and Iterated Local Search Algorithm?

I'm basically trying to use Genetic Algorithm or Iterated Local Search Algorithm to get an optimal solution for a question.Can someone please explain what is the basic difference between these two algorithms and is there any situations where one of them is better than the other?
Let me start from the second question. I believe that there is no way to determine a better algorithm for a given problem without any trials and tests. The behavior of an algorithm heavily depends on problem's properties. If we are talking about complex problems with hundreds and thousands of variables, it's just too difficult to predict anything. I'm not talking about your engineer's intuition, some deep problem understanding, previous experience, etc, they are not really measurable.
The main difference between global and local search is quite straightforward - local search considers just one or a few of possible solutions at a single point of time and it tries to improve them with some modifications. Thus, each iteration it considers just a small portion of a search space (=local neighboorhood). Global search tries to take into account whole problem with all its parameters at the same time. For example, PSO samples huge amount of candidates and tries to move all of them into the global optimum's direction using some simple formula.

Algorithmic solution to Minesweeper

I am trying to make the minesweeper solver. As you know there are 2 ways to determine which fields in minefield are safe to open, or to determine which fields are mined and you need to flag it. First way to determine is trivial and we have something like this:
if (number of mines around X – current number of discovered mines around X) = number of unopened fields around X then
All unopened fields around X are mined
if (number of mines around X == current number of discovered mines around X) then
All unopened fields around X are NOT mined
But my question is: What about situation when we can't find any mined or safe field and we need to look at more than 1 field?
http://img541.imageshack.us/img541/4339/10299095.png
For example this situation. We can't determine anything using previous method. So i need a help with algorithm for these cases.
I have to use A* algorithm to make this. That is why i need all possible safe states for next step in algorithm. When i find all possible safe states i will add them to the current shortest path and depending on heuristic function i will sort list of paths and choose next field that needs to be opened.
Awesome problem, before you get too excited though, please read NP Completeness and Minesweeper, as well as the accompanying presentation which develops some good worst case examples and how a human might solve them. Nevertheless, in expectation we most likely won't hit a time barrier, if we use basic pruning and heuristics.
The question of generating the game is asked here: Minesweeper solving algorithm. There is a very cool post on algebraic methods. You can also give backtracking a try (i.e. take a guess and see if that invalidates things), similar to the case where local information is not enough for something like sudoku. See this great discussion about this technique.
As #tigger said this is not a problem that can be solved with a simple set of rules. Minesweeper is a good example where backtracking algorithms such as DPLL is useful. With something as simple as propositional logic, you can implement a very efficient solver for minesweeper. I am not sure if you are familiar with AI reasoning & logic inference - If not, you might want to have a look at the book "Artificial Intelligence - A Modern Approach" by Stuart Russel and Peter Norvig. For quick reference of DPLL and propositional logic, search "wumpus world propositional logic" on Google.

optimum finding in Genetic algorithms

I am implementing my M.Sc dissertation and in theory aspect of my thesis, i have a big problem.
suppose we want to use genetic algorithms.
we have 2 kind of functions :
a) some functions that have relations like this : ||x1 - x2||>>||f(x1) - f(x2)||
for example : y=(1/10)x^2
b) some functions that have relations like this : ||x1 - x2||<<||f(x1) - f(x2)||
for example : y=x^2
my question is that which of the above kind of functions have more difficulties than other when we want to use genetic algorithms to find optimum ( never mind MINIMUM or MAXIMUM ).
Thank you a lot,
Armin
I don't believe you can answer this question in general without imposing additional constraints.
It's going to depend on the particular type of genetic algorithm you're dealing with. If you use fitness proportional (roulette-wheel) selection, then altering the range of fitness values can matter a great deal. With tournament selection or rank-biased selection, as long as the ordering relations hold between individuals, there will be no effects.
Even if you can say that it does matter, it's still going to be difficult to say which version is harder for the GA. The main effect will be on selection pressure, which causes the algorithm to converge more or less quickly. Is that good or bad? It depends. For a function like f(x)=x^2, converging as fast as possible is probably great, because there's only one optimum, so find it as soon as possible. For a more complex function, slower convergence can be required to find good solutions. So for any given function, scaling and/or translating the fitness values may or may not make a difference, and if it does, the difference may or may not be helpful.
There's probably also a No Free Lunch argument that no single best choice exists over all problems and optimization algorithms.
I'd be happy to be corrected, but I don't believe you can say one way or the other without specifying much more precisely exactly what class of algorithms and problems you're focusing on.

Probability transition matrix

I'm working on Markov Chains and I would like to know of efficient algorithms for constructing probabilistic transition matrices (of order n), given a text file as input.
I am not after one algorithm, but I'd rather like to build a list of such algorithms. Papers on such algorithms are also more than welcome, as any tips on terminology, etc. Notice that this topic bears a strong resemblance with n-gram identification algorithms.
Any help would be much appreciated.
It sounds like there are two possible questions, you should clarify which one:
The 'text file' contains probability values and "n" and you build the matrix directly, but how to code it? This question is trivial, so let's disregard it
The 'text file' contains something like signal data and you want to model it as a Markov Chain.
'Markov Chain' generally refers to a first order stochastic process, so I'm not sure then what you mean by "order", probably the size of the matrix, but that is not typical terminology. Anyway, for 1st-order, n x n matrix, discrete time random process, you should look at Viterbi Algorithm: http://en.wikipedia.org/wiki/Viterbi_algorithm
Whenever dealing with Markov Models, I tend to end up looking at crm114 Discriminator. One, he goes into great detail about what different models there actually are (Markov isn't always the best, depending on what the application is) and provides general links and lots of background information on how probabilistic models work. While crm114 is generally used as some sort of SPAM identification tool, it is actually a more generic probability engine that I have used in other applications.

String search algorithms

I have a project on benchmarking String Matching Algorithms and I would like to know if there is a standard for every algorithm so that I would be able to get fair results with my experimentation. I am planning to use java's system.nanotime in getting the running time of every algorithm. Any comment or reactions regarding my problem is very much appreciated. Thanks!
I am not entirely sure what you're asking. However, I am guessing you are asking how to get the most realistic results. You need to run your algorithm hundreds, or even thousands of iterations to get an average. It is also very important to turn off any caching that your language may do, and don't reuse objects, unless it is part of your algorithm.
I am not entirely sure what you're asking. However, another interpretation of what you are asking can be answered by trying to work out how a given algorithm performs as you increase the size of the problem. Using raw time to compare algorithms at a given string size does not necessarily allow for accurate comparison. Instead, you could try each algorithm with different string sizes and see how the algorithm behaves as string size varies.
And Mark's advice is good too. So you are running repeated trials for many different string lengths to get a picture of how one algorithm works, then repeating that for the next algorithm.
Again, it's not clear what you're asking, but here's another thought in addition to what Tony and Mark said:
Be very careful about testing only "real" input or only "random" input. Some algorithms are tuned to do well on typical input (searching for a word in English text), while others are tuned for working well on pathologically hard cases. You'll need a huge mix of possible inputs of all different types and sizes to do a truly good benchmark.

Resources