My copy of The Design and Analysis of Computer Algorithms has arrived today. In the first chapter, the author introduced Turing Machines. I have two other algorithms textbooks, Introduction to Algorithms and The Algorithm Design Manual, but none of them talks about Turing machines, even though they are famous on the subject of algorithms and data structures.
I would like to understand What is the relation between Turing Machine and Algorithm/Datastructure. Is is really important to understand Turing machines to become expert in algorithms?
Turing machines are just theoretical tools to analyze computation, ie. we can specify an algorihm by creating a turing machine which computes it. They are very useful in the study of computability, that is, if it is possible at all to compute a function. Turing machines and several other formal language constructs are discuessed in the classic book by Hopcroft and Ullmann. Turing machines also appear when discussing NP-completeness for instance in this book, by Garey and Johnson.
Both books and turing machines in general are pretty theoretical. If you are interested in algorihhms in an academic way, I'd recommend them. However, if you want a practical understanding of algorihms implemented on real computers and run on real data then I'd say that it's only important to have a cursory understanding of turing machines.
The reason that Turing machines are of importance when describing data structures and algorithms is that they provide a mathematical model in which we can describe what an algorithm is. Most of the time, algorithms are described using high-level language or pseudocode. For example, I might describe an algorithm to find the maximum value in an array like this:
Set max = -infinity
For each element in the array:
If that element is greater than max:
Set max equal to that element.
From this description it's easy to see how the algorithm works, and it would be easy to translate it into source code. However, suppose that I had written out this description:
Guess the index at which the maximum element occurs.
Output the element at that position.
Is this a valid algorithm? That is, can we say "guess the index" and rigorously define what it means? If we do allow this, how long will it take to do this? If we don't allow it, why not? What's different about the first description from the second?
In order to have a mathematically rigorous definition of an algorithm, we need to have some formal model of how a computer works and what it can and cannot do. The Turing machine is one common way to formally define computation, though there are others that can be used as well (register machines, string rewriting systems, Church's lambda calculus, etc.) Once we have defined a mathematical model of computation, we can start talking about what sorts of algorithmic descriptions are valid - namely, those that could be implemented using our model of computation.
Many modern algorithms depend critically on the properties of the underlying model of computation. For example, cache-oblivious algorithms assume that the model of computation has some memory buffer of an unknown size and a two-tiered memory. Some algorithms require that the underlying machine be transdichotomous, meaning that the size of a machine word must be at least large enough to hold the size of any problem. Randomized algorithms require a formal definition of randomess and how the machine can use random values. Nondeterministic algorithms require a means of specifying a nondeterministic computation. Algorithms based on circuits must know what circuit primitives are and are not allowed. Quantum computers need a formal definition of what operations are and are not allowed, along with what the definition of an algorithm is given that the output is probabilistic. Distributed algorithms need a formal definition of communication across machines.
In short, it's important to be explicit about what is and is not allowed when describing an algorithm. However, to be a good programmer or to have a solid grasp of algorithms, you don't need to necessarily know Turing machines inside and out, nor do you need to know about the specific details of how you'd encode particular problems using them. What you should know, though, is what the model of computation can and cannot do, and what the cost is per operation. This way, you can reason about how efficient algorithms are, how much of various resources (time, space, memory, communication, randomess, nondeterminism, etc.) they use. But that said, don't panic if you don't understand the underlying model.
There is one other reason to think about the underlying model of computation - discussing its limitations. Every model of computation has its limits, and in some cases you can prove that certain algorithms cannot possibly exist for certain problems, or that any algorithm that would solve some problem necessarily must use some amount of a given resource. The most common example where this comes up in algorithm design the notion of NP-hardness. Some problems are conjectured to be extremely "difficult" to solve, but the formal definitions of what this difficulty is relies on knowledge of Turing machines and nondeterministic Turing machines. Understanding the model is useful in this case because it allows you to reason about the computational feasibility of certain problems.
Hope this helps!
Related
It seems to be a matter of computer science lore that data and computation (or data and process, whatever you want to call it) are in some vague sense duals of each other: data is generated by computation but also guides future computation, and so the two are, vaguely, two sides of the same coin. This duality is more apparent in programming languages like Lisp which purposefully blur the line between the two.
I'm wondering whether this notion of duality has been studied in complexity theory in a rigorous setting. For instance, are there any computational models in which this duality arises naturally out of some deeper duality intrinsic to the model? For instance--and this is wishful thinking bordering on the nonsensical--if, say, we equated data with the states of a DFA and process with the DFA's transition function, and then the graph-dual of the DFA would yield another DFA related to the original in some meaningful way, then the data/computation duality would emerge naturally from the underlying model.
That sort of thing. Any pointers to research in the area (or even just keywords) are appreciated.
Recently, I was reading these books about algorithms, specifically the section about analysis of algorithms:
Introduction to Algorithms. 3rd ed. TCRC
Algorithm Design Manual. 2nd ed. S. Skiena
Algorithm Design. J.Kleinberg & Eva Tardos
Algorithms. 4th ed. R. Sedgewick
Algorithms. S. Dasgupta, C. Papadimitriou & Vazirani
a few other books
After that, I got a bit confused because I don't fully understand the origin of counting steps of algorithms.
I mean, in Introduction to Algorithms and Algorithm Design Manual, something called the RAM model of computation is mentioned. In these books, it is said that under that model we count steps, but in the others books a model of computation as such is not mentioned.
The other books talk about counting steps of the path that the algorithm travels, that is, in a common sense way or in a logical way. So, I would appreciate if you guys could help me with these questions:
What's the relationship(or difference) between the step count method (other books) and using a model of computation (TCRC & S. Skiena) to do it?
When someone talks about counting steps to analyze algorithms, may I assume he is referring to using a model of computation(RAM)?
Our common sense is based on a model of computation that can be implicit or explicit. Usually in an introductory course it is left implicit. Explicitly what you use is usually the RAM model. Which is based on the idea of sequential processing, where each simple operation takes constant time. So you just count steps.
You can find a formal description of that model at http://people.seas.harvard.edu/~cs125/fall14/lec6.pdf.
Reality is, of course, rather different. As https://gist.github.com/jboner/2841832 shows, operations take wildly different amounts of time. I've personally seen jobs go from 5 days to 1 hour by switching to using a sort instead of hash lookups. Yes, hash lookups are O(1), but with a horrible constant when data is backed by disk. Distributed computing has things operating in parallel. Computing on a GPU gives you a tremendous amount of parallelism..as long as all computation operates in perfect lockstep. We are trying to build quantum computers, which can theoretically give would give us many, many orders of magnitude more parallelism..at the cost of losing irreversible operations like "if".
We can create models that deal with all of this complexity. But there is no need to consider any of it until you understand the basics. Which is the standard "count operations" thing from the RAM model.
I have a very basic and general doubt related to algorithm design. I've learnt basic algorithm and now learning randomized algorithm. Everywhere I observed that a professor mostly focuses on designing the algorithm that will ultimately try to reduces the complexity.
The usual way(What I observed) is to learn some basic(or an older) algorithm which behaves badly in terms of complexity and so the objective is to modify that older one with a newer algorithm which should focus on reducing the complexity, without affecting the output.
But in most of algorithm I've studied, especially distributed algorithms (in distributed operating systems) such as algorithms for distributed mutual exclusion, distributed deadlock detection etc., what I observed is that(and mostly I think that) the design of the algorithm should not focus only on complexity enhancement but it should focus on the perfection of the algorithm as well.
Lets take an example of distributed mutual exclusion algorithm. The very basic algorithm is a Lamport's algorithm and the modified version(by enhancing the complexity) of it is the Ricart-Agarwala algorithm. Since in distributed environment the communication is mostly by means of message passing, for distributed mutual exclusion we have three kinds of messages : a) Request critical resource b) Reply the request c) Release critical resource. The basic algorithm uses extra release messages(to inform all sites that the my site has released the critical resource, so you can enter). But in the advanced version what they did is they discarded these release messages by accommodating it in reply messages. And so they came up with some reduced complexity solution.
But when I tried the implementation of these algorithms in java, I observed that even if the complexity of basic algorithm was bit higher but it was more perfect than the advanced one. Because by reducing the number of messages transferred (in advanced solution), local site is no longer aware of the fact that remote site has actually released the resource or not because on the confirmation of release message only site updates its local data structures such as request queue etc. If we don't send any explicit notification for release, then requests remains pending unnecessarily in request queue of the local site for entire run.
So my doubt is that if enhancement of complexity is so important, why can't perfection ? I mean if algorithm is producing perfect results at the cost of bit higher complexity then how does it matters as far as I am getting perfection in output as compared to the enhanced complexity solution which lacks in perfection ?
Note : By perfection I don't mean correct/incorrect results. Results are always correct. Only the perfection or accuracy of the produced result varies.
Principally a fair complexity comparision is done for two algoritms that produce exactly the same output. E.g sorting.
In your case it is different, you describe algoritms with different behaviour.
To choose the better suited algorithm many factors decide:
Ease of implementations (in praxis very important)
A faster algorithm, that lacks some functionallity like in your case must be incredible faster (faktor 10 on expected data volume) to choose it, or easier to implement.
robustness: well know algo, successfuly used since 10 years, or a new algo from a paper where chance are high that it works only the environment (optimized for the algo) by the scientist. (I know such a case for a telecom network algo)
Consider any NP-complete problem (e.g. the travelling salesman problem).
There are no known non-exponential exact algorithms for these problems (except in special cases), so it would literally take years (or much longer) to find an exact solution for any reasonably-sized version of these problems.
So, instead we use heuristics and approximations (and possibly some randomness) to get a non-exact solution in a reasonable time-frame.
NP-complete problems are just an extreme example - we can also just have a few seconds to get a solution (for whatever reason), but finding an exact solution will take a few minutes. So it all comes down to balancing out how long we want to run the algorithm for and how good we want the results to be (and development time also certainly plays a role).
I hope I understood what you were asking correctly and that this helps.
Instead of "perfection", maybe you should consider "fitness for a particular purpose".
For your example of a distributed mutual exclusion algorithm, consider the "simple" and "improved" algorithms from different viewpoints. As another answer pointed out, the two algorithms behave differently; my point is that different people are interested in different aspects of that behavior.
Someone using an algorithm for a particular purpose probably does not care about all aspects of its behavior. For your example, you are concerned about pending resource locks. However, if the mutual exclusion algorithm is expected to be running all the time, the user might not care, because the locks will be returned promptly anyway, while using fewer messages than the simple version. If you want both efficiency and promptness, there is likely some way to accommodate both -- at the cost of greater complexity -- and if you're looking for practical "perfection", this is the logical endpoint.
A computer scientist does not know how his algorithm might be used. In general, he cannot anticipate all possible variations on a particular technique, and you would not want to read them all if he could! When publishing an algorithm, clarity of expression is the "perfection" you're pursuing -- the idea should be described as simply as possible.
Given a data structure specification such as a purely functional map with known complexity bounds, one has to pick between several implementations. There is some folklore on how to pick the right one, for example Red-Black trees are considered to be generally faster, but AVL trees have better performance on work loads with many lookups.
Is there a systematic presentation (published paper) of this knowledge (as relates to sets/maps)? Ideally I would like to see statistical analysis performed on actual software. It might conclude, for example, that there are N typical kinds of map usage, and list the input probability distribution for each.
Are there systematic benchmarks that test map and set performance on different distributions of inputs?
Are there implementations that use adaptive algorithms to change representation depending on actual usage?
These are basically research topics, and the results are generally given in the form of conclusions, while the statistical data is hidden. One can have statistical analysis on their own data though.
For the benchmarks, better go through the implementation details.
The 3rd part of the question is a very subjective matter, and the actual intentions may never be known at the time of implementation. However, languages like perl do their best to implement highly optimized solutions to every operation.
Following might be of help:
Purely Functional Data Structures by Chris Okasaki
http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
I'd like to pose a few abstract questions about computer vision research. I haven't quite been able to answer these questions by searching the web and reading papers.
How does someone know whether a computer vision algorithm is correct?
How do we define "correct" in the context of computer vision?
Do formal proofs play a role in understanding the correctness of computer vision algorithms?
A bit of background: I'm about to start my PhD in Computer Science. I enjoy designing fast parallel algorithms and proving the correctness of these algorithms. I've also used OpenCV from some class projects, though I don't have much formal training in computer vision.
I've been approached by a potential thesis advisor who works on designing faster and more scalable algorithms for computer vision (e.g. fast image segmentation). I'm trying to understand the common practices in solving computer vision problems.
You just don't prove them.
Instead of a formal proof, which is often impossible to do, you can test your algorithm on a set of testcases and compare the output with previously known algorithms or correct answers (for example when you recognize the text, you can generate a set of images where you know what the text says).
In practice, computer vision is more like an empirical science: You gather data, think of simple hypotheses that could explain some aspect of your data, then test those hypotheses. You usually don't have a clear definition of "correct" for high-level CV tasks like face recognition, so you can't prove correctness.
Low-level algorithms are a different matter, though: You usually have a clear, mathematical definition of "correct" here. For example if you'd invent an algorithm that can calculate a median filter or a morphological operation more efficiently than known algorithms or that can be parallelized better, you would of course have to prove it's correctness, just like any other algorithm.
It's also common to have certain requirements to a computer vision algorithm that can be formalized: For example, you might want your algorithm to be invariant to rotation and translation - these are properties that can be proven formally. It's also sometimes possible to create mathematical models of signal and noise, and design a filter that has the best possible signal to noise-ratio (IIRC the Wiener filter or the Canny edge detector were designed that way).
Many image processing/computer vision algorithms have some kind of "repeat until convergence" loop (e.g. snakes or Navier-Stokes inpainting and other PDE-based methods). You would at least try to prove that the algorithm converges for any input.
This is my personal opinion, so take it for what it's worth.
You can't prove the correctness of most of the Computer Vision methods right now. I consider most of the current methods some kind of "recipe" where ingredients are thrown down until the "result" is good enough. Can you prove that a brownie cake is correct?
It is a bit similar in a way to how machine learning evolved. At first, people did neural networks, but it was just a big "soup" that happened to work more or less. It worked sometimes, didn't on other cases, and no one really knew why. Then statistical learning (through Vapnik among others) kicked in, with some real mathematical backup. You could prove that you had the unique hyperplane that minimized a particular loss function, PCA gives you the closest matrix of fixed rank to a given matrix (considering the Frobenius norm I believe), etc...
Now, there are still a few things that are "correct" in computer vision, but they are pretty limited. What comes to my mind is the wavelet : they are the sparsest representation in an orthogonal basis of function. (i.e : the most compressed way to represent an approximation of an image with minimal error)
Computer Vision algorithms are not like theorems which you can prove, they usually try to interpret the image data into the terms which are more understandable to us humans. Like face recognition, motion detection, video surveillance etc. The exact correctness is not calculable, like in the case of image compression algorithms where you can easily find the result by the size of the images.
The most common methods used to show the results in Computer Vision methods(especially classification problems) are the graphs of precision Vs recall, accuracy Vs false positives. These are measured on standard databases available on various sites. Usually the harsher you set the parameters for correct detection, the more false positives you generate. The typical practice is to choose the point from the graph according to your requirement of 'how many false positives are tolerable for the application'.