Why do we define equivalent turing machines as two turing machines with the same accepted languages? - logic

From many textbooks about computability, I see how we define equivalent turing machines as follows:
Two turing machines TM1 and TM2 are equivalent <=> L(TM1) = (TM2)
where L(TM1) is the languages accpeted by TM1, i.e. L(TM1) = {w | TM1(w) = accept}, and so is L(TM2).
My question is: why does the above definition totally ignore the 'rejected languages' {w | TM1(w) = reject} and 'loop languages'{w | TM1(w) goes into infinite loop}?
Intuitively, TM1 and TM2 are equivalent if and only if their input-output relation are exactly the same, which means they output the same thing, or react in the same manner for any possible input. Say, if L(TM1) = L(TM2), then it is still possible that these exists some w0 such that TM1(w0)=reject while TM2(w0) goes into infinite loop, and thus we conclude that TM1 and TM2 are indeed not equivalent. However, according to the definition on textbooks, they are equivalent, which is not intuitive.
I try to obtain some explanation of the counterintuitive definition of equivalent Turing machines on textbooks.

Generally speaking the basic definition of Turing machines does not have a (explicit) notion of reject (see https://en.wikipedia.org/wiki/Turing_machine).
Then, there are many extensions along different directions that lead to slightly different formalizations. In fact, you can label a particular state as the reject state.
Notice that this extension does not change the expressive power of the computational model (see below).
Each extension is better suited to model different scenarios and serve different goals.
In some scenarios it could make perfect sense to employ a definition of equivalence that explicitly distinguishes between accepted and rejected words as you suggest.
I would like to point out that equivalence of Turing machines is all about the accepted languages. Internally, the two machines may behave in a completely different manner.
In addition, it is not completely clear what the output of a Turing machine is; they simply execute on some input and they either accept it or they don't.
Of course, in the basic case in which we define Turing machines without explicit reject states it makes sense to define the language of the machine as the set of words that the machine accepts.
In addition, this is consistent with the definition used in automata theory.
In a more formal manner you can consider the following reduction sketch to compare the two settings.
Let EM0 and EM1 be two Turing machines with an explicit reject state.
Assume we want to solve the problem of deciding whether the two machines accept and reject the same words.
We can reduce this problem to deciding the equivalence of two Turing machines (without explicit reject state).
Define M0 and M1 from EM0 and EM1 respectively by extending the tape alphabet with two fresh symbols R (for reject) and A (for accept).
M0 accepts the language build by all the words accepted by EM0 extended with an additional A at the beginning and all the words rejected by EM0 extended with an additional R at the end.
L(M0) := {Aw | EM0(w) = accept} U {Rw | EM0(w) = reject}
Define M1 similarly such that:
L(M1) := {Aw | EM1(w) = accept} U {Rw | EM1(w) = reject}
where, using your notation, EM(w) = v means that EM on input w terminates on a v state.
Notice that I am skipping the details about how to actually build the two Turing machines with such language.
Finally, M0 is equivalent to M1 iff EM0 and EM1 accept / reject the same words.
Therefore, the standard definition of language and equivalence of Turing machines is already expressive enough to capture the accept / reject scenario you described and it is also simpler in the sense that it does not require the extra definition of rejecting states.

Related

Precision in Program analysis

According to David Brumley's Control Flow Integrity & Software Fault Isolation (PPT slide),
in the below statements, x is always 8 due to the path to the x=7 is unrealizable even with the path sensitive analysis.
Why is that?
Is it because the analysis cannot determine the values of n, a, b, and c in advance during the analysis? Or is it because there's no solution that can be calculated by a computer?
if(a^n + b^n = c^n && n>2 && a>0 && b>0 && c>0)
x = 7; /unrealizable path/
else
x = 8;
In general, the task to determine which path in the program is taken, and which — not, is undecidable. It is quite possible that a particular expression, as in your example, can be proved to have a specific value. However, the words "in general" and "undecidable" say that you cannot write an algorithm that would be able to compute the value every time.
At this point the analysis algorithm can be optimistic or pessimistic. The optimistic one could pick 8 and be fine — it considers possible that at run-time x would get this value. It could also pick 7 — "who knows, maybe, x would be 7". But if the analysis is required to be sound, and it cannot determine the value of the condition, it should assume that the first branch could be taken during one execution, and the second branch could be taken during another execution, so x could be either 7 or 8.
In other words, there is a trade-off between soundness and precision. Or, actually, between soundness, precision, and decidability. The latter property tells if the analysis always terminates. Now, you have to pick what is needed:
Decidability — this is a common choice for compilers and code analyzers, because you would like to get an answer about your program in finite time. However, proof assistants could start some processes that could run up to the specified time limit, and if the limit is not set, forever: it's up to the user to stop it and to try something else.
Soundness — this is a common choice for compilers, because you would like to get the answer that matches the language specification. Code analyzers are more flexible. Many of them are unsound, but because of that they can find more potential issues in finite time, leaving the interpretation to the developer. I believe the example you mention talks about sound analysis.
Precision — this is a rare property. Compilers and code analyzer should be pessimistic, because otherwise some incorrect code could sneak in. But this might be parameterizable. E.g., if the compiler/analyzer supports constant propagation and folding, and all of the variables in the example are set to some known constants before the condition, it can figure out the exact value of x after it, and be completely precise.

Why are epsilon transitions used in NFA?

I'm trying to understand how to create NFA-s from regular expressions, but I am really confused from epsilon transitions. I have this example in my textbook , but I don't understand why epsilon transitions are used and how does one know when to use them.
In general, espilon-transitions are used when they are convenient. For example, when constructing an NFA from a regular expression, you start by constructing small parts of the automaton corresponding to parts of the expression. To connect them, you need to put a transition. But if there is no symbol to be read there, an epsilon transition is a simple way to do this. They are, however never necessary, you can always find a solution without them.
In your example, just apply the algorithm described in your textbook. It tells you when to use them.
The epsilon transitions
from 1 to 2 probably connects the parts for (a|b)* and for ac
1->5 and 8->1 probably result from the *
5->6 and 5->7 probably result from the alternative in |
Epsilon-transitions in NFAs are a natural representation of choice or disjunction or union in regular expressions. That is, a regular expression like r + s (or r | s or r U s depending on your preferred notation) is naturally represented as an NFA consisting of two independent NFAs, one for r and one for s, joined using e-transitions as follows:
e
----->q0----->(r)
|
| e
|
V
(s)
When used to connect states in more complicated ways, the effect may not be as easy or natural to describe, but essentially these transitions let you choose unconditionally among multiple options. So, if I have seen a part of the input already and there are a few different ways the string could end, I can represent that by using e-transitions to states that handle the different possibilities.
In your example, the e-transitions are not really serving any very useful function and are merely artifacts of the conversion algorithm you have used. That algorithm includes them because, in the general case, they may be useful or necessary. In your specific case this was not true, so they look out of place.

"haltingproblem" Contradiction Proof

I recently came across the halting problem contradiction proof.
In the proof, we have to feed the Turing machine a copy of the program and a copy of the input to decide whether that program halts on the input. In the contradiction, why does it have to be the program as the program and the input? Sorry if it sounds confusing. I can simply feed the machine with a program and a random input and come to the same conclusion.
Can anyone tell me why? Is there a specific reason I didn't think of?
First let me come back on the proof itself.
HALT_TM is undecidable
Assume that any machine has a description which takes the form of a string. Let HALT_TM = {<M, w>| M is a TM and M halts on input w}, and A_TM = {<M,w>| M is a TM and accepts w}. Here I assume that we know that A_TM is undecidable (the proof can be done by diagonalization and realizing that as there are more languages than Turing Machines, and as a given TM only decide one language, then some language are not decided).
Assume by contradiction that HALT_TM is decidable, meaning that we dispose of a decider D for this language. Then we would be able to build a machine M which decides A_TM. On input <M', w>, M does the following:
Run D on input <M',w>
If D reject, reject, otherwise run M' on w (until it halts, which we know because D did not reject!)
If M' accepts, accept, if it rejects, reject.
There we see the contradiction with our assumption
Universal Turing Machines
Now the core of your question: you actually feed M any valid machine description M', not necessarily <M> itself. Remember that a TM and "a program" are actually equivalent: See this nice answer for more details. Quoting this same answer: "A Turing Machine is the formal analogue of an algorithm".
One power of Turing Machines is that they can be encoded as a string, allowing another Turing Machine (called an "Universal Turing Machine") to execute them. Because the given machine is an algorithm, you see that you are actually feeding your "top-level" TM a program, and a input of your choice.

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

Writing a program that writes a program

Its well known in theoretical computer science that the "Hello world tester" program is an undecidable problem.(Here is a link what i mean by hello world tester
My question is since given a program as input we can't say what the program will do,can we solve the reverse problem:
Given set of input and output,is there an algorithm for writing a program that writes a program to achieve a one to one mapping between the given input and output.
I know about metaprogramming but my question is more of theoretical interest. Something which can apply for a generic case.
With these kind of musings one has to be very careful. A lot of confusion arises from not clearly distinguishing about a program x for which proposition P(x) holds or any program x for which proposition P(x) hold. As long as the set of programs for which P(x) holds is finite there always is a proof, of their correctness (although this proof may not be known).
At this point you also have to distinguish between programs, which are and can be known and programs which can only be shown to exist by full enumeration of all posibilities. Let's make an example:
Take 10 Programs, which take no input and may or may not terminate and produce "hello World". Then there is a program which decides exactly which of these programs are correct, and which aren't. Lets call these programs (x_1,...,x_10). Then take the programs (X_0,...,X_{2^10}) where X_i output true for program x_j if the j-th bit in the binary representation of i is set. One of these programs has to be the one which decides correctly for all ten x_i, there just might not be any way to ever figure out which one of these 100 X_js is the correct one (a meta-problem at this point).
This goes to show that considering finite sets of programs and input/output pairs one can always resolve to full enumeration and all halting-problem type of paradoxies instantly disappear. In your case the set of generated programs for each input is of size one and the set of input/output pairs is of finite size (because you have to supply it to the meta-program). Hence full enumeration solves your problem very simple and you can also easily proof both the correctness of the corrected program as well as the correctness of the meta-program.
Note: Since the set of generated programs is infinite, this is one of the few cases where you can proof P(x) for a infinite set of programs (actually you can proof P(x,input,output) for this set). This shows that the set being infinite is only a necessary, not a sufficient condition for this type of paradoxes to appear.
Your question is ambiguously phrased.
How would you specify "what a program should do"?
Any precise, complete, and machine-readable specification of a program's functionality is already a program.
Thus, the answer to your question is, a compiler.
Now, you're asking how to find a function based on a sample of its input and output.
That is a question about statistical analysis that I cannot answer.
Sounds like you want to generate a state machine that learns by being given an input sequence and then updates itself to produce the appropriate output sequence. Assuming your output sequences are always the same for the same input sequence it should be simple enough to write. If your output is not deterministic, such as changing the output depending on the time of day, then you cannot automatically generate a state machine.
Depends on what you mean by "one to one mapping". (And also, I suppose, "input" and "output".)
My guess is that you're asking whether, given an example of inputs and outputs for a given arbitrary program, can one devise an algorithm to write an equivalent program? If so, the answer is no. Eg, you could have a program with the inputs/outputs of 1/1, 2/2, 3/3, 4/4, and yet, if the next input value was 5, the output would be 3782. There's no way to know, from a given set of results, what the next result might be.
The question is underspecified since you did not say how the input and output are presented. For finite lists, the answer is "yes", as in this Python code:
def f(input,output):
print "def g():"
print " x = {" + ",".join(repr(x) + ":" + repr(y) for x,y in zip(input,output)) + "}"
print " print x[raw_input()]"
>>> f(['2','3','4'],['a','b','x'])
def g():
x = {'2':'a','3':'b','4':'x'}
print x[raw_input()]
>>> def g():
... x = {'2':'a','3':'b','4':'x'}
... print x[raw_input()]
...
>>> g()
3
b
for infinite sets how are you going to present them? If you show only a small sample of input this does not specify the whole algorithm. Guessing the best fit is undecidable. If you have a "magic blackbox" then there are continuum many mappings but only a countable number of programs, so it's impossible.
I think I agree with SLaks, but taking things from a different angle, what does a compiler do?
(EDIT: I see SLaks edited his original answer, which was essentially 'you're describing the identity function').
It takes a program in one source language that describes the intended behaviour of a program, and "writes" another program in a target language that exhibits that behaviour.
We could also think of this in terms of things like process refinement --- given an abstract specification, we can construct a refinement mapping to some "more concrete" (read: less non-deterministic, usually) implementation.
But based on your question, it's really very difficult to tell which of these you meant, if any.

Resources