Why don't Standardize variables just violate completeness of resolution - logic

I have been reading some notes on converting first order logic (FOL) sentences to conjunctive normal form (CNF), and then performing resolution.
One of the steps of converting to CNF, is Standardize variables.
I have been searching to find an why complete condition of resolution algorithm violate and soundness doesn't violate, if i don't standardize variables.
anyone could add details why just violate completeness, and soundness is remain?

Here is an example that may help you visualize this. Suppose your theory is
(for all X : nice(X)) or (for all X : smart(X)) (1)
which, if you standardize apart, will result in the CNF:
nice(X) or smart(Y)
that is, everybody in a population is nice, or everybody in a population is smart, or both.
Dropping standardize apart will produce a CNF
nice(X) or smart(X)
which is equivalent to
(for all X : nice(X) or smart(X)) (2)
that is, for each individual in a population, that individual is nice, or smart, or both.
This theory (2) is saying less, it is weaker, than the original theory (1), because it admits a situation in which it is not true that everybody is nice, and it is not true that everybody is smart, but it is true that each individual is one or the other or both. In other words, (2) does not imply (1), but (1) implies (2) (if the whole population is nice, then each individual is nice; if the whole population is smart, then each individual is smart; therefore, each individual is either nice or smart). The set of possible worlds accepted by (2) is strictly larger than the set of possible worlds accepted by (1).
What does this say about completeness and soundness?
Using (2) is not complete because I can show you a counter-example of a true theorem in (1) that is not proven true when using (2). Consider the theorem "If John is not nice but smart, then Liz is smart". This is true given (1) because if the whole population shares one of those two qualities, then it must be "smart" since John is not nice, so Liz (and everyone else) must also be smart. However, given (2) this is not true anymore (it could be that Liz is nice but not smart, and everybody else is one or the other, which still would be ok given (2)). So I can no longer prove a true theorem using (2) (after dropping standardize apart), and therefore my inference is not complete.
Now suppose I prove some theorem T to be true, using (2). This means T holds in every possible world in which (2) holds, including the sub-set of them that hold in (1) (according to the third paragraph above from here). Therefore T is true in (1) as well, so performing inference using (2) is still sound.
In a nutshell: when you do not standardize apart, you "join" statements about entire domains (populations) and make them about individuals, which will be weaker and will not imply some facts that were implied before; they will be lost and your procedure will not be complete.

Related

Precision in Program analysis

According to David Brumley's Control Flow Integrity & Software Fault Isolation (PPT slide),
in the below statements, x is always 8 due to the path to the x=7 is unrealizable even with the path sensitive analysis.
Why is that?
Is it because the analysis cannot determine the values of n, a, b, and c in advance during the analysis? Or is it because there's no solution that can be calculated by a computer?
if(a^n + b^n = c^n && n>2 && a>0 && b>0 && c>0)
x = 7; /unrealizable path/
else
x = 8;
In general, the task to determine which path in the program is taken, and which — not, is undecidable. It is quite possible that a particular expression, as in your example, can be proved to have a specific value. However, the words "in general" and "undecidable" say that you cannot write an algorithm that would be able to compute the value every time.
At this point the analysis algorithm can be optimistic or pessimistic. The optimistic one could pick 8 and be fine — it considers possible that at run-time x would get this value. It could also pick 7 — "who knows, maybe, x would be 7". But if the analysis is required to be sound, and it cannot determine the value of the condition, it should assume that the first branch could be taken during one execution, and the second branch could be taken during another execution, so x could be either 7 or 8.
In other words, there is a trade-off between soundness and precision. Or, actually, between soundness, precision, and decidability. The latter property tells if the analysis always terminates. Now, you have to pick what is needed:
Decidability — this is a common choice for compilers and code analyzers, because you would like to get an answer about your program in finite time. However, proof assistants could start some processes that could run up to the specified time limit, and if the limit is not set, forever: it's up to the user to stop it and to try something else.
Soundness — this is a common choice for compilers, because you would like to get the answer that matches the language specification. Code analyzers are more flexible. Many of them are unsound, but because of that they can find more potential issues in finite time, leaving the interpretation to the developer. I believe the example you mention talks about sound analysis.
Precision — this is a rare property. Compilers and code analyzer should be pessimistic, because otherwise some incorrect code could sneak in. But this might be parameterizable. E.g., if the compiler/analyzer supports constant propagation and folding, and all of the variables in the example are set to some known constants before the condition, it can figure out the exact value of x after it, and be completely precise.

Prolog predicate arguments: readability vs. efficiency

I want to ask pros and cons of different Prolog representations in arguments of predicates.
For example in Exercise 4.3: Write a predicate second(X,List) which checks whether X is the second element of List. The solution can be:
second(X,List):- [_,X|_]=List.
Or,
second(X,[_,X|_]).
The both predicates would behave similarly. The first one would be more readable than the second, at least to me. But the second one uses more stacks during the execution (I checked this with trace).
A more complicated example is Exercise 3.5: Binary trees are trees where all internal nodes have exactly two children. The smallest binary trees consist of only one leaf node. We will represent leaf nodes as leaf(Label) . For instance, leaf(3) and leaf(7) are leaf nodes, and therefore small binary trees. Given two binary trees B1 and B2 we can combine them into one binary tree using the functor tree/2 as follows: tree(B1,B2) . So, from the leaves leaf(1) and leaf(2) we can build the binary tree tree(leaf(1),leaf(2)) . And from the binary trees tree(leaf(1),leaf(2)) and leaf(4) we can build the binary tree tree(tree(leaf(1), leaf(2)),leaf(4)). Now, define a predicate swap/2 , which produces the mirror image of the binary tree that is its first argument. The solution would be:
A2.1:
swap(T1,T2):- T1=tree(leaf(L1),leaf(L2)), T2=tree(leaf(L2),leaf(L1)).
swap(T1,T2):- T1=tree(tree(B1,B2),leaf(L3)), T2=tree(leaf(L3),T3), swap(tree(B1,B2),T3).
swap(T1,T2):- T1=tree(leaf(L1),tree(B2,B3)), T2=tree(T3,leaf(L1)), swap(tree(B2,B3),T3).
swap(T1,T2):- T1=tree(tree(B1,B2),tree(B3,B4)), T2=tree(T4,T3), swap(tree(B1,B2),T3),swap(tree(B3,B4),T4).
Alternatively,
A2.2:
swap(tree(leaf(L1),leaf(L2)), tree(leaf(L2),leaf(L1))).
swap(tree(tree(B1,B2),leaf(L3)), tree(leaf(L3),T3)):- swap(tree(B1,B2),T3).
swap(tree(leaf(L1),tree(B2,B3)), tree(T3,leaf(L1))):- swap(tree(B2,B3),T3).
swap(tree(tree(B1,B2),tree(B3,B4)), tree(T4,T3)):- swap(tree(B1,B2),T3),swap(tree(B3,B4),T4).
The number of steps of the second solution was much less than the first one (again, I checked with trace). But regarding the readability, the first one would be easier to understand, I think.
Probably the readability depends on the level of one's Prolog skill. I am a learner level of Prolog, and am used to programming with C++, Python, etc. So I wonder if skillful Prolog programmers agree with the above readability.
Also, I wonder if the number of steps can be a good measurement of the computational efficiency.
Could you give me your opinions or guidelines to design predicate arguments?
EDITED.
According to the advice from #coder, I made a third version that consists of a single rule:
A2.3:
swap(T1,T2):-
( T1=tree(leaf(L1),leaf(L2)), T2=tree(leaf(L2),leaf(L1)) );
( T1=tree(tree(B1,B2),leaf(L3)), T2=tree(leaf(L3),T3), swap(tree(B1,B2),T3) );
( T1=tree(leaf(L1),tree(B2,B3)), T2=tree(T3,leaf(L1)), swap(tree(B2,B3),T3) );
( T1=tree(tree(B1,B2),tree(B3,B4)), T2=tree(T4,T3), swap(tree(B1,B2),T3),swap(tree(B3,B4),T4) ).
I compared the number of steps in trace of each solution:
A2.1: 36 steps
A2.2: 8 steps
A2.3: 32 steps
A2.3 (readable single-rule version) seems to be better than A2.1 (readable four-rule version), but A2.2 (non-readable four-rule version) still outperforms.
I'm not sure if the number of steps in trace is reflecting the actual computation efficiency.
There are less steps in A2.2 but it uses more computation cost in pattern matching of the arguments.
So, I compared the execution time for 40000 queries (each query is a complicated one, swap(tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),leaf(4)),leaf(5))),tree(tree(leaf(3),tree(tree(leaf(3),leaf(4)),leaf(5))),tree(tree(tree(tree(leaf(3),leaf(4)),leaf(5)),leaf(4)),leaf(5)))), _). ). The results were almost the same (0.954 sec, 0.944 sec, 0.960 sec respectively). This is showing that the three reresentations A2.1, A2.2, A2.3 have close computational efficiency.
Do you agree with this result? (Probably this is a case specific; I need to vary the experimental setup).
This question is a very good example of a bad question for a forum like Stackoverflow. I am writing an answer because I feel you might use some advice, which, again, is very subjective. I wouldn't be surprised if the question gets closed as "opinion based". But first, an opinion on the exercises and the solutions:
Second element of list
Definitely, second(X, [_,X|_]). is to be preferred. It just looks more familiar. But you should be using the standard library anyway: nth1(2, List, Element).
Mirroring a binary tree
The tree representation that the textbook suggests is a bit... unorthodox? A binary tree is almost invariably represented as a nested term, using two functors, for example:
t/3 which is a non-empty tree, with t(Value_at_node, Left_subtree, Right_subtree)
nil/0 which is an empty tree
Here are some binary trees:
The empty tree: nil
A binary search tree holding {1,2,3}: t(2, t(1, nil, nil), t(3, nil, nil))
A degenerate left-leaning binary tree holding the list [1,2,3] (if you traversed it pre-order): t(1, t(2, t(3, nil, nil), nil), nil)
So, to "mirror" a tree, you would write:
mirror(nil, nil).
mirror(t(X, L, R), t(X, MR, ML)) :-
mirror(L, ML),
mirror(R, MR).
The empty tree, mirrored, is the empty tree.
A non-empty tree, mirrored, has its left and right sub-trees swapped, and mirrored.
That's all. No need for swapping, really, or anything else. It is also efficient: for any argument, only one of the two clauses will be evaluated because the first arguments are different functors, nil/0 and t/3 (Look-up "first argument indexing" for more information on this). If you would have instead written:
mirror_x(T, MT) :-
( T = nil
-> MT = nil
; T = t(X, L, R),
MT = t(X, MR, ML),
mirror_x(L, ML),
mirror_x(R, MR)
).
Than not only is this less readable (well...) but probably less efficient, too.
On readability and efficiency
Code is read by people and evaluated by machines. If you want to write readable code, you still might want to address it to other programmers and not to the machines that are going to evaluate it. Prolog implementations have gotten better and better at being efficient at evaluating code that is also more readable to people who have read and written a lot of Prolog code (do you recognize the feedback loop?). You might want to take a look at Coding Guidelines for Prolog if you are really interested in readability.
A first step towards getting used to Prolog is trying to solve the 99 Prolog Problems (there are other sites with the same content). Follow the suggestion to avoid using built-ins. Then, look at the solutions and study them. Then, study the documentation of a Prolog implementation to see how much of these problems have been solved with built-in predicates or standard libraries. Then, study the implementations. You might find some real gems there: one of my favorite examples is the library definition of nth0/3. Just look at this beauty ;-).
There is also a whole book written on the subject of good Prolog code: "The Craft of Prolog" by Richard O'Keefe. The efficiency measurements are quite outdated though. Basically, if you want to know how efficient your code is, you end up with a matrix with at least three dimensions:
Prolog implementation (SWI-Prolog, SICSTUS, YAP, Gnu-Prolog...)
Data structure and algorithm used
Facilities provided by the implementation
You will end up having some wholes in the matrix. Example: what is the best way to read line-based input, do something with each line, and output it? Read line by line, do the thing, output? Read all at once, do everything in memory, output at once? Use a DCG? In SWI-Prolog, since version 7, you can do:
read_string(In_stream, _, Input),
split_string(Input, "\n", "", Lines),
maplist(do_x, Lines, Xs),
atomics_to_string(Xs, "\n", Output),
format(Out_stream, "~s\n", Output)
This is concise and very efficient. Caveats:
The available memory might be a bottle neck
Strings are not standard Prolog, so you are stuck with implementations that have them
This is a very basic example, but it demonstrates at least the following difficulties in answering your question:
Differences between implementations
Opinions on what is readable or idiomatic Prolog
Opinions on the importance of standards
The example above doesn't even go into details about your problem, as for example what you do with each line. Is it just text? Do you need to parse the lines? Why are you not using a stream of Prolog terms instead? and so on.
On efficiency measurements
Don't use the number of steps in the tracer, or even the reported number of inferences. You really need to measure time, with a realistic input. Sorting with sort/2, for example, always counts as exactly one inference, no matter what is the length of the list being sorted. On the other hand, sort/2 in any Prolog is about as efficient as a sort on your machine would ever get, so is that an issue? You can't know until you have measured the performance.
And of course, as long as you make an informed choice of an algorithm and a data structure, you can at the very least know the complexity of your solution. Doing an efficiency measurement is interesting only if you notice a discrepancy between what you expect and what you measure: obviously, there is a mistake. Either your complexity analysis is wrong, or your implementation is wrong, or even the Prolog implementation you are using is doing something unexpected.
On top of this, there is the inherent problem of high-level libraries. With some of the more complex approaches, you might not be able to easily judge what the complexity of a given solution might be (constraint logic programming, as in CHR and CLPFD, is a prime example). Most real problems that fit nicely to the approach will be much easier to write, and more efficient than you could ever do without considerable effort and very specific code. But get fancy enough, and your CHR program might not even want to compile any more.
Unification in the head of the predicate
This is not opinion-based any more. Just do the unifications in the head if you can. It is more readable to a Prolog programmer, and it is more efficient.
PS
"Learn Prolog Now!" is a good starting point, but nothing more. Just work your way through it and move on.
In the first way for example for Exercise 3.5 you use the rule swap(T1,T2) four times ,which means that prolog will examine all these four rules and will return true or fail for every of these four calls .Because these rules can't all be true together (each time one of them will return true) ,for every input you waste three calls that will not succeed (that's why it demands more steps and more time ). The only advantage in the above case is that by writing with the first way ,it is more readable. In generally when you have such cases of pattern matching it's better to write the rules in a way that are well defined and not two(or more) rules match a input ,if of course you require only one answer ,as for example the second way of writing the above example .
Finally one example where it is required that more than one rules match an input is the predicate member where it is written:
member(H,[H|_]).
member(H,[_|T]):- member(H,T).
where in this case you require more than one answers.
In the third way you just write the first way without pattern matching .It has the form (condition1);...;(condition4) and if the condition1 does not return true it examines the next condition .Most of the times the fourth condition returns true ,but it has called and tested condition1-3 which returned false .So it is almost as the first way of writing the solution ,except the fact that in third solution if it finds true condition1 it will not test other conditions so you will save some wasted calls (compared to solution1).
As for the running time ,it was expected to be almost the same because in worst case solution 1 and 3 does four times the tests/calls that solution 2 does .So if solution2 is O(g) complexity (for some function g) ,then solution 1 and 3 are O(4g) which is O(g) complexity so running times will be very close.

Left-recursive Grammar Identification

Often we would like to refactor a context-free grammar to remove left-recursion. There are numerous algorithms to implement such a transformation; for example here or here.
Such algorithms will restructure a grammar regardless of the presence of left-recursion. This has negative side-effects, such as producing different parse trees from the original grammar, possibly with different associativity. Ideally a grammar would only be transformed if it was absolutely necessary.
Is there an algorithm or tool to identify the presence of left recursion within a grammar? Ideally this might also classify subsets of production rules which contain left recursion.
There is a standard algorithm for identifying nullable non-terminals, which runs in time linear in the size of the grammar (see below). Once you've done that, you can construct the relation A potentially-starts-with B over all non-terminals A, B. (In fact, it's more normal to construct that relationship over all grammatical symbols, since it is also used to construct FIRST sets, but in this case we only need the projection onto non-terminals.)
Having done that, left-recursive non-terminals are all A such that A potentially-starts-with+ A, where potentially-starts-with+ is:
potentially-starts-with ∘ potentially-starts-with*
You can use any transitive closure algorithm to compute that relation.
For reference, to detect nullable non-terminals.
Remove all useless symbols.
Attach a pointer to every production, initially at the first position.
Put all the productions into a workqueue.
While possible, find a production to which one of the following applies:
If the left-hand-side of the production has been marked as an ε-non-terminal, discard the production.
If the token immediately to the right of the pointer is a terminal, discard the production.
If there is no token immediately to the right of the pointer (i.e., the pointer is at the end) mark the left-hand-side of the production as an ε-non-terminal and discard the production.
If the token immediately to the right of the pointer is a non-terminal which has been marked as an ε-non-terminal, advance the pointer one token to the right and return the production to the workqueue.
Once it is no longer possible to select a production from the work queue, all ε-non-terminals have been identified.
Just for fun, a trivial modification of the above algorithm can be used to do step 1. I'll leave it as an exercise (it's also an exercise in the dragon book). Also left as an exercise is the way to make sure the above algorithm executes in linear time.

Validity and satisfiability

I am having problems understanding the difference between validity and satisfiability.
Given the following:
(i) For all F, F is satisfiable or ~F is satisfiable.
(ii) For all F, F is valid or ~F is valid.
How do I prove which is true and which is false?
Statement (i) is true, as for all F, F will either be satisfiable, or ~F will be satisfiable (truth table). However, how do I go about solving for statement (ii)?
Any help is highly appreciated!
Aprilrocks92,
I don't blame you for being confused, because actually logicians, mathematicians, heck even those philosopher types use the words validity differently sometimes.
Trying not to overcomplicate, I'll give you a thin definition: a conclusion if valid when it is true whenever the premises are true. We also say, given a suitably defined logic, that the conclusion follows as a "logical consequence" of the premises.
On the other hand, satisfisability means that there exists a valuation of the non logical symbols in the formula F that makes the formula true in the logic.
So I should probably mention the difference between semantics and syntax to explain. The syntax of your logic is all those logical and non logical symbols, and the deductive rules that enable you to make "steps" towards proof in the logic. My definition of satisfisability above mentioned the word "valuation"- now how does that fit? Well the answer is that you need to supply a semantics: in short this is the structure that the formula of the logic are expressions of, usually given in set theory, and a valuation of a given F is a function that maps all the non logical symbols in F to sets and members of sets, which a given semantics for the logic composes into a truth value.
Hmm. I'm not sure that's the best explanation, but I don't have much time.
Either way, that should help you understand the difference. To answer your question about the difference between (i) and (ii) without giving too much away, think: what's the relationship between the two? Well given that as above an F' is true given a valuation that sends the sentence to true. So you could "rewrite" my definition of validity as: a conclusion is valid iff whenever the premises are satisfisable the conclusion is satisfisable.
Now, with regards your requirement to prove these things, I strongly suspect you've got a lot more context about your logic you're not telling us and your teacher or text book has intimated a context in which to answer these, as actually taken in the general sense your question doesn't make complete sense.

Mathematica's pattern matching poorly optimized?

I recently inquired about why PatternTest was causing a multitude of needless evaluations: PatternTest not optimized? Leonid replied that it is necessary for what seems to me as a rather questionable method. I can accept that, though I would prefer a more efficient alternative.
I now realize, which I believe Leonid has been saying for some time, that this problem runs much deeper in Mathematica, and I am troubled. I cannot understand why this is not or cannot be better optimized.
Consider this example:
list = RandomReal[9, 20000];
Head /# list; // Timing
MatchQ[list, {x__Integer, y__}] // Timing
{0., Null}
{1.014, False}
Checking the heads of the list is essentially instantaneous, yet checking the pattern takes over a second. Surely Mathematica could recognize that since the first element of the list is not an Integer, the pattern cannot match, and unlike the case with PatternTest I cannot see how there is any mutability in the pattern. What is the explanation for this?
There appears to be some confusion regarding packed arrays, which as far as I can tell have no bearing on this question. Rather, I am concerned with the O(n2) time complexity on all lists, packed or unpacked.
MatchQ unpacks for these kinds of tests. The reason is that no special case for this has been implemented. In principle it could contain anything.
On["Packing"]
MatchQ[list, {x_Integer, y__}] // Timing
MatchQ[list, {x__Integer, y__}] // Timing
Improving this is very tricky - if you break the pattern matcher you have a serious problem.
Edit 1:
It is true that the unpacking is not the cause for the O(n^2) complexity. It does, however, show that for the MatchQ[list, {x__Integer, y__}] part the code goes to another part of the algorithm (which needs the lists to be unpacked). Some other things to note: This complexity arises only if both patterns are __ if either one of them is _ the algorithm has a better complexity.
The algorithm then goes through all n*n potential matches and there seems no early bailout. Presumably because other patters could be constructed that would need this complexity - The issue is that the above pattern forces the matcher to a very general algorithm.
I then was hoping for MatchQ[list, {Shortest[x__Integer], __}] and friends but to no avail.
So, my two cents: either use a different pattern (and have On["Packing"] to see if it goes to the general matcher) or do a pre-check DeveloperPackedArrayQ[expr] && Head[expr[[1]]]===Integer or some such.
#the author of the first answer. As far as I know from reverse-engeneering and reading of available information, it may be due to different ways the patterns are checked. In fact - as they say - a special hash code is used for pattern matching. This hash (basically a FNV-1 round) makes it very easy to check for particular patterns related to the type of expression involved (matter of a few xor operations). The hashing algorithm cycles inside the expression and each subpart is xorred with the output of the previous one. Special xor values are used for each atom expression - machineInts, machineReals, bigNums, Rationals and so on. Hence, for example, _Integer is easy to check because the hash of any integer is formed with integer's xor value, so all we need to do is doing the inverse op and see if matches - i.e. if we get some particular value or something like that (sorry if I'm vague on actual implementation details. It's WIP). For general or uncommon patterns the check may not take advantage of this hash stuff and require something different.
#the OP Head[] simply acts on the internal expression, taking the value of the first pointer of the expression (expressions are implemented as arrays of pointers). So doing it is as easy as copying and printing a string - very very fast. The pattern matching engine is not even called in this case.

Resources