When transforming a Context-free grammar into Chomsky Normal Form, we first remove null-productions, then unit-productions and then useless productions in this exact order.
I understand that removing null-productions could give raise to unit-productions, that’s why unit is removed after null-productions.
I do however not understand what could go wrong if we first removed useless-productions and then unit?
If you remove the unit production A → B and that was the only place in the grammar where B was referenced, then B will become unreachable as a result of unit-production elimination, and will need to be removed along with its productions.
That condition requires B to be non-recursive (since a recursive non-terminal refers to itself, and presumably not with a unit production), and any non-terminals referenced in the productions of B will still be referenced, having been absorbed into productions for A.
If the grammar does not have a cycle of unit productions allowing A →* A, then unit productions can be topologically sorted and removed in reverse topological order, which guarantees that the unit production elimination doesn't create a new unit production. That makes it possible to remove newly-unreachable non-terminals as you do the unit-production elimination. But I think that textbook algorithms probably don't do that, which is presumably why your textbook wants you to remove useless productions after the grammar has been converted to CNF. (And, of course, there's nothing stopping a grammar from having a cycle of unit productions. Such a grammar would be ambiguous, making it difficult to use in a parser, but this exercise doesn't require that the grammar be useful in a parser.)
Similarly, if the only production for a non-terminal is an ε-production, then that non-terminal will end up with no productions after null-productions are removed (and it will also be unreachable). Again, that could be handled in a way which doesn't require deferring reachability analysis, but the textbook algorithm probably doesn't do that.
I have done process of elimination for the resolution rule and ended up with the set.
{pq, not p, not q}.
according to my text book: Lemma : if two clauses clash on more than one literal their resolvent is a trivial cause ... then goes on to say it is not strictly incorrect to perform resolution on such clauses but since trivial clauses contribute nothing to the satisfiability or unsatisfiability of a set of clauses we agree to delete them...
But elsewhere I Have read not to remove them since there is no reason that both of those clauses could be true.
So would the able clauses leave me with the empty set {} making my final answer that the set is unsatisfiable? Or do I leave that as my final answer? The problem said Prove that it IS satisfiable, so I'm guessing I should leave the clauses in the set so that it is, but the textbook says to remove them.
In your example, there aren't any two clauses that would clash in two literals. Two such clauses could be {p,r,q} and {~p,~r,s} and you would get {r,~r,q,s} which is always satisfiable (tautology). So, you would remove it us useless.
In your example, you will end up with an empty set after applying two resolution steps to three clauses: {pq}, ~p yields q and {q}, {~q} yields an empty set. So, the set of clauses is not satisfiable.
If the task was to prove something is satisfiable, there must be something wrong earlier in the derivation.
If I have a given boolean expression (AND and OR operations) with many boolean variables, then I want to evaluate this expression to true, how can I find the set of all possible boolean values to achive a true epxression?
For example, I have 4 boolean variable a, b, c, d and an expression:
(a ^ b) v (c ^ d)
The slowest way I've tried to do is:
I build an expression tree to get all variables in the expression, I get a {a,b,c,d} set.
I find all subsets of the set: {a}, {b}, {c}, {d}, {a,b}, {a,c}, {a,d}, {b,c}, {b,d}, {c,d}, {a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}, {a,b,c,d}
For each subset, I set each of variables to true, then evaluate the expression. If the expression returns true, I save the subset with the values.
EDIT: I eliminate the NOT operator to make the problem simpler.
I think I see a way to compute this without having to try all the permutations, and my high level outline, described below, is not really very complicated. I'll outline the basic approach, and you will have two follow-up tasks to do on your own:
Parsing a logical expression, like "A && (B || C)" into a classical parse tree, that represents the expression, which each node in the tree being either a variable, or a boolean operation, either "&&", "||", or "!" (NOT), with two children being its operands. This is a classical parse tree. Plenty of examples of how to do this can be found in Google.
Translating my outline into actual C++ code. That's going to be up to you also, but I think that the implementation should be rather obvious, once you wrap your brain around the overall approach.
To solve this problem, I'll use a two-phase approach.
I will use the general approach of proof by induction in order to come up with a tentative list of all potential sets of values of all the variables for which the boolean expression will evaluate to true.
In the second phase I'll eliminate from the list of all potential sets those sets that are logically impossible. This might sound confusing, so I'll explain this second phase first.
Let's use the following datatypes. First, I'll use this datatype of possible values for which the boolean expression will evaluate to either true or false:
typedef std::set<std::pair<std::string, bool>> values_t;
Here, std::pair<std::string, bool> represents variable, whose name is std::string, has this bool value. For example:
{"a", true}
Means that the value of variable "a" has the value of true. It follows that this std::set represents a set of variables and their corresponding values.
All of these potential solutions are going to be an:
typedef std::list<values_t> all_values_t;
So this is how we will represent a list of all combinations of values of all variables, that produce the result of either true or false. You can use a std::vector instead of a std::list, it doesn't really matter.
Now notice that it is possible for a values_t to have both:
{"a", true}
and
{"a", false}
in the set. This means that in order for the expression to evaluate to true or false, "a" must be simultaneously true and false.
But this is, obviously, logically impossible. So, in phase 2 of this solution you will need to go simply go through all the individual values_t in all_values_t, and remove the "impossible" values_t that contain both true and false for the some variable. The way to do this should seem rather obvious, and I won't waste time on describing it, but phase 2 should be straightforward, once phase 1 is complete.
For phase 1 our goal is to come up with a function that's roughly declared like this:
all_values_t phase1(expression_t expr, bool goal);
expr is a parsed representation of your boolean expression, as a parse tree (as I mentioned in the beginning, doing this part will be up to you). goal is how you want the parsed expression to be evaluated to: phase1() returns all possible all_values_t for which expr evaluates to either true or false, as indicated by "goal". You will, obviously, call phase1() passing true for "goal" for your answer, because that's what you want to figure out. But phase1() will call itself recursively, with either a true or a false "goal", to do its magic.
Before proceeding, it is important now to read and understand the various resources that describe how a proof by induction works. Don't proceed any further until you understand this general concept fully.
Ok, now you understand the concept. If you do, then you must now agree with me that phase1() is already done. It works! Proof by induction starts by assuming that phase1() does what it is supposed to do. phase1() will make recursive calls to itself, and since phase1() returns the right result, phase1() can simply rely on itself to figure everything out. See how easy this is?
phase1() really has one "simple" task at hand:
Check what the top level node of the parse tree is. It will be either a variable node or an expression node (see above).
Return the appropriate all_values_t, based on that.
That's it. We'll take both possibilities, one at a time.
The top level node is a variable.
So, if your expression is just a variable, and you want the expression to return goal, then:
values_t v{ {name, goal} };
There's only one possible way for the expression to evaluate to goal: an obvious no-brainer: the variable, and goal for its value.
And there's only one possible solution. No other alternatives:
all_values_t res;
res.push_back(v);
return res;
Now, the other possibility is that the top-level node in the expression is one of the boolean operations: and, or, or not.
Again, we'll divide and conquer this, and tackle each one, one at a time.
Let's say that it's "not". What should we do then? That should be easy:
return phase1(child1, !goal);
Just call phase1() recursively, passing the "not" expression's child node, with goal logically inverted. So, if your goal was true, use phase1() to come back with what the values for "not" sub-expression being false, and vice-versa. Remember, proof by induction assumes that phase1() works as advertised, so you can rely on it to get the correct answer for the sub-expression.
It should now start becoming obvious how the rest of phase1() works. There are only two possibilities left: the "and" and the "or" logical operation.
For the "and" operation, we'll consider, separately, whether the "goal" of the "and" operation should be true or false.
If goal is true, you must use phase1() to come up with all_values_t for both subexpressions being true:
all_values_t left_result=phase1(child1, true);
all_values_t right_result=phase1(child2, true);
Then just combine the two results together. Now, recall that all_values_t is a list of all possible values. Each value in all_values_t (which can be an empty list/vector) represents one possible solution. Both the left and the right sub-expressions must be logically combined, but any possible solution from the left_result can go together with any right_result. Any potential solution with the left subexpression being true can (and must) go with any potential solution with the right subexpression being true.
So the all_values_t that needs to be returned, in this case, is obtained by doing a cartesian product between the left_result and the right_result. That is: taking the first value, the first values_t std::set in the left_result, then adding to this set the first right_result std::set, then the first left_result with the second right_result, and so on; then the second left_result with the first right_result, then the second right_result and so on. Each one of these combinations gets push_back()ed into the all_values_t that gets returned from this call to phase1().
But your goal is to have the "and" expression return false, instead, you simply have to do a variation of this three times. The first time by calling phase1(child1, false) with phase1(child2, false); then phase1(child1, true) with phase1(child2, false); and finally phase1(child1, false) and phase1(child2, true). Either child1 or child2, or both, must evaluate to false.
So that takes care of the "and" operation.
The last, and the final possibility for phase1() to deal with is the logical or operation. You should be able to, now, figure out how to do it by yourself, but I'll just briefly summarize it:
If goal is false, you must call phase1(child1, false) with phase1(child2, false), then combine both results together, as a cartesian product. If goal is true, you will make three sets of recursive calls, for the three other possibilities, and combine everything together.
You're done. There's nothing else for phase1() to do, and we completed our proof by induction.
Well, I lied a little bit. You'll also need to do a small "Phase 3". Recall that in "Phase 2" we eliminated all impossible solution. Well, it is possible that, as a result of all this, the final list of possible solution will have the same values_t occur more than one time in the all_values_t, so you'll just have to dedupe it.
P.S. It's also possible to avoid a discrete phase 2 by doing it on the fly, as part of phase 1. This variation is going to be your homework assignment, too.
I am currently studying logic programming, and learn Prolog for that case.
We can have a Knowledge Base, which can lead us to some results, whereas Prolog will get in infinite loop, due to the way it expands the predicates.
Let assume we have the following logic program
p(X):- p(X).
p(X):- q(X).
q(X).
The query p(john) will get to an infinite loop because Prolog expands by default the first predicate that is unified. However, we can conclude that p(john) is true if we start expanding the second predicate.
So why doesn't Prolog expand all the matching predicates (implemented like threads/processes model with time slices), in order to conclude something if the KB can conclude something ?
In our case for example, two processes can be created, one expanded with p(X) and the other one with q(X). So when we later expand q(X), our program will conclude q(john).
Because Prolog's search algorithm for matching predicates is depth-first. So, in your example, once matching the first rule, it will match again the first rule, and will never explore the others.
This would not happen if the algorithm is breadth-first or iterative-deepening.
Usually is up to you to reorder the KB such that these situations never happen.
However, it is possible to encode breadth-first/iterative-deepening search in Prolog using a meta-interpreter that changes the search order. This is an extremely powerful technique that is not well known outside of the Prolog world. 'The Art of Prolog' describes this technique in detail.
You can find some examples of meta-interpreters here, here and here.
If I have a subset of logic programming which contains only one function symbol, am I able to do everything?
I think that I cannot but I am not sure at all.
A programming language can do anything user wants if it is a Turing-complete language. I was taught that this means it has to be able to execute if..then..else commands, recursion and that natural numbers should be defined.
Any help and opinions would be appreciated!
In classical predicate logic, there is a distinction between the formula level and the term level. Since an n-ary function can be represented as an (n+1)-ary predicate, restricting only the number of function symbols does not lessen the expressivity.
In prolog, there is no difference between the formula and the term level. You might pick an n-ary symbol p and try to encode turing machines or an equivalent notion(e.g. recursive functions) via nestings of p.
From my intution I would assume this is not possible: you can basically describe n-ary trees with variables as leaves, but then you can always unify these trees. This means that every rule head will match during recursive derivations and therefore you are unable to express any case distinction. Still, this is just an informal argument, not a proof.
P.S. you might also be interested in monadic logic, where only unary predicates are allowed. This fragment of first-order logic is decidable.