What is a unification algorithm? - algorithm

Well I know it might sound a bit strange but yes my question is: "What is a unification algorithm".
Well, I am trying to develop an application in F# to act like Prolog. It should take a series of facts and process them when making queries.
I was suggested to get started in implementing a good unification algorithm but did not have a clue about this.
Please refer to this question if you want to get a bit deeper to what I want to do.
Thank you very much and Merry Christmas.

If you have two expressions with variables, then unification algorithm tries to match the two expressions and gives you assignment for the variables to make the two expressions the same.
For example, if you represented expressions in F#:
type Expr =
| Var of string // Represents a variable
| Call of string * Expr list // Call named function with arguments
And had two expressions like this:
Call("foo", [ Var("x"), Call("bar", []) ])
Call("foo", [ Call("woo", [ Var("z") ], Call("bar", []) ])
Then the unification algorithm should give you an assignment:
"x" -> Call("woo", [ Var("z") ]
This means that if you replace all occurrences of the "x" variable in the two expressions, the results of the two replacements will be the same expression. If you had expressions calling different functions (e.g. Call("foo", ...) and Call("bar", ...)) then the algorithm will tell you that they are not unifiable.
There is also some explanation in WikiPedia and if you search the internet, you'll surely find some useful description (and perhaps even an implementation in some functional language similar to F#).

I found Baader and Snyder's work to be most informative. In particular, they describe several unification algorithms (including Martelli and Montanari's near-linear algorithm using union-find), and describe both syntactic unification and various kinds of semantic unification.
Once you have unification, you'll also need backtracking. Kiselyov/Shan/Friedman's LogicT framework will help here.

Obviously, destructive unification would be much more efficient than a pure functional one, but much less F-sharpish as well. If it's a performance you're after, probably you will end up implementing a subset of WAM any way:
https://en.wikipedia.org/wiki/Warren_Abstract_Machine
And probably this could help: Andre Marien, Bart Demoen: A new Scheme for Unification in WAM.

Related

Reporting *why* a query failed in Prolog in a systematic way

I'm looking for an approach, pattern, or built-in feature in Prolog that I can use to return why a set of predicates failed, at least as far as the predicates in the database are concerned. I'm trying to be able to say more than "That is false" when a user poses a query in a system.
For example, let's say I have two predicates. blue/1 is true if something is blue, and dog/1 is true if something is a dog:
blue(X) :- ...
dog(X) :- ...
If I pose the following query to Prolog and foo is a dog, but not blue, Prolog would normally just return "false":
? blue(foo), dog(foo)
false.
What I want is to find out why the conjunction of predicates was not true, even if it is an out of band call such as:
? getReasonForFailure(X)
X = not(blue(foo))
I'm OK if the predicates have to be written in a certain way, I'm just looking for any approaches people have used.
The way I've done this to date, with some success, is by writing the predicates in a stylized way and using some helper predicates to find out the reason after the fact. For example:
blue(X) :-
recordFailureReason(not(blue(X))),
isBlue(X).
And then implementing recordFailureReason/1 such that it always remembers the "reason" that happened deepest in the stack. If a query fails, whatever failure happened the deepest is recorded as the "best" reason for failure. That heuristic works surprisingly well for many cases, but does require careful building of the predicates to work well.
Any ideas? I'm willing to look outside of Prolog if there are predicate logic systems designed for this kind of analysis.
As long as you remain within the pure monotonic subset of Prolog, you may consider generalizations as explanations. To take your example, the following generalizations might be thinkable depending on your precise definition of blue/1 and dog/1.
?- blue(foo), * dog(foo).
false.
In this generalization, the entire goal dog(foo) was removed. The prefix * is actually a predicate defined like :- op(950, fy, *). *(_).
Informally, above can be read as: Not only this query fails, but even this generalized query fails. There is no blue foo at all (provided there is none). But maybe there is a blue foo, but no blue dog at all...
?- blue(_X/*foo*/), dog(_X/*foo*/).
false.
Now we have generalized the program by replacing foo with the new variable _X. In this manner the sharing between the two goals is retained.
There are more such generalizations possible like introducing dif/2.
This technique can be both manually and automatically applied. For more, there is a collection of example sessions. Also see Declarative program development in Prolog with GUPU
Some thoughts:
Why did the logic program fail: The answer to "why" is of course "because there is no variable assignment that fulfills the constraints given by the Prolog program".
This is evidently rather unhelpful, but it is exactly the case of the "blue dog": there are no such thing (at least in the problem you model).
In fact the only acceptable answer to the blue dog problem is obtained when the system goes into full theorem-proving mode and outputs:
blue(X) <=> ~dog(X)
or maybe just
dog(X) => ~blue(X)
or maybe just
blue(X) => ~dog(X)
depending on assumptions. "There is no evidence of blue dogs". Which is true, as that's what the program states. So a "why" in this question is a demand to rewrite the program...
There may not be a good answer: "Why is there no x such that x² < 0" is ill-posed and may have as answer "just because" or "because you are restricting yourself to the reals" or "because that 0 in the equation is just wrong" ... so it depends very much.
To make a "why" more helpful, you will have to qualify this "why" somehow. which may be done by structuring the program and extending the query so that additional information collecting during proof tree construction is bubbling up, but you will have to decide beforehand what information that is:
query(Sought, [Info1, Info2, Info3])
And this query will always succeed (for query/2, "success" no longer means "success in finding a solution to the modeled problem" but "success in finishing the computation"),
Variable Sought will be the reified answer of the actual query you want answered, i.e. one of the atoms true or false (and maybe unknown if you have had enough with two-valued logic) and Info1, Info2, Info3 will be additional details to help you answer a why something something in case Sought is false.
Note that much of the time, the desire to ask "why" is down to the mix-up between the two distinct failures: "failure in finding a solution to the modeled problem" and "failure in finishing the computation". For example, you want to apply maplist/3 to two lists and expect this to work but erroneously the two lists are of different length: You will get false - but it will be a false from computation (in this case, due to a bug), not a false from modeling. Being heavy-handed with assertion/1 may help here, but this is ugly in its own way.
In fact, compare with imperative or functional languages w/o logic programming parts: In the event of failure (maybe an exception?), what would be a corresponding "why"? It is unclear.
Addendum
This is a great question but the more I reflect on it, the more I think it can only be answer in a task-specific way: You must structure your logic program to be why-able, and you must decide what kind of information why should actually return. It will be something task-specific: something about missing information, "if only this or that were true" indications, where "this or that" are chosen from a dedicates set of predicates. This is of course expected, as there is no general way to make imperative or functional programs explain their results (or lack thereof) either.
I have looked a bit for papers on this (including IEEE Xplore and ACM Library), and have just found:
Reasoning about Explanations for Negative Query Answers in DL-Lite which is actually for Description Logics and uses abductive reasoning.
WhyNot: Debugging Failed Queries in Large Knowledge Bases which discusses a tool for Cyc.
I also took a random look at the documentation for Flora-2 but they basically seem to say "use the debugger". But debugging is just debugging, not explaining.
There must be more.

How is predicate logic represented in Prolog?

may be a strange and broad question and not a 100% programming question, but I hope this is ok. I recently had a discussion about, that a lot of programs in Prolog don´t follow strict predicate logic (of Frege) but often are "object oriented" which I am trying to grasp.
I know that Prolog is based on first order predicate logic especially Horn Clauses and that they are a special form of modus ponens. A fact and a rule if they occur solo are simply clauses, but as soon as I add more than one occurrence they become a predicate.
How are the quantors of first order predicate logic represented and related to fact , rule , predicate or the Prolog concept in general? What does the functor express and what the arguments in relation to predicate logic. How is predicate logic and first order predicate logic reflected in Prolog and where does prolog leave their concepts? e.g. how would I define a point, a line and a vertical line in predicate logic and first order predicate logic.
How do I formulate this in predicate logic and first order predicate logic what is the semantic and logic difference between
vertical(line).
line(vertical).
Or a line and point in this example. Are point and line not predicate logic?
For me it is " point(X) the set of all points" and when I pick a concrete point "there exists one point(110, 12)."
point(X,Y).
line(point(W,X), point(Y,Z)).
vertical(line(point(X,Y), point(X,Z))).
horizontal(line(point(X,Y), point(Z,Y))).
Any info helps! Many thanks, H
A chapter of Programming in Prolog by W.Clocksin and C.Mellish is devoted to explain the relation of Prolog with logic. Citing from there
If we wish to discuss how Prolog is related to logic, we must first establish what we
mean by logic. Logic was originally devised as a way of representing the form of
arguments, so that it would be possible to check in a formal way whether or not they
are valid. Thus we can use logic to express propositions, the relations between propositions and how one can validly infer some propositions from others. The particular
form of logic that we will be talking about here is called the Predicate Calculus. We
will only be able to say a few words about it here. There are scores of good basic
introductions to logic you can turn to for background reading.
If we wish to express propositions about the world, we must be able to describe
the objects that are involved in them. In Predicate Calculus, we represent objects by
terms. A term is of one of the following forms:
A constant symbol. This is a symbol that stands for a single individual or concept.
We can think of this as a Prolog atom, and we will use the Prolog syntax. So
greek, agatha, and peace are constant symbols.
A variable symbol. This is a symbol that we may want to stand for different
individuals at different times. Variables are really only introduced in conjunction
with quantifiers, which are discussed below. We can think of them as Prolog
variables and will use the Prolog syntax. Thus X, Man, and Greek are variable
symbols.
A compound term. A compound term consists of a function symbol, together
with an ordered set of terms as its arguments. The idea is that the compound
term represents some individual that depends on the individuals represented by
the arguments. The function symbol represents how the first depends on the second. For instance, we could have a function symbol standing for the notion of
"distance" and two arguments. In this case, the compound term stands for the
distance between the objects represented by the arguments. We can think of a
compound term as a Prolog structure with the function symbol as the functor.
We will write Predicate Calculus compound terms using the Prolog syntax, so
that, for instance, wife(henry) might mean Henry's wife, distance(point1, X)
might mean the distance between some particular point and some other place to
be specified, and classes(mary, dayafter(W)) might mean the classes that Mary
teaches on the day after some day W to be specified.
Thus in Predicate Calculus the ways of representing objects are just like the ways available in Prolog.
Seems not appropriate to put the entire chapter here... there is also a program, very explanatory, in appendix B, that performs an automatic translation of WFFs into clauses.
The book is very readable, just a pity it's not among the titles in Free Prolog Programming Books section.
I know that Prolog is based on first order predicate logic especially Horn Clauses and that they are a special form of modus ponens.
In a sense, inverse "modus ponens":
a :- b
You want to show "a true", and to do so, you have to show "b true"
A fact and a rule if they occur solo are simply clauses, but as soon as I add more than one occurrence they become a predicate.
No, they are all predicates. The "predicate" is an object/agent/program/platonic-phenomenon which expresses that there (objectively) is some "relationship" between "things", and you can ask the Prolog Processor about that relationship. There is no direct meaning associated to all of that though, it's "strings related to strings via strings". We are working with syntactic machines after all (i.e. computers).
Enter this logic program:
p(x,y). % Predicate p/2 states that there is a relationship p between x and y
And now, you can query the database about what the program is saying:
?- p(x,y).
true. % a p relationship exists (fact, but could also be rule)
?- p(x,A).
A = y. % the thing related to x via p is y
?- p(A,y).
A = x. % the thing related to x via p is y
?- p(A,B).
A = x, % things related via p are x and y
B = y.
?- p(c,d).
false. % not REALLY "false" but "as far as I can tell, there
% is no relationship p between c and d"
Note the interpretation of "false", which is not the "strong false" of classical logic. Even though it is traditionally state that Prolog works in classical logic, this is not really the case:
From "Logic Programming with Strong Negation" (David Pearce, Gerd Wagner, FU Berlin, 1991), appears in Springer LNAI 475: Extensions of Logic Programming, International Workshop Tübingen, FRG, December 8–10, 1989 Proceedings):
According to the standard view, a logic program is a set of definite Horn clauses. Thus, logic programs are regarded as syntactically restricted first-order theories within the framework of classical logic. Correspondingly, the proof theory of logic programs is considered as the specialized version of classical resolution, known as SLD-resolution. This view, however, neglects the fact that a program clause, a_0 <— a_1, a_2, • • •, a_n, is an expression of a fragment of positive logic (a subsystem of intuitionistic logic) rather than an implicational formula of classical logic. The classical interpretation of logic programs, therefore, seems to be a semantical overkill.
It should be clear that in order to explain the deduction mechanism of Prolog one does not have to refer to the indirect method of SLD-resolution which checks for the refutability of the contrary. It is certainly more natural to view Prolog's proof procedure as a kind of natural deduction, as, for example, in [Hallnäs & Schroeder-Heister 1987] and [Miller 1989]. This also is more in line with the intuitions of a Prolog programmer. Since Prolog is the paradigm, logic programming semantics should take it as a point of departure.
Now:
How are the quantors of first order predicate logic represented and related
to fact, rule, predicate or the Prolog concept in general?
That is a long story. Note that Prolog is primarily about "programming using logic", and also about "modeling using logic". The two aspects certainly overlap well for problems that can be solved using explicit enumeration, but Prolog is not made for specifying general FOL constraints describing a sought-for solution. In fact, certain FOL constraints cannot be represented and other have to be transformed into nominally equivalent expression that are agreeable to the machine. Look up "skolemization". For example: https://www.cs.toronto.edu/~sheila/384/w11/Lectures/csc384w11-KR-tutorial.pdf
On the flip side, Prolog provides "meta-predicates" which generate solutions by calling other predicates, so it's making forays into second-order logic. As it must - nobody can survive in the FOL desert for long.
What does the functor express
Nothing. It just stands for itself. Pure syntax. Look up "Herbrand Universe".
How do I formulate this in predicate logic and first order predicate logic
what is the semantic and logic difference between
vertical(line).
line(vertical).
It's you who imbues vertical and line with meaning. So, feelings. You want a "vertial line", so you would say, the "thing" is the "line" and "vertical" is an attribute of the "line". So vertical(line) sounds appropriate. Or maybe attribute(line,vertical). It depends.
Here:
point(X,Y).
line(point(W,X), point(Y,Z)).
You have to aspects:
Predicates express "relationships". "Function symbols" are used to construct "things with structure": you can form trees of stuff with function symbols on nodes and integers/strings/variables on leaves. These are called "term". But terms can appear as predicates or as things, depending on the context, it's quite fluid. So you can for example construct a Prolog program with another Prolog program.
point(X,Y)
line(point(W,X), point(Y,Z))
These are terms!
If you type this into a file program.pl:
point_on_line(point(X,Y),line(point(W,X), point(Y,Z))).
The terms appear as "things" related by predicate point_on_line/2. The whole line is itself a term.
If you type this into a file program.pl:
point(X,Y).
line(point(W,X), point(Y,Z)).
The terms appear as "predicates", and point appears both as predicate point/2 and as "thing" about which predicate line/2 is talking.
This is actually a vast subject and it takes some time getting used to it, much more than functional programming. I had some Prolog and Logic courses at uni but 20 years later I found out that I had badly misunderstood a lot of aspects.

How to identify wasteful representations of Prolog terms

What is the Prolog predicate that helps to show wasteful representations of Prolog terms?
Supplement
In a aside of an earlier Prolog SO answer, IIRC by mat, it used a Prolog predicate to analyze a Prolog term and show how it was overly complicated.
Specifically for a term like
[op(add),[[number(0)],[op(add),[[number(1)],[number(1)]]]]]
it revealed that this has to many [].
I have searched my Prolog questions and looked at the answers twice and still can't find it. I also recall that it was not in SWI-Prolog but in another Prolog so instead of installing the other Prolog I was able to use the predicate with an online version of Prolog.
If you read along in the comments you will see that mat identified the post I was seeking.
What I was seeking
I have one final note on the choice of representation. Please try out the following, using for example GNU Prolog or any other conforming Prolog system:
| ?- write_canonical([op(add),[Left,Right]]).
'.'(op(add),'.'('.'(_18,'.'(_19,[])),[]))
This shows that this is a rather wasteful representation, and at the same time prevents uniform treatment of all expressions you generate, combining several disadvantages.
You can make this more compact for example using Left+Right, or make all terms uniformly available using for example op_arguments(add, [Left,Right]), op_arguments(number, [1]) etc.
Evolution of a Prolog data structure
If you don't know it already the question is related to writing a term rewriting system in Prolog that does symbolic math and I am mostly concentrating on simplification rewrites at present.
Most people only see math expressions in a natural representation
x + 0 + sin(y)
and computer programmers realize that most programming languages have to parse the math expression and convert it into an AST before using
add(add(X,0),sin(Y))
but most programming languages can not work with the AST as written above and have to create data structures See: Compiler/lexical analyzer, Compiler/syntax analyzer, Compiler/AST interpreter
Now if you have ever done more than dipped your toe in the water when learning about Prolog you will have come across Program 3.30 Derivative rules, which is included in this, but the person did not give attribution.
If you try and roll your own code to do symbolic math with Prolog you might try using is/2 but quickly find that doesn't work and then find that Prolog can read the following as compound terms
add(add(X,0),sin(Y))
This starts to work until you need to access the name of the functor and find functor/3 but then we are getting back to having to parse the input, however as noted by mat and in "The Art of Prolog" if one were to make the name of the structure accessible
op(add,(op(add,X,0),op(sin,Y)))
now one can access not only the terms of the expression but also the operator in a Prolog friendly way.
If it were not for the aside mat made the code would still be using the nested list data structure and now is being converted to use the compound terms that expose the name of the structure. I wonder if there is a common phrase to describe that, if not there should be one.
Anyway the new simpler data structure worked on the first set of test, now to see if it holds up as the project is further developed.
Try it for yourself online
Using GNU Prolog at tutorialspoint.com enter
:- initialization(main).
main :- write_canonical([op(add),[Left,Right]]).
then click Execute and look at the output
sh-4.3$ gprolog --consult file main.pg
GNU Prolog 1.4.4 (64 bits)
Compiled Aug 16 2014, 23:07:54 with gcc
By Daniel Diaz
Copyright (C) 1999-2013 Daniel Diaz
compiling /home/cg/root/main.pg for byte code...
/home/cg/root/main.pg:2: warning: singleton variables [Left,Right] for main/0
/home/cg/root/main.pg compiled, 2 lines read - 524 bytes written, 9 ms
'.'(op(add),'.'('.'(_39,'.'(_41,[])),[]))| ?-  
Clean vs. defaulty representations
From The Power of Prolog by Markus Triska
When representing data with Prolog terms, ask yourself the following question:
Can I distinguish the kind of each component from its outermost functor and arity?
If this holds, your representation is called clean. If you cannot distinguish the elements by their outermost functor and arity, your representation is called defaulty, a wordplay combining "default" and "faulty". This is because reasoning about your data will need a "default case", which is applied if everything else fails. In addition, such a representation prevents argument indexing, and is considered faulty due to this shortcoming. Always aim to avoid defaulty representations! Aim for cleaner representations instead.
Please see the last part of:
https://stackoverflow.com/a/42722823/1613573
It uses write_canonical/1 to display the canonical representation of a term.
This predicate is very useful when learning Prolog and helps to clear several misconceptions that are typical for beginners. See for example the recent question about hyphens, where it would have helped too.
Note that in SWI, the output deviates from canonical Prolog syntax in general, so I am not using SWI when explaining Prolog syntax.
You could also programmatially count how many subterms are a single-element list using something like this (not optimized);
single_element_list_subterms(Term, Index) :-
Term =.. [Functor|Args],
( Args = []
-> Index = 0
; maplist(single_element_list_subterms, Args, Indices),
sum_list(Indices, SubIndex),
( Functor = '.', Args = [_, []]
-> Index is SubIndex + 1
; Index = SubIndex
)
).
Trying it on the example compound term:
| ?- single_element_list_subterms([op(add),[[number(0)],[op(add),[[number(1)],[number(1)]]]]], Count).
Count = 7
yes
| ?-
Indicating that there are 7 subterms consisting of a single-element list. Here is the result of write_canonical:
| ?- write_canonical([op(add),[[number(0)],[op(add),[[number(1)],[number(1)]]]]]).
'.'(op(add),'.'('.'('.'(number(0),[]),'.'('.'(op(add),'.'('.'('.'(number(1),[]),'.'('.'(number(1),[]),[])),[])),[])),[]))
yes
| ?-

Prolog: How to tell if a predicate is deterministic or not

So from what I understand about deterministic predicates:
Deterministic predicate = 1 solution
Non-deterministic predicate = multiple solutions
Are there any type of rules as to how you can detect if the predicate is one or the other? Like looking at the search tree, etc.
There is no clear, generally accepted consensus about these notions. However, they are usually based rather on the observed answers and not based on the number of solutions. In certain contexts the notions are very implementation related. Non-determinate may mean: leaves a choice point open. And sometimes determinate means: never even creates a choice point.
Answers vs. solutions
To see the difference, consider the goal length(L, 1). How many solutions does it have? L = [a] is one, L = [23] another... but all of these solutions are compactly represented with a single answer substitution: L = [_] which thus contains infinitely many solutions.
In any case, in all implementations I know of, length(L, 1) is a determinate goal.
Now consider the goal repeat which has exactly one solution, but infinitely many answers. This goal is considered non-determinate.
In case you are interested in constraints, things become even more evolved. In library(clpfd), the goal X #> Y, Y #> X has no solution, but still one answer. Combine this with repeat: infinitely many answers and no solution.
Further, the goal append(Xs, Ys, []) has exactly one solution and also exactly one answer, nevertheless it is considered non-determinate in many implementations, since in those implementations it leaves a choice point open.
In an ideal implementation, there would be no answers without solutions (except false), and there would be non-determinism only when there is more than one answer. But then, all of this is mostly undecidable in the general case.
So, whenever you are using these notions make sure on what level things are meant. Rather explicitly say: multiple answers, multiple solutions, leaves no (unnecessary) choice point open.
You need understand the difference between det, semidet and undet, it is more than just number of solutions.
Because there is no loop control operator in Prolog, looping (not recursion) is constructed as a 'sequence generating' predicate (undet) followed by the loop body. Also you can store solutions with some of findall-group predicates as a list and loop later with the member/2 predicate.
So, any piece of your program is either part of loop construction or part of usual flow. So, there is a difference in designing det and undet predicates almost in the intended usage. If you can work with a sequence you always do undet and comment it as so. There is a nice unit-test extension in swi-prolog which can check wheter your predicate always the same in mean of det/semidet/undet (semidet is for usage the same way as undet but as a head of 'if' construction).
So, the difference is pre-design, and this question should not be arised with already existing predicates. It is a good practice always comment the intended usage in a comment like.
% member(?El, ?List) is undet.
Deterministic: Always succeeds with a single answer that is always the same for the same input. Think a of a static list of three items, and you tell your function to return value one. You will get the same answer every time. Additionally, arithmetic functions. 1 + 1 = 2. X + Y = Z.
Semi-deterministic: Succeeds with a single answer that is always the same for the same input, but it can fail. Think of a function that takes a list of numbers, and you ask your function if some number exists in the list. It either does, or it doesn't, based on the contents of the list given and the number asked.
Non-deterministic: Succeeds with a single answer, but can exhibit different behaviors on different runs, even for the same input. Think any kind of math.random(min,max) function like random/3
In essence, this is entirely separate from the concept of choice points, as choice points are a function of Prolog. Where I think the Prolog confusion of these terms comes from is that Prolog can find a single answer, then go back and try for another solution, and you have to use the cut operator ! to tell it that you want to discard your choice points explicitly.
This is very useful to know when working with Prolog Unit Testing

Prolog negation and logical negation

Assume we have the following program:
a(tom).
v(pat).
and the query (which returns false):
\+ a(X), v(X).
When tracing, I can see that X becomes instantiated to tom, the predicate a(tom) succeeds, therefore \+ a(tom) fails.
I have read in some tutorials that the not (\+) in Prolog is just a test and does not cause instantiation.
Could someone please clarify the above point for me? As I can see the instantiation.
I understand there are differences between not (negation as failure) and the logical negation. Could you refer a good article that explains in which cases they behave the same and when do they behave different?
Great question.
Short answer: you stumbled upon "floundering".
The problem is that the implementation of the operator \+ only works
when applied to a literal containing no variables, i.e., a ground
literal. It is not able to generate bindings for variables, but only
test whether subgoals succeed or fail. So to guarantee reasonable
answers to queries to programs containing negation, the negation
operator must be allowed to apply only to ground literals. If it is
applied to a nonground literal, the program is said to flounder.
link
If you invert the query
v(X), \+ a(X).
You'll get the right answer. Some implementations or meta interpreter detect floundering goals and delay them until all the variables are ground.
About your point 1), you see the instantiation inside the NAF tree. What happens there shouldn't affect variables that are outside (in this case in v(X)). Prolog often acts in the naive way to avoid inefficiencies. In theory it should just return an error instead of instantiating the variable.
2) This is my favourite article on the topic: Nonmonotonic Logic Programming.
WRT point 2, Wikipedia article seems a good starting point.
You already experienced that understanding NAF can be difficult. Part of this could be because (logical) negation it's inherently difficult to define even in simpler contest that predicate calculus (see for instance Russel's paradox), and part because the powerful variables of Prolog are domed to keep the actual counterexamples of failed if negated proofs. See if you can understand the actual library definition of forall/2 (please read the documentation, it's synthetic and interesting) that's the preferred way to run a failure driven loop:
%% forall(+Condition, +Action)
%
% True if Action if true for all variable bindings for which Condition
% if true.
forall(Cond, Action) :-
\+ (Cond, \+ Action).
I remember the first time I saw it, it looked like magic...
edit about a tutorial, I found, while 'spelunking' my links collection, a good site by J.R.Fisher. It's full of interesting stuff, just a pity it's a bit terse in explanations, requiring the student to answer itself with frequent execises. See the paragraph 2.5, devoted to negation by failure. I think you could also enjoy section 3. How Prolog Works

Resources