I remember once reading that there were at least two other alternatives invented roughly at the same time as the WAM. Any pointers?
Prior to the WAM, there was the ZIP by Clocksin. Its design is still very interesting. SWI-Prolog uses it. And also B-Prolog has slowly migrated from a WAM design towards the ZIP. Of course, on that way many new innovations were developed. Another alternative is the VAM.
A comparison as of 1993 is:
http://www.complang.tuwien.ac.at/ulrich/papers/PDF/binwam-nov93.pdf
In the meantime, the most interesting architectural developments are related to B-Prolog.
WAM vs. ZIP
The key difference between the WAM and the ZIP is the precise interface for a predicate's arguments. In the WAM, the arguments are all passed via registers, that is, either real registers or at least fixed locations in memory. The ZIP passes all arguments via the stack.
Let's consider a minimal example:
p(R1,R2,R3,L1,L2,L3) :- % WAM % ZIP
% store L1..L3 % nothing
% nothing % push R1..R3
% init X1..X3 % push X1..X3
q(R1,R2,R3,X1,X2,X3),
% put unsafe X1..X3 % push X1..X3
% load L1..L3 % push L1..L3
r(X1,X2,X3,L1,L2,L3).
Prior to calling q:
The WAM does not need to do any action for arguments that are passed on to the first goal at the very same positions (R1..R3). This is particularly interesting for binary clauses - that is, clauses with exactly one regular goal at the end. Here the WAM excels.
The other arguments L1..L3 need to be stored locally. So for these arguments, the register interface did not do anything good.
The ZIP on the other hand does not need to save arguments - they are already saved on the stack. This is not only good for clauses with more than one goal, but also for other interrupting goals like constraints or interrupts.
As a downside, the ZIP must push again R1..R3.
Both have to initialize X1..X3 and store them on the stack.
Calling q:
When calling q, the WAM has to allocate stack space for X1..X3 and L1..L3 thus 6 cells, whereas the ZIP needs R1..R3,L1..L3,X1..X3. So here, the WAM is more space efficient. Also, the WAM permits environment trimming (for more complex situations) which is next-to-impossible for the ZIP.
Prior to calling r:
This r is the last call, and systems try to free the space for this clause, provided no choice point is present.
For the WAM, the existential variables X1..X3 have to be checked for being still uninstantiated local variables (put_unsafe), and if, they are moved onto the heap - that's expensive, but occurs rarely. L1..L3 are just loaded. That's all, the WAM can now safely deallocate the local frame. So last call optimization is dirt cheap.
For the ZIP, everything has to be pushed as usual. Then only, an extra scan has to examine all the values on the stack and moves them accordingly. That's rather expensive. Some optimizations are possible, but it is still much more than what the WAM does. ((A possible improvement would be to push arguments in reverse order. Then the variables L1..L3 might be left in their location. So these variables would not need any handling. I have not seen such an implementation (yet).))
In the technical note entitled An abstract Prolog instruction set, Warren also references another compiler by Bowen, Byrd, and Clocksin. However, he says that the two architectures have much in common, so I don't know whether that compiler could be really considered as an alternative.
Not sure if this is what you mean, but the first two Prolog implementations were an interpreter written in Fortran by Colmerauer et al. and a DEC PDP-10 native compiler by Warren et al.
Warren mentions these in his foreword to Ait-Kaci's Tutorial Reconstruction of the WAM. If this is not what you mean, you may find it in that document or its references.
Related
I'm reading Hassan Aït-Kaci's "Warren's Abstract Machine: A Tutorial Reconstruction".
In Chapter 2, the compilation of L0 programs is presented after the compilation of L0 queries. The program compilation section (2.3) starts with:
Compiling a program term p is just a bit trickier, although not by
much. Observe that it assumes that a query ?- q will have built a term
on the heap and set register X1 to contain its address. Thus,
unifying q to p can proceed by following the term structure already
present in X1 as long as it matches functor for functor the structure of p.
So the compilation of a program is made after instructions obtained from query compilation are executed? Does that even make sense? I'm confused...
What makes sense to me: WAM code generated from a program's annotated syntax tree is stored by the interpreter. For each procedure (defined in the program) a block of WAM code is stored. When a query is made, its instructions are generated and executed. If the query is calling a defined procedure, execute its block of code. Is it something like that?
Please note that what you quote is from the very beginning of a series of increasingly complex virtual machines that are introduced in this text:
We consider here ℒ0, a very simple language indeed. In this language, one can specify only two sorts of entities: a program term and a query term. Both program and query are first-order terms but not variables. The semantics of ℒ0 is simply
tantamount to computing the most general unifier of the program and the query.
This simple language is interpreted as you describe.
In later sections of the book, the design and execution of more complex machines becomes proportionally more sophisticated, and already a few pages later we find for example:
In ℳ1, compiled code is stored in a code area (CODE), an addressable array of data words, each containing a possibly labeled instruction over one or more memory words consisting of an opcode
followed by operands.
This is already the design you describe at the end of your post, which is of course how actual Prolog code is compiled in practice.
So the compilation of a program is made after instructions obtained from query compilation are executed? Does that even make sense? I'm confused...
In the beginning, this is clarified (2, last paragraph):
The idea is quite simple: having defined a program term p, one can submit any query ?-q and execution either fails if p and q do not unify, or succeeds with a binding of the variables in q obtained by unifying it with p.
As #mat already states: This is a step-by-step approach. Starting from very simple programs. Just one ground fact and a query.
There is a very detailed Draft proposal for setup_call_cleanup/3.
Let me quote the relevant part for my question:
c) The cleanup handler is called exactly once; no later than upon failure of G. Earlier moments are:
If G is true or false, C is called at an implementation dependent moment after the last solution and after the last observable effect of G.
And this example:
setup_call_cleanup(S=1,G=2,write(S+G)).
Succeeds, unifying S = 1, G = 2.
Either: outputs '1+2'
Or: outputs on backtracking '1+_' prior to failure.
Or (?): outputs on backtracking '1+2' prior to failure.
In my understanding, this is basically because a unification is a
backtrackable goal; it simply fails on reexecution. Therefore, it is
up to the implementation to decide whether to call the cleanup just
after the fist execution (because there will be no more observable
effects of the goal), or postpone until the second execution of the
goal which now fails.
So it seems to me that this cannot be used to detect determinism
portably. Only a few built-in constructs like true, fail, !
etc. are genuinely non-backtrackable.
Is there any other way to check determinism without executing a goal twice?
I'm currently using SWI-prolog's deterministic/1, but I'd certainly
appreciate portability.
No. setup_call_cleanup/3 cannot detect determinism in a portable manner. For this would restrict an implementation in its freedom. Systems have different ways how they implement indexing. They have different trade-offs. Some have only first argument indexing, others have more than that. But systems which offer "better" indexing behave often quite randomly. Some systems do indexing only for nonvar terms, others also permit clauses that simply have a variable in the head - provided it is the last clause. Some may do "manual" choice point avoidance with safe tests prior to cuts and others just omit this. Brief, this is really a non-functional issue, and insisting on portability in this area is tantamount to slowing systems down.
However, what still holds is this: If setup_call_cleanup/3 detects determinism, then there is no further need to use a second goal to determine determinism! So it can be used to implement determinism detection more efficiently. In the general case, however, you will have to execute a goal twice.
The current definition of setup_call_cleanup/3 was also crafted such as to permit an implementation to also remove unnecessary choicepoints dynamically.
It is thinkable (not that I have seen such an implementation) that upon success of Call and the internal presence of a choicepoint, an implementation may examine the current choicepoints and remove them if determinism could be detected. Another possibility might be to perform some asynchronous garbage collection in between. All these options are not excluded by the current specification. It is unclear if they will ever be implemented, but it might happen, once some applications depend on such a feature. This has happened in Prolog already a couple of times, so a repetition is not completely fantasy. In fact I am thinking of a particular case to help DCGs to become more determinate. Who knows, maybe you will go that path!
Here is an example how indexing in SWI depends on the history of previous queries:
?- [user].
p(_,a). p(_,b). end_of_file.
true.
?- p(1,a).
true ;
false.
?- p(_,a).
true.
?- p(1,a).
true. % now, it's determinate!
Here is an example, how indexing on the second argument is strictly weaker than first argument indexing:
?- [user].
q(_,_). q(0,0). end_of_file.
true.
?- q(X,1).
true ; % weak
false.
?- q(1,X).
true.
As you're interested in portability, Logtalk's lgtunit tool defines a portable deterministic/1 predicate for 10 of its supported backend Prolog compilers:
http://logtalk.org/library/lgtunit_0.html
https://github.com/LogtalkDotOrg/logtalk3/blob/master/tools/lgtunit/lgtunit.lgt (starting around line 1051)
Note that different systems use different built-in predicates that approximate the intended functionality.
I am trying to see the index value of for loop in DDC-I debugger and it always shows me ERROR.
With the assembly of the same, it shows the following instruction:
cmp cr7,0,r20,r23
so it's comparing r20 and r23 but both of these registers don't hold the index value. I am not sure what is cr7 ?
In short, most embedded tool chains (including the ones you pay for) are horrible about reconstructing local/automatic variables in even lightly optimized code. A lot of them simply can't reconstruct variables that never have storage because they live in registers the whole time (loop index variables like the one you can't see are typical cases). Some even have issues with interim computation holders, and arguments (since they're almost always passed as registers).
Typical strategies might be:
Temporarily turning off optimizations around the code in question
Temporarily moving the variable in question to the global scope
Becoming proficient at reading disassembly.
This isn't a terribly practical answer, but it is surprising for a lot of people that are new to the embedded world or never had the luxury of a source level debugger on their embedded platform.
On PowerPC there are eight CR fields, cr0 to cr7. If you don't specify a CR field for a compare result the default is cr0, but in this case cr7 is specified and so the flags in field cr7 will indicate the result of the compare operation. There are 4 condition code bits in each CR field: lt, gt, eq and so. Typically the compare will be followed by a conditional branch, bc.
There is some useful info in this IBM developerWorks article: Assembly language for Power Architecture, Part 3: Programming with the PowerPC branch processor.
I'm trying to create my own WAM implementation and I'm stuck at the exercise 2.4
I can't understand how to execute instruction unify_value X4 in figure 2.4.
As far as I understand, this instruction should unify Y from the program with f(W) from the query.
unify_value X4 calls unify (X4,S) where S=2 (see Figure 2.1) and a corresponding heap cell is "REF 2", and X4 is "STR 5".
Unify (Figure 2.7) should bind those values, but I do not understand how to deref a register.
"REF 2" is in the heap, "STR 5" is in a register. How do you bind something to a register?
We are talking about Warren's "New" Engine, WAM and not the Old Engine, known as PLM.
In the WAM variables are allocated in two places.
the local stack (environment stack)
the heap
Registers cannot hold variables. However, they may hold references to variables. Note that references from the heap only point into the heap.
Much related to your question is the pretty ingenious way how the WAM maintains this order and at the same time has very cheap last-call optimization. At the point in time of a (determinate) last call, the local variables that are arguments of the last call must be moved somehow. In more traditional Prolog machines like the ZIP this is an extremely laborious undertaking which essentially requires to scan the environment frame for variables still sitting in them.
The WAM however has a much better calling convention: Most variables are already in a safe place, which can be trivially analyzed during compilation. The very few remaining need an explicit PUT_UNSAFE instruction where the value is checked, and should it still be a local variable that variable is transferred onto the heap.
Consider what is a safe variable in the WAM:
All variables occurring in the head
All variables that appear as an argument of a structure
Thus only variables that appear first in a goal and in the last goal and that do not appear in some structure must have a PUT_UNSAFE. That is not that much. Further, the dynamic check may reduce the actual copying onto the heap to a minimum.
At first this PUT_UNSAFE looks like a lot of work, but never forget that the WAM permits to remove many PUTs, while the ZIP has to execute at least one instruction for each argument.
Here is a tiny typical example using GNU:
a --> b, c.
expanded to:
a(S0,S) :- b(S0,S1), c(S1,S).
and compiled using the command pl2wam to:
predicate(a/2,1,static,private,monofile,global,[
allocate(2),
get_variable(y(0),1), % S
put_variable(y(1),1), % S1
call(b/2),
put_unsafe_value(y(1),0), % S1
put_value(y(0),1), % S
deallocate,
execute(c/2)]).
I need to write something like circular buffer in TurboProlog 2.0 for calculating average. I don't know what predicates i need to write, and have no idea how link them together.
I'm not sure what functionality of "a circular buffer" needs to be realized for your application. In general a "buffer" would be reusable storage, often associated with I/O processes that handle asynchronous communications (hence the need for a buffer that allows one process to get ahead of the other). A "circular buffer" denotes a way of managing the available storage with pointers (both to the beginning and end of valid/unprocessed data) that wraparound through a (linear) contiguous region. This has an advantage over maintaining a FIFO queue with a fixed location for the beginning of valid data because no "shuffling" of unprocessed items is required.
In the general context of standard Prolog, where rewriting memory locations is not directly supported, that advantage doesn't make sense. Even in Turbo Prolog it has to be asked exactly what you want to accomplish, so that a skillful use of the extended/nonstandard features available can be made.
Here are some ideas:
Turbo Prolog supports lists that are in some ways more restrictive and perhaps in other ways more elaborate than the proper lists of standard Prolog. One of the restrictions is that in Turbo Prolog all items of a list must belong to the same "domain", a notion foreign to the weakly-typed character of standard Prolog. Also domains may be designated as "reference" domains in Turbo Prolog, a level of indirection that permits partially bound compound terms to be passed between subgoals. Without going into too much detail, one sense of "circular buffer" might be a "list" (formed from a reference domain) which wraps back around on itself (a cyclical reference). Such a term can be created in many other Prologs, the distinction being that it is not a proper list. Circular though such a term might be, it would not be much of a buffer (once created) because items in the list could not be rewritten.
Turbo Prolog supports dynamic assertion and retraction of facts, with metapredicates like asserta/1 and assertz/1 that allow serial positioning of new facts at the beginning or the end of those existing facts for the same predicate (and also if desired within a specified named "module" or factbase to use the Turbo Prolog terminology). If the simple management of items in a FIFO queue is your objective, then this is most likely the approach you want (at least for an initial implementation), with items encapsulated as facts.
Turbo Prolog also supports "external" factbases with some additional features, external in the sense of being stored (either in memory or on disk) in a way that allows persistence and expanded space beyond what is allocated for internal factbases. Given the modest fixed sizes usually associated with circular buffers (because they are meant for reuse and overflow is usually dealt with by blocking on the input process until the output process has a chance to catch up), external factbases don't seem to offer much of interest, though possibly the ability to persist "buffers" might be of interest for long-running processes.
Hopefully these suggestions will elicit some clarification on what really needs to be accomplished here.
After much thought, was written the following program
% Consumer predicate. If buffer is empty, nothing to take, need to wait for producer predicate.
consumer(BufferBefore, [], _) :-
length(BufferBefore, BuffSize),
BuffSize = 0,
write("Buffer is empty. Waiting for producer"), nl, !, fail.
% If the buffer is not empty, returns first element and removes them from the buffer
consumer(BufferBefore, BufferAfter, Result) :-
car(BufferBefore, Result),
deletefirst(BufferBefore, BufferAfter).
% Producer predicate. If data and buffer is empty, nothing taken from the data to put in buffer.
producer([], [], [], [], _) :- write("End of data!."), !, fail.
% Else if buffer is not empty, add first elem from data (and removes them from here) to last position in buffer.
producer(DataBefore, BufferBefore, DataAfter, BufferAfter, Size) :-
length(BufferBefore, BuffSize), BuffSize < Size, !,
car(DataBefore, Elem),
addlast(Elem, BufferBefore, BufferAfter),
deletefirst(DataBefore, DataAfter).
Several examples of running
consumer([1,2,3,4,5], BufferAfter, Result)
returns
BufferAfter = [2,3,4,5], Result = 1.
And
producer([1,2,3,4,5,6],[7,8,9],DataAfter, BufferAfter, %">3 here"%)
returns
DataAfrer = [2,3,4,5,6], BufferAfter = [7,8,9,1].
Now, to demonstrate any calculation, we need to write a program that, which will run "consumer" until the buffer is empty will be. And "consumer" will run "producer", when buffer is empty. And stop the process, when data and buffer will be empty.
Hope will be useful to anyone.