YAP prolog cpu seconds - prolog

I'm using time/1 to measure cpu time in YAP prolog and I'm getting for example
514.000 CPU in 0.022 seconds (2336363% CPU)
yes
What I'd like to ask is what is the interpretation of these numbers? Does 514.000 represents CPU secs? What is "0.022 seconds" and the CPU percentage that follows?
Thank you

Related

Up-to-date Prolog implementation benchmarks?

Are there any up-to-date Prolog implementation benchmarks (with results)?
I found this on the mercury web site. Surprisingly, it shows a 20-fold gap between swi-prolog and Aquarius. I suspect that these results are pretty old. Does this gap still hold? Personally, I'd also like to see some comparisons with the occurs check turned on, since it has a major impact on performance, and some compilers might be better than others at optimizing it away.
Of more recent comparisons, I found this claim that gnu-prolog is 2x faster than SWI, and YAP is 4x faster than SWI on one specific code base.
Edit:
a specific case where the occurs check is needed for a real world problem
Sure: type inference in Haskell, OCaml, Swift or theorem provers such as this one. I also think the burden is on the programmer to prove that his code doesn't need the occurs check. Tests can only prove that you do need it, not that you don't need it.
I have some benchmark results published at:
https://logtalk.org/performance.html
Be sure to read and understand the notes at the end of that page, however.
Regarding running benchmarks with GNU Prolog, note that you cannot use the top-level interpreter as code loaded from it is interpreted, not compiled (see GNU Prolog documentation on gplc). In general, is not uncommon to see people running benchmarks from the top-level interpreter, forgetting what the word interpreter means, and publishing bogus stats where compilation/term-expansion/... steps mistakenly end up mixed with what's supposed to be benchmarked.
There's also a classical set of Prolog benchmarks that can be used for comparing Prolog implementations. Some Prolog systems include them (e.g. SWI-Prolog). They are also included in the Logtalk distribution, which allows running them with the supported backends:
https://github.com/LogtalkDotOrg/logtalk3/tree/master/examples/bench
In the current Logtalk git version, you can start it with the backend you want to benchmark and use the queries:
?- {bench(loader)}.
...
?- run.
These will run each benchmark 1000 times are reported the total time. Use run/1 for a different number of repetitions. For example, in my macOS system using SWI-Prolog 8.3.15 I get:
?- run.
boyer: 20.897818 seconds
chat_parser: 7.962188999999999 seconds
crypt: 0.14653999999999812 seconds
derive: 0.004462999999997663 seconds
divide10: 0.002300000000001745 seconds
log10: 0.0011489999999980682 seconds
meta_qsort: 0.2729539999999986 seconds
mu: 0.04534600000000211 seconds
nreverse: 0.016964000000001533 seconds
ops8: 0.0016230000000021505 seconds
poly_10: 1.9540520000000008 seconds
prover: 0.05286200000000463 seconds
qsort: 0.030829000000004214 seconds
queens_8: 2.2245050000000077 seconds
query: 0.11675499999999772 seconds
reducer: 0.00044199999999960937 seconds
sendmore: 3.048624999999994 seconds
serialise: 0.0003770000000073992 seconds
simple_analyzer: 0.8428750000000065 seconds
tak: 5.495768999999996 seconds
times10: 0.0019139999999993051 seconds
unify: 0.11229400000000567 seconds
zebra: 1.595203000000005 seconds
browse: 31.000829000000003 seconds
fast_mu: 0.04102400000000728 seconds
flatten: 0.028527999999994336 seconds
nand: 0.9632950000000022 seconds
perfect: 0.36678499999999303 seconds
true.
For SICStus Prolog 4.6.0 I get:
| ?- run.
boyer: 3.638 seconds
chat_parser: 0.7650000000000006 seconds
crypt: 0.029000000000000803 seconds
derive: 0.0009999999999994458 seconds
divide10: 0.001000000000000334 seconds
log10: 0.0009999999999994458 seconds
meta_qsort: 0.025000000000000355 seconds
mu: 0.004999999999999893 seconds
nreverse: 0.0019999999999997797 seconds
ops8: 0.001000000000000334 seconds
poly_10: 0.20500000000000007 seconds
prover: 0.005999999999999339 seconds
qsort: 0.0030000000000001137 seconds
queens_8: 0.2549999999999999 seconds
query: 0.024999999999999467 seconds
reducer: 0.001000000000000334 seconds
sendmore: 0.6079999999999997 seconds
serialise: 0.0019999999999997797 seconds
simple_analyzer: 0.09299999999999997 seconds
tak: 0.5869999999999997 seconds
times10: 0.001000000000000334 seconds
unify: 0.013000000000000789 seconds
zebra: 0.33999999999999986 seconds
browse: 4.137 seconds
fast_mu: 0.0070000000000014495 seconds
nand: 0.1280000000000001 seconds
perfect: 0.07199999999999918 seconds
yes
For GNU Prolog 1.4.5, I use the sample embedding script in logtalk3/scripts/embedding/gprolog to create an executable that includes the bench example fully compiled:
| ?- run.
boyer: 9.3459999999999983 seconds
chat_parser: 1.9610000000000003 seconds
crypt: 0.048000000000000043 seconds
derive: 0.0020000000000006679 seconds
divide10: 0.00099999999999944578 seconds
log10: 0.00099999999999944578 seconds
meta_qsort: 0.099000000000000199 seconds
mu: 0.012999999999999901 seconds
nreverse: 0.0060000000000002274 seconds
ops8: 0.00099999999999944578 seconds
poly_10: 0.72000000000000064 seconds
prover: 0.016000000000000014 seconds
qsort: 0.0080000000000008953 seconds
queens_8: 0.68599999999999994 seconds
query: 0.041999999999999815 seconds
reducer: 0.0 seconds
sendmore: 1.1070000000000011 seconds
serialise: 0.0060000000000002274 seconds
simple_analyzer: 0.25 seconds
tak: 1.3899999999999988 seconds
times10: 0.0010000000000012221 seconds
unify: 0.089999999999999858 seconds
zebra: 0.63499999999999979 seconds
browse: 10.923999999999999 seconds
fast_mu: 0.015000000000000568 seconds
(27352 ms) yes
I suggest you try these benchmarks, running them on your computer, with the Prolog systems that you want to compare. In doing that, remember that this is a limited set of benchmarks, not necessarily reflecting the actual relative performance in non-trivial applications.
Ratios:
SICStus/SWI GNU/SWI
boyer 17.4% 44.7%
browse 13.3% 35.2%
chat_parser 9.6% 24.6%
crypt 19.8% 32.8%
derive 22.4% 44.8%
divide10 43.5% 43.5%
fast_mu 17.1% 36.6%
flatten - -
log10 87.0% 87.0%
meta_qsort 9.2% 36.3%
mu 11.0% 28.7%
nand 13.3% -
nreverse 11.8% 35.4%
ops8 61.6% 61.6%
perfect 19.6% -
poly_10 10.5% 36.8%
prover 11.4% 30.3%
qsort 9.7% 25.9%
queens_8 11.5% 30.8%
query 21.4% 36.0%
reducer 226.2% 0.0%
sendmore 19.9% 36.3%
serialise 530.5% 1591.5%
simple_analyzer 11.0% 29.7%
tak 10.7% 25.3%
times10 52.2% 52.2%
unify 11.6% 80.1%
zebra 21.3% 39.8%
P.S. Be sure to use Logtalk 3.43.0 or later as it includes portability fixes for the bench example, including for GNU Prolog, and a set of basic unit tests.
I stumbled upon this comparison from 2008 in the Internet archive:
https://web.archive.org/web/20100227050426/http://www.probp.com/performance.htm

A positive_integer/1 predicate that works for big numbers

In my Prolog-inspired language Brachylog, there is the possibility to label CLP(FD)-equivalent variables that have potentially infinite domains. The code that does this labelization can be found here (thanks to Markus Triska #mat).
This predicate requires the existence of a predicate positive_integer/1, which must have the following behavior:
?- positive_integer(X).
X = 1 ;
X = 2 ;
X = 3 ;
X = 4 ;
…
This is implemented as such in our current solution:
positive_integer(N) :- length([_|_], N).
This has two problems that I can see:
This becomes slow fairly quickly:
?- time(positive_integer(100000)).
% 5 inferences, 0.000 CPU in 0.001 seconds (0% CPU, Infinite Lips)
?- time(positive_integer(1000000)).
% 5 inferences, 0.000 CPU in 0.008 seconds (0% CPU, Infinite Lips)
?- time(positive_integer(10000000)).
% 5 inferences, 0.062 CPU in 0.075 seconds (83% CPU, 80 Lips)
This ultimately returns an Out of global stack error for numbers that are a bit too big:
?- positive_integer(100000000).
ERROR: Out of global stack
This is obviously due to the fact that Prolog needs to instantiate the list, which is bad if its length is big.
How can we improve this predicate such that this works even for very big numbers, with the same behavior?
There are already many good ideas posted, and they work to various degrees.
Additional test case
#vmg has the right intuition: between/3 does not mix well with constraints. To see this, I would like to use the following query as an additional benchmark:
?- X #> 10^30, positive_integer(X).
Solution
With the test case in mind, I suggest the following solution:
positive_integer(I) :-
I #> 0,
( var(I) ->
fd_inf(I, Inf),
( I #= Inf
; I #\= Inf,
positive_integer(I)
)
; true
).
The key idea is to use the CLP(FD) reflection predicate fd_inf/2 to reason about the smallest element in the domain of a variable. This is the only predicate you will need to change when you port the solution to further Prolog systems. For example, in SICStus Prolog, the predicate is called fd_min/2.
Main features
portable to several Prolog systems with minimal changes
fast in the shown cases
works also in the test case above
advertises and uses the full power of CLP(FD) constraints.
It is of course very clear which of these points is most important.
Sample queries
creatio ex nihilo:
?- positive_integer(X).
X = 1 ;
X = 2 ;
X = 3 .
fixed integer:
?- X #= 12^42, time(positive_integer(X)).
% 4 inferences, 0.000 CPU in 0.000 seconds (68% CPU, 363636 Lips)
X = 2116471057875484488839167999221661362284396544.
constrained integer:
?- X #> 10^30, time(positive_integer(X)).
% 124 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 3647059 Lips)
X = 1000000000000000000000000000001 ;
% 206 inferences, 0.000 CPU in 0.000 seconds (93% CPU, 2367816 Lips)
X = 1000000000000000000000000000002 ;
% 204 inferences, 0.000 CPU in 0.000 seconds (92% CPU, 2428571 Lips)
X = 1000000000000000000000000000003 .
Other comments
First, make sure to check out Brachylog and the latest Brachylog solutions on Code Golf. Thanks to Julien's efforts, a language inspired by Prolog is now increasingly often hosting some of the most concise and elegant programs that are posted there. Awesome work Julien!
Please refrain from using implementation-specific anomalies of between/3: These destroy important semantic properties of the predicate and are not portable to other systems.
If you ignore (2), please use infinite instead of inf. In the context of CLP(FD), inf denotes the infimum of the set of integers, which is the exact opposite of positive infinity.
In the context of CLP(FD), I recommend to use CLP(FD) constraints instead of between/3 and other predicates that don't take constraints into account.
In fact, I recommend to use CLP(FD) constraints instead of all low-level predicates that reason over integers. This can at most make your programs more general, never more specific.
Many thanks for your interest in this question and the posted solutions! I hope you find the test case above useful for your variants, and find ways to take CLP(FD) constraints into account in your versions so that they run faster and we can all upvote them!
Since "Brachylog's interpreter is entirely written in Prolog" meaning SWI-Prolog, you can use between/3 with the second argument bound to inf.
Comparing your positive_integer with
positive_integer_b(X):- between(1,inf,X).
Tests on my machine:
?- time(positive_integer(10000000)).
% 5 inferences, 0.062 CPU in 0.072 seconds (87% CPU, 80 Lips)
true.
9 ?- time(positive_integer_b(10000000)).
% 2 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
true.
And showing "Out of global stack":
13 ?- time(positive_integer(100000000)).
% 5 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
ERROR: Out of global stack
14 ?- time(positive_integer_b(100000000)).
% 2 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
true.
I don't think between is pure prolog though.
In case you want your code to be runnable also on other systems, consider:
positive_integer(N) :-
( nonvar(N), % strictly not needed, but clearer
integer(N),
N > 0
-> true
; length([_|_], N)
).
This version produces exactly the same errors as your first try.
You have indeed spotted a bit towards a weakness in current length/2 implementations. Ideally, a goal like length([_|_], 1000000000000000) would take some time, but at least does not consume more than constant memory. On the other hand, I am not too sure if this is worth optimizing. After all, I do not see an easy way to solve the runtime problem for such cases.
Note that the version of between/3 in SWI-Prolog is highly specific to SWI. It makes termination arguments much more complex. In other systems like SICStus, you know for sure that between/3 is terminating, regardless of the arguments. In SWI you would have to prove, that the atom inf will not be encountered which raises the burden of proof obligation.
without between/3, and ISO compliant (I think)
positive_integer(1).
positive_integer(X) :-
var(X),
positive_integer(Y),
X is Y + 1.
positive_integer(X) :-
integer(X),
X > 0.

Prolog : what do the results from calling time/1 actually mean?

I am new to Prolog (and fairly new to CS/programming in general), and I'm trying to assess and improve my programs' performance by using the time/1 predicate. However, I'm not sure I understand the output. For instance, the query time("MyProgram") yields the following result in addition to the solution to "MyProgram":
% 34,865,980 inferences, 4.479 CPU in 4.549 seconds (98% CPU, 7784905 Lips)
What does this mean? There is somewhat of an explanation here but I'm finding it's not quite enough.
Thanks in advance!
Firstly, see this answer for some general information about the difficulties of benchmarking in Prolog, or any programming language for that matter. The answer concerns the ECLiPSe language which uses Prolog internally so you'll be familiar with the syntax.
Now, let's look at a simple example:
equal_to_one(X) :- X =:= 1.
If we trace the execution (which by the way is a great way to better understand how Prolog works), we get:
?- trace, foo(1).
Call: (7) foo(1) ? creep
Call: (8) 1=:=1 ? creep
Exit: (8) 1=:=1 ? creep
Exit: (7) foo(1) ? creep
Notice the two calls and two exits occurring in the trace. In the first call, foo(1) is matched with defined facts/rules in the Prolog file and successfully finds foo/1, whereafter in the second call, the body is (successfully) executed. Subsequently the two exits simply represent exiting out of the statements that were true (both calls).
When we run our program with time/1, we see:
?- time(foo(1)).
% 2 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 69691 Lips)
true.
?- time(foo(2)).
% 2 inferences, 0.000 CPU in 0.000 seconds (82% CPU, 77247 Lips)
false.
Both queries need 2 (logical) inferences to complete. These inferences represent the calls described above (i.e. the program 'tries to match' something twice, it doesn't matter whether the number is equal to one or not). It is because of this that inferences are a good indication of the performance of your program, being not based on any hardware-specific properties, but rather on the complexity of your algorithm(s).
Furthermore we see CPU and seconds, which respectively represent cpu-time and overall clock-time spent while executing the program (see referred SO answer for more information).
Finally, we see a different % CPU and LIPS for each execution. You shouldn't worry too much about these numbers as they represent the percentage CPU used and average Logical Inferences Per Second made and for obvious reasons these will always differ for each execution.
PS : a similar SO question can be found here
The meaning is as follows. The basic data is sampled via the following calls:
get_time(Wall)
statistics(cputime, Time)
statistics(inferences, Inferences)
What is then shown is:
'%1 inferences, %2 CPU in %3 seconds (%4% CPU, %5 Lips)'
%1: Inferences2-Inferences1
%2: Time2-Time1
%3: Wall2-Wall1
%4: round(100*%2/%3)
%5: integer(%1/%2)
In a single threaded application and no other applications, we have still %2 =< %3 if there is a separate GC thread, subsequently %4 will be a precentage below or equal 100. If your application isn't doing I/O, and your percentage is very low, you might have a locking problem somewhere.

Measuring elapsed CPU time in Julia

Many scientific computing languages make a distinction between absolute time (wall clock) and CPU time (processor cycles). For example, in Matlab we have:
>> tic; pause(1); toc
Elapsed time is 1.009068 seconds.
>> start = cputime; pause(1); elapsed = cputime - start
elapsed =
0
and in Mathematica we have:
>>In[1]:= AbsoluteTiming[Pause[1]]
>>Out[1]= {1.0010572, Null}
>>In[2]:= Timing[Pause[1]]
>>Out[2]= {0., Null}
This distinction is useful when benchmarking code run on computation servers, where there may be high variance in the absolute timing results depending on what other processes are running concurrently.
The Julia standard library provides support for timing of expressions through tic(), toc(), #time and a few other functions/macros all based on time_ns(), a function that measures absolute time.
>>julia> #time sleep(1)
elapsed time: 1.017056895 seconds (135788 bytes allocated)
My question: Is there a simple way to get the elapsed CPU time for an expression evaluation in Julia?
(Side note: doing some digging, it appears that Julia timing is based on the uv_hrtime() function from libuv. It seems to me that using uv_getrusage from the same library might give a way to access elapsed CPU time in Julia, but I'm no expert. Has anybody tried using anything like this?)
I couldn't find any existing solutions, so I've put together a package with some simple CPU timing functionality here: https://github.com/schmrlng/CPUTime.jl. The package is completely untested on parallel code and may have other bugs, but if anybody else would like to try it out calling
>> Pkg.clone("https://github.com/schmrlng/CPUTime.jl.git")
from the julia> prompt should install the package.
Julia does have the commands tic() and toc() which work just like tic and toc in Matlab:
julia> tic(); 7^1000000000; toc()
elapsed time: 0.046563597 seconds
0.046563597

What's a good name for "total wall clock time of all cpus?"

There are only two hard things in Computer Science: cache invalidation
and naming things.
-- Phil Karlton
My app is reporting CPU time, and people reasonably want to know how much time this is out of, so they can compute % CPU utilized. My question is, what's the name for the wall clock time times the number of CPUs?
If you add up the total user, system, idle, etc. time for a system, you get the total wall clock time, times the number of CPUs. What's a good name for that? According to Wikipedia, CPU time is:
CPU time (or CPU usage, process time) is the amount of time for which
a central processing unit (CPU) was used for processing instructions
of a computer program, as opposed to, for example, waiting for
input/output (I/O) operations.
"total time" suggests just wall clock time, and doesn't connote that over a 10 second span, a four-cpu system would have 40 seconds of "total time."
Total Wall Clock Time of all CPUs
Naming things is hard, why waste a good 'un once you've found it ?
Aggregate time: 15 of 40 seconds.

Resources