Up-to-date Prolog implementation benchmarks? - prolog

Are there any up-to-date Prolog implementation benchmarks (with results)?
I found this on the mercury web site. Surprisingly, it shows a 20-fold gap between swi-prolog and Aquarius. I suspect that these results are pretty old. Does this gap still hold? Personally, I'd also like to see some comparisons with the occurs check turned on, since it has a major impact on performance, and some compilers might be better than others at optimizing it away.
Of more recent comparisons, I found this claim that gnu-prolog is 2x faster than SWI, and YAP is 4x faster than SWI on one specific code base.
Edit:
a specific case where the occurs check is needed for a real world problem
Sure: type inference in Haskell, OCaml, Swift or theorem provers such as this one. I also think the burden is on the programmer to prove that his code doesn't need the occurs check. Tests can only prove that you do need it, not that you don't need it.

I have some benchmark results published at:
https://logtalk.org/performance.html
Be sure to read and understand the notes at the end of that page, however.
Regarding running benchmarks with GNU Prolog, note that you cannot use the top-level interpreter as code loaded from it is interpreted, not compiled (see GNU Prolog documentation on gplc). In general, is not uncommon to see people running benchmarks from the top-level interpreter, forgetting what the word interpreter means, and publishing bogus stats where compilation/term-expansion/... steps mistakenly end up mixed with what's supposed to be benchmarked.

There's also a classical set of Prolog benchmarks that can be used for comparing Prolog implementations. Some Prolog systems include them (e.g. SWI-Prolog). They are also included in the Logtalk distribution, which allows running them with the supported backends:
https://github.com/LogtalkDotOrg/logtalk3/tree/master/examples/bench
In the current Logtalk git version, you can start it with the backend you want to benchmark and use the queries:
?- {bench(loader)}.
...
?- run.
These will run each benchmark 1000 times are reported the total time. Use run/1 for a different number of repetitions. For example, in my macOS system using SWI-Prolog 8.3.15 I get:
?- run.
boyer: 20.897818 seconds
chat_parser: 7.962188999999999 seconds
crypt: 0.14653999999999812 seconds
derive: 0.004462999999997663 seconds
divide10: 0.002300000000001745 seconds
log10: 0.0011489999999980682 seconds
meta_qsort: 0.2729539999999986 seconds
mu: 0.04534600000000211 seconds
nreverse: 0.016964000000001533 seconds
ops8: 0.0016230000000021505 seconds
poly_10: 1.9540520000000008 seconds
prover: 0.05286200000000463 seconds
qsort: 0.030829000000004214 seconds
queens_8: 2.2245050000000077 seconds
query: 0.11675499999999772 seconds
reducer: 0.00044199999999960937 seconds
sendmore: 3.048624999999994 seconds
serialise: 0.0003770000000073992 seconds
simple_analyzer: 0.8428750000000065 seconds
tak: 5.495768999999996 seconds
times10: 0.0019139999999993051 seconds
unify: 0.11229400000000567 seconds
zebra: 1.595203000000005 seconds
browse: 31.000829000000003 seconds
fast_mu: 0.04102400000000728 seconds
flatten: 0.028527999999994336 seconds
nand: 0.9632950000000022 seconds
perfect: 0.36678499999999303 seconds
true.
For SICStus Prolog 4.6.0 I get:
| ?- run.
boyer: 3.638 seconds
chat_parser: 0.7650000000000006 seconds
crypt: 0.029000000000000803 seconds
derive: 0.0009999999999994458 seconds
divide10: 0.001000000000000334 seconds
log10: 0.0009999999999994458 seconds
meta_qsort: 0.025000000000000355 seconds
mu: 0.004999999999999893 seconds
nreverse: 0.0019999999999997797 seconds
ops8: 0.001000000000000334 seconds
poly_10: 0.20500000000000007 seconds
prover: 0.005999999999999339 seconds
qsort: 0.0030000000000001137 seconds
queens_8: 0.2549999999999999 seconds
query: 0.024999999999999467 seconds
reducer: 0.001000000000000334 seconds
sendmore: 0.6079999999999997 seconds
serialise: 0.0019999999999997797 seconds
simple_analyzer: 0.09299999999999997 seconds
tak: 0.5869999999999997 seconds
times10: 0.001000000000000334 seconds
unify: 0.013000000000000789 seconds
zebra: 0.33999999999999986 seconds
browse: 4.137 seconds
fast_mu: 0.0070000000000014495 seconds
nand: 0.1280000000000001 seconds
perfect: 0.07199999999999918 seconds
yes
For GNU Prolog 1.4.5, I use the sample embedding script in logtalk3/scripts/embedding/gprolog to create an executable that includes the bench example fully compiled:
| ?- run.
boyer: 9.3459999999999983 seconds
chat_parser: 1.9610000000000003 seconds
crypt: 0.048000000000000043 seconds
derive: 0.0020000000000006679 seconds
divide10: 0.00099999999999944578 seconds
log10: 0.00099999999999944578 seconds
meta_qsort: 0.099000000000000199 seconds
mu: 0.012999999999999901 seconds
nreverse: 0.0060000000000002274 seconds
ops8: 0.00099999999999944578 seconds
poly_10: 0.72000000000000064 seconds
prover: 0.016000000000000014 seconds
qsort: 0.0080000000000008953 seconds
queens_8: 0.68599999999999994 seconds
query: 0.041999999999999815 seconds
reducer: 0.0 seconds
sendmore: 1.1070000000000011 seconds
serialise: 0.0060000000000002274 seconds
simple_analyzer: 0.25 seconds
tak: 1.3899999999999988 seconds
times10: 0.0010000000000012221 seconds
unify: 0.089999999999999858 seconds
zebra: 0.63499999999999979 seconds
browse: 10.923999999999999 seconds
fast_mu: 0.015000000000000568 seconds
(27352 ms) yes
I suggest you try these benchmarks, running them on your computer, with the Prolog systems that you want to compare. In doing that, remember that this is a limited set of benchmarks, not necessarily reflecting the actual relative performance in non-trivial applications.
Ratios:
SICStus/SWI GNU/SWI
boyer 17.4% 44.7%
browse 13.3% 35.2%
chat_parser 9.6% 24.6%
crypt 19.8% 32.8%
derive 22.4% 44.8%
divide10 43.5% 43.5%
fast_mu 17.1% 36.6%
flatten - -
log10 87.0% 87.0%
meta_qsort 9.2% 36.3%
mu 11.0% 28.7%
nand 13.3% -
nreverse 11.8% 35.4%
ops8 61.6% 61.6%
perfect 19.6% -
poly_10 10.5% 36.8%
prover 11.4% 30.3%
qsort 9.7% 25.9%
queens_8 11.5% 30.8%
query 21.4% 36.0%
reducer 226.2% 0.0%
sendmore 19.9% 36.3%
serialise 530.5% 1591.5%
simple_analyzer 11.0% 29.7%
tak 10.7% 25.3%
times10 52.2% 52.2%
unify 11.6% 80.1%
zebra 21.3% 39.8%
P.S. Be sure to use Logtalk 3.43.0 or later as it includes portability fixes for the bench example, including for GNU Prolog, and a set of basic unit tests.

I stumbled upon this comparison from 2008 in the Internet archive:
https://web.archive.org/web/20100227050426/http://www.probp.com/performance.htm

Related

How to interpret cpu profiling graph

I was following the go blog here
I tried to profile my program but it looks a bit different. (Seems that go has moved from sampling to instrumentation?)
I wonder what these numbers mean
Especially showing nodes accounting for 2.59s, 92.5% of 2.8
What does total sample = 2.8s mean? The sample is drawn in an interval of 2.8 seconds?
Does it mean that only nodes that are running over 92.5% of sample
time are shown?
Also I wonder these numbers are generated. In the original go blog, the measure is how many times the function is detected in execution among all samples. However, we are dealing with seconds here. How does go profiling tool know how many seconds a function call takes.
Any help will be appreciated
Think of the graph as a graph of a resource, time. You'll start at the top with, for example, 10 seconds. Then you'll see that 5 seconds went to time.Sleep and 5 went to encoding/json. The particular divides in that time is represented by the arrows, so they show that 5 went to each part of the program. So now we have 3 nodes, the first node 10 seconds, time.Sleep 5 seconds, and encoding/json 5 seconds. Then those 5 seconds in encoding/json are broken down even further into the functions that took up most the time. The 0.01s (percentage) out of 0.02s (larger percentage) means that this function took 0.01s of processing time out of a total of 0.02s of the block of time (the arrow with the number) total by this particular call stack. The percentage represents the overall percentage of execution time this part took up from the whole pie. So you'll see that encoding/json string/encoder took 0.36 percent of the total execution time/resources of your program.

Sysbench:which value should I look at for CPU benchmarking?

sysbench version: 1.0.7
OS: macOS 10.11.6
No matter where I ran sysbench cpu run I get very similar results like the following.
sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Prime numbers limit: 10000
Initializing worker threads...
Threads started!
General statistics:
total time: 10.0005s
total number of events: 9083
Latency (ms):
min: 0.96
avg: 1.10
max: 7.18
95th percentile: 1.34
sum: 9995.18
Threads fairness:
events (avg/stddev): 9083.0000/0.00
execution time (avg/stddev): 9.9952/0.00
I read some blog posts and all say that I should look at total time but it's always 10 sec in different platform/env. I also get the very similar result with very small prime number list e.g. --cpu-max-prime=100. I also run with --time=0 and the benchmark never finishes.
My guess is the total time matches the value specified with --time option but then I don't know what's the right command to use.
Thanks in advance
Take --cpu-max-prime big enough, for example let it be 20000. As for sysbench, it looks like it always run 10 sec now, so you should see "total number of events" value (high value means better performance). To get correct total performance of your server CPU, you should put --num-threads=[number of hyperthreads from cpuinfo] too.
You can set --max-time to indicate a different max time (in seconds) for the test. Default is 10 seconds.
You can see the full manual here: http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf
Always read man sysbench for different parameters you needed.
Use example from bottom, to have a perspective of cpu performance
sysbench --time=60 --resoults-interval=10 --threads=8 cpu run

Measuring elapsed CPU time in Julia

Many scientific computing languages make a distinction between absolute time (wall clock) and CPU time (processor cycles). For example, in Matlab we have:
>> tic; pause(1); toc
Elapsed time is 1.009068 seconds.
>> start = cputime; pause(1); elapsed = cputime - start
elapsed =
0
and in Mathematica we have:
>>In[1]:= AbsoluteTiming[Pause[1]]
>>Out[1]= {1.0010572, Null}
>>In[2]:= Timing[Pause[1]]
>>Out[2]= {0., Null}
This distinction is useful when benchmarking code run on computation servers, where there may be high variance in the absolute timing results depending on what other processes are running concurrently.
The Julia standard library provides support for timing of expressions through tic(), toc(), #time and a few other functions/macros all based on time_ns(), a function that measures absolute time.
>>julia> #time sleep(1)
elapsed time: 1.017056895 seconds (135788 bytes allocated)
My question: Is there a simple way to get the elapsed CPU time for an expression evaluation in Julia?
(Side note: doing some digging, it appears that Julia timing is based on the uv_hrtime() function from libuv. It seems to me that using uv_getrusage from the same library might give a way to access elapsed CPU time in Julia, but I'm no expert. Has anybody tried using anything like this?)
I couldn't find any existing solutions, so I've put together a package with some simple CPU timing functionality here: https://github.com/schmrlng/CPUTime.jl. The package is completely untested on parallel code and may have other bugs, but if anybody else would like to try it out calling
>> Pkg.clone("https://github.com/schmrlng/CPUTime.jl.git")
from the julia> prompt should install the package.
Julia does have the commands tic() and toc() which work just like tic and toc in Matlab:
julia> tic(); 7^1000000000; toc()
elapsed time: 0.046563597 seconds
0.046563597

YAP prolog cpu seconds

I'm using time/1 to measure cpu time in YAP prolog and I'm getting for example
514.000 CPU in 0.022 seconds (2336363% CPU)
yes
What I'd like to ask is what is the interpretation of these numbers? Does 514.000 represents CPU secs? What is "0.022 seconds" and the CPU percentage that follows?
Thank you

Why does wget give me two different total download times?

The last 3 lines of wget -i urls.txt:
FINISHED --2012-05-16 12:58:08--
Total wall clock time: 1h 56m 52s
Downloaded: 1069 files, 746M in 1h 52m 49s (113 KB/s)
There are two different times:
1h 56m 52s
1h 52m 49s
Why are they different? What do they stand for?
Wall clock time or real time is the human perception of passage of time. That will be the time as a human user, what we will experience. In this case wget might have took less than the real time to finish its job, but the real time is the sum of time the software took to finish its real job and the time it was waiting for resources like hard disk, network etc.
When you have wall clock time and a shorter time, the shorter time is usually user time and the missing time system time (time spend in the kernel) or time waiting for something like a file descriptor. (But I have not checked what's the case with wget). If you are curious start time wget http://some.url or look into /proc/<wget-pid>/stat while it's running (assuming you are running on linux).

Resources