Related
I'm deciding on a language to use for back-end use. I've looked at Go, Rust, C++, and I thought I'd look at Crystal because it did reasonably well in some benchmarks. Speed is not the ultimate requirement, however speed is important. The syntax is equally important, and I'm not averse to the Crystal syntax. The simple benchmark that I wrote is only a small part of the evaluation and it's also partly familiarisation. I'm using Windows, so I'm using Crystal 0.35.1 on Win10 2004-19041.329 with WSL2 and Ubuntu 20.04 LTS. I don't know if WSL2 has any impact on performance. The benchmark is primarily using integer arithmetic. Go, Rust, and C++ have almost equal performance to each other (on Win10). I've translated that code to Crystal, and it runs a fair bit slower than those three. Out of simple curiosity I also ran the code on Dart (on Win10), and it ran (very surprisingly) almost twice as fast as those three. I do understand that a simple benchmark does not say a lot. I notice from a recent post that Crystal is more efficient with floats than integers, however this was and is aimed at integers.
This is my first Crystal program, so I thought I should ask - is there any simple improvements I can make to the code for performance? I don't want to improve the algorithm other than to correct errors, because all are using this algorithm.
The code is as follows:
# ------ Prime-number counter. -----#
# Brian 25-Jun-2020 Program written - my first Crystal program.
# -------- METHOD TO CALCULATE APPROXIMATE SQRT ----------#
def fnCalcSqrt(iCurrVal) # Calculate approximate sqrt
iPrevDiv = 0.to_i64
iDiv = (iCurrVal // 10)
if iDiv < 2
iDiv = 2
end
while (true)
begin
iProd = (iDiv * iDiv)
rescue vError
puts "Error = #{vError}, iDiv = #{iDiv}, iCurrVal = #{iCurrVal}"
exit
end
if iPrevDiv < iDiv
iDiff = ((iDiv - iPrevDiv) // 2)
else
iDiff = ((iPrevDiv - iDiv) // 2)
end
iPrevDiv = iDiv
if iProd < iCurrVal # iDiv IS TOO LOW #
if iDiff < 1
iDiff = 1
end
iDiv += iDiff
else
if iDiff < 2
return iDiv
end
iDiv -= iDiff
end
end
end
# ---------- PROGRAM MAINLINE --------------#
print "\nCalculate Primes from 1 to selected number"
#iMills = uninitialized Int32 # CHANGED THIS BECAUSE IN --release DOES NOT WORK
iMills = 0.to_i32
while iMills < 1 || iMills > 100
print "\nEnter the ending number of millions (1 to 100) : "
sInput = gets
if sInput == ""
exit
end
iTemp = sInput.try &.to_i32?
if !iTemp
puts "Please enter a valid number"
puts "iMills = #{iTemp}"
elsif iTemp > 100 # > 100m
puts "Invalid - too big must be from 1 to 100 (million)"
elsif iTemp < 1
puts "Invalid - too small - must be from 1 to 100 (million)"
else
iMills = iTemp
end
end
#iCurrVal = 2 # THIS CAUSES ARITHMETIC OVERFLOW IN SQRT CALC.
iCurrVal = 2.to_i64
iEndVal = iMills * 1_000_000
iPrimeTot = 0
# ----- START OF PRIME NUMBER CALCULATION -----#
sEndVal = iEndVal.format(',', group: 3) # => eg. "10,000,000"
puts "Calculating number of prime numbers from 2 to #{sEndVal} ......"
vStartTime = Time.monotonic
while iCurrVal <= iEndVal
if iCurrVal % 2 != 0 || iCurrVal == 2
iSqrt = fnCalcSqrt(iCurrVal)
tfPrime = true # INIT
iDiv = 2
while iDiv <= iSqrt
if ((iCurrVal % iDiv) == 0)
tfPrime = (iDiv == iCurrVal);
break;
end
iDiv += 1
end
if (tfPrime)
iPrimeTot+=1;
end
end
iCurrVal += 1
end
puts "Elapsed time = #{Time.monotonic - vStartTime}"
puts "prime total = #{iPrimeTot}"
You need to compile with --release flag. By default, the Crystal compiler is focused on compilation speed, so you get a compiled program fast. This is particularly important during development. If you want a program that runs fast, you need to pass the --release flag which tells the compiler to take time for optimizations (that's handled by LLVM btw.).
You might also be able to shave off some time by using wrapping operators like &+ in location where it's guaranteed that the result can'd overflow. That skips some overflow checks.
A few other remarks:
Instead of 0.to_i64, you can just use a Int64 literal: 0_i64.
iMills = uninitialized Int32
Don't use uninitialized here. It's completely wrong. You want that variable to be initialized. There are some use cases for uninitialized in C bindings and some very specific, low-level implementations. It should never be used in most regular code.
I notice from a recent post that Crystal is more efficient with floats than integers
Where did you read that?
Adding type prefixes to identifiers doesn't seem to be very useful in Crystal. The compiler already keeps track of the types. You as the developer shouldn't have to do that, too. I've never seen that before.
My project involves computing in parallel a map using Julia's Distributed's pmap function.
Mapping a given element could take a few seconds, or it could take essentially forever. I want a timeout or time limit for an individual map task/computation to complete.
If a map task finishes in time, great, return the result of the computation. If the task doesn't complete by the time limit, stop computation when the time limit has been reached, and return some value or message indicating a timeout occurred.
A minimal example follows. First are imported modules, and then worker processes are launched:
num_procs = 1
using Distributed
if num_procs > 1
# The main process (no calling addprocs) can be used for `pmap`:
addprocs(num_procs-1)
end
Next, the mapping task is defined for all the worker processes. The mapping task should timeout after 1 second:
#everywhere import Random
#everywhere begin
"""
Compute stuff for `wait_time` seconds, and return `wait_time`.
If `timeout` seconds elapses, stop computation and return something else.
"""
function waitForTimeUnlessTimeout(wait_time, timeout=1)
# < Insert some sort of timeout code? >
# This block of code simulates a long computation.
# (pretend the computation time is unknown)
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
end
# computation completed before time limit. Return wait_time.
round(wait_time, digits=2)
end
end
The function that executes the parallel map (pmap) is defined on the main process. Each map task randomly takes up to 2 seconds to complete, but should time out after 1 second.
function myParallelMapping(num_tasks = 20, max_runtime=2)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# return the parallel computation of the mapping tasks
pmap((runtime)->waitForTimeUnlessTimeout(runtime), runtimes)
end
print(myParallelMapping())
How should this time-limited parallel map be implemented?
You could put something like this inside your pmap body
pmap(runtimes) do runtime
t0 = time()
task = #async waitForTimeUnlessTimeout(runtime)
while !istaskdone(task) && time()-t0 < time_limit
sleep(1)
end
istaskdone(task) && (return fetch(task))
error("time over")
end
Also note that (runtime)->waitForTimeUnlessTimeout(runtime) is the same as just waitForTimeUnlessTimeout .
Following #Fredrik Bagge's very helpful answer, here is the full working example implementation with some extra explanation.
num_procs = 8
using Distributed
if num_procs > 1
addprocs(num_procs-1)
end
#everywhere import Random
#everywhere begin
function waitForTime(wait_time)
# This code block simulates a long computation.
# Pretend the computation time is unknown.
t0 = time()
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
yield() # CRITICAL to release computation to check if task is done.
# If you comment out #yield(), you will see timeout doesn't work!
end
return round(wait_time, digits=2)
end
end
function myParallelMapping(num_tasks = 16, max_runtime=2, time_limit=1)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# parallel compute the mapping tasks. See "do block" in
# the Julia documentation, it's just syntactic sugar.
return pmap(runtimes) do runtime
t0 = time()
task = #async waitForTime(runtime)
while !istaskdone(task) && time()-t0 < time_limit
# releases computation to waitForTime
sleep(0.1)
# nothing past here will run until waitForTime calls yield()
# *and* 0.1 seconds have passed.
end
# equal to if istaskdone(task); return fetch(task); end
istaskdone(task) && (return fetch(task))
return "TimeOut"
# `return error("TimeOut")` halts pmap unless pmap is
# given an error handler argument. See pmap documentation.
end
end
The output is
julia> print(myParallelMapping())
Any["TimeOut", "TimeOut", 0.33, 0.35, 0.56, 0.41, 0.08, 0.14, 0.72,
"TimeOut", "TimeOut", "TimeOut", 0.52, "TimeOut", 0.33, "TimeOut"]
Note that there are two tasks per process in this example. The original task (the "time checker") is checking every 0.1 seconds if the other task has completed computation. The other task (created with #async) is computing something, periodically calling yield() to release control to the time checker; if it doesn't call yield(), time checking cannot occur.
i'm a bit beginner about coding, and my english isn't great, i hope i'll be able to make my question clear:
So I have a for-loop in my code, and 2 if in it, in each if, I see if something is true or false, if it's true, I delete a portion of the loop. like that:
for n=#Test,1, -1 do
if Test[n].delete == true then
table.remove(Test, n )
end
if Test[n].y > 0 then
table.remove(Test, n )
end
end
kind of like that, and my problem is, if the first if make Test[n] being deleted, then the game crash at the next if because Test[n] Doesn't exist anymore. I solved the problem by making 2 for-loop, one for each if.
But I saw someone who did it without 2 for-loop, and it's not crashing, I tried to figure what was wrong but I can't find it.
If someone could find what is wrong with what I wrote, I would be thankful!
So here is the moment that makes problem in my code, on line 9, if the condition are met, i table.remove(ListeTirs, n ), then on line 17, when code try to find it again for test it, it bug :
for n=#ListeTirs,1, -1 do
if ListeTirs[n].Type == "Gentils"
then local nAliens
for nAliens=#ListeAliens,1,-1 do
if
Collide(ListeTirs[n], ListeAliens[nAliens]) == true
then
ListeTirs[n].Supprime = true
table.remove(ListeTirs, n )
ListeAliens[nAliens].Supprime = true
table.remove(ListeAliens, nAliens)
end
end
end
if
ListeTirs[n].y < 0 - ListeTirs[n].HauteurMoitie or ListeTirs[n].y > Hauteur + ListeTirs[n].HauteurMoitie or ListeTirs[n].x > Largeur + ListeTirs[n].LargeurMoitie or ListeTirs[n].x < 0 - ListeTirs[n].LargeurMoitie
then
ListeTirs[n].Supprime = true
table.remove(ListeTirs, n)
end
end
I hope it's clear, I could post the whole code but I don't feel it's necessary, if it is, I will add it.
Thank you :)
for n=#Test,1, -1 do
local should_be_removed
-- just mark for deletion
if Test[n].delete = true then
should_be_removed = true
end
-- just mark for deletion
if Test[n].y > 0 then
should_be_removed = true
end
-- actually delete (at the very end of the loop body)
if should_be_removed then
table.remove(Test, n )
end
end
Following my previous question : Unable to implement MPI_Intercomm_create
The problem of MPI_INTERCOMM_CREATE has been solved. But when I try to implement a basic send receive operations between process 0 of color 0 (globally rank = 0) and process 0 of color 1 (ie globally rank = 2), the code just hangs up after printing received buffer.
the code:
program hello
include 'mpif.h'
implicit none
integer tag,ierr,rank,numtasks,color,new_comm,inter1,inter2
integer sendbuf,recvbuf,tag,stat(MPI_STATUS_SIZE)
tag = 22
sendbuf = 222
call MPI_Init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,numtasks,ierr)
if (rank < 2) then
color = 0
else
color = 1
end if
call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,rank,new_comm,ierr)
if (color .eq. 0) then
if (rank == 0) print*,' 0 here'
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_Comm_world,2,tag,inter1,ierr)
call mpi_send(sendbuf,1,MPI_INT,2,tag,inter1,ierr)
!local_comm,local leader,peer_comm,remote leader,tag,new,ierr
else if(color .eq. 1) then
if(rank ==2) print*,' 2 here'
call MPI_INTERCOMM_CREATE(new_comm,2,MPI_COMM_WORLD,0,tag,inter2,ierr)
call mpi_recv(recvbuf,1,MPI_INT,0,tag,inter2,stat,ierr)
print*,recvbuf
end if
end
The communication with intercommunication is not well understood by most users, and examples are not as many as examples for other MPI operations. You can find a good explanation by following this link.
Now, there are two things to remember:
1) Communication in an inter communicator always go from one group to the other group. When sending, the rank of the destination is its the local rank in the remote group communicator. When receiving, the rank of the sender is its local rank in the remote group communicator.
2) Point to point communication (MPI_send and MPI_recv family) is between one sender and one receiver. In your case, everyone in color 0 is sending and everyone in color 1 is receiving, however, if I understood your problem, you want the process 0 of color 0 to send something to the process 0 of color 1.
The sending code should be something like this:
call MPI_COMM_RANK(inter1,irank,ierr)
if(irank==0)then
call mpi_send(sendbuf,1,MPI_INT,0,tag,inter1,ierr)
end if
The receiving code should look like:
call MPI_COMM_RANK(inter2,irank,ierr)
if(irank==0)then
call mpi_recv(recvbuf,1,MPI_INT,0,tag,inter2,stat,ierr)
print*,'rec buff = ', recvbuf
end if
In the sample code, there is a new variable irank that I use to query the rank of each process in the inter-communicator; that is the rank of the process in his local communicator. So you will have two process of rank 0, one for each group, and so on.
It is important to emphasize what other commentators of your post are saying: when building a program in those modern days, use moderns constructs like use mpi instead of include 'mpif.h' see comment from Vladimir F. Another advise from your previous question was yo use rank 0 as remote leader in both case. If I combine those 2 ideas, your program can look like:
program hello
use mpi !instead of include 'mpif.h'
implicit none
integer :: tag,ierr,rank,numtasks,color,new_comm,inter1,inter2
integer :: sendbuf,recvbuf,stat(MPI_STATUS_SIZE)
integer :: irank
!
tag = 22
sendbuf = 222
!
call MPI_Init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,numtasks,ierr)
!
if (rank < 2) then
color = 0
else
color = 1
end if
!
call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,rank,new_comm,ierr)
!
if (color .eq. 0) then
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_Comm_world,2,tag,inter1,ierr)
!
call MPI_COMM_RANK(inter1,irank,ierr)
if(irank==0)then
call mpi_send(sendbuf,1,MPI_INT,0,tag,inter1,ierr)
end if
!
else if(color .eq. 1) then
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_COMM_WORLD,0,tag,inter2,ierr)
call MPI_COMM_RANK(inter2,irank,ierr)
if(irank==0)then
call mpi_recv(recvbuf,1,MPI_INT,0,tag,inter2,stat,ierr)
if(ierr/=MPI_SUCCESS)print*,'Error in rec '
print*,'rec buff = ', recvbuf
end if
end if
!
call MPI_finalize(ierr)
end program h
I've written a feature for my library Rubikon that displays a throbber (a spinning — as you may have seen in other console apps) as long as some other code is running.
To test this feature I capture the output of the throbber in a StringIO and compare it with the expected value. As the throbber is only displayed as long as the other code is running the content of the IO gets longer when the code runs longer. In my tests I do a simple sleep 1 and should have a constant 1 second delay. This works most of the time, but sometimes (apparently due to external factors like heavy load on the CPU) it fails, because the code doesn't run for 1 second, but for a bit more, so that the throbber prints a few additional characters.
My question is: Is there any possibility to test such time critical features in Ruby?
From your github repository, I found this test for the Throbber class:
should 'work correctly' do
ostream = StringIO.new
thread = Thread.new { sleep 1 }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
assert_equal " \b-\b\\\b|\b/\b", ostream.string
end
I'll assume that a throbber iterates over ['-', '\', '|', '/'], backspacing before each write, once per second. Consider the following test:
should 'work correctly' do
ostream = StringIO.new
started_at = Time.now
ended_at = nil
thread = Thread.new { sleep 1; ended_at = Time.now }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
duration = ended_at - started_at
iterated_chars = " -\\|/"
expected = ""
if duration >= 1
# After n seconds we should have n copies of " -\\|/", excluding \b for now
expected << iterated_chars * duration.to_i
end
# Next append the characters we'd get from working for fractions of a second:
remainder = duration - duration.to_i
expected << iterated_chars[0..((iterated_chars.length*remainder).to_i)] if remainder > 0.0
expected = expected.split('').join("\b") + "\b"
assert_equal expected, ostream.string
end
The last assignment of expected is a bit unpleasant, but I made the assumption that the throbber would write character/backspace pairs atomically. If this is not true, you should be able to insert the \b escape sequence into the iterated_chars string and remove the last assignment entirely.
This question is similar (I think, altough I'm not completely sure) to this one:
Only real time operating system can
give you such precision. You can
assume Thread.Sleep has a precision of
about 20 ms so you could, in theory
sleep until the desired time - the
actual time is about 20 ms and THEN
spin for 20 ms but you'll have to
waste those 20 ms. And even that
doesn't guarantee that you'll get real
time results, the scheduler might just
take your thread out just when it was
about to execute the RELEVANT part
(just after spinning)
The problem is not rubby (possibly, I'm no expert in ruby), the problem is the real time capabilities of your operating system.