Can this Crystal benchmark code be improved significantly?

Can this Crystal benchmark code be improved significantly? - performance

I'm deciding on a language to use for back-end use. I've looked at Go, Rust, C++, and I thought I'd look at Crystal because it did reasonably well in some benchmarks. Speed is not the ultimate requirement, however speed is important. The syntax is equally important, and I'm not averse to the Crystal syntax. The simple benchmark that I wrote is only a small part of the evaluation and it's also partly familiarisation. I'm using Windows, so I'm using Crystal 0.35.1 on Win10 2004-19041.329 with WSL2 and Ubuntu 20.04 LTS. I don't know if WSL2 has any impact on performance. The benchmark is primarily using integer arithmetic. Go, Rust, and C++ have almost equal performance to each other (on Win10). I've translated that code to Crystal, and it runs a fair bit slower than those three. Out of simple curiosity I also ran the code on Dart (on Win10), and it ran (very surprisingly) almost twice as fast as those three. I do understand that a simple benchmark does not say a lot. I notice from a recent post that Crystal is more efficient with floats than integers, however this was and is aimed at integers.
This is my first Crystal program, so I thought I should ask - is there any simple improvements I can make to the code for performance? I don't want to improve the algorithm other than to correct errors, because all are using this algorithm.
The code is as follows:
# ------ Prime-number counter. -----#
# Brian 25-Jun-2020 Program written - my first Crystal program.
# -------- METHOD TO CALCULATE APPROXIMATE SQRT ----------#
def fnCalcSqrt(iCurrVal) # Calculate approximate sqrt
iPrevDiv = 0.to_i64
iDiv = (iCurrVal // 10)
if iDiv < 2
iDiv = 2
end
while (true)
begin
iProd = (iDiv * iDiv)
rescue vError
puts "Error = #{vError}, iDiv = #{iDiv}, iCurrVal = #{iCurrVal}"
exit
end
if iPrevDiv < iDiv
iDiff = ((iDiv - iPrevDiv) // 2)
else
iDiff = ((iPrevDiv - iDiv) // 2)
end
iPrevDiv = iDiv
if iProd < iCurrVal # iDiv IS TOO LOW #
if iDiff < 1
iDiff = 1
end
iDiv += iDiff
else
if iDiff < 2
return iDiv
end
iDiv -= iDiff
end
end
end
# ---------- PROGRAM MAINLINE --------------#
print "\nCalculate Primes from 1 to selected number"
#iMills = uninitialized Int32 # CHANGED THIS BECAUSE IN --release DOES NOT WORK
iMills = 0.to_i32
while iMills < 1 || iMills > 100
print "\nEnter the ending number of millions (1 to 100) : "
sInput = gets
if sInput == ""
exit
end
iTemp = sInput.try &.to_i32?
if !iTemp
puts "Please enter a valid number"
puts "iMills = #{iTemp}"
elsif iTemp > 100 # > 100m
puts "Invalid - too big must be from 1 to 100 (million)"
elsif iTemp < 1
puts "Invalid - too small - must be from 1 to 100 (million)"
else
iMills = iTemp
end
end
#iCurrVal = 2 # THIS CAUSES ARITHMETIC OVERFLOW IN SQRT CALC.
iCurrVal = 2.to_i64
iEndVal = iMills * 1_000_000
iPrimeTot = 0
# ----- START OF PRIME NUMBER CALCULATION -----#
sEndVal = iEndVal.format(',', group: 3) # => eg. "10,000,000"
puts "Calculating number of prime numbers from 2 to #{sEndVal} ......"
vStartTime = Time.monotonic
while iCurrVal <= iEndVal
if iCurrVal % 2 != 0 || iCurrVal == 2
iSqrt = fnCalcSqrt(iCurrVal)
tfPrime = true # INIT
iDiv = 2
while iDiv <= iSqrt
if ((iCurrVal % iDiv) == 0)
tfPrime = (iDiv == iCurrVal);
break;
end
iDiv += 1
end
if (tfPrime)
iPrimeTot+=1;
end
end
iCurrVal += 1
end
puts "Elapsed time = #{Time.monotonic - vStartTime}"
puts "prime total = #{iPrimeTot}"

You need to compile with --release flag. By default, the Crystal compiler is focused on compilation speed, so you get a compiled program fast. This is particularly important during development. If you want a program that runs fast, you need to pass the --release flag which tells the compiler to take time for optimizations (that's handled by LLVM btw.).
You might also be able to shave off some time by using wrapping operators like &+ in location where it's guaranteed that the result can'd overflow. That skips some overflow checks.
A few other remarks:
Instead of 0.to_i64, you can just use a Int64 literal: 0_i64.
iMills = uninitialized Int32
Don't use uninitialized here. It's completely wrong. You want that variable to be initialized. There are some use cases for uninitialized in C bindings and some very specific, low-level implementations. It should never be used in most regular code.
I notice from a recent post that Crystal is more efficient with floats than integers
Where did you read that?
Adding type prefixes to identifiers doesn't seem to be very useful in Crystal. The compiler already keeps track of the types. You as the developer shouldn't have to do that, too. I've never seen that before.

Related

Why error() works faster then assert() in Lua?

Assertion of condition is well known way to design application in strategic way. You could be completely sure that your code will work correctly a day after release, but also when other dev in your team will change this code.
There are 2 common ways to put assertion in Lua code:
assert(1 > 0, "Assert that math works")
if 1 <= 0 then
error("Assert that math doesn't work")
end
I would expect this thing similar from performance point of view.
Consider it only matter of style. But it happens to be not true.
assert works longer on my machine:
function with_assert()
for i=1,100000 do
assert(1 < 0, 'Assert')
end
end
function with_error()
for i=1,100000 do
if 1 > 0 then
error('Error')
end
end
end
local t = os.clock()
pcall(with_assert)
print(os.clock() - t)
t = os.clock()
pcall(with_error)
print(os.clock() - t)
>> 3.1999999999999e-05
>> 1.5e-05
Why does it happen?

Look at the source code of assert and error: assert does some work and then calls error.
But the loop in your code serves no purpose: throwing an error during the first iteration means that the rest of the iterations don't run. Perhaps you meant to put the loop around the pcalls as below. Then you'll find hardly any difference between the times.
function with_assert()
assert(1 < 0, 'Assert')
end
function with_error()
if 1 > 0 then
error('Error')
end
end
function test(f)
local t = os.clock()
for i=1,100000 do
pcall(f)
end
print(os.clock() - t)
end
test(with_assert)
test(with_error)

How to improve running time of my binary search code in peripherical parts?

I am studying for this great Coursera course https://www.coursera.org/learn/algorithmic-toolbox . On the fourth week, we have an assignment related to binary trees.
I think I did a good job. I created a binary search code that solves this problem using recursion in Python3. That's my code:
#python3
data_in_sequence = list(map(int,(input().split())))
data_in_keys = list(map(int,(input()).split()))
original_array = data_in_sequence[1:]
data_in_sequence = data_in_sequence[1:]
data_in_keys = data_in_keys[1:]
def binary_search(data_in_sequence,target):
answer = 0
sub_array = data_in_sequence
#print("sub_array",sub_array)
if not sub_array:
# print("sub_array",sub_array)
answer = -1
return answer
#print("target",target)
mid_point_index = (len(sub_array)//2)
#print("mid_point", sub_array[mid_point_index])
beg_point_index = 0
#print("beg_point_index",beg_point_index)
end_point_index = len(sub_array)-1
#print("end_point_index",end_point_index)
if sub_array[mid_point_index]==target:
#print ("final midpoint, ", sub_array[mid_point_index])
#print ("original_array",original_array)
#print("sub_array[mid_point_index]",sub_array[mid_point_index])
#print ("answer",answer)
answer = original_array.index(sub_array[mid_point_index])
return answer
elif target>sub_array[mid_point_index]:
#print("target num higher than current midpoint")
beg_point_index = mid_point_index+1
sub_array=sub_array[beg_point_index:]
end_point_index = len(sub_array)-1
#print("sub_array",sub_array)
return binary_search(sub_array,target)
elif target<sub_array[mid_point_index]:
#print("target num smaller than current midpoint")
sub_array = sub_array[:mid_point_index]
return binary_search(sub_array,target)
else:
return None
def bin_search_over_seq(data_in_sequence,data_in_keys):
final_output = ""
for key in data_in_keys:
final_output = final_output + " " + str(binary_search(data_in_sequence,key))
return final_output
print (bin_search_over_seq(data_in_sequence,data_in_keys))
I usually get the correct output. For instance, if I input:
5 1 5 8 12 13
5 8 1 23 1 11
I get the correct indexes of the sequences or (-1) if the term is not in sequence (first line):
2 0 -1 0 -1
However, my code does not pass on the expected running time.
Failed case #4/22: time limit exceeded (Time used: 13.47/10.00, memory used: 36696064/536870912.)
I think this happens not due to the implementation of my binary search (I think it is right). Actually, I think this happens due to some inneficieny in a peripheral part of the code. Like the way I am managing to output the final answer. However, the way I am presenting the final answer does not seem to be really "heavy"... I am lost.
Am I not seeing something? Is there another inefficiency I am not seeing? How can I solve this? Just trying to present the final result in a faster way?

Working with very big data faster in Matlab?

I have to deal with very big data (Point clouds generally more than 30 000 000 points) using Matlab. I can read ascii data using textscan function. After reading, I need to detect invalid data (points with 0,0,0 coordinates) and then I need to do some mathematical operations on each point or each line in the data. In my way, first I read data with textscan and then I assign this data to a matrix. Secondly, I use for loops for detecting invalid points and doing some mathematical operations on each point or line in the data. A sample of my code is shown as below. According to profile tool of Matlab textscan takes 37% and line
transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix;
takes 35% of all computation time.
I tried it with another point cloud (stores around 5 500 000) and profile tool reported same results. Is there a way of avoiding for loops, or is there another way of speeding up this computation?
fileID = fopen('C:\Users\Mustafa\Desktop\ptx_all_data\dede5.ptx');
original_data = textscan(fileID,'%f %f %f %f %f %f %f', 'delimiter',' ');
fclose(fileID);
column = original_data{1}(1);
row = original_data{1}(2);
t_matrix = [original_data{1}(7) original_data{2}(7) original_data{3}(7) original_data{4}(7)
original_data{1}(8) original_data{2}(8) original_data{3}(8) original_data{4}(8)
original_data{1}(9) original_data{2}(9) original_data{3}(9) original_data{4}(9)
original_data{1}(10) original_data{2}(10) original_data{3}(10) original_data{4}(10)];
coordinate_list(:,1) = original_data{1}(11:length(original_data{1}));
coordinate_list(:,2) = original_data{2}(11:length(original_data{2}));
coordinate_list(:,3) = original_data{3}(11:length(original_data{3}));
coordinate_list(:,4) = 0;
coordinate_list(:,5) = original_data{4}(11:length(original_data{4}));
transformed_list = zeros(length(coordinate_list),5);
for i = 1:length(coordinate_list)
if coordinate_list(i,1) == 0 && coordinate_list(i,2) == 0 && coordinate_list(i,3) == 0
transformed_list(i,:) = NaN;
else
%transformed_list(i,:) = coordinate_list(i,:)*t_matrix;
transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix;
transformed_list(i,5) = coordinate_list(i,5);
end
%i
end
Thanks in advance

for loops with conditional statements like those will take ages to run. But what Matlab lacks in loop speed it makes up with vectorization and indexing.
Let's try some logical indexing like this to solve the first step:
coordinate_list(coordinate_list(:,1) == 0 .* ...
coordinate_list(:,2) == 0 .* ...
coordinate_list(:,3) == 0)=nan;
And then vectorize the second statement:
transformed_list(:,(1:4)) = coordinate_list(:,(1:4))*t_matrix;
As EBH mentioned above this might be a bit heavy on your RAM. If it's more than your computer can handle asks yourself if the coordinates really have to be doubles, maybe single precision will do. If that still doesn't do, try slicing the vector and performing the operation in parts.
Small example to give you an idea because I had a 2million element point cloud around here:
In R2015a
transformed_list = zeros(length(coordinate_list),5);
tic
for i = 1:length(coordinate_list)
if coordinate_list(i,1) == 0 && coordinate_list(i,2) == 0 && coordinate_list(i,3) == 0
transformed_list(i,:) = NaN;
else
%transformed_list(i,:) = coordinate_list(i,:)*t_matrix;
transformed_list((i:i),(1:3)) = coordinate_list((i:i),(1:3))*t_matrix;
transformed_list(i,5) = 1;
end
%i
end
toc
Returns Elapsed time is 10.928142 seconds.
transformed_list=coordinate_list;
tic
coordinate_list(coordinate_list(:,1) == 0 .* ...
coordinate_list(:,2) == 0 .* ...
coordinate_list(:,3) == 0)=nan;
transformed_list(:,(1:3)) = coordinate_list(:,(1:3))*t_matrix;
toc
Returns Elapsed time is 0.101696 seconds.

Rather than read the whole file, you'd be better off using a loop with
fscanf(fileID, '%f', 7)
and processing input as you read it.

Ruby driver tests failing for credit card with luhn algorithm

I worked up a working code to check if a credit card is valid using luhn algorithm:
class CreditCard
def initialize(num)
##num_arr = num.to_s.split("")
raise ArgumentError.new("Please enter exactly 16 digits for the credit card number.")
if ##num_arr.length != 16
#num = num
end
def check_card
final_ans = 0
i = 0
while i < ##num_arr.length
(i % 2 == 0) ? ans = (##num_arr[i].to_i * 2) : ans = ##num_arr[i].to_i
if ans > 9
tens = ans / 10
ones = ans % 10
ans = tens + ones
end
final_ans += ans
i += 1
end
final_ans % 10 == 0 ? true : false
end
end
However, when I create driver test codes to check for it, it doesn't work:
card_1 = CreditCard.new(4563960122001999)
card_2 = CreditCard.new(4563960122001991)
p card_1.check_card
p card_2.check_card
I've been playing around with the code, and I noticed that the driver code works if I do this:
card_1 = CreditCard.new(4563960122001999)
p card_1.check_card
card_2 = CreditCard.new(4563960122001991)
p card_2.check_card
I tried to research before posting on why this is happening. Logically, I don't see why the first driver codes wouldn't work. Can someone please assist me as to why this is happening?
Thanks in advance!!!

You are using a class variable that starts with ##, which is shared among all instances of CreditCard as well as the class (and other related classes). Therefore, the value will be overwritten every time you create a new instance or apply check_card to some instance. In your first example, the class variable will hold the result for the last application of the method, and hence will reflect the result for the last instance (card_2).

Testing time critical code

I've written a feature for my library Rubikon that displays a throbber (a spinning — as you may have seen in other console apps) as long as some other code is running.
To test this feature I capture the output of the throbber in a StringIO and compare it with the expected value. As the throbber is only displayed as long as the other code is running the content of the IO gets longer when the code runs longer. In my tests I do a simple sleep 1 and should have a constant 1 second delay. This works most of the time, but sometimes (apparently due to external factors like heavy load on the CPU) it fails, because the code doesn't run for 1 second, but for a bit more, so that the throbber prints a few additional characters.
My question is: Is there any possibility to test such time critical features in Ruby?

From your github repository, I found this test for the Throbber class:
should 'work correctly' do
ostream = StringIO.new
thread = Thread.new { sleep 1 }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
assert_equal " \b-\b\\\b|\b/\b", ostream.string
end
I'll assume that a throbber iterates over ['-', '\', '|', '/'], backspacing before each write, once per second. Consider the following test:
should 'work correctly' do
ostream = StringIO.new
started_at = Time.now
ended_at = nil
thread = Thread.new { sleep 1; ended_at = Time.now }
throbber = Throbber.new(ostream, thread)
thread.join
throbber.join
duration = ended_at - started_at
iterated_chars = " -\\|/"
expected = ""
if duration >= 1
# After n seconds we should have n copies of " -\\|/", excluding \b for now
expected << iterated_chars * duration.to_i
end
# Next append the characters we'd get from working for fractions of a second:
remainder = duration - duration.to_i
expected << iterated_chars[0..((iterated_chars.length*remainder).to_i)] if remainder > 0.0
expected = expected.split('').join("\b") + "\b"
assert_equal expected, ostream.string
end
The last assignment of expected is a bit unpleasant, but I made the assumption that the throbber would write character/backspace pairs atomically. If this is not true, you should be able to insert the \b escape sequence into the iterated_chars string and remove the last assignment entirely.

This question is similar (I think, altough I'm not completely sure) to this one:
Only real time operating system can
give you such precision. You can
assume Thread.Sleep has a precision of
about 20 ms so you could, in theory
sleep until the desired time - the
actual time is about 20 ms and THEN
spin for 20 ms but you'll have to
waste those 20 ms. And even that
doesn't guarantee that you'll get real
time results, the scheduler might just
take your thread out just when it was
about to execute the RELEVANT part
(just after spinning)
The problem is not rubby (possibly, I'm no expert in ruby), the problem is the real time capabilities of your operating system.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Can this Crystal benchmark code be improved significantly? - performance

Related

Why error() works faster then assert() in Lua?

How to improve running time of my binary search code in peripherical parts?

Working with very big data faster in Matlab?

Ruby driver tests failing for credit card with luhn algorithm

Testing time critical code

Categories

Resources