Implement a basic datatype in BinData - ruby

I am trying to implement binary16 encoding for a half-precision floating point type.
The code is working except for one detail: It returns an object with the three properties (sign, exponent, fraction), but I would like it to return the float. Right now, I have to call to_f to get to the float. I would like this to work the way the integrated int and float classes work.
Here's my code:
require 'bindata'
class Binary16Be < BinData::Record
# naming based on https://en.wikipedia.org/wiki/Half-precision_floating-point_format
bit1 :sign_bit
bit5 :exponent
bit10 :fraction
def sign
sign_bit.zero? ? 1 : -1
end
def to_f
if exponent == 31 # special value in binary16 - all exponent bits are 1
return fraction.zero? ? (sign * Float::INFINITY) : Float::NAN
end
sign * 2**(exponent - 15) * (1.0 + fraction.to_f / 1024)
end
end
What I would like:
Binary16Be.read("\x3C\x00")
=> 1.0
What happens right now:
Binary16Be.read("\x3C\x00")
{:sign_bit=>0, :exponent=>15, :fraction=>0}

(this is not actually my own answer, I received this from the gem's author. Made slight alterations to his answer to fit this Q&A format a little better.)
The procedure is described in the bindata Wiki / Primitive Types
In your case:
Subclass Primitive instead of Record
Rename #to_f to #get
Implement #set
The converted code
class Binary16Be < BinData::Primitive
# naming based on
# https://en.wikipedia.org/wiki/Half-precision_floating-point_format
bit1 :sign_bit
bit5 :exponent
bit10 :fraction
def sign
sign_bit.zero? ? 1 : -1
end
def get
if exponent == 31 # special value in binary16 - all exponent bits are 1
return fraction.zero? ? (sign * Float::INFINITY) : Float::NAN
end
sign * 2**(exponent - 15) * (1.0 + fraction.to_f / 1024)
end
def set(val)
self.sign = (val >= 0.0)
self.fraction = ... # TODO implement
self.exponent = ... # TODO implement
end
end

Related

Can this Crystal benchmark code be improved significantly?

I'm deciding on a language to use for back-end use. I've looked at Go, Rust, C++, and I thought I'd look at Crystal because it did reasonably well in some benchmarks. Speed is not the ultimate requirement, however speed is important. The syntax is equally important, and I'm not averse to the Crystal syntax. The simple benchmark that I wrote is only a small part of the evaluation and it's also partly familiarisation. I'm using Windows, so I'm using Crystal 0.35.1 on Win10 2004-19041.329 with WSL2 and Ubuntu 20.04 LTS. I don't know if WSL2 has any impact on performance. The benchmark is primarily using integer arithmetic. Go, Rust, and C++ have almost equal performance to each other (on Win10). I've translated that code to Crystal, and it runs a fair bit slower than those three. Out of simple curiosity I also ran the code on Dart (on Win10), and it ran (very surprisingly) almost twice as fast as those three. I do understand that a simple benchmark does not say a lot. I notice from a recent post that Crystal is more efficient with floats than integers, however this was and is aimed at integers.
This is my first Crystal program, so I thought I should ask - is there any simple improvements I can make to the code for performance? I don't want to improve the algorithm other than to correct errors, because all are using this algorithm.
The code is as follows:
# ------ Prime-number counter. -----#
# Brian 25-Jun-2020 Program written - my first Crystal program.
# -------- METHOD TO CALCULATE APPROXIMATE SQRT ----------#
def fnCalcSqrt(iCurrVal) # Calculate approximate sqrt
iPrevDiv = 0.to_i64
iDiv = (iCurrVal // 10)
if iDiv < 2
iDiv = 2
end
while (true)
begin
iProd = (iDiv * iDiv)
rescue vError
puts "Error = #{vError}, iDiv = #{iDiv}, iCurrVal = #{iCurrVal}"
exit
end
if iPrevDiv < iDiv
iDiff = ((iDiv - iPrevDiv) // 2)
else
iDiff = ((iPrevDiv - iDiv) // 2)
end
iPrevDiv = iDiv
if iProd < iCurrVal # iDiv IS TOO LOW #
if iDiff < 1
iDiff = 1
end
iDiv += iDiff
else
if iDiff < 2
return iDiv
end
iDiv -= iDiff
end
end
end
# ---------- PROGRAM MAINLINE --------------#
print "\nCalculate Primes from 1 to selected number"
#iMills = uninitialized Int32 # CHANGED THIS BECAUSE IN --release DOES NOT WORK
iMills = 0.to_i32
while iMills < 1 || iMills > 100
print "\nEnter the ending number of millions (1 to 100) : "
sInput = gets
if sInput == ""
exit
end
iTemp = sInput.try &.to_i32?
if !iTemp
puts "Please enter a valid number"
puts "iMills = #{iTemp}"
elsif iTemp > 100 # > 100m
puts "Invalid - too big must be from 1 to 100 (million)"
elsif iTemp < 1
puts "Invalid - too small - must be from 1 to 100 (million)"
else
iMills = iTemp
end
end
#iCurrVal = 2 # THIS CAUSES ARITHMETIC OVERFLOW IN SQRT CALC.
iCurrVal = 2.to_i64
iEndVal = iMills * 1_000_000
iPrimeTot = 0
# ----- START OF PRIME NUMBER CALCULATION -----#
sEndVal = iEndVal.format(',', group: 3) # => eg. "10,000,000"
puts "Calculating number of prime numbers from 2 to #{sEndVal} ......"
vStartTime = Time.monotonic
while iCurrVal <= iEndVal
if iCurrVal % 2 != 0 || iCurrVal == 2
iSqrt = fnCalcSqrt(iCurrVal)
tfPrime = true # INIT
iDiv = 2
while iDiv <= iSqrt
if ((iCurrVal % iDiv) == 0)
tfPrime = (iDiv == iCurrVal);
break;
end
iDiv += 1
end
if (tfPrime)
iPrimeTot+=1;
end
end
iCurrVal += 1
end
puts "Elapsed time = #{Time.monotonic - vStartTime}"
puts "prime total = #{iPrimeTot}"
You need to compile with --release flag. By default, the Crystal compiler is focused on compilation speed, so you get a compiled program fast. This is particularly important during development. If you want a program that runs fast, you need to pass the --release flag which tells the compiler to take time for optimizations (that's handled by LLVM btw.).
You might also be able to shave off some time by using wrapping operators like &+ in location where it's guaranteed that the result can'd overflow. That skips some overflow checks.
A few other remarks:
Instead of 0.to_i64, you can just use a Int64 literal: 0_i64.
iMills = uninitialized Int32
Don't use uninitialized here. It's completely wrong. You want that variable to be initialized. There are some use cases for uninitialized in C bindings and some very specific, low-level implementations. It should never be used in most regular code.
I notice from a recent post that Crystal is more efficient with floats than integers
Where did you read that?
Adding type prefixes to identifiers doesn't seem to be very useful in Crystal. The compiler already keeps track of the types. You as the developer shouldn't have to do that, too. I've never seen that before.

Improving an algorithm for substring search when reading ZIP files

So I have a ZIP reader library, and I read ZIP files by first figuring out where the EOCD record is (the standard way "from the tail"). I have to look for a pattern that is roughly this:
4byte_magic_number, fixed_n_bytes, 2_bytes_of_comment_size, comment
The bytesize of comment is provided in the 2_bytes_of_comment_size. Just scanning for the magic number is insufficient, because I eager-read a substantial portion at the tail of the file - basically the maximum size the ZIP EOCD record can be, and then look for this pattern in there.
So far, I came up with this
def locate_eocd_signature(in_str)
# We have to scan from the _very_ tail. We read the very minimum size
# the EOCD record can have (up to and including the comment size), using
# a sliding window. Once our end offset matches the comment size we found our
# EOCD marker.
eocd_signature_int = 0x06054b50
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
end_location = minimum_record_size * -1
loop do
# If the window is nil, we have rolled off the start of the string, nothing to do here.
# We use negative values because if we used positive slice indices
# we would have to detect the rollover ourselves
break unless window = in_str[end_location, minimum_record_size]
window_location = in_str.bytesize + end_location
unpacked = window.unpack(unpack_pattern)
# If we found the signature, pick up the comment size, and check if the size of the window
# plus that comment size is where we are in the string. If we are - bingo.
if unpacked[0] == 0x06054b50 && comment_size = unpacked[-1]
assumed_eocd_location = in_str.bytesize - comment_size - minimum_record_size
# if the comment size is where we should be at - we found our EOCD
return assumed_eocd_location if assumed_eocd_location == window_location
end
end_location -= 1 # Shift the window back, by one byte, and try again.
end
end
but it just screams ugly at me. Is there a better way to do something like this? Is there a pack specifier that says "all the bytes in binary until the the end of the string" that I do not know of? Then I could tack that onto the end of the pack specifier for example... A bit at loss here.
In the end I opted for the following optimization. First, I made a method for finding all the indices of a given substring in a string - there is no stdlib builtin for this.
def all_indices_of_substr_in_str(of_substring, in_string)
last_i = 0
found_at_indices = []
while last_i = in_string.index(of_substring, last_i)
found_at_indices << last_i
last_i += of_substring.bytesize
end
found_at_indices
end
Then, we use it to "latch" onto the offsets in our buffer where our signature was found.
def locate_eocd_signature(in_str)
eocd_signature = 0x06054b50
eocd_signature_str = [eocd_signature].pack('V')
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
str_size = in_str.bytesize
indices = all_indices_of_substr_in_str(eocd_signature_str, in_str)
indices.each do |check_at|
maybe_record = in_str[check_at..str_size]
# If the record is smaller than the minimum - we will never recover anything
break if maybe_record.bytesize < minimum_record_size
# Now we check if the record ends with the combination
# of the comment size and an arbitrary byte string of that size.
# If it does - we found our match
*_unused, comment_size = maybe_record.unpack(unpack_pattern)
if (maybe_record.bytesize - minimum_record_size) == comment_size
return check_at # Found the EOCD marker location
end
end
# If we haven't caught anything, return nil deliberately instead of returning the last statement
nil
end

How to unpack 7-bits at a time in ruby?

I'm trying to format a UUIDv4 into a url friendly string. The typical format in base16 is pretty long and has dashes:
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
To avoid dashes and underscores I was going to use base58 (like bitcoin does) so each character fully encode sqrt(58).floor = 7 bits.
I can pack the uuid into binary with:
[ uuid.delete('-') ].pack('H*')
To get 8-bit unsigned integers its:
binary.unpack('C*')
How can i unpack every 7-bits into 8-bit unsigned integers? Is there a pattern to scan 7-bits at a time and set the high bit to 0?
require 'base58'
uuid ="123e4567-e89b-12d3-a456-426655440000"
Base58.encode(uuid.delete('-').to_i(16))
=> "3fEgj34VWmVufdDD1fE1Su"
and back again
Base58.decode("3fEgj34VWmVufdDD1fE1Su").to_s(16)
=> "123e4567e89b12d3a456426655440000"
A handy pattern to reconstruct the uuid format from a template
template = 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
src = "123e4567e89b12d3a456426655440000".each_char
template.each_char.reduce(''){|acc, e| acc += e=='-' ? e : src.next}
=> "123e4567-e89b-12d3-a456-426655440000"
John La Rooy's answer is great, but I just wanted to point out how simple the Base58 algorithm is because I think it's neat. (Loosely based on the base58 gem, plus bonus original int_to_uuid function):
ALPHABET = "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ".chars
BASE = ALPHABET.size
def base58_to_int(base58_val)
base58_val.chars
.reverse_each.with_index
.reduce(0) do |int_val, (char, index)|
int_val + ALPHABET.index(char) * BASE ** index
end
end
def int_to_base58(int_val)
''.tap do |base58_val|
while int_val > 0
int_val, mod = int_val.divmod(BASE)
base58_val.prepend ALPHABET[mod]
end
end
end
def int_to_uuid(int_val)
base16_val = int_val.to_s(16)
[ 8, 4, 4, 4, 12 ].map do |n|
base16_val.slice!(0...n)
end.join('-')
end
uuid = "123e4567-e89b-12d3-a456-426655440000"
int_val = uuid.delete('-').to_i(16)
base58_val = int_to_base58(int_val)
int_val2 = base58_to_int(base58_val)
uuid2 = int_to_uuid(int_val2)
printf <<END, uuid, int_val, base_58_val, int_val2, uuid2
Input UUID: %s
Input UUID as integer: %d
Integer encoded as base 58: %s
Integer decoded from base 58: %d
Decoded integer as UUID: %s
END
Output:
Input UUID: 123e4567-e89b-12d3-a456-426655440000
Input UUID as integer: 24249434048109030647017182302883282944
Integer encoded as base 58: 3fEgj34VWmVufdDD1fE1Su
Integer decoded from base 58: 24249434048109030647017182302883282944
Decoded integer as UUID: 123e4567-e89b-12d3-a456-426655440000

Ruby: how to view the definition of Date.leap?

I have tried the following:
method = Date.method(:leap?)
puts method
location=method.source_location
puts location
Here is the output: # Method: Date.leap?
Date::leap? is not written in Ruby, it's written in C (at least on the YARV implementation of Ruby).
Method#source_location returns the location in Ruby source code, but Date::leap? doesn't have any Ruby source code, therefore it returns nil.
You will have to dig deep into the C source of YARV, if you want to find what you are looking for.
First, you'l notice on lines 9314-9315 of ext/date/date_core.c that leap? is implemented by the function date_s_gregorian_leap_p:
rb_define_singleton_method(cDate, "leap?",
date_s_gregorian_leap_p, 1);
The definition of date_s_gregorian_leap_p is on lines 2918-2926 of ext/date/date_core.c:
static VALUE
date_s_gregorian_leap_p(VALUE klass, VALUE y)
{
VALUE nth;
int ry;
decode_year(y, -1, &nth, &ry);
return f_boolcast(c_gregorian_leap_p(ry));
}
You'll notice that it simply calls c_gregorian_leap_p, which is defined on lines 682-686 of ext/date/date_core.c:
inline static int
c_gregorian_leap_p(int y)
{
return (MOD(y, 4) == 0 && y % 100 != 0) || MOD(y, 400) == 0;
}
Last but not least, here are the definitions of MOD:
#define MOD(n,d) ((n)<0 ? NMOD((n),(d)) : (n)%(d))
and NMOD:
#define NMOD(x,y) ((y)-(-((x)+1)%(y))-1)
However, I personally find YARV to be the most unreadable of all Ruby implementations. I much prefer looking at the source code of JRuby or Rubinius.
For example, here's the same method in Rubinius. Well, actually, unlike YARV, Rubinius doesn't ship its own implementation of the standard library, instead it uses the gemified standard library from the RubySL (Ruby Standard Library) project.
As you can see on line 730 of lib/rubysl/date/date.rb of the rubysl-date gem, Date::leap? is defined as an alias of Date::gregorian_leap?:
class << self; alias_method :leap?, :gregorian_leap? end
Which is defined directly above in line 728 of lib/rubysl/date/date.rb of the rubysl-date gem:
def self.gregorian_leap? (y) y % 4 == 0 && y % 100 != 0 || y % 400 == 0 end

Ruby driver tests failing for credit card with luhn algorithm

I worked up a working code to check if a credit card is valid using luhn algorithm:
class CreditCard
def initialize(num)
##num_arr = num.to_s.split("")
raise ArgumentError.new("Please enter exactly 16 digits for the credit card number.")
if ##num_arr.length != 16
#num = num
end
def check_card
final_ans = 0
i = 0
while i < ##num_arr.length
(i % 2 == 0) ? ans = (##num_arr[i].to_i * 2) : ans = ##num_arr[i].to_i
if ans > 9
tens = ans / 10
ones = ans % 10
ans = tens + ones
end
final_ans += ans
i += 1
end
final_ans % 10 == 0 ? true : false
end
end
However, when I create driver test codes to check for it, it doesn't work:
card_1 = CreditCard.new(4563960122001999)
card_2 = CreditCard.new(4563960122001991)
p card_1.check_card
p card_2.check_card
I've been playing around with the code, and I noticed that the driver code works if I do this:
card_1 = CreditCard.new(4563960122001999)
p card_1.check_card
card_2 = CreditCard.new(4563960122001991)
p card_2.check_card
I tried to research before posting on why this is happening. Logically, I don't see why the first driver codes wouldn't work. Can someone please assist me as to why this is happening?
Thanks in advance!!!
You are using a class variable that starts with ##, which is shared among all instances of CreditCard as well as the class (and other related classes). Therefore, the value will be overwritten every time you create a new instance or apply check_card to some instance. In your first example, the class variable will hold the result for the last application of the method, and hence will reflect the result for the last instance (card_2).

Resources