fromCharCode equivalent in Ruby - ruby

I was wondering if there is a Ruby equivalent of JavaScript's fromCharCode function. What it does is converting Unicode values into characters.
Here an example of its return value in JavaScript:
String.fromCharCode(72,69,76,76,79)
#=> HELLO
Is there an equivalent for that in Ruby?

Use Integer#chr:
72.chr
# => "H"
[72,69,76,76,79].map{|i| i.chr }.join
# => "HELLO"
[72,69,76,76,79].map(&:chr).join
# => "HELLO"
UPDATE
Without parameters chr only handles 8-bit ASCII characters, you have to pass the parameter Encoding::UTF_8 to chr to handle Unicode characters.
512.chr
RangeError: 512 out of char range
from (irb):8:in `chr'
from (irb):8
from /usr/bin/irb:12:in `<main>'
512.chr(Encoding::UTF_8)
# => "Ȁ"
[512,513].map{|i| i.chr(Encoding::UTF_8)}.join
# => "Ȁȁ"

Related

Is there a function available in ruby where the part after regex match can be returned?

I have a string say its in the below format.
2020/07/08 16:30:03.919 263825 (Followed by strings)
2020/07/08 16:30:03.919 263826 (Followed by strings)
Do we have a function to return only the (Followed by strings) part. If we have a pattern for the first part
^\w*\s*(?<time_var>\w*\/\w*\/\w*\s\w*:\w*:\w*.\w*)\s\w*
The above pattern matches the timestamp followed by number.
I found this Get substring after the first = symbol in Ruby but didn't help actually! Am i doing anything wrong here?
irb(main):001:0> line = "2020/07/08 16:30:03.919 263825 (Followed by strings)"
=> "2020/07/08 16:30:03.919 263825 (Followed by strings)"
irb(main):002:0> line.partition('^\w*\s*(?<time_var>\w*\/\w*\/\w*\s\w*:\w*:\w*.\w*)\s\w*').last
=> ""
irb(main):003:0> line.partition('^\w*\s*(?<time_var>\w*\/\w*\/\w*\s\w*:\w*:\w*.\w*)\s\w*')
=> ["2020/07/08 16:30:03.919 263825 (Followed by strings)", "", ""]
And without last function it matching the whole string?
MatchData#post_match returns the string after the actual match:
pattern = /^\w*\s*(?<time_var>\w*\/\w*\/\w*\s\w*:\w*:\w*.\w*)\s\w*/
line = "2020/07/08 16:30:03.919 263825 (Followed by strings)"
line.match(pattern, &:post_match)
#=> " (Followed by strings)"
You can try:
line = "2020/07/08 16:30:03.919 263825 (Followed by strings)"
l = line.gsub(/^\w*\s*(?<time_var>\w*\/\w*\/\w*\s\w*:\w*:\w*.\w*)\s\w*/, "").lstrip
# l ==> "(Followed by strings)"
You can use a regexp capture:
str = <<~STR
2020/07/08 16:30:03.919 263825 (Followed by strings)
2020/07/08 16:30:03.919 263826 (Followed by strings)
STR
tstamp_line_rgx = %r{\d{4}/\d\d/\d\d \d\d:\d\d:\d\d\.\d{3} \d+ (.*)}
str.lines.map do |line|
line[tstamp_line_rgx, 1]
end
I have a slightly different regex (I think yours might work too), but the important part is (.*), which captures "anything after the timestamps + pid, to the end of the line", and it's referenced by the 1 in string_variable[regex, 1] because it's the 1st parenthesized capture group.
You can see more clearly the regex capture groups etc when using it "directly" (as opposed to the string[regex, capture_num] syntax):
[12] pry(main)> a_string = "2020/07/08 16:30:03.919 263826 (Followed by strings)"
=> "2020/07/08 16:30:03.919 263826 (Followed by strings)"
[13] pry(main)> tstamp_line_rgx.match(a_string)
=> #<MatchData "2020/07/08 16:30:03.919 263826 (Followed by strings)" 1:"(Followed by strings)">
You can use \K to keep the string after the regex:
regex = %r(^\w*\s*(?<time_var>\w*\/\w*\/\w*\s\w*:\w*:\w*.\w*)\s\w*)
'2020/07/08 16:30:03.919 263825 (Followed by strings)'.match(/#{regex}\K.*/).to_s
# => " (Followed by strings)"
Your timestamp has a very definite pattern. Among other things,
'2020/07/08 16:30:03.919 263825'.size
#=> 30
One therefore could write:
str = '2020/07/08 16:30:03.919 263825 the cat and the hat'
time_stamp = str[0,30]
#=> "2020/07/08 16:30:03.919 263825"
remainder = str[30..-1].strip
#=> "the cat and the hat"
If you wish to be on the safe side by confirming it is a valid time stamp, you could do the following.
time_stamp_str = time_stamp[0,23]
#=> "2020/07/08 16:30:03.919"
time_stamp_supp = time_stamp[23..-1]
#=> " 263825"
time_stamp_supp.match?(/\A \d+\z/)
#=> true
require 'time'
def time_stamp_valid?(time_stamp_str)
rv = DateTime.strptime(time_stamp_str, '%Y/%m/%d %H:%M:%S.%L') rescue false
!!rv
end
time_stamp_valid?(time_stamp_str)
#=> true
Here
rv #=> #<DateTime: 2020-07-08T16:30:03+00:00 ((2459039j,59403s,919000000n),+0s,2299161j)>
See DateTime::strptime and (for formatting directives) DateTime#strftime. strptime raises an exception if the string does not represent a valid date, in which case time_stamp_valid? rescues the exception in-line and returns false.
!! merely converts truthy objects (here a DateTime object) to true and converts falsy objects (nil and false) to false.
Verifying a time stamp in this way is preferable to using a regular expression as the latter can give incorrect results. For example, most regexes would not be able to determine whether Feb. 29, 2000 is a valid date (though it can be done). Moreover, this approach is so much easier than crafting a regex that does only a fair job of evaluating date-time stings for correctness.
Above all, do not use use parse as it can be quite unpredicable. For example: DateTime.parse("She thought that maybe he was the killer after all") #=> #<DateTime: 2020-05-01T00:00:00+00:00 ((2458971j,0s,0n),+0s,2299161j)>.

Trim a trailing .0

I have an Excel column containing part numbers. Here is a sample
As you can see, it can be many different datatypes: Float, Int, and String. I am using roo gem to read the file. The problem is that roo interprets integer cells as Float, adding a trailing zero to them (16431 => 16431.0). I want to trim this trailing zero. I cannot use to_i because it will trim all the trailing numbers of the cells that require a decimal in them (the first row in the above example) and will cut everything after a string char in the String rows (the last row in the above example).
Currently, I have a a method that checks the last two characters of the cell and trims them if they are ".0"
def trim(row)
if row[0].to_s[-2..-1] == ".0"
row[0] = row[0].to_s[0..-3]
end
end
This works, but it feels terrible and hacky. What is the proper way of getting my Excel file contents into a Ruby data structure?
def trim num
i, f = num.to_i, num.to_f
i == f ? i : f
end
trim(2.5) # => 2.5
trim(23) # => 23
or, from string:
def convert x
Float(x)
i, f = x.to_i, x.to_f
i == f ? i : f
rescue ArgumentError
x
end
convert("fjf") # => "fjf"
convert("2.5") # => 2.5
convert("23") # => 23
convert("2.0") # => 2
convert("1.00") # => 1
convert("1.10") # => 1.1
For those using Rails, ActionView has the number_with_precision method that takes a strip_insignificant_zeros: true argument to handle this.
number_with_precision(13.00, precision: 2, strip_insignificant_zeros: true)
# => 13
number_with_precision(13.25, precision: 2, strip_insignificant_zeros: true)
# => 13.25
See the number_with_precision documentation for more information.
This should cover your needs in most cases: some_value.gsub(/(\.)0+$/, '').
It trims all trailing zeroes and a decimal point followed only by zeroes. Otherwise, it leaves the string alone.
It's also very performant, as it is entirely string-based, requiring no floating point or integer conversions, assuming your input value is already a string:
Loading development environment (Rails 3.2.19)
irb(main):001:0> '123.0'.gsub(/(\.)0+$/, '')
=> "123"
irb(main):002:0> '123.000'.gsub(/(\.)0+$/, '')
=> "123"
irb(main):003:0> '123.560'.gsub(/(\.)0+$/, '')
=> "123.560"
irb(main):004:0> '123.'.gsub(/(\.)0+$/, '')
=> "123."
irb(main):005:0> '123'.gsub(/(\.)0+$/, '')
=> "123"
irb(main):006:0> '100'.gsub(/(\.)0+$/, '')
=> "100"
irb(main):007:0> '127.0.0.1'.gsub(/(\.)0+$/, '')
=> "127.0.0.1"
irb(main):008:0> '123xzy45'.gsub(/(\.)0+$/, '')
=> "123xzy45"
irb(main):009:0> '123xzy45.0'.gsub(/(\.)0+$/, '')
=> "123xzy45"
irb(main):010:0> 'Bobby McGee'.gsub(/(\.)0+$/, '')
=> "Bobby McGee"
irb(main):011:0>
Numeric values are returned as type :float
def convert_cell(cell)
if cell.is_a?(Float)
i = cell.to_i
cell == i.to_f ? i : cell
else
cell
end
end
convert_cell("foobar") # => "foobar"
convert_cell(123) # => 123
convert_cell(123.4) # => 123.4

Comparing bytes in Ruby

I have a binary blob header of either a JPG or MP4 file. I am trying to differentiate between the two.
When the file is a JPG, the first two bytes are \xFF\xD8. However, when I make the comparison blob[0] == "\xFF", it fails. Even when I know that blob[0] IS in fact \xFF
What is the best way to do this?
This is an encoding issue. You are comparing a string with binary encoding (your JPEG blob) with a UTF-8 encoded string ("\xFF"):
foo = "\xFF".force_encoding("BINARY") # like your blob
bar = "\xFF"
p foo # => "\xFF"
p bar # => "\xFF"
p foo == bar # => false
There are several ways to create a binary encoded string:
str = "\xFF\xD8".b # => "\xFF\xD8" (Ruby 2.x)
str.encoding # => #<Encoding:ASCII-8BIT>
str = "\xFF\xD8".force_encoding("BINARY") # => "\xFF\xD8"
str.encoding # => #<Encoding:ASCII-8BIT>
str = 0xFF.chr + 0xD8.chr # => "\xFF\xD8"
str.encoding # => #<Encoding:ASCII-8BIT>
str = ["FFD8"].pack("H*") # => "\xFF\xD8"
str.encoding # => #<Encoding:ASCII-8BIT>
All of the above can be compared with your blob.

Why are two strings with same bytes and encoding not identical in Ruby 1.9?

In Ruby 1.9.2, I found a way to make two strings that have the same bytes, same encoding, and are equal, but they have a different length and different characters returned by [].
Is this a bug? If it is not a bug, then I'd like to fully understand it. What kind of information is stored inside Ruby 1.9.2 String objects that allows these two strings to behave differently?
Below is the code that reproduces this behavior. The comments that start with #=> show you what output I am getting from this script, and the parenthetical words tell you my judgment of that output.
#!/usr/bin/ruby1.9
# coding: utf-8
string1 = "\xC2\xA2" # A well-behaved string with one character (¢)
string2 = "".concat(0xA2) # A bizarre string very similar to string1.
p string1.bytes.to_a #=> [194, 162] (good)
p string2.bytes.to_a #=> [194, 162] (good)
puts string1.encoding.name #=> UTF-8 (good)
puts string2.encoding.name #=> UTF-8 (good)
puts string1 == string2 #=> true (good)
puts string1.length #=> 1 (good)
puts string2.length #=> 2 (weird!)
p string1[0] #=> "¢" (good)
p string2[0] #=> "\xC2" (weird!)
I am running Ubuntu and compiled Ruby from source. My Ruby version is:
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
It is Ruby's bug and fixed r29848.
Matz mentioned this question via Twitter:
http://twitter.com/matz_translator/status/6597021662187520
http://twitter.com/matz_translator/status/6597055132733440
"It's hard to determine as a bug but, it's not acceptable to leave it as is. I'd prefer to fix this issue."
I think the problem is in the string's encoding. Check out James Grey's Shades of Gray: Ruby 1.9's String article on Unicode encoding.
Additional odd behavior:
# coding: utf-8
string1 = "\xC2\xA2"
string2 = "".concat(0xA2)
string3 = 0xC2.chr + 0xA2.chr
string1.bytes.to_a # => [194, 162]
string2.bytes.to_a # => [194, 162]
string3.bytes.to_a # => [194, 162]
string1.encoding.name # => "UTF-8"
string2.encoding.name # => "UTF-8"
string3.encoding.name # => "ASCII-8BIT"
string1 == string2 # => true
string1 == string3 # => false
string2 == string3 # => true
string1.length # => 1
string2.length # => 2
string3.length # => 2
string1[0] # => "¢"
string2[0] # => "\xC2"
string3[0] # => "\xC2"
string3.unpack('C*') # => [194, 162]
string4 = string3.unpack('C*').pack('C*') # => "\xC2\xA2"
string4.encoding.name # => "ASCII-8BIT"
string4.force_encoding('UTF-8') # => "¢"
string3.force_encoding('UTF-8') # => "¢"
string3.encoding.name # => "UTF-8"

ruby double question mark [duplicate]

This question already has answers here:
what is "?" in ruby
(3 answers)
Closed 7 years ago.
I came across this piece of ruby code:
str[-1]==??
What is the double question mark all about? Never seen that before.
Ruby 1.8 has a ?-prefix syntax that turns a character into its ASCII code value. For example, ?a is the ASCII value for the letter a (or 97). The double question mark you see is really just the number 63 (or the ASCII value for ?).
?a # => 97
?b # => 98
?c # => 99
?\n # => 10
?? # => 63
To convert back, you can use the chr method:
97.chr # => "a"
10.chr # => "\n"
63.chr # => "?"
??.chr # => "?"
In Ruby 1.9, the ?a syntax returns the character itself (as does the square bracket syntax on strings):
?? # => "?"
"What?"[-1] # => "?"
As Ryan says, the ? prefix gives you the ASCII value of a character. The reason why this is useful in this context is that when you use the index notation on a string in Ruby 1.8 the ASCII value is returned rather than the character. e.g.
irb(main):009:0> str = 'hello'
=> "hello"
irb(main):010:0> str[-1]
=> 111
so the following wouldn't test if the last character of a string was the letter 'o'
irb(main):011:0> str[-1] == 'o'
=> false
but this would:
irb(main):012:0> str[-1] == ?o
=> true
and (provided you know what the ? does!) this is slightly clearer than
irb(main):013:0> str[-1] == 111
=> true

Resources