Parser in Ruby: #slice! inside #each_with_index = missing element

Parser in Ruby: #slice! inside #each_with_index = missing element - ruby

Let's say, I want to separate certain combinations of elements from an array. For example
data = %w{ start before rgb 255 255 255 between hex FFFFFF after end }
rgb, hex = [], []
data.each_with_index do |v,i|
p [i,v]
case v.downcase
when 'rgb' then rgb = data.slice! i,4
when 'hex' then hex = data.slice! i,2
end
end
pp [rgb, hex, data]
# >> [0, "start"]
# >> [1, "before"]
# >> [2, "rgb"]
# >> [3, "hex"]
# >> [4, "end"]
# >> [["rgb", "255", "255", "255"],
# >> ["hex", "FFFFFF"],
# >> ["start", "before", "between", "after", "end"]]
The code have done the correct extraction, but it missed the elements just after the extracted sets. So if my data array is
data = %w{ start before rgb 255 255 255 hex FFFFFF after end }
then
pp [rgb, hex, data]
# >> [["rgb", "255", "255", "255"],
# >> [],
# >> ["start", "before", "hex", "FFFFFF", "after", "end"]]
Why does it happen? How to get those missed elements inside #each_with_index? Or may be there is a better solution for this problem assuming that there are much more sets to extract?

The problem is that you are mutating the collection while you are iterating over it. This cannot possibly work. (And in my opinion, it shouldn't. Ruby should raise an exception in this case, instead of silently allowing incorrect behavior. That's what pretty much all other imperative languages do.)
This here is the best I could come up with while still keeping your original style:
require 'pp'
data = %w[start before rgb 255 255 255 hex FFFFFF after end]
rgb_count = hex_count = 0
rgb, hex, rest = data.reduce([[], [], []]) do |acc, el|
acc.tap do |rgb, hex, rest|
next (rgb_count = 3 ; rgb << el) if /rgb/i =~ el
next (rgb_count -= 1 ; rgb << el) if rgb_count > 0
next (hex_count = 1 ; hex << el) if /hex/i =~ el
next (hex_count -= 1 ; hex << el) if hex_count > 0
rest << el
end
end
data.replace(rest)
pp rgb, hex, data
# ["rgb", "255", "255", "255"]
# ["hex", "FFFFFF"]
# ["start", "before", "after", "end"]
However, what you have is a parsing problem and that should really be solved by a parser. A simple hand-rolled parser/state machine will probably be a little bit more code than the above, but it will be so much more readable.
Here's a simple recursive-descent parser that solves your problem:
class ColorParser
def initialize(input)
#input = input.dup
#rgb, #hex, #data = [], [], []
end
def parse
parse_element until #input.empty?
return #rgb, #hex, #data
end
private
def parse_element
parse_color or parse_stop_word
end
def parse_color
parse_rgb or parse_hex
end
def parse_rgb
return unless /rgb/i =~ peek
#rgb << consume
parse_rgb_values
end
I really like recursive-descent parsers because their structure almost perfectly matches the grammar: just keep parsing elements until the input is empty. What is an element? Well, it's a color specification or a stop word. What is a color specification? Well, it's either an RGB color specification or a hex color specification. What is an RGB color specification? Well, it's something that matches the Regexp /rgb/i followed by RGB values. What are RGB values? Well, it's just three numbers …
def parse_rgb_values
3.times do #rgb << consume.to_i end
end
def parse_hex
return unless /hex/i =~ peek
#hex << consume
parse_hex_value
end
def parse_hex_value
#hex << consume.to_i(16)
end
def parse_stop_word
#data << consume unless /rgb|hex/i =~ peek
end
def consume
#input.slice!(0)
end
def peek
#input.first
end
end
Use it like so:
data = %w[start before rgb 255 255 255 hex FFFFFF after end]
rgb, hex, rest = ColorParser.new(data).parse
require 'pp'
pp rgb, hex, rest
# ["rgb", 255, 255, 255]
# ["hex", 16777215]
# ["start", "before", "after", "end"]
For comparison, here's the grammar:
S → element*
element → color | word
color → rgb | hex
rgb → rgb rgbvalues
rgbvalues → token token token
hex → hex hexvalue
hexvalue → token
word → token

Because you are manipulating data in place.
When you hit rgb the next element in the loop would be 255, but you are deleting those elements so now between is in the place that rgb was, so the next element is hex
Something like this may work better for you:
when 'rgb' then rgb = data.slice! i+1,3
when 'hex' then hex = data.slice! i+1,1

Here is a bit nicer solution
data = %w{ start before rgb 255 255 255 hex FFFFFF hex EEEEEE after end }
rest, rgb, hex = [], [], []
until data.empty?
case (key = data.shift).downcase
when 'rgb' then rgb += [key] + data.shift(3)
when 'hex' then hex += [key] + data.shift(1)
else rest << key
end
end
p rgb, hex, rest
# >> ["rgb", "255", "255", "255"]
# >> ["hex", "FFFFFF", "hex", "EEEEEE"]
# >> ["start", "before", "after", "end"]

Related

Codewars: "Return or rotate": why isn't my attempted solution working?

These were the instructions given on Codewars (https://www.codewars.com/kata/56b5afb4ed1f6d5fb0000991/train/ruby):
The input is a string str of digits. Cut the string into chunks (a chunk here is a substring of the initial string) of size sz (ignore the last chunk if its size is less than sz).
If a chunk represents an integer such as the sum of the cubes of its digits is divisible by 2, reverse that chunk; otherwise rotate it to the left by one position. Put together these modified chunks and return the result as a string.
If
sz is <= 0 or if str is empty return ""
sz is greater (>) than the length of str it is impossible to take a chunk of size sz hence return "".
Examples:
revrot("123456987654", 6) --> "234561876549"
revrot("123456987653", 6) --> "234561356789"
revrot("66443875", 4) --> "44668753"
revrot("66443875", 8) --> "64438756"
revrot("664438769", 8) --> "67834466"
revrot("123456779", 8) --> "23456771"
revrot("", 8) --> ""
revrot("123456779", 0) --> ""
revrot("563000655734469485", 4) --> "0365065073456944"
This was my code (in Ruby):
def revrot(str, sz)
# your code
if sz > str.length || str.empty? || sz <= 0
""
else
arr = []
while str.length >= sz
arr << str.slice!(0,sz)
end
arr.map! do |chunk|
if chunk.to_i.digits.reduce(0) {|s, n| s + n**3} % 2 == 0
chunk.reverse
else
chunk.chars.rotate.join
end
end
arr.join
end
end
It passed 13/14 test and the error I got back was as follows:
STDERR/runner/frameworks/ruby/cw-2.rb:38:in `expect': Expected: "", instead got: "095131824330999134303813797692546166281332005837243199648332767146500044" (Test::Error)
from /runner/frameworks/ruby/cw-2.rb:115:in `assert_equals'
from main.rb:26:in `testing'
from main.rb:84:in `random_tests'
from main.rb:89:in `<main>'
I'm not sure what I did wrong, I have been trying to find what it could be for over an hour. Could you help me?

I will let someone else identify the problem with you code. I merely wish to show how a solution can be speeded up. (I will not include code to deal with edge cases, such as the string being empty.)
You can make use of two observations:
the cube of an integer is odd if and only if the integer is odd; and
the sum of collection of integers is odd if and only if the number of odd integers is odd.
We therefore can write
def sum_of_cube_odd?(str)
str.each_char.count { |c| c.to_i.odd? }.odd?
end
Consider groups of 4 digits in the last example, "563000655734469485".
sum_of_cube_odd? "5630" #=> false (so reverse -> "0365")
sum_of_cube_odd? "0065" #=> true (so rotate -> "0650")
sum_of_cube_odd? "5734" #=> true (so rotate -> "7345")
sum_of_cube_odd? "4694" #=> true (so rotate -> "6944")
so we are to return "0365065073456944".
Let's create another helper.
def rotate_chars_left(str)
str[1..-1] << s[0]
end
rotate_chars_left "0065" #=> "0650"
rotate_chars_left "5734" #=> "7345"
rotate_chars_left "4694" #=> "6944"
We can now write the main method.
def revrot(str, sz)
str.gsub(/.{,#{sz}}/) do |s|
if s.size < sz
''
elsif sum_of_cube_odd?(s)
rotate_chars_left(s)
else
s.reverse
end
end
end
revrot("123456987654", 6) #=> "234561876549"
revrot("123456987653", 6) #=> "234561356789"
revrot("66443875", 4) #=> "44668753"
revrot("66443875", 8) #=> "64438756"
revrot("664438769", 8) #=> "67834466"
revrot("123456779", 8) #=> "23456771"
revrot("563000655734469485", 4) #=> "0365065073456944"
It might be slightly faster to write
require 'set'
ODD_DIGITS = ['1', '3', '5', '7', '9'].to_set
#=> #<Set: {"1", "3", "5", "7", "9"}>
def sum_of_cube_odd?(str)
str.each_char.count { |c| ODD_DIGITS.include?(c) }.odd?
end

How can I improve the performance of this small Ruby function?

I am currently doing a Ruby challenge and get the error Terminated due to timeout
for some testcases where the string input is very long (10.000+ characters).
How can I improve my code?
Ruby challenge description
You are given a string containing characters A and B only. Your task is to change it into a string such that there are no matching adjacent characters. To do this, you are allowed to delete zero or more characters in the string.
Your task is to find the minimum number of required deletions.
For example, given the string s = AABAAB, remove A an at positions 0 and 3 to make s = ABAB in 2 deletions.
My function
def alternatingCharacters(s)
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
return counter
end
Thank you!

This could be faster returning the count:
str.size - str.chars.chunk_while{ |a, b| a == b }.to_a.size
The second part uses String#chars method in conjunction with Enumerable#chunk_while.
This way the second part groups in subarrays:
'aababbabbaab'.chars.chunk_while{ |a, b| a == b}.to_a
#=> [["a", "a"], ["b"], ["a"], ["b", "b"], ["a"], ["b", "b"], ["a", "a"], ["b"]]

Trivial if you can use squeeze:
str.length - str.squeeze.length
Otherwise, you could try a regular expression that matches those A (or B) that are preceded by another A (or B):
str.enum_for(:scan, /(?<=A)A|(?<=B)B/).count
Using enum_for avoids the creation of the intermediate array.

The main issue with:
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
Is the fact that you don't save chars into a variable. s.chars will rip apart the string into an array of characters. The first s.chars call outside the loop is fine. However there is no reason to do this for each character in s. This means if you have a string of 10.000 characters, you'll instantiate 10.001 arrays of size 10.000.
Re-using the characters array will give you a huge performance boost:
require 'benchmark'
s = ''
options = %w[A B]
10_000.times { s << options.sample }
Benchmark.bm do |x|
x.report do
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
# create a character array for each iteration ^
end
x.report do
counter = 0
chars = s.chars # <- only create a character array once
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
end
user system total real
8.279767 0.000001 8.279768 ( 8.279655)
0.002188 0.000003 0.002191 ( 0.002191)
You could also make use of enumerator methods like each_cons and count to simplify the code, this doesn't increase performance cost a lot, but makes the code a lot more readable.
Benchmark.bm do |x|
x.report do
counter = 0
chars = s.chars
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
x.report do
s.each_char.each_cons(2).count { |a, b| a == b }
# ^ using each_char instead of chars to avoid
# instantiating a character array
end
end
user system total real
0.002923 0.000000 0.002923 ( 0.002920)
0.003995 0.000000 0.003995 ( 0.003994)

Compress a number as a string to fit in 256 char space

I'm trying to use a bitmask to provide as many binary values as possible so that the final value will store in the limited allocate memory for a string. My current methodology is to find a maximum number and convert it to a string base-36.
value = (0 | (1<<1318)).to_s(36)
The result is 255 chars of a compressed number from which I can extract my original number of 1318. The downside is I'm limited to 1,318 binary values and I want to expand that number. Are there any alternative strategies in Ruby to compress this number even further?

You can always encode your number into base s and then represent that as string with whatever alphabet you want.
def encode(n, alphabet)
s = alphabet.size
res = []
while (n > 0)
res << n % s
n = n / s
end
res.reverse.map { |i| alphabet[i] }.join
end
Your method is then equivalent to encode(n, alphabet), where alphabet is defined as
alphabet = ((0..9).to_a + ("a".."z").to_a).join
# => "0123456789abcdefghijklmnopqrstuvwxyz"
But you might as well use all possible characters instead of only 36 of them:
extended_alphabet = (0..255).map { |i| i.chr }.join
This gives a total of (256 ** 255) possibilities, i.e. up to (2 ** 2040), which is much better than your actual (2 ** 1318).
This encoding happens to be optimal because each character of your string can have at most 256 different values, and all of them are used here.
Decoding can then be performed as follows:
def decode(encoded, alphabet)
s = alphabet.size
n = 0
decode_dict = {}; i = -1
alphabet.each_char { |c| decode_dict[c] = (i += 1) }
encoded.each_char do |c|
n = n * s + decode_dict[c]
end
n
end
If you are going to use a fixed alphabet for all your encodings, I would suggest computing the decoding dictionnary outside of the function and taking it as a parameter instead of alphabet, to avoid computing it every time you try to encode a number.

Numbers are non-negative
If the numbers are non-negative we can encode each 8-bits of each number to a character that is part of a string, and then decode the string by converting each character to 8 bits of the number.
def encode(n)
str = ''
until n.zero?
str << (n & 255).chr
n = n >> 8
end
str.reverse
end
def decode(str)
str.each_char.reduce(0) { |n,c| (n << 8) | c.ord }
end
This uses the following bit-manipulation methods in the class Integer: &, >>, << and |.
def test(n)
encoded = encode(n)
puts "#{n} => #{encoded} => #{decode(encoded)}"
end
test 1 # 1 => ["\u0001"] => 1
test 63 # 63 => ["?"] => 63
test 64 # 64 => ["#"] => 64
test 255 # 255 => ["\xFF"] => 255
test 256 # 256 => ["\u0001", "\u0000"] => 256
test 123456 # 123456 => ["\x01", "\xE2", "#"] => 123456
For example,
n = 123456
n.to_s(2)
#=> "11110001001000000"
so
n = 0b11110001001000000
#=> 123456
The bytes of this number can be visualized so:
00000001 11100010 01000000
We see that
a = [0b00000001, 0b11100010, 0b01000000]
a.map(&:chr)
#=> ["\x01", "\xE2", "#"]
Numbers can be negative
If the numbers to be encoded can be negative we need to first convert then to their absolute values then add some information to the encoded string that indicates whether they are non-negative or negative. That will require at least one additional byte so we might include a "+" for non-negative numbers and a "-" for negative numbers.
def encode(n)
sign = "+"
if n < 0
sign = "-"
n = -n
end
str = ''
until n.zero?
str << (n & 255).chr
n = n >> 8
end
(str << sign).reverse
end
def decode(str)
n = str[1..-1].each_char.reduce(0) { |n,c| (n << 8) | c.ord }
str[0] == '+' ? n : -n
end
test -255 # -255 => ["-", "\xFF"] => -255
test -256 # -256 => ["-", "\u0001", "\u0000"] => -256
test -123456 # -123456 => ["-", "\x01", "\xE2", "#"] => -123456
test 123456 # 123456 => ["+", "\x01", "\xE2", "#"] => 123456

How to implement Caesar Cipher in Ruby?

I'm learning Ruby on theOdinProject and I need to build the Caeser Cipher. Here's my code:
def caesar_cipher plaintext, factor
codepoints_array = []
ciphertext = ""
plaintext.split('').each do |letter|
if letter =~ /[^a-zA-Z]/
codepoints_array << letter.bytes.join('').to_i
else
codepoints_array << letter.bytes.join('').to_i + factor
end
end
ciphertext = codepoints_array.pack 'C*'
puts ciphertext
end
caesar_cipher("What a string!", 5)
I bet it's not really "elegant" but the main issue here is that the output should be "Bmfy f xywnsl!" but what do I have is "\mfy f xywnsl!". I've been struggling with this task for a couple of days now, but I still have no idea how to "chain" the alphabet so 'z' becomes 'a' with the factor == 1.
I could check the finished tasks of the other people on theOdinProject but their code usually different/more professional and I tried to get a hint, not the final solution. I'll be really thankful if someone could hint me how to resolve this. Thank you in advance.

Hints
Your code would almost work fine if the ASCII table had only 26 characters.
But W is not w, and after z comes {, not a.
So you first need to apply downcase to your letters, offset the bytecode so that a is 0, and do every calculation modulo 26.
Modified version
def caesar_cipher plaintext, factor
codepoints_array = []
ciphertext = ""
a_codepoint = 'a'.ord
plaintext.split('').each do |letter|
if letter =~ /[^a-zA-Z]/
codepoints_array << letter.bytes.join('').to_i
else
shifted_codepoint = letter.downcase.bytes.join('').to_i + factor
codepoints_array << (shifted_codepoint - a_codepoint) % 26 + a_codepoint
end
end
ciphertext = codepoints_array.pack 'C*'
ciphertext
end
puts caesar_cipher("What a string!", 5) #=> "bmfy f xywnsl!"
Another solution
I wrote a small Ruby script for Vigenere chiper a while ago. Caesar cipher is just a variant of it, with the same factor for every character :
class Integer
# 0 => 'a', 1 => 'b', ..., 25 => 'z', 26 => 'a'
def to_letter
('a'.ord + self % 26).chr
end
end
class String
# 'A' => '0', 'a' => 0, ..., 'z' => 25
def to_code
self.downcase.ord - 'a'.ord
end
end
def caesar_cipher(string, factor)
short_string = string.delete('^A-Za-z')
short_string.each_char.map do |char|
(char.to_code + factor).to_letter
end.join
end
puts caesar_cipher("What a string!", 5) #=> "bmfyfxywnsl"
puts caesar_cipher("bmfyfxywnsl", -5) #=> "whatastring"
With ciphers, it is recommended to remove any punctuation sign or whitespace, because they make it much easier to decode the string with statistical analysis.
Caesar cipher is very weak anyway.

How can I convert a human-readable number to a computer-readable number in Ruby?

I'm working in Ruby with an array that contains a series of numbers in human-readable format (e.g., 2.5B, 1.27M, 600,000, where "B" stands for billion, "M" stands for million). I'm trying to convert all elements of the array to the same format.
Here is the code I've written:
array.each do |elem|
if elem.include? 'B'
elem.slice! "B"
elem = elem.to_f
elem = (elem * 1000000000)
else if elem.include? 'M'
elem.slice! "M"
elem = elem.to_f
elem = (elem * 1000000)
end
end
When I inspect the elements of the array using puts(array), however, the numbers appear with the "B" and "M" sliced off but the multiplication conversion does not appear to have been applied (e.g., the numbers now read 2.5, 1.27, 600,000, instead of 2500000000, 1270000, 600,000).
What am I doing wrong?

First thing to note is that else if in ruby is elsif. See http://www.tutorialspoint.com/ruby/ruby_if_else.htm
Here is a working function for you to try out:
def convert_array_items_from_human_to_integers(array)
array.each_with_index do |elem,i|
if elem.include? 'B'
elem.slice! "B"
elem = elem.to_f
elem = (elem * 1000000000)
elsif elem.include? 'M'
elem.slice! "M"
elem = elem.to_f
elem = (elem * 1000000)
end
array[i] = elem
end
return array
end
Calling convert_array_items_from_human_to_integers(["2.5B", "1.2M"])
returns [2500000000.0, 1200000.0]

Another variation:
array = ['2.5B', '1.27M', '$600000']
p array.each_with_object([]) { |i, a|
i = i.gsub('$', '')
a << if i.include? 'B'
i.to_f * 1E9
elsif i.include? 'M'
i.to_f * 1E6
else
i.to_f
end
}
#=> [2500000000.0, 1270000.0, 600000.0]

Try this:
array.map do |elem|
elem = elem.gsub('$','')
if elem.include? 'B'
elem.to_f * 1000000000
elsif elem.include? 'M'
elem.to_f * 1000000
else
elem.to_f
end
end
This uses map instead of each to return a new array. Your attempt assigns copies of the array elements, leaving the original array in place (except for the slice!, which modifies in place). You can dispense with the slicing in the first place, since to_f will simply ignore any non-numeric characters.
EDIT:
If you have leading characters such as $2.5B, as your question title indicates (but not your example), you'll need to strip those explicitly. But your sample code doesn't handle those either, so I assume that's not an issue.

Expanding a bit on pjs' answer:
array.each do |elem|
elem is a local variable pointing to each array element, one at a time. When you do this:
elem.slice! "B"
you are sending a message to that array element telling it to slice the B. And you're seeing that in the end result. But when you do this:
elem = elem.to_f
now you've reassigned your local variable elem to something completely new. You haven't reassigned what's in the array, just what elem is.

Here's how I'd go about it:
ARY = %w[2.5B 1.27M 600,000]
def clean_number(s)
s.gsub(/[^\d.]+/, '')
end
ARY.map{ |v|
case v
when /b$/i
clean_number(v).to_f * 1_000_000_000
when /m$/i
clean_number(v).to_f * 1_000_000
else
clean_number(v).to_f
end
}
# => [2500000000.0, 1270000.0, 600000.0]
The guts of the code are in the case statement. A simple check for the multiplier allows me to strip the undesired characters and multiply by the right value.
Normally we could use to_f to find the floating-point number to be multiplied for strings like '1.2', but it breaks down for things like '$1.2M' because of the "$". The same thing is true for embedded commas marking thousands:
'$1.2M'.to_f # => 0.0
'1.2M'.to_f # => 1.2
'6,000'.to_f # => 6.0
'6000'.to_f # => 6000.0
To fix the problem for simple strings containing just the value, it's not necessary to do anything fancier than stripping undesirable characters using gsub(/[^\d.]+/, ''):
'$1.2M'.gsub(/[^\d.]+/, '') # => "1.2"
'1.2M'.gsub(/[^\d.]+/, '') # => "1.2"
'6,000'.gsub(/[^\d.]+/, '') # => "6000"
'6000'.gsub(/[^\d.]+/, '') # => "6000"
[^\d.] means "anything NOT a digit or '.'.
Be careful how you convert your decimal values to integers. You could end up throwing away important precision:
'0.2M'.gsub(/[^\d.]+/, '').to_f * 1_000_000 # => 200000.0
'0.2M'.gsub(/[^\d.]+/, '').to_i * 1_000_000 # => 0
('0.2M'.gsub(/[^\d.]+/, '').to_f * 1_000_000).to_i # => 200000
Of course all this breaks down if your string is more complex than a simple number and multiplier. It's easy to break down a string and identify those sort of sub-strings, but that's a different question.

I would do it like this:
Code
T, M, B = 1_000, 1_000_000, 1_000_000_000
def convert(arr)
arr.map do |n|
m = n.gsub(/[^\d.TMB]/,'')
m.to_f * (m[-1][/[TMB]/] ? Object.const_get(m[-1]) : 1)
end
end
Example
arr = %w[$2.5B 1.27M 22.5T, 600,000]
convert(arr)
# => [2500000000.0, 1270000.0, 22500.0, 600000.0]
Explanation
The line
m = n.gsub(/[^\d.TMB]/,'')
# => ["2.5B", "1.27M", "22.5T", "600000"]
merely eliminates unwanted characters.
m.to_f * (m[-1][/[TMB]/] ? Object.const_get(m[-1]) : 1)
returns the product of the string converted to a float and a constant given by the last character of the string, if that character is T, M or B, else 1.
Actual implementation might be like this:
class A
T, M, B = 1_000, 1_000_000, 1_000_000_000
def doit(arr)
c = self.class.constants.map(&:to_s).join
arr.map do |n|
m = n.gsub(/[^\d.#{c}]/,'')
m.to_f * (m[-1][/[#{c}]/] ? self.class.const_get(m[-1]) : 1)
end
end
end
If we wished to change the reference for 1,000 from T to K and add T for trillion, we would need only change
T, M, B = 1_000, 1_000_000, 1_000_000_000
to
K, M, B, T = 1_000, 1_000_000, 1_000_000_000, 1_000_000_000_000

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parser in Ruby: #slice! inside #each_with_index = missing element - ruby

Related

Codewars: "Return or rotate": why isn't my attempted solution working?

How can I improve the performance of this small Ruby function?

Compress a number as a string to fit in 256 char space

How to implement Caesar Cipher in Ruby?

How can I convert a human-readable number to a computer-readable number in Ruby?

Categories

Resources