How to sum values from csv file with Ruby - ruby

I have a Csv file with several columns. The 4th Column has a format that I want to parse. String str below would be one line of the file:
str = "108,882,xyz, { Abc:{-} Val1:{6845} Val2:{653} llsh:{0} xTime: {2018-11-10 09:56:12} Yub:{Rtv} Val1:{807} Val2:{153} llsh:{0} xTime: {2018-11-10 09:59:05}A Wbc:{57} Val1:{441} Val2:{875} llsh:{0} xTime: {2018-11-10 10:13:12:22}"
For this 4th column I'd like to sum all Val1 and Val2 present within the string and show the first and last date as a new column. If Val1 and Val2 appear only once, then there is sum to do and output would be the values of Val1, Val2 and xTime.
The output would be:
Col1, Col2, Col3, Val1, Val2 , xTime
108, 882, xyz, 8093, 16821, 2018-11-10 09:56:12 - 2018-11-10 10:13:12:22
I'm trying with CSV.parse.
require 'csv'
CSV.parse(str)
For 4th column do
//Parse
How can I do this in Ruby?
Thanks for any help

The essence of this problem is extracting the desired information from the part of the string that follows "108,882,xyz, ", as opposed to how a CSV string is to be parsed, so I will confine my attention to the former.
r = /
Val1:\{ # match string
(\d+) # match > 0 digits in capture group 1
\}\ +Val2:\{ # match string
(\d+) # match > 0 digits in capture group 2
\}\ +[^\}]+\}\ +xTime:\ +\{ # match string
(.+?) # match > 0 characters lazily in capture group 3
\} # match string
/x # free-spacing regex definition mode
This regular expression is conventionally written as follows:
/Val1:\{(\d+)\} +Val2:\{(\d+)\} +[^\}]+\} +xTime: +\{(.+?)\}/
Notice that when using free-spacing mode space characters would be stripped out by the parser if they were not protected in some way. There are a few ways of protecting them. I have chosen to escape each space character. Free-spacing mode has the advantage that it makes the regular expression self-documenting.
a = str.scan(r)
#=> [["6845", "653", "2018-11-10 09:56:12"],
# [ "807", "153", "2018-11-10 09:59:05"],
# [ "441", "875", "2018-11-10 10:13:12:22"]]
val1, val2, (f,*,l) = a.transpose
#=> [["6845", "807", "441"],
# [ "653", "153", "875"],
# ["2018-11-10 09:56:12", "2018-11-10 09:59:05", "2018-11-10 10:13:12:22"]]
val1
#=> ["6845", "807", "441"]
val2
#=> ["653", "153", "875"]
f #=> "2018-11-10 09:56:12"
l #=> "2018-11-10 10:13:12:22"
def convert(arr)
arr.map(&:to_i).sum
end
convert(val1)
#=> 8093
convert(val2)
#=> 1681
"%s - %s" % [f,l]
#=> "2018-11-10 09:56:12 - 2018-11-10 10:13:12:22"
See String#scan.

Related

Find nth occurrence of variable regex in Ruby?

Writing a method for what the question says, need to find the index of the nth occurrence of a particular left bracket (defined by the user, i.e. if user provides a string with the additional parameters '{' and '5' it will find the 5th occurrence of this, same with '(' and '[').
Currently doing it with a while loop and comparing each character but this looks ugly and isn't very interesting, is there a way to do this with regex? Can you use a variable in a regex?
def _find_bracket_n(str,left_brac,brackets_num)
i = 0
num_of_left_bracs = 0
while i < str.length && num_of_left_bracs < brackets_num
num_of_left_bracs += 1 if str[i] == left_brac
i += 1
end
n_th_lbrac_index = i - 1
end
The offset of the nth instance of a given character in a string is wanted, or nil if the string contains fewer than n instances of that character. I will give four solutions.
chr = "("
str = "a(b(cd((ef(g(hi("
n = 5
Use Enumerable#find_index
str.each_char.find_index { |c| c == chr && (n = n-1).zero? }
#=> 10
Use a regular expression
chr_esc = Regexp.escape(chr)
#=> "\\("
r = /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
) # end the non-capture group
{#{n-1}} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
/x # free-spacing regex definition mode
#=> /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
\( # match the given character
) # end the non-capture group
{4} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
\( # match the given character
/x
str =~ r
#=> 0
$~.end(0)-1
#=> 10
For the last line we could instead write
Regexp.last_match.end(0)-1
See Regexp::escape, Regexp::last_match and MatchData#end.
The regex is conventionally written (i.e., not free-spacing mode) written as follows.
/\A(?:.*?#{chr_esc}){#{n-1}}.*?#{chr_esc}/
Convert characters to offsets, remove offsets to non-matching characters and return the nth offset of those that remain
str.size.times.select { |i| str[i] == chr }[n-1]
#=> 10
n = 20
str.size.times.select { |i| str[i] == chr }[n-1]
#=> nil
Use String#index repeatedly to decapitate substrings
s = str.dup
n.times.reduce(0) do |off,_|
i = s.index(chr)
break nil if i.nil?
s = s[i+1..-1]
off + i + 1
end - 1
#=> 10

i have a regular expression that i need to figure out

"peter,nick,jake,jack"
i need to have something like this.
i cannot have any whitespace after the word for example,
"peter,," "peter," "peter,,nick " will all be incorrect.
it has to be just a word such as "peter" or a word follow by a comma then word ("peter,nick")
First confirm that the string has the required structure.
r = /
\A # match the beginning of the string
[[:alpha:]]+ # match > 0 letters
(?:,[[:alpha:]]+) # match a comma then > 0 letters in a non-capture group
* # match the preceding non-capture group >= 0 times
\z # match end of the string
/x # free-spacing regex definition mode
str = "peter,nick,jake,jack"
str =~ r #=> 0
Since it matches the regex, simply split on commas to return an array of the words.
str.split(',') #=> ["peter", "nick", "jake", "jack"]
By contrast:
"peter,nick,,jake,jack" =~ r #=> nil
"peter,nick,jake, jack" =~ r #=> nil
"peter,nick,jake,jack " =~ r #=> nil
"peter ispeter,nick" =~ r #=> nil
I assume the string must contain at least one letter.

Removing trailings zeros in string

I have a string and I need to remove trailing zeros after the 2nd decimal place:
remove_zeros("1,2,3,4.2300") #=> "1,2,3,4.23"
remove_zeros("1,2,3,4.20300") #=> "1,2,3,4.203"
remove_zeros("1,2,3,4.0200") #=> "1,2,3,4.02"
remove_zeros("1,2,3,4.0000") #=> "1,2,3,4.00"
Missing zeros don't have to be appended, i.e.
remove_zeros("1,2,3,4.0") #=> "1,2,3,4.0"
How could I do this in Ruby? I tried with converting into Float but it terminates the string when I encounter a ,. Can I write any regular expression for this?
Yes, a regular expression could be used.
R = /
\. # match a decimal
\d*? # match one or more digits lazily
\K # forget all matches so far
0+ # match one or more zeroes
(?!\d) # do not match a digit (negative lookahead)
/x # free-spacing regex definition mode
def truncate_floats(str)
str.gsub(R,"")
end
truncate_floats "1,2,3,4.2300"
#=> "1,2,3,4.23"
truncate_floats "1.34000,2,3,4.23000"
#=> "1.34,2,3,4.23"
truncate_floats "1,2,3,4.23003500"
#=> "1,2,3,4.230035"
truncate_floats "1,2,3,4.3"
#=> "1,2,3,4.3"
truncate_floats "1,2,3,4.000"
#=> "1,2,3,4."
> a = "1,2,3,4.2300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.23"
> a = "1,2,3,4.20300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.203"
First, you need to parse the string into its component numbers, then remove the trailing zeros on each number. This can be done by:
1) splitting the string on ',' to get an array of numeric strings
2) for each numeric string, convert it to a Float, then back to a string:
#!/usr/bin/env ruby
def parse_and_trim(string)
number_strings = string.split(',')
number_strings.map { |s| Float(s).to_s }.join(',')
end
p parse_and_trim('1,2,3,4.2300') # => "1.0,2.0,3.0,4.23"
If you really want to remove the trailing '.0' fragments, you could replace the script with this one:
#!/usr/bin/env ruby
def parse_and_trim_2(string)
original_strings = string.split(',')
converted_strings = original_strings.map { |s| Float(s).to_s }
trimmed_strings = converted_strings.map do |s|
s.end_with?('.0') ? s[0..-3] : s
end
trimmed_strings.join(',')
end
p parse_and_trim_2('1,2,3,4.2300') # => "1,2,3,4.23"
These could of course be made more concise, but I've used intermediate variables to clarify what's going on.

Ruby transform string of range measurements into a list of the measurements?

I have a sample string that I would like to transform, from this:
#21inch-#25inch
to this:
#21inch #22inch #23inch #24inch #25inch
Using Ruby, please show me how this can be done.
You can scan your string and working with range of strings:
numbers = "#21inch-#25inch".scan(/\d+/)
=> ["21", "25"]
Range.new(*numbers).map{ |s| "##{s}inch" }.join(" ")
=> "#21inch #22inch #23inch #24inch #25inch"
This solution working only if your string has a format like in your instance. For other cases you should write your own specific solution.
R = /
(\D*) # match zero or more non-digits in capture group 1
(\d+) # match one or more digits in capture group 2
([^\d-]+) # match on or more chars other the digits and hyphens in capture group 3
/x # free-spacing regex definition mode
def spin_out(str)
(prefix, first, units),(_, last, _) = str.scan(R)
(first..last).map { |s| "%s%s%s" % [prefix,s,units] }.join(' ')
end
spin_out "#21inch-#25inch"
#=> "#21inch #22inch #23inch #24inch #25inch"
spin_out "#45cm-#53cm"
#=> "#45cm #46cm #47cm #48cm #49cm #50cm #51cm #52cm #53cm"
spin_out "sz 45cm-sz 53cm"
#=> "sz 45cm sz 46cm sz 47cm sz 48cm sz 49cm sz 50cm sz 51cm sz 52cm sz 53cm"
spin_out "45cm-53cm"
#=> "45cm 46cm 47cm 48cm 49cm 50cm 51cm 52cm 53cm"
For str = "#21inch-#25inch", we obtain
(prefix, first, units),(_, last, _) = str.scan(R)
#=> [["#", "21", "inch"], ["-#", "25", "inch"]]
prefix
#=> "#"
first
#=> "21"
units
#=> "inch"
last
#=> "25"
The subsequent mapping is straightforward.
You can use a regex gsub with a block match replacement, like this:
string = "#21inch-#25inch"
new_string = string.gsub(/#\d+\w+-#\d+\w+/) do |match|
first_capture, last_capture = match.split("-")
first_num = first_capture.gsub(/\D+/, "").to_i
last_num = last_capture.gsub(/\D+/, "").to_i
pattern = first_capture.split(/\d+/)
(first_num..last_num).map {|num| pattern.join(num.to_s) }.join(" ")
end
puts "#{new_string}"
Running this will produce this output:
First: #21inch Last: #25inch
First num: 21 Last num: 25
Pattern: ["#", "inch"]
#21inch #22inch #23inch #24inch #25inch
The last line of output is the answer, and the previous lines show the progression of logic to get there.
This approach should work for other, slightly different unit formats, as well:
#32ft-#49ft
#1mm-5mm
#2acres-5acres
Making this suit multiple purposes will be quite simple. With a slight variation in the regex, you could also support a range format #21inch..#25inch:
/(#\d+\w+)[-.]+(#\d+\w+)/
Happy parsing!

How to write a regex in a single line

I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.

Resources