Find nth occurrence of variable regex in Ruby? - ruby

Writing a method for what the question says, need to find the index of the nth occurrence of a particular left bracket (defined by the user, i.e. if user provides a string with the additional parameters '{' and '5' it will find the 5th occurrence of this, same with '(' and '[').
Currently doing it with a while loop and comparing each character but this looks ugly and isn't very interesting, is there a way to do this with regex? Can you use a variable in a regex?
def _find_bracket_n(str,left_brac,brackets_num)
i = 0
num_of_left_bracs = 0
while i < str.length && num_of_left_bracs < brackets_num
num_of_left_bracs += 1 if str[i] == left_brac
i += 1
end
n_th_lbrac_index = i - 1
end

The offset of the nth instance of a given character in a string is wanted, or nil if the string contains fewer than n instances of that character. I will give four solutions.
chr = "("
str = "a(b(cd((ef(g(hi("
n = 5
Use Enumerable#find_index
str.each_char.find_index { |c| c == chr && (n = n-1).zero? }
#=> 10
Use a regular expression
chr_esc = Regexp.escape(chr)
#=> "\\("
r = /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
) # end the non-capture group
{#{n-1}} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
/x # free-spacing regex definition mode
#=> /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
\( # match the given character
) # end the non-capture group
{4} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
\( # match the given character
/x
str =~ r
#=> 0
$~.end(0)-1
#=> 10
For the last line we could instead write
Regexp.last_match.end(0)-1
See Regexp::escape, Regexp::last_match and MatchData#end.
The regex is conventionally written (i.e., not free-spacing mode) written as follows.
/\A(?:.*?#{chr_esc}){#{n-1}}.*?#{chr_esc}/
Convert characters to offsets, remove offsets to non-matching characters and return the nth offset of those that remain
str.size.times.select { |i| str[i] == chr }[n-1]
#=> 10
n = 20
str.size.times.select { |i| str[i] == chr }[n-1]
#=> nil
Use String#index repeatedly to decapitate substrings
s = str.dup
n.times.reduce(0) do |off,_|
i = s.index(chr)
break nil if i.nil?
s = s[i+1..-1]
off + i + 1
end - 1
#=> 10

Related

How to sum values from csv file with Ruby

I have a Csv file with several columns. The 4th Column has a format that I want to parse. String str below would be one line of the file:
str = "108,882,xyz, { Abc:{-} Val1:{6845} Val2:{653} llsh:{0} xTime: {2018-11-10 09:56:12} Yub:{Rtv} Val1:{807} Val2:{153} llsh:{0} xTime: {2018-11-10 09:59:05}A Wbc:{57} Val1:{441} Val2:{875} llsh:{0} xTime: {2018-11-10 10:13:12:22}"
For this 4th column I'd like to sum all Val1 and Val2 present within the string and show the first and last date as a new column. If Val1 and Val2 appear only once, then there is sum to do and output would be the values of Val1, Val2 and xTime.
The output would be:
Col1, Col2, Col3, Val1, Val2 , xTime
108, 882, xyz, 8093, 16821, 2018-11-10 09:56:12 - 2018-11-10 10:13:12:22
I'm trying with CSV.parse.
require 'csv'
CSV.parse(str)
For 4th column do
//Parse
How can I do this in Ruby?
Thanks for any help
The essence of this problem is extracting the desired information from the part of the string that follows "108,882,xyz, ", as opposed to how a CSV string is to be parsed, so I will confine my attention to the former.
r = /
Val1:\{ # match string
(\d+) # match > 0 digits in capture group 1
\}\ +Val2:\{ # match string
(\d+) # match > 0 digits in capture group 2
\}\ +[^\}]+\}\ +xTime:\ +\{ # match string
(.+?) # match > 0 characters lazily in capture group 3
\} # match string
/x # free-spacing regex definition mode
This regular expression is conventionally written as follows:
/Val1:\{(\d+)\} +Val2:\{(\d+)\} +[^\}]+\} +xTime: +\{(.+?)\}/
Notice that when using free-spacing mode space characters would be stripped out by the parser if they were not protected in some way. There are a few ways of protecting them. I have chosen to escape each space character. Free-spacing mode has the advantage that it makes the regular expression self-documenting.
a = str.scan(r)
#=> [["6845", "653", "2018-11-10 09:56:12"],
# [ "807", "153", "2018-11-10 09:59:05"],
# [ "441", "875", "2018-11-10 10:13:12:22"]]
val1, val2, (f,*,l) = a.transpose
#=> [["6845", "807", "441"],
# [ "653", "153", "875"],
# ["2018-11-10 09:56:12", "2018-11-10 09:59:05", "2018-11-10 10:13:12:22"]]
val1
#=> ["6845", "807", "441"]
val2
#=> ["653", "153", "875"]
f #=> "2018-11-10 09:56:12"
l #=> "2018-11-10 10:13:12:22"
def convert(arr)
arr.map(&:to_i).sum
end
convert(val1)
#=> 8093
convert(val2)
#=> 1681
"%s - %s" % [f,l]
#=> "2018-11-10 09:56:12 - 2018-11-10 10:13:12:22"
See String#scan.

Regex for name in Ruby

I know this question has been asked a lot but I need a RegEx for a name validator.
The only requirements are letters are okay, No numbers, and no special characters other than 2 and the spaces cannot be at the beginning or end, the "-" and "`" are allowed also. Everything else would be invalid.
All the other answers seem to ask for a lot more and seem to get too complicated.
Currently I am using
/^([^\d\W]|[-])*$/
But this fails with the space
Sample data:
Pass:
Susan Johnson,
Stephanie Le'Sean,
John Pierre'-Frank
Fail:
Ricky2Good,
Jean,stewie,
Mike#dude,
Jim. McNeil
I've assumed that for a string to be valid, it may contain only uppercase and lowercase letters, apostrophes, dashes and at most two spaces, provided the spaces are not at the beginning or end of the string.
STR= "-a-z'"
r = /
\A # match beginning of string
(?: # begin non-capture group
[#{STR}]+ # match 1+ letters, "-" or "'"
| # or
[#{STR}]+\s[#{STR}]*\s?[#{STR}]+
# match 1+ letters, "-" or "'", space, 0+ letters, "-" or "'",
# optional space, 1+ letters, "-" or "'"
) # end non-capture group
\z # match end of string
/ix # case-indifferent and free-spacing regex definition modes
#=> /
\A # match beginning of string
(?: # begin non-capture group
[-a-z']+ # match 1+ letters, "-" or "'"
| # or
[-a-z']+\s[-a-z']*\s?[-a-z']+
# match 1+ letters, "-" or "'", space, 0+ letters, "-" or "'",
# optional space, 1+ letters, "-" or "'"
) # end non-capture group
\z # match end of string
/ix
If I did not use free-spacing mode to define the regex it would look like this:
r = /\A(?:[-a-z']+|[-a-z']+\s[-a-z']*\s?[-a-z']+)\z/i
"a B-' v" =~ r #=> 0
"aB-'v" =~ r #=> 0
"aB-'1v" =~ r #=> nil
"a B-'1 v" =~ r #=> nil
" a B-1v" =~ r #=> nil
If you wish to return true or false, rather than a truthy value 0 or a falsy value nil, you could write, for example:
("a B-' v" =~ r) ? true : false #=> true
or (the "trick")
!!("a B-' v" =~ r) #=> true
The latter works because it is the same as:
!(!("a B-' v" =~ r))
#=> !(!(0)) => !(false) => true
The question asks for a regex to validate names. Using a regex may be the best, but it's not the only way. If the question is really how to validate names--using a regex or otherwise--it should be stated in a way that doesn't stipulate a particular approach. Here's one way to validate without using a regex.
GOOD_CHARS = ('a'..'z').to_a.join << "'-"
#=> "abcdefghijklmnopqrstuvwxyz'-"
def validate(str)
return false if str.empty? || (str[0]==' ' || str[-1]==' ')
nbr_spaces = str.count(' ')
return false if nbr_spaces > 2
str.downcase.count(GOOD_CHARS) + nbr_spaces == str.size
end
validate "a B-' v" #=> true
validate "aB-'v" #=> true
validate "aB-`1v" #=> false
validate "a B-'1 v" #=> false
validate " a B-'1v" #=> false
The following regex should filter for letters, no special characters (other than one space, dashes, and backticks), and no numbers:
/^[a-zA-Z\-\`]++(?: [a-zA-Z\-\`]++)?$/
Hope it helps!

i have a regular expression that i need to figure out

"peter,nick,jake,jack"
i need to have something like this.
i cannot have any whitespace after the word for example,
"peter,," "peter," "peter,,nick " will all be incorrect.
it has to be just a word such as "peter" or a word follow by a comma then word ("peter,nick")
First confirm that the string has the required structure.
r = /
\A # match the beginning of the string
[[:alpha:]]+ # match > 0 letters
(?:,[[:alpha:]]+) # match a comma then > 0 letters in a non-capture group
* # match the preceding non-capture group >= 0 times
\z # match end of the string
/x # free-spacing regex definition mode
str = "peter,nick,jake,jack"
str =~ r #=> 0
Since it matches the regex, simply split on commas to return an array of the words.
str.split(',') #=> ["peter", "nick", "jake", "jack"]
By contrast:
"peter,nick,,jake,jack" =~ r #=> nil
"peter,nick,jake, jack" =~ r #=> nil
"peter,nick,jake,jack " =~ r #=> nil
"peter ispeter,nick" =~ r #=> nil
I assume the string must contain at least one letter.

Removing trailings zeros in string

I have a string and I need to remove trailing zeros after the 2nd decimal place:
remove_zeros("1,2,3,4.2300") #=> "1,2,3,4.23"
remove_zeros("1,2,3,4.20300") #=> "1,2,3,4.203"
remove_zeros("1,2,3,4.0200") #=> "1,2,3,4.02"
remove_zeros("1,2,3,4.0000") #=> "1,2,3,4.00"
Missing zeros don't have to be appended, i.e.
remove_zeros("1,2,3,4.0") #=> "1,2,3,4.0"
How could I do this in Ruby? I tried with converting into Float but it terminates the string when I encounter a ,. Can I write any regular expression for this?
Yes, a regular expression could be used.
R = /
\. # match a decimal
\d*? # match one or more digits lazily
\K # forget all matches so far
0+ # match one or more zeroes
(?!\d) # do not match a digit (negative lookahead)
/x # free-spacing regex definition mode
def truncate_floats(str)
str.gsub(R,"")
end
truncate_floats "1,2,3,4.2300"
#=> "1,2,3,4.23"
truncate_floats "1.34000,2,3,4.23000"
#=> "1.34,2,3,4.23"
truncate_floats "1,2,3,4.23003500"
#=> "1,2,3,4.230035"
truncate_floats "1,2,3,4.3"
#=> "1,2,3,4.3"
truncate_floats "1,2,3,4.000"
#=> "1,2,3,4."
> a = "1,2,3,4.2300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.23"
> a = "1,2,3,4.20300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.203"
First, you need to parse the string into its component numbers, then remove the trailing zeros on each number. This can be done by:
1) splitting the string on ',' to get an array of numeric strings
2) for each numeric string, convert it to a Float, then back to a string:
#!/usr/bin/env ruby
def parse_and_trim(string)
number_strings = string.split(',')
number_strings.map { |s| Float(s).to_s }.join(',')
end
p parse_and_trim('1,2,3,4.2300') # => "1.0,2.0,3.0,4.23"
If you really want to remove the trailing '.0' fragments, you could replace the script with this one:
#!/usr/bin/env ruby
def parse_and_trim_2(string)
original_strings = string.split(',')
converted_strings = original_strings.map { |s| Float(s).to_s }
trimmed_strings = converted_strings.map do |s|
s.end_with?('.0') ? s[0..-3] : s
end
trimmed_strings.join(',')
end
p parse_and_trim_2('1,2,3,4.2300') # => "1,2,3,4.23"
These could of course be made more concise, but I've used intermediate variables to clarify what's going on.

How to write a regex in a single line

I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.

Resources