regex scan only returning first value - ruby

I have two strings that should both return matches according to the regex, but only str1 returns the expected match. str1 is an exact match for the regex (created by Avinash Raj) below. str2 contains str1 and more data. I expected str2 to return str1 and more values that matched, but it returns nothing Can someone explain why?
str1="3,15,14,31,40,5,5,4,5,3,4,4,5,2,2,2,1,2,1,1,3,3,3,2,4,3,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,3,3,3,2,3"
str2="3,15,14,31,40,5,5,4,5,3,4,4,5,2,2,2,1,2,1,1,3,3,3,2,4,3,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,3,3,3,2,3,3,15,14,35,27,4,5,3,5,3,2,4,4,2,1,1,2,2,2,1,3,3,3,2,5,9,true,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,true,true,false,false,false,false,2,2,3,2,3,3,15,16,34,53,4,4,4,3,1,3,4,3,1,1,1,1,1,1,1,2,3,2,3,5,1,true,false,false,false,false,false,true,false,false,false,false,false,false,false,true,true,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,15,18,37,29,4,4,4,3,2,3,3,4,1,1,1,1,1,1,1,1,3,1,2,4,1,true,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,15,20,34,37,4,4,4,3,1,3,3,4,1,1,1,1,1,1,1,1,1,1,2,4,1,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,false,true,false,false,3,1,3,1,3,3,16,10,18,30,4,3,3,3,1,3,3,3,1,1,1,1,1,1,1,1,2,1,4,4,3,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,3,2,3,2,3,3,16,12,39,5,5,5,4,5,3,5,5,5,1,1,1,1,1,1,1,2,1,1,1,5,10,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,3,2,3,2,3,3,16,14,18,27,4,4,4,4,2,3,3,4,1,1,1,1,1,1,1,1,1,1,2,5,1,true,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,16,16,18,32,5,5,5,5,4,5,5,5,1,1,1,1,1,1,1,2,1,1,1,5,3,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,3,2,3,2,3,3,16,18,20,7,5,5,5,5,3,3,3,4,1,1,1,1,1,1,1,1,1,1,2,5,1,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,16,20,18,59,4,4,4,3,1,1,1,2,1,1,1,1,1,1,1,1,2,2,4,5,9,false,false,false,true,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,17,10,16,9,3,3,3,3,1,2,3,3,1,1,1,1,1,1,1,1,2,1,3,5,1,true,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,17,12,16,17,4,3,4,2,1,4,3,2,1,1,1,1,1,1,1,1,1,1,4,5,3,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,17,14,16,21,4,4,4,4,1,3,4,4,1,1,1,1,1,1,1,1,1,1,2,5,1,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,true,false,false,3,2,3,2,3,3,17,16,16,20,5,5,4,5,3,4,4,5,1,1,1,1,1,1,1,1,1,1,1,5,8,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,true,false,false,false,3,2,3,2,3,3,17,18,16,31,4,4,4,4,1,4,3,3,1,1,1,1,1,1,1,1,1,1,3,5,1,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,true,false,false,false,false,3,2,3,2,3,3,17,20,18,8,5,5,4,5,4,4,4,5,1,1,1,1,1,1,1,2,1,1,1,5,1,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,true,false,false,false,3,2,3,2,3,3,18,10,31,33,3,2,3,2,2,2,2,3,1,1,1,1,1,1,1,1,1,1,1,5,7,true,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,18,12,36,11,4,4,4,5,3,4,3,3,1,1,2,1,2,1,2,2,1,1,1,5,1,false,false,false,true,false,false,true,false,false,false,false,false,false,true,false,true,false,false,false,false,false,false,false,false,true,false,false,3,2,3,2,3,3,18,14,49,6,3,3,2,2,1,2,2,2,2,1,1,1,2,1,2,3,3,4,4,5,9,true,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,18,16,32,53,3,4,4,3,3,3,3,3,1,1,1,1,1,1,2,2,1,1,3,5,7,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,true,false,false,false,false,3,2,3,2,3,3,18,18,37,59,5,4,4,4,4,4,4,4,1,1,1,1,1,1,1,2,1,1,2,5,7,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,true,false,false,false,false,3,2,3,2,3,3,19,10,5,25,4,4,4,2,2,4,3,3,1,1,1,1,1,1,1,1,2,2,2,5,1,true,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,2,1,3,2,3,3,19,13,0,5,5,5,4,5,3,3,5,5,1,1,1,1,1,1,1,1,1,1,3,5,7,false,false,true,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,3,2,3,2,3,3,19,14,5,23,4,4,4,4,3,4,3,3,1,1,1,1,1,1,1,1,1,2,2,5,9,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,3,2,3,2,3,3,19,16,7,19,5,4,4,4,3,4,3,3,1,1,1,1,1,1,1,2,2,2,3,5,9,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,3,2,3,2,3,3,19,18,6,30,4,4,4,4,3,4,4,4,1,1,1,1,1,1,1,1,1,1,1,5,8,false,false,true,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,false,true,false,false,false,true,false,false,3,2,3,2,3,3,19,20,8,25,4,4,5,4,3,4,3,4,1,1,1,1,1,1,1,1,1,1,3,5,1,false,false,true,false,false,false,true,false,false,false,false,false,false,true,false,false,false,false,false,false,false,false,false,false,true,false,false,3,2,3,2,3,3,19,21,18,2,4,4,4,3,3,4,3,4,1,1,1,1,1,1,1,1,1,1,1,5,1,false,false,true,false,false,false,false,false,false,false,false,false,false,true,false,false,false,false,true,false,true,false,false,false,true,false,false,3,2,3,2,3,"
str1.scan(/^,?(?:[1-5]\d|[1-9])(?:,(?:[1-5]\d|[1-9])){4}(?:,[1-5]){21}(?:,(?:true|false)){27}(?:,[1-5]){5}$/).each{|x|
puts x
puts "---1---"
}
str2.scan(/^,?(?:[1-5]\d|[1-9])(?:,(?:[1-5]\d|[1-9])){4}(?:,[1-5]){21}(?:,(?:true|false)){27}(?:,[1-5]){5}$/).each{|x|
puts x
puts "---2---"
}

Kind of by definition, you can't have more than one pattern match in a string when your pattern specifically says "start of string, then [stuff], then end of string". Look at regexp anchors ^ and $.
A simpler example might make it clearer: ^a$ "start of string, then letter a, then end of string" will match in "a" once, but will match in "aaa" zero times, even though there are three letters a.

$ assert position at end of a line
Now you are not matching upto the end of line.
^,?(?:[1-5]\d|[1-9])(?:,(?:[1-5]\d|[1-9])){4}(?:,[1-5]){21}(?:,(?:true|false)){27}(?:,[1-5]){5}
Just remove the $ from the end.See demo.
https://regex101.com/r/sJ9gM7/22

Because you're regular starts with the ^ metacharacter and ends with the $ metacharacter, it expects the full string to match.

Related

How do I match a space or the end of a line in a regexp in Ruby?

I'm using Ruby 2.4. I'm tryihng to write a regular expression to match a string in which teh first character is either an "a" or a "b" and then next character is a space or the end of the line. So I came up with
2.4.0 :006 > data = "B U"
=> "B U"
2.4.0 :007 > data =~ /^[ab](^[[:space:]]|$)/i
=> nil
But as you can see, my expression is not matching my string "B U" even though I thought I wrote it properly. How can I revise it to make it right?
I'm tryihng to write a regular expression to match a string in which teh first character is either an "a" or a "b" and then next character is a space or the end of the line.
The regex in Ruby will look like
/^[ab](?:[[:space:]]|$)/i
See the regex demo.
Your ^[ab](^[[:space:]]|$) pattern matches the line start, then a or b, then either a whitespace at the start of the string (^[[:space:]], it will never match) or the line end ($). So, your regex will match a line that is equal to b or B.
Remember to replace ^ with \A and $ with \z if you need to match whole string, not just a line.

Replacement with regular expression and capture

The method below is supposed to transform "snake_case" to "CamelCase".
def zebulansNightmare(string)
string.gsub(/_(.)/){$1.upcase}
end
With string "camel_case", I expect gsub(/_(.)/) to match c after the _. I understood that $1 is the first matched letter: the capital letter. But it works like it's substituting _ with the capital letter. Why has the _ disappeared?
You are right that $1 is the captured value, however, the gsub matches the letter with _ before it, and the whole match gets replaced. You need to reinsert _ to the result:
"camel_case".gsub(/_(.)/){"_#{$1.upcase}"}
See the IDEONE demo
BTW, if you only plan to match _ followed with a letter (so as not to waste time and resources on trying to turn non-letters to upper case), you can use the following regex:
/_(\p{Ll})/
Where \p{Ll} is any lowercase Unicode letter.
def zebulans_nightmare(string)
string.gsub(/\B_[a-z0-9]/) { |s| s[1].upcase }
end
zebulans_nightmare("case_of_snakes")
#=> "caseOfSnakes"
zebulans_nightmare("case_of_3_snakes")
#=> "caseOf3Snakes"
zebulans_nightmare("_case_of_3_snakes")
#=> "_caseOf3Snakes"
\B matches non-word boundaries.

Regex to grab full firstname and first letter of last name

I have a list of users grabbed by the Etc Ruby library:
Thomas_J_Perkins
Jennifer_Scanner
Amanda_K_Loso
Aaron_Cole
Mark_L_Lamb
What I need to do is grab the full first name, skip the middle name (if given), and grab the first character of the last name. The output should look like this:
Thomas P
Jennifer S
Amanda L
Aaron C
Mark L
I'm not sure how to do this, I've tried grabbing all of the characters: /\w+/ but that will grab everything.
You don't always need regular expressions.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. Jamie Zawinski
You can do it with some simple Ruby code
string = "Mark_L_Lamb"
string.split('_').first + ' ' + string.split('_').last[0]
=> "Mark L"
I think its simpler without regex:
array = "Thomas_J_Perkins".split("_") # split at _
array.first + " " + array.last[0] # .first prints first name .last[0] prints first char of last name
#=> "Thomas P"
You can use
^([^\W_]+)(?:_[^\W_]+)*_([^\W_])[^\W_]*$
And replace with \1_\2. See the regex demo
The [^\W_] matches a letter or a digit. If you want to only match letters, replace [^\W_] with \p{L}.
^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$
See updated demo
The point is to match and capture the first chunk of letters up to the first _ (with (\p{L}+)), then match 0+ sequences of _ + letters inside (with (?:_\p{L}+)*_) and then match and capture the last word first letter (with (\p{L})) and then match the rest of the string (with \p{L}*).
NOTE: replace ^ with \A and $ with \z if you have independent strings (as in Ruby ^ matches the start of a line and $ matches the end of the line).
Ruby code:
s.sub(/^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$/, "\\1_\\2")
I'm in the don't-use-a-regex-for-this camp.
str1 = "Alexander_Graham_Bell"
str2 = "Sylvester_Grisby"
"#{str1[0...str1.index('_')]} #{str1[str1.rindex('_')+1]}"
#=> "Alexander B"
"#{str2[0...str2.index('_')]} #{str2[str2.rindex('_')+1]}"
#=> "Sylvester G"
or
first, last = str1.split(/_.+_|_/)
#=> ["Alexander", "Bell"]
first+' '+last[0]
#=> "Alexander B"
first, last = str2.split(/_.+_|_/)
#=> ["Sylvester", "Grisby"]
first+' '+last[0]
#=> "Sylvester G"
but if you insist...
r = /
(.+?) # match any characters non-greedily in capture group 1
(?=_) # match an underscore in a positive lookahead
(?:.*) # match any characters greedily in a non-capture group
(?:_) # match an underscore in a non-capture group
(.) # match any character in capture group 2
/x # free-spacing regex definition mode
str1 =~ r
$1+' '+$2
#=> "Alexander B"
str2 =~ r
$1+' '+$2
#=> "Sylvester G"
You can of course write
r = /(.+?)(?=_)(?:.*)(?:_)(.)/
This is my attempt:
/([a-zA-Z]+)_([a-zA-Z]+_)?([a-zA-Z])/
See demo
Let's see if this works:
/^([^_]+)(?:_\w)?_(\w)/
And then you'll have to combine the first and second matches into the format you want. I don't know Ruby, so I can't help you there.
And another attempt using a replacement method:
result = subject.gsub(/^([^_]+)(?:_[^_])?_([^_])[^_]+$/, '\1 \2')
We capture the entire string, with the relevant parts in capturing groups. Then just return the two captured groups
using the split method is much better
full_names.map do |full_name|
parts = full_name.split('_').values_at(0,-1)
parts.last.slice!(1..-1)
parts.join(' ')
end
/^[A-Za-z]{5,15}\s[A-Za-z]{1}]$/i
This will have the following criteria:
5-15 characters for first name then a whitespace and finally a single character for last name.

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Use regular expression to fetch 3 groups from string

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

Resources