I have a specific RegEx question, simply because I'm wracking my brain trying to isolate one item, rather than multiple.
Here is my text I am searching:
"{\"1430156203913\"=>\"ABC\", \"1430156218698\"=>\"DEF\", \"1430156219763\"=>\"GHI\", \"1430156553620\"=>\"JKL", \"1430156793764\"=>\"MNO\", \"1430156799454\"=>\"PQR\"}"
What I would like to do is capture the key associated with ABC as well as the key associated with GHI
The first is easy enough, and I'm capturing it with this RegEx:
/\d.*ABC/ maps to: "1430156203913\"=>\"ABC.
Then, I'd just use /\d/ to pull out the 1430156203913 key that I'm looking for.
The second is what I'm having difficulty with.
This does not work:
/\d.*GHI/ maps to the start of the first digit to my final string (GHI) ->
1430156203913\"=>\"ABC\", \"1430156218698\"=>\"DEF\", \"1430156219763\"=>\"GHI
Question:
How can I edit this second Regex to just capture 1430156219763?
To capture all the keys and values, use String#scan.
pairs = s.scan /"(\d+)"=>"([[:upper:]]+)"/
# [["1430156203913", "ABC"],
# ["1430156218698", "DEF"],
# ["1430156219763", "GHI"],
# ["1430156553620", "JKL"],
# ["1430156793764", "MNO"],
# ["1430156799454", "PQR"]]
Then to get any k/v pair you want, hashify and find them.
Hash[pairs].invert.values_at('ABC', 'DEF')
# ["1430156203913", "1430156218698"]
\d[^,]*GHI
You can simply use this.See demo.
https://regex101.com/r/yW3oJ9/3
Try this one to capture only the key
\b\d+\b(?=\\"=>\\"GHI)
check this Demo
Notice I've added \b around d+ just for a better perfromance
This is one way you could do that:
def extract(str, key)
r = /
(?<=\") # match \" in a positive lookbehind
\d+ # match one or more digits
(?= # begin a positive lookahead
\"=>\" # match \"=>\"
#{key} # match value of key
\" # match \"
) # end positive lookahead
/x
str[r]
end
extract str, "ABC" #=> "1430156203913"
extract str, "GHI" #=> "1430156219763"
Related
I have a list of users grabbed by the Etc Ruby library:
Thomas_J_Perkins
Jennifer_Scanner
Amanda_K_Loso
Aaron_Cole
Mark_L_Lamb
What I need to do is grab the full first name, skip the middle name (if given), and grab the first character of the last name. The output should look like this:
Thomas P
Jennifer S
Amanda L
Aaron C
Mark L
I'm not sure how to do this, I've tried grabbing all of the characters: /\w+/ but that will grab everything.
You don't always need regular expressions.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. Jamie Zawinski
You can do it with some simple Ruby code
string = "Mark_L_Lamb"
string.split('_').first + ' ' + string.split('_').last[0]
=> "Mark L"
I think its simpler without regex:
array = "Thomas_J_Perkins".split("_") # split at _
array.first + " " + array.last[0] # .first prints first name .last[0] prints first char of last name
#=> "Thomas P"
You can use
^([^\W_]+)(?:_[^\W_]+)*_([^\W_])[^\W_]*$
And replace with \1_\2. See the regex demo
The [^\W_] matches a letter or a digit. If you want to only match letters, replace [^\W_] with \p{L}.
^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$
See updated demo
The point is to match and capture the first chunk of letters up to the first _ (with (\p{L}+)), then match 0+ sequences of _ + letters inside (with (?:_\p{L}+)*_) and then match and capture the last word first letter (with (\p{L})) and then match the rest of the string (with \p{L}*).
NOTE: replace ^ with \A and $ with \z if you have independent strings (as in Ruby ^ matches the start of a line and $ matches the end of the line).
Ruby code:
s.sub(/^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$/, "\\1_\\2")
I'm in the don't-use-a-regex-for-this camp.
str1 = "Alexander_Graham_Bell"
str2 = "Sylvester_Grisby"
"#{str1[0...str1.index('_')]} #{str1[str1.rindex('_')+1]}"
#=> "Alexander B"
"#{str2[0...str2.index('_')]} #{str2[str2.rindex('_')+1]}"
#=> "Sylvester G"
or
first, last = str1.split(/_.+_|_/)
#=> ["Alexander", "Bell"]
first+' '+last[0]
#=> "Alexander B"
first, last = str2.split(/_.+_|_/)
#=> ["Sylvester", "Grisby"]
first+' '+last[0]
#=> "Sylvester G"
but if you insist...
r = /
(.+?) # match any characters non-greedily in capture group 1
(?=_) # match an underscore in a positive lookahead
(?:.*) # match any characters greedily in a non-capture group
(?:_) # match an underscore in a non-capture group
(.) # match any character in capture group 2
/x # free-spacing regex definition mode
str1 =~ r
$1+' '+$2
#=> "Alexander B"
str2 =~ r
$1+' '+$2
#=> "Sylvester G"
You can of course write
r = /(.+?)(?=_)(?:.*)(?:_)(.)/
This is my attempt:
/([a-zA-Z]+)_([a-zA-Z]+_)?([a-zA-Z])/
See demo
Let's see if this works:
/^([^_]+)(?:_\w)?_(\w)/
And then you'll have to combine the first and second matches into the format you want. I don't know Ruby, so I can't help you there.
And another attempt using a replacement method:
result = subject.gsub(/^([^_]+)(?:_[^_])?_([^_])[^_]+$/, '\1 \2')
We capture the entire string, with the relevant parts in capturing groups. Then just return the two captured groups
using the split method is much better
full_names.map do |full_name|
parts = full_name.split('_').values_at(0,-1)
parts.last.slice!(1..-1)
parts.join(' ')
end
/^[A-Za-z]{5,15}\s[A-Za-z]{1}]$/i
This will have the following criteria:
5-15 characters for first name then a whitespace and finally a single character for last name.
I do not have access to the code, this is via an interface that only allows me to edit the regex that parses user responses. I need to extract the weight after users text, where they text things like:
wt 172.5
172.5 lbs
180
wt. 173.22
172,5
I need to capture the weight as a float field, but I want to restrict it to at most 1 decimal place. I tried using /(?<val>[\d+((\.|,)\d\d?)?]/ but it is only saving the first digit "1" in the field
Sometimes what seems most simple is not. I suggest using this regex:
r = /(?<=\A|\s)\d+(?:[.,]\d)?(?=\d|\s|\z)/
We can alternatively define the regex using extended or free-spacing mode (by adding the modifier x after the final /), which allows us to include documentation:
r = /
(?<=\A|\s) # match beginning of string or space in a positive lookbehind
\d+ # match one or more digits
(?:[.,]\d)? # optionally (? after non-capture group) match a . or , then a digit
(?=\d|\s|\z) # match a digit, space or the end of the string in a positive lookahead
/x
"wt 172.5"[r] #=> "172.5"
"172.5 lbs"[r] #=> "172.5"
"180"[r] #=> "180"
"wt. 173.22"[r] #=> "173.2"
"172,5"[r] #=> "172,5"
"A1 143.66"[r] #=> "143.6"
"A1 1.3.4 43.6"[r] #=> "43.6"
\d+(?:[,.]\d{1,2})?
Guess you wanted this .[] is character class,not what you think.Your character class captures just one out of all characters you have defined.
See demo.
https://regex101.com/r/eB8xU8/12
I'm trying to grab id number from the string, say
id/number/2000GXZ2/ref=sr
using
(?:id\/number\/)([a-zA-Z0-9]{8})
for some reason non capture group is not worked, giving me:
id/number/2000GXZ2
As mentioned by others, non-capturing groups still count towards the overall match. If you don't want that part in your match use a lookbehind.
Rubular example
(?<=id\/number\/)([a-zA-Z0-9]{8})
(?<=pat) - Positive lookbehind assertion: ensures that the preceding characters match pat, but doesn't include those characters in the matched text
Ruby Doc Regexp
Also, the capture group around the id number is unnecessary in this case.
You have:
str = "id/number/2000GXZ2/ref=sr"
r = /
(?:id\/number\/) # match string in a non-capture group
([a-zA-Z0-9]{8}) # match character in character class 8 times, in capture group 1
/x # extended/free-spacing regex definition mode
Then (using String#[]):
str[r]
#=> "id/number/2000GXZ2"
returns the entire match, as it should, not just the contents of capture group 1. There are a few ways to remedy this. Consider first ones that do not use a capture group.
#jacob.m suggested putting the first part in a positive lookbehind (modified slightly from his code):
r = /
(?<=id\/number\/) # match string in positive lookbehind
[[:alnum:]]{8} # match >= 1 alphameric characters
/x
str[r]
#=> "2000GXZ2"
An alternative is:
r = /
id\/number\/ # match string
\K # forget everything matched so far
[[:alnum:]]{8} # match 8 alphanumeric characters
/x
str[r]
#=> "2000GXZ2"
\K is especially useful when the match to forget is variable-length, as (in Ruby) positive lookbehinds do not work with variable-length matches.
With both of these approaches, if the part to be matched contains only numbers and capital letters, you may want to use [A-Z0-9]+ instead of [[:alnum:]] (though the latter includes Unicode letters, not just those from the English alphabet). In fact, if all the entries have the form of your example, you might be able to use:
r = /
\d # match a digit
[A-Z0-9]{7} # match >= 0 capital letters or digits
/x
str[r]
#=> "2000GXZ2"
The other line of approach is to keep your capture group. One simple way is:
r = /
id\/number\/ # match string
([[:alnum:]]{8}) # match >= 1 alphameric characters in capture group 1
/x
str =~ r
str[r, 1] #=> "2000GXZ2"
Alternatively, you could use String#sub to replace the entire string with the contents of the capture group:
r = /
id\/number\/ # match string
([[:alnum:]]{8}) # match >= 1 alphameric characters in capture group 1
.* # match the remainder of the string
/x
str.sub(r, '\1') #=> "2000GXZ2"
str.sub(r, "\\1") #=> "2000GXZ2"
str.sub(r) { $1 } #=> "2000GXZ2"
This is Ruby Regexp expected match consistency evilness. Some Regexp-style methods will return the global-match while others will return specified matches.
In this case, one method we can use to get the behavior you're looking for is scan.
I don't think anyone here actually mentions how to get your Regexp working as you originally intended, which was to get the capture-only match. To do that, you would use the scan method like so with your original pattern:
test_me.rb
test_string="id/number/2000GXZ2/ref=sr"
result = test_string.scan(/(?:id\/number\/)([a-zA-Z0-9]{8})/)
puts result
2000GXZ2
That said, replacing (?:) with (?<=) for non-capture groups for look-behinds will benefit you both when you use scan as well as other parts of ruby that use Regexps.
I want to match character pairs in a string. Let's say the string is:
"zttabcgqztwdegqf". Both "zt" and "gq" are matching pairs of characters in the string.
The following code finds the "zt" matching pair, but not the "gq" pair:
#!/usr/bin/env ruby
string = "zttabcgqztwdegqf"
puts string.scan(/.{1,2}/).detect{ |c| string.count(c) > 1 }
The code provides matching pairs where the indices of the pairs are 0&1,2&3,4&5... but not 1&2,3&4,5&6, etc:
zt
ta
bc
gq
zt
wd
eg
qf
I'm not sure regex in Ruby is the best way to go. But I want to use Ruby for the solution.
You can do your search with a single regex:
puts string.scan(/(?=(.{2}).*\1)/)
regex101 demo
Output
zt
gq
Regex Breakout
(?= # Start a lookahead
(.{2}) # Search any couple of char and group it in \1
.*\1 # Search ahead in the string for another \1 to validate
) # Close lookahead
Note
Putting all the checks inside lookahead assure the regex engine does not consume the couple when validates it.
So it also works with overlapping couples like in the string abcabc: the output will correctly be ab,bc.
Oddity
If the regex engine does not consume the chars how it can reach the end of the string?
Internally after the check Onigmo (the ruby regex engine) makes one step further automatically. Most regex flavours behaves in this way but e.g. the javascript engine needs the programmer to increment the last match index manually.
str = "ztcabcgqzttwtcdegqf"
r = /
(.) # match any character in capture group 1
(?= # begin a positive lookahead
(.) # match any character in capture group 2
.+ # match >= 1 characters
\1 # match capture group 1
\2 # match capture group 2
) # close positive lookahead
/x # extended/free-spacing regex definition mode
str.scan(r).map(&:join)
#=> ["zt", "tc", "gq"]
Here is one way to do this without using regex:
string = "zttabcgqztwdegqf"
p string.split('').each_cons(2).map(&:join).select {|i| string.scan(i).size > 1 }.uniq
#=> ["zt", "gq"]
This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.