How to regex the strings in an url - ruby

http://something.com/bOhxBeD,SyhyTGi,TMDDSIB,U72gx2J,kQTIRy9,7VXgGDw,eSxIcK6,S5oNlnn,WBHHsLk,BdMGd2d,U9kNlsF,cHVyc7Y,D83kaJ5,cLWgdSO,iWtCIF3,ount8L6
I have tried to get the value: bOhxBeD, SyhyTGi and so on. This is what I come up with ( yes fairly simple ) /([a-zA-Z0-9]{7})/, it seems to work with PCRE:
([a-zA-Z0-9]{7})
Debuggex Demo
But when it comes to Ruby, I use it like this :
str.match(/([a-zA-Z0-9]{7})/)
#<MatchData "bOhxBeD" 1:"bOhxBeD">
it doesn't seem to work. Can anyone point out what's wrong with this regex ? Thanks

You need to add word boundary \b inorder to match an exact 7 alphanumeric characters.
\b[a-zA-Z0-9]{7}\b
DEMO
irb(main):006:0> "http://something.com/bOhxBeD,SyhyTGi,TMDDSIB,U72gx2J,kQTIRy9,7VXgGDw,eSxIcK6,S5oNlnn,WBHHsLk,BdMGd2d,U9kNlsF,cHVyc7Y,D83kaJ5,cLWgdSO,iWtCIF3,ount8L6".scan(/\b([a-zA-Z0-9]{7})\b/)
=> [["bOhxBeD"], ["SyhyTGi"], ["TMDDSIB"], ["U72gx2J"], ["kQTIRy9"], ["7VXgGDw"], ["eSxIcK6"], ["S5oNlnn"], ["WBHHsLk"], ["BdMGd2d"], ["U9kNlsF"], ["cHVyc7Y"], ["D83kaJ5"], ["cLWgdSO"], ["iWtCIF3"], ["ount8L6"]]

(?!.*?\/)[a-zA-Z0-9]{7}
Is should be this.Or else it will pick 7 letter words from link as well."somethi" will be in ans.But i guess that is not required.

match only picks up the first match.
You can try the global version of match which is scan.
You can use scan to search string not containing specific characters using [^...]:
str.scan(/[^\/\.\,]+/)[3..-1]
#=> ["bOhxBeD", "SyhyTGi", "TMDDSIB", "U72gx2J", "kQTIRy9", "7VXgGDw", "eSxIcK6", "S5oNlnn", "WBHHsLk", "BdMGd2d", "U9kNlsF", "cHVyc7Y", "D83kaJ5", "cLWgdSO", "iWtCIF3", "ount8L6"]
Update:
If you know that the strings between the comma are always 7 characters, you can use this instead:
str.scan(/[^\/\.\,]{7}/)[1..-1]

it happens because your regexp match just one element which contain 7 chars, nothing more,
as simple solution could be:
str.match(/\/(.*)\z/)[1].split(',')

You could use String#[] and String#split:
str[/.*\/(.*)/,1].split(',')
#=> ["bOhxBeD", "SyhyTGi", "TMDDSIB", "U72gx2J", "kQTIRy9", "7VXgGDw",
# "eSxIcK6", "S5oNlnn", "WBHHsLk", "BdMGd2d", "U9kNlsF", "cHVyc7Y",
# "D83kaJ5", "cLWgdSO", "iWtCIF3", "ount8L6"]
.*\/ in the regex, "greedy" as it is, will consume characters up to and including the last forward slash in the string. Capture group #1 (.*) sucks up the remainder of the string and, due to the presence of ,1, returns it. split(',') then breaks up the string to give you the desired array.
Another way:
str[str[/.*\//].size..-1].split(',')

Related

Split a string and remove the first element in string

Original string '4.0.0-4.0-M-672092'
How to modify the Original string to "4.0-M-672092" using a one line code.
Any Help is highly appreciated .
Thanks and Regards
The 'split' method works in this case
https://apidock.com/ruby/String/split
'4.0.0-4.0-M-672092'.split('-')[1..-1].join('-')
# => "4.0-M-672092"
Just be careful, in this application is fine, but in long texts this might become unoptimized, since it splits all the string and then joins the array all over again
If you need this in wider texts to be more optimized, you can find the "-" index (which is your split) and use the next position to make a substring
text = '4.0.0-4.0-M-672092'
text[(text.index('-') + 1)..-1]
# => "4.0-M-672092"
But you can't do it in one line, and not finding a split character will result in an error, so use a rescue statement if that is possible to happen
Simplest way:
'4.0.0-4.0-M-672092'.split('-', 2).second
"4.0.0-4.0-M-672092"[/(?<=-).*/]
#=> "4.0-M-672092"
The regular expression reads, "Match zero or more characters other than newlines, as many as possible (.*), provided the match is preceded by a hyphen. (?<=-) is a positive lookbehind. See String#[].

ruby regex to match multiple occurrences of pattern

I am looking to build a ruby regex to match multiple occurrences of a pattern and return them in an array. The pattern is simply: [[.+]]. That is, two left brackets, one or more characters, followed by two right brackets.
This is what I have done:
str = "Some random text[[lead:first_name]] and more stuff [[client:last_name]]"
str.match(/\[\[(.+)\]\]/).captures
The regex above doesn't work because it returns this:
["lead:first_name]] and another [[client:last_name"]
When what I wanted was this:
["lead:first_name", "client:last_name"]
I thought if I used a noncapturing group that for sure it should solve the issue:
str.match(/(?:\[\[(.+)\]\])+/).captures
But the noncapturing group returns the same exact wrong output. Any idea on how I can resolve my issue?
The problem with your regex is that the .+ part is "greedy", meaning that if the regex matches both a smaller and larger part of the string, it will capture the larger part (more about greedy regexes).
In Ruby (and most regex syntaxes), you can qualify your + quantifier with a ? to make it non-greedy. So your regex would become /(?:\[\[(.+?)\]\])+/.
However, you'll notice this still doesn't work for what you want to do. The Ruby capture groups just don't work inside a repeating group. For your problem, you'll need to use scan:
"[[a]][[ab]][[abc]]".scan(/\[\[(.+?)\]\]/).flatten
=> ["a", "ab", "abc"]
Try this:
=> str.match(/\[\[(.*)\]\].*\[\[(.*)\]\]/).captures
=> ["lead:first_name", "client:last_name"]
With many occurrences:
=> str
=> "Some [[lead:first_name]] random text[[lead:first_name]] and more [[lead:first_name]] stuff [[client:last_name]]"
=> str.scan(/\[(\w+:\w+)\]/)
=> [["lead:first_name"], ["lead:first_name"], ["lead:first_name"], ["client:last_name"]]

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Ruby Regex Match Between "foo" and "bar"

I have unfortunately wandered into a situation where I need regex using Ruby. Basically I want to match this string after the underscore and before the first parentheses. So the end result would be 'table salt'.
_____ table salt (1) [F]
As usual I tried to fight this battle on my own and with rubular.com. I got the first part
^_____ (Match the beginning of the string with underscores ).
Then I got bolder,
^_____(.*?) ( Do the first part of the match, then give me any amount of words and letters after it )
Regex had had enough and put an end to that nonsense and crapped out. So I was wondering if anyone on stackoverflow knew or would have any hints on how to say my goal to the Ruby Regex parser.
EDIT: Thanks everyone, this is the pattern I ended up using after creating it with rubular.
ingredientNameRegex = /^_+([^(]*)/;
Everything got better once I took a deep breath, and thought about what I was trying to say.
str = "_____ table salt (1) [F]"
p str[ /_{3}\s(.+?)\s+\(/, 1 ]
#=> "table salt"
That says:
Find at least three underscores
and a whitespace character (\s)
and then one or more (+) of any character (.), but as little as possible (?), up until you find
one or more whitespace characters,
and then a literal (
The parens in the middle save that bit, and the 1 pulls it out.
Try this: ^[_]+([^(]*)\(
It will match lines starting with one or more underscores followed by anything not equal to an opening bracket: http://rubular.com/r/vthpGpVr4y
Here's working regex:
str = "_____ table salt (1) [F]"
match = str.match(/_([^_]+?)\(/)
p match[1].strip # => "table salt"
You could use
^_____\s*([^(]+?)\s*\(
^_____ match the underscore from the beginning of string
\s* matches any whitespace character
( grouping start
[^(]+ matches all non ( character at least once
? matches the shortest possible string (non greedy)
) grouping end
\s* matches any whitespace character
\( find the (
"_____ table salt (1) [F]".gsub(/[_]\s(.+)\s\(/, ' >>>\1<<< ')
# => "____ >>>table salt<<< 1) [F]"
It seems to me the simplest regex to do what you want is:
/^_____ ([\w\s]+) /
That says:
leading underscores, space, then capture any combination of word chars or spaces, then another space.

Ruby regular expression

Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]

Resources