best way to find substring in ruby using regular expression - ruby

I have a string https://stackverflow.com. I want a new string that contains the domain from the given string using regular expressions.
Example:
x = "https://stackverflow.com"
newstring = "stackoverflow.com"
Example 2:
x = "https://www.stackverflow.com"
newstring = "www.stackoverflow.com"

"https://stackverflow.com"[/(?<=:\/\/).*/]
#⇒ "stackverflow.com"
(?<=..) is a positive lookbehind.

If string = "http://stackoverflow.com",
a really easy way is string.split("http://")[1]. But this isn't regex.
A regex solution would be as follows:
string.scan(/^http:\/\/(.+)$/).flatten.first
To explain:
String#scan returns the first match of the regex.
The regex:
^ matches beginning of line
http: matches those characters
\/\/ matches //
(.+) sets a "match group" containing any number of any characters. This is the value returned by the scan.
$ matches end of line
.flatten.first extracts the results from String#scan, which in this case returns a nested array.

You might want to try this:
#!/usr/bin/env ruby
str = "https://stackoverflow.com"
if mtch = str.match(/(?::\/\/)(/S)/)
f1 = mtch.captures
end
There are two capturing groups in the match method: the first one is a non-capturing group referring to your search pattern and the second one referring to everything else afterwards. After that, the captures method will assign the desired result to f1.
I hope this solves your problem.

Related

string capture between duplicates in ruby

string = 'xabcdexfghijk'
In the example above, 'x' appears twice. I want to capture everything between the first 'x' and the next 'x'. Thus, the desired result is a new string that equals 'xabcdex'. Any ideas?
You could use a simple regular expression: /x.*?x/. This basically means "match any characters in between two x characters, as few times as possible (non-greedy)".
The matched text can be extracted with String#[regexp]
string = 'xabcdexfghijk'
string[/x.*?x/] # => "xabcdex"

Ruby regex - using optional named backreferences

I am trying to write a Ruby regex that will return a set of named matches. If the first element (defined by slashes) is found anywhere later in the string then I want the match to return that 2nd match onward. Otherwise, return the whole string. The closest I've gotten is (?<p1>top_\w+).*?(?<hier>\k<p1>.*) which doesn't work for the 3rd item. I've tried regex ifthen-else constructs but Rubular says it's invalid. I've tried (?<p1>[\w\/]+?)(?<hier>\k<p1>.*) which correct splits the 1st and 4th lines but doesn't work for the others. Please note: I want all results to return as the same named reference so I can iterate through "hier".
Input:
top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
Output:
hier = top_cat/mouse/dog/elephant/horse
hier = top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
hier = top_bat/car[0]
hier = top_2/top_1/top_3/top_4/dog
Problem
The reason it does not match the second line is because the second instance of hat does not end with a slash, but the first instance does.
Solution
Specify that there is a slash between the first and second match
Regex
(top_.*)/(\1.*$)|(^.*$)
Replacement
hier = \2\3
Example
Regex101 Permalink
More info on the Alternation token
To explain how the | token works in regex, see the example: abc|def
What this regex means in plain english is:
Match either the regex below (attempting the next alternative only if this one fails)
Match the characters abc literally
Or match the regex below (the entire match attempt fails if this one fails to match)
Match the characters def literally
Example
Regex: alpha|alphabet
If we had a phrase "I know the alphabet", only the word alpha would be matched.
However, if we changed the regex to alphabet|alpha, we would match alphabet.
So you can see, alternation works in a left-to-right fashion.
paths = %w(
top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
test/test
)
paths.each do |path|
md = path.match(/^([^\/]*).*\/(\1(\/.*|$))/)
heir = md ? md[2] : path
puts heir
end
Output:
top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/dog
test

Ruby Regex gsub - everything after string

I have a string something like:
test:awesome my search term with spaces
And I'd like to extract the string immediately after test: into one variable and everything else into another, so I'd end up with awesome in one variable and my search term with spaces in another.
Logically, what I'd so is move everything matching test:* into another variable, and then remove everything before the first :, leaving me with what I wanted.
At the moment I'm using /test:(.*)([\s]+)/ to match the first part, but I can't seem to get the second part correctly.
The first capture in your regular expression is greedy, and matches spaces because you used .. Instead try:
matches = string.match(/test:(\S*) (.*)/)
# index 0 is the whole pattern that was matched
first = matches[1] # this is the first () group
second = matches[2] # and the second () group
Use the following:
/^test:(.*?) (.*)$/
That is, match "test:", then a series of characters (non-greedily), up to a single space, and another series of characters to the end of the line.
I am guessing you want to remove all the leading spaces before the second match too, hence I have \s+ in the expression. Otherwise, remove the \s+ from the expression, and you'll have what you want:
m = /^test:(\w+)\s+(.*)/.match("test:awesome my search term with spaces")
a = m[1]
b = m[2]
http://codepad.org/JzuNQxBN

Match consecutive list of exactly one character in set with regular expressions

I don't think I'll even try to explain this, I don't know the words to, but I'd like to achieve the following:
Given a string like this:
+++>><<<--
I'd like a match to give me: +++, but also match if any of the other characters were in the string consecutively like they are. So if the +++ wasn't there, I'd like to match >>.
I tried using the following regular expression:
([><\-\+]+)
However, given the string above, it would match the entire string, and not the first list of consecutive characters.
If it makes a difference, this is in Ruby (1.9.3).
Not sure about the ruby bit, but you can do this with backreferences in the pattern:
(.)\1+
What this does is to use a capturing group () to capture any character . followed by any number + of the same character \1. The \1 is a backreference to the the first captured group; in a pattern with more capturing groups \2 would be the second captured group and so on.
Java Example
Pattern p = Pattern.compile("(.)\\1+");
Matcher m = p.matcher("aaabbccaa");
m.find();
System.out.println(m.group(0)); // prints "aaa"
Ruby Example
# Return an array of matched patterns.
string = '+++>><<<--'
string.scan( /((.)\2+)/ ).collect { |match| match.first }

Ruby regular expression

Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]

Resources