regular expression in ruby Regexp - ruby

I'm using ruby 1.9.2
string = "asufasu isaubfusabiu safbsua fbisaufb sa {{hello}} uasdhfa s asuibfisubibas {{manish}} erieroi"
Now I have to find {{anyword}}
How many times it will come and the name with curly braces.
After reading Regexp
I am using
/{{[a-z]}}/.match(string)
but it return nil everytime.

You need to apend a * to the [a-z] pattern to tell it to match any number of letters inside the {s, and then use scan to get all occurrences of the match in the string:
string.scan(/{{[a-z]*}}/)
=> ["{{hello}}", "{{manish}}"]
To get the number of times matches occur, just take the size of the resulting array:
string.scan(/{{[a-z]*}}/).size
=> 2

The regular expression matching web application Rubular can be an incredibly helpful tool for doing realtime regular expression parsing.

Related

Ruby Regex Group Replacement

I am trying to perform regular expression matching and replacement on the same line in Ruby. I have some libraries that manipulate strings in Ruby and add special formatting characters to it. The formatting can be applied in any order. However, if I would like to change the string formatting, I want to keep some of the original formatting. I'm using regex for that. I have the regular expression matching correctly what I need:
mystring.gsub(/[(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))|(\e\[[3,9][0-8]m)]*Text/, 'New Text')
However, what I really want is the matching from the first grouping found in:
(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))
to be appended to New Text and replaced as opposed to just New Text. I'm trying to reference the match in the form of
mystring.gsub(/[(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))|(\e\[[3,9][0-8]m)]*Text/, '\1' + 'New Text')
but my understanding is that \1 only works when using \d or \k. Is there any way to reference that specific capturing group in my replacement string? Additionally, since I am using an asterik for the [], I know that this grouping could occur more than once. Therefore, I would like to have the last matching occurrence yielded.
My expected input/output with a sample is:
Input: "\e[1mHello there\e[34m\e[40mText\e[0m\e[0m\e[22m"
Output: "\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m"
Input: "\e[1mHello there\e[44m\e[34m\e[40mText\e[0m\e[0m\e[22m"
Output: "\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m"
So the last grouping is found and appended.
You can use the following regex with back-reference \\1 in the replacement:
reg = /(\\e\[(?:[0-9]{1,2}|[3,9][0-8])m)+Text/
mystring = "\\e[1mHello there\\e[34m\\e[40mText\\e[0m\\e[0m\\e[22m"
puts mystring.gsub(reg, '\\1New Text')
mystring = "\\e[1mHello there\\e[44m\\e[34m\\e[40mText\\e[0m\\e[0m\\e[22m"
puts mystring.gsub(reg, '\\1New Text')
Output of the IDEONE demo:
\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m
\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m
Mind that your input has backslash \ that needs escaping in a regular string literal. To match it inside the regex, we use double slash, as we are looking for a literal backslash.

ruby regex to match multiple occurrences of pattern

I am looking to build a ruby regex to match multiple occurrences of a pattern and return them in an array. The pattern is simply: [[.+]]. That is, two left brackets, one or more characters, followed by two right brackets.
This is what I have done:
str = "Some random text[[lead:first_name]] and more stuff [[client:last_name]]"
str.match(/\[\[(.+)\]\]/).captures
The regex above doesn't work because it returns this:
["lead:first_name]] and another [[client:last_name"]
When what I wanted was this:
["lead:first_name", "client:last_name"]
I thought if I used a noncapturing group that for sure it should solve the issue:
str.match(/(?:\[\[(.+)\]\])+/).captures
But the noncapturing group returns the same exact wrong output. Any idea on how I can resolve my issue?
The problem with your regex is that the .+ part is "greedy", meaning that if the regex matches both a smaller and larger part of the string, it will capture the larger part (more about greedy regexes).
In Ruby (and most regex syntaxes), you can qualify your + quantifier with a ? to make it non-greedy. So your regex would become /(?:\[\[(.+?)\]\])+/.
However, you'll notice this still doesn't work for what you want to do. The Ruby capture groups just don't work inside a repeating group. For your problem, you'll need to use scan:
"[[a]][[ab]][[abc]]".scan(/\[\[(.+?)\]\]/).flatten
=> ["a", "ab", "abc"]
Try this:
=> str.match(/\[\[(.*)\]\].*\[\[(.*)\]\]/).captures
=> ["lead:first_name", "client:last_name"]
With many occurrences:
=> str
=> "Some [[lead:first_name]] random text[[lead:first_name]] and more [[lead:first_name]] stuff [[client:last_name]]"
=> str.scan(/\[(\w+:\w+)\]/)
=> [["lead:first_name"], ["lead:first_name"], ["lead:first_name"], ["client:last_name"]]

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Match consecutive list of exactly one character in set with regular expressions

I don't think I'll even try to explain this, I don't know the words to, but I'd like to achieve the following:
Given a string like this:
+++>><<<--
I'd like a match to give me: +++, but also match if any of the other characters were in the string consecutively like they are. So if the +++ wasn't there, I'd like to match >>.
I tried using the following regular expression:
([><\-\+]+)
However, given the string above, it would match the entire string, and not the first list of consecutive characters.
If it makes a difference, this is in Ruby (1.9.3).
Not sure about the ruby bit, but you can do this with backreferences in the pattern:
(.)\1+
What this does is to use a capturing group () to capture any character . followed by any number + of the same character \1. The \1 is a backreference to the the first captured group; in a pattern with more capturing groups \2 would be the second captured group and so on.
Java Example
Pattern p = Pattern.compile("(.)\\1+");
Matcher m = p.matcher("aaabbccaa");
m.find();
System.out.println(m.group(0)); // prints "aaa"
Ruby Example
# Return an array of matched patterns.
string = '+++>><<<--'
string.scan( /((.)\2+)/ ).collect { |match| match.first }

Ruby regular expression

Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]

Resources