RegExp match word with parenthesis? - ruby

I have problem with regular expressions. I have strings like this:
test(r), testtest(r,r), example, example2, exmp (r,5)
I would like to find all words without blanks between text and the parenthesis. From the string above, I want to get:
test(r), testtest(r,r)
I created this regexp but it catches only brackets with content inside:
/\(([^\)]+)\)/g
Thanks for all answers.
Edit:
I am working in Ruby, and this regex works perfectly /\w+\(([^)]+)\)/g.
What should I do if I would like to separate words with commas between brackets and without? For example how do I get this?
testtest(r,r) <-with comma
test(r) <-without comma

You can add a \w+ in front of the pattern:
/\w+\(([^)]+)\)/g
This will match one or more 'word' characters (which includes letters, digits, and underscores) followed by an (, followed by one or more of any character other than ), followed by a ).
If you want to know want to be able to capture the word that appears before the parentheses separately, you can put it in a group like this:
/(\w+)\(([^)]+)\)/g
In your example, this would give group 1: test and group 2: r for the first match and group 1: testtest and group 2: r,r for the second match.

Recently had the same issue myself. Doing this solved my problem:
(\w+\([\w, ]+\))
Hope it helps!

Something like this:
/\w+\(\w([,]\w)?\)/g

text = '''test(r), testtest(r,r), example, example2, exmp (r,5)'''
pattern = r'\w+\([^()]+\)'
m = re.compile(pattern)
results = m.finditer(text)
for r in results:
print(r)

Related

Regex capturing from a non capture group in ruby

I am trying to fix a bit of regex I have for a chatops bot for lita. I have the following regex:
/^(?:how\s+do\s+I\s+you\s+get\s+far\s+is\s+it\s+from\s+)?(.+)\s+to\s+(.+)/i
This is supposed to capture the words before and after 'to', with optional words in front that can form questions like: How do I get from x to y, how far from x to y, how far is it from x to y.
expected output:
match 1 : "x"
match 2 : "y"
For the most part my optional words work as expected. But when I pull my response matches, I get the words leading up to the first capture group included.
So, how far is it from sfo to lax should return:
sfo and lax.
But instead returns:
how far is it from sfo and lax
Your glitch is that the first chunk of your regex doesn't make sense.
To choose from multiple options, use this syntax:
(a|b|c)
What I think you're trying to do is this:
/^(?:(?:how|do|I|you|get|far|is|it|from)\s+)*(.+)\s+to\s+(.+)/i
The regexp says to skip all the words in the multiple options, regardless of order.
If you want to preserve word order, you can use regexps such as this pseudocode:
… how (can|do|will) (I|you|we) (get|go|travel) from …
When you want to match words, \w is the most natural pattern I'd use (e.g., it is used in word count tools.)
To capture any 1 word before and after a "to" can be done with (\w+\sto\s+\w*) regex.
To return them as 2 different groups, you can use (\w+)\s+to\s+(\w+).
Have a look at the demo.

Regex for first x words in string

I need a regex that returns the first N words from a string, including line breaks and white spaces. I tried with the following code, but the server crashes:
str[/\S+(\s)?{N}/].strip
Like this (for the first 15 words):
if subject =~ /^(?:\w+\s){15}/
thefirstwords = $&
Just change the 15 to whatever number you like.
I guess you can achieve this without even regex:
str.split[0...n].join(' ')
Try this expression
'/^.\S+(\s){N}/'
Start with any character and match up to N words.

ruby remove variable length string from regular expression leaving hyphen

I have a string such as this: "im# -33.870816,151.203654"
I want to extract the two numbers including the hyphen.
I tried this:
mystring = "im# -33.870816,151.203654"
/\D*(\-*\d+\.\d+),(\-*\d+\.\d+)/.match(mystring)
This gives me:
33.870816,151.203654
How do I get the hyphen?
I need to do this in ruby
Edit: I should clarify, the "im# " was just an example, there can be any set of characters before the numbers. the numbers are mostly well formed with the comma. I was having trouble with the hyphen (-)
Edit2: Note that the two nos are lattidue, longitude. That pattern is mostly fixed. However, in theory, the preceding string can be arbitrary. I don't expect it to have nos. or hyphen, but you never know.
How about this?
arr = "im# -33.2222,151.200".split(/[, ]/)[1..-1]
and arr is ["-33.2222", "151.200"], (using the split method).
now
arr[0].to_f is -33.2222 and arr[1].to_f is 151.2
EDIT: stripped "im#" part with [1..-1] as suggested in comments.
EDIT2: also, this work regardless of what the first characters are.
If you want to capture the two numbers with the hyphen you can use this regex:
> str = "im# -33.870816,151.203654"
> str.match(/([\d.,-]+)/).captures
=> ["33.870816,151.203654"]
Edit: now it captures hyphen.
This one captures each number separetely: http://rubular.com/r/NNP2OTEdiL
Note: Using String#scan will match all ocurrences of given pattern, in this case
> str.scan /\b\s?([-\d.]+)/
=> [["-33.870816"], ["151.203654"]] # Good, but flattened version is better
> str.scan(/\b\s?([-\d.]+)/).flatten
=> ["-33.870816", "151.203654"]
I recommend you playing around a little with Rubular. There's also some docs about regegular expressions with Ruby:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UJ
http://www.regular-expressions.info/ruby.html
http://www.ruby-doc.org/core-1.9.3/Regexp.html
Your regex doesn't work because the hyphen is caught by \D, so you have to modify it to catch only the right set of characters.
[^0-9-]* would be a good option.

Match comma separated list with Ruby Regex

Given the following string, I'd like to match the elements of the list and parts of the rest after the colon:
foo,bar,baz:something
I.e. I am expecting the first three match groups to be "foo", "bar", "baz". No commas and no colon. The minimum number of elements is 1, and there can be arbitrarily many. Assume no whitespace and lower case.
I've tried this, which should work, but doesn't populate all the match groups for some reason:
^([a-z]+)(?:,([a-z]+))*:(something)
That matches foo in \1 and baz (or whatever the last element is) in \2. I don't understand why I don't get a match group for bar.
Any ideas?
EDIT: Ruby 1.9.3, if that matters.
EDIT2: Rubular link: http://rubular.com/r/pDhByoarbA
EDIT3: Add colon to the end, because I am not just trying to match the list. Sorry, oversimplified the problem.
This expression works for me: /(\w+)/i
If you want to do it with regex, how about this?
(?<=^|,)("[^"]*"|[^,]*)(?=,|$)
This matches comma-separated fields, including the possibility of commas appearing inside quoted strings like 123,"Yes, No". Regexr for this.
More verbosely:
(?<=^|,) # Must be preceded by start-of-line or comma
(
"[^"]*"| # A quote, followed by a bunch of non-quotes, followed by quote, OR
[^,]* # OR anything until the next comma
)
(?=,|$) # Must end with comma or end-of-line
Usage would be with something like Python's re.findall(), which returns all non-overlapping matches in the string (working from left to right, if that matters.) Don't use it with your equivalent of re.search() or re.match() which only return the first match found.
(NOTE: This actually doesn't work in Python because the lookbehind (?<=^|,) isn't fixed width. Grr. Open to suggestions on this one.)
Edit: Use a non-capturing group to consume start-of-line or comma, instead of a lookbehind, and it works in Python.
>>> test_str = '123,456,"String","String, with, commas","Zero-width fields next",,"",nyet,123'
>>> m = re.findall('(?:^|,)("[^"]*"|[^,]*)(?=,|$)',test_str)
>>> m
['123', '456', '"String"', '"String, with, commas"',
'"Zero-width fields next"', '', '""', 'nyet', '123']
Edit 2: The Ruby equivalent of Python's re.findall(needle, haystack) is haystack.scan(needle).
Maybe split will be better solution for this case?
'foo,bar,baz'.split(',')
=> ["foo", "bar", "baz"]
If I am interpreting your post correctly, you want everything separated by commas before the colon (:).
The appropriate regex for this would be:
[^\s:]*(,[^\s:]*)*(:.*)?
This should find everything you are looking for.

Regular expression Unix shell script

I need to filter all lines with words starting with a letter followed by zero or more letters or numbers, but no special characters (basically names which could be used for c++ variable).
egrep '^[a-zA-Z][a-zA-Z0-9]*'
This works fine for words such as "a", "ab10", but it also includes words like "b.b". I understand that * at the end of expression is problem. If I replace * with + (one or more) it skips the words which contain one letter only, so it doesn't help.
EDIT:
I should be more precise. I want to find lines with any number of possible words as described above. Here is an example:
int = 5;
cout << "hello";
//some comments
In that case it should print all of the lines above as they all include at least one word which fits the described conditions, and line does not have to began with letter.
Your solution will look roughly like this example. In this case, the regex requires that the "word" be preceded by space or start-of-line and then followed by space or end-of-line. You will need to modify the boundary requirements (the parenthesized stuff) as needed.
'(^| )[a-zA-Z][a-zA-Z0-9]*( |$)'
Assuming the line ends after the word:
'^[a-zA-Z][a-zA-Z0-9]+|^[a-zA-Z]$'
You have to add something to it. It might be that the rest of it can be white spaces or you can just append the end of line.(AFAIR it was $ )
Your problem lies in the ^ and $ anchors that match the start and end of the line respectively. You want the line to match if it does contain a word, getting rid of the anchors does what you want:
egrep '[a-zA-Z][a-zA-Z0-9]+'
Note the + matches words of length 2 and higher, a * in that place would signel chars too.

Resources