Ruby regular expressions for movie titles and ratings - ruby

The quiz problem:
You are given the following short list of movies exported from an Excel comma-separated values (CSV) file. Each entry is a single string that contains the movie name in double quotes, zero or more spaces, and the movie rating in double quotes. For example, here is a list with three entries:
movies = [
%q{"Aladdin", "G"},
%q{"I, Robot", "PG-13"},
%q{"Star Wars","PG"}
]
Your job is to create a regular expression to help parse this list:
movies.each do |movie|
movie.match(regexp)
title,rating = $1,$2
end
# => for first entry, title should be Aladdin, rating should be G,
# => WITHOUT the double quotes
You may assume movie titles and ratings never contain double-quote marks. Within a single entry, a variable number of spaces (including 0) may appear between the comma after the title and the opening quote of the rating.
Which of the following regular expressions will accomplish this? Check all that apply.
regexp = /"([^"]+)",\s*"([^"]+)"/
regexp = /"(.*)",\s*"(.*)"/
regexp = /"(.*)", "(.*)"/
regexp = /(.*),\s*(.*)/
Would someone explain why the answer was (1) and (2)?

Would someone explain why the answer was (1) and (2)?
The resulting strings will be similar to "Aladdin", "G" let's take a look at the correct answer #1:
/"([^"]+)",\s*"([^"]+)"/
"([^"]+)" = at least one character that is not a " surrounded by "
, = a comma
\s* = a number of spaces (including 0)
"([^"]+)" = like first
Which is exactly the type of strings you will get. Let's take a look at the above string:
"Aladdin", "G"
#^1 ^2^3^4
Now let's take at the second correct answer:
/"(.*)",\s*"(.*)"/
"(.*)" = any number (including 0) of almost any character surrounded by ".
, = a comma
\s* = any number of spaces (including 0)
"(.*)" = see first point
Which is correct as well as the following irb session (using Ruby 1.9.3) shows:
'"Aladdin", "G"'.match(/"([^"]+)",\s*"([^"]+)"/) # number 1
# => #<MatchData "\"Aladdin\", \"G\"" 1:"Aladdin" 2:"G">
'"Aladdin", "G"'.match(/"(.*)",\s*"(.*)"/) # number 2
# => #<MatchData "\"Aladdin\", \"G\"" 1:"Aladdin" 2:"G">
Just for completeness I'll tell why the third and fourth are wrong as well:
/"(.*)", "(.*)"/
The above regex is:
"(.*)" = any number (including 0) of almost any character surrounded by "
, = a comma
= a single space
"(.*)" = see first point
Which is wrong because, for example, Aladdin takes more than one character (the first point) as the following irb session shows:
'"Aladdin", "G"'.match(/"(.*)", "(.*)"/) # number 3
# => nil
The fourth regex is:
/(.*),\s*(.*)/
which is:
(.*) = any number (including 0) of almost any character
, = a comma
\s* = any number (including 0) of spaces
(.*) = see first point
Which is wrong because the text explicitly says that the movie titles do not contain any number of " character and that are surrounded by double quotes. The above regex does not checks for the presence of " in movie titles as well as the needed surrounding double quotes, accepting strings like "," (which are not valid) as the following irb session shows:
'","'.match(/(.*),\s*(.*)/) # number 4
# => #<MatchData "\",\"" 1:"\"" 2:"\"">

Related

How to put comma after 3 digits in numeric variable in vbscript?

i want to put a comma after 3 digits in a numeric variable in vbscript
w_orimpo = getvalue(rsmodifica , "w_orimpo")
w_orimpo = FormatNumber(w_orimpo,2)
The initial value of w_orimpo is 21960.
If I use FormatNumber I get the value 21,960.
But I would like to get the following one -> 219,60
We can handle this via a regex replacement:
Dim input, output, regex1, regex2
Set input = "21960"
Set regex1 = New RegExp
Set regex2 = New RegExp
regex1.Pattern = "(\d{3})"
regex1.Global = True
regex2.Pattern = ",$"
output = regex1.Replace(StrReverse(input), "$1,")
output = StrReverse(regex2.Replace(output, ""))
Rhino.Print output
Note that two regex replacements are needed here because VBScript's regex engine does not support lookarounds. There is a single regex pattern which would have gotten the job done here:
(\d{3})(?!$)
This would match (and capture) only groups of three digits at a time, and only if those three digits are not followed by the end of the input. This is needed to cover the following edge case:
123456 -> 123,456
We don't want a comma after the final group of three digits. My answer gets around this problem by doing another regex replacement to trim off any trailing comma.
Or without regex:
Mid(CStr(w_orimpo), 1, 3) & "," & Mid(CStr(w_orimpo), 4)
Or
Dim divider
divider = 10 ^ (Len(CStr(w_orimpo)) - 3)
w_orimpo = FormatNumber(w_orimpo / divider, 2)

Regex cuts word if end of string

I want to check and capture 2 or x words after and before a target string in a multiline text. The problem is that if the words matched are less than x number of words, then regex cuts off the last word and splits it till x.
For example
text = "This is an example /year"
if example is the target:
Matching Data: "is" , "an", "/yea", "r"
If i add random words after /year it matches it correctly.
How could I fix this so that if less than x words exist just stop there or return empty for the rest of the matches?
So it should be
Matching Data: "is" , "an", "/year", ""
def checkWords(target, text, numLeft = 2, numRight = 2)
target = target.compact.map{|x| x.inspect}.join('').gsub(/"/, '')
regex = ""
regex += "\\s+{,2}(\\S+)\\s+{,2}" * numLeft
regex += target
regex += "\\s+{,2}(\\S+)" * numRight
pattern = Regexp.new(regex)
matches = pattern.match(text)
puts matches.inspect
end
Since you want to capture the words before and after target, you need to set a capturing group around the whole regex parts that match the 0 to 2 occurrences of spaces-non-spaces. Also, you need to allow a minimum bound of 0 - use {0,2} (or a more succint {,2}) limiting quantifier to make sure you get the context on the left even if it is missing on the right:
/((?:\S+\s+){,2})target((?:\s+\S+){,2})/
^ ^ ^ ^
See this Rubular demo
If you use /(?:(\S+)\s+){0,2}target(?:\s+(\S+)){0,2}/, all captured values but the last one will be lost, i.e. once quantified, repeated capturing groups only store the value captured during the last iteration in the group buffer.
Also note that setting a {,2} quantifier on the + quantifier makes no sense, \\s+{,2} = \\s+.

Splitting the content of brackets without separating the brackets ruby

I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"

string capture between duplicates in ruby

string = 'xabcdexfghijk'
In the example above, 'x' appears twice. I want to capture everything between the first 'x' and the next 'x'. Thus, the desired result is a new string that equals 'xabcdex'. Any ideas?
You could use a simple regular expression: /x.*?x/. This basically means "match any characters in between two x characters, as few times as possible (non-greedy)".
The matched text can be extracted with String#[regexp]
string = 'xabcdexfghijk'
string[/x.*?x/] # => "xabcdex"

Ruby Regex gsub - everything after string

I have a string something like:
test:awesome my search term with spaces
And I'd like to extract the string immediately after test: into one variable and everything else into another, so I'd end up with awesome in one variable and my search term with spaces in another.
Logically, what I'd so is move everything matching test:* into another variable, and then remove everything before the first :, leaving me with what I wanted.
At the moment I'm using /test:(.*)([\s]+)/ to match the first part, but I can't seem to get the second part correctly.
The first capture in your regular expression is greedy, and matches spaces because you used .. Instead try:
matches = string.match(/test:(\S*) (.*)/)
# index 0 is the whole pattern that was matched
first = matches[1] # this is the first () group
second = matches[2] # and the second () group
Use the following:
/^test:(.*?) (.*)$/
That is, match "test:", then a series of characters (non-greedily), up to a single space, and another series of characters to the end of the line.
I am guessing you want to remove all the leading spaces before the second match too, hence I have \s+ in the expression. Otherwise, remove the \s+ from the expression, and you'll have what you want:
m = /^test:(\w+)\s+(.*)/.match("test:awesome my search term with spaces")
a = m[1]
b = m[2]
http://codepad.org/JzuNQxBN

Resources