Splitting the content of brackets without separating the brackets ruby - ruby

I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.

If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.

A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"

Related

gsub numbers and +

I'm saving a number with params[:number].gsub(/\D/,''), but I don't want to strip the plus symbol: +
For example if a user saves number +1 (516) 949-9508 it saves as 15169499508 but how can we preserve the + as +15169499508?
In Ruby \D is just an alias for [^0-9]. You may explicitly set [^0-9+]:
params[:number].gsub(/[^0-9+]/,'')
I understand you only want to keep a plus only at the start of the string. You need to use:
.gsub(/\A(\+)|\D+/, '\1')
Here, \A(\+) branch matches a literal plus at the start of the string. The second branch is your \D that matches all chars but digits, just with a + quantifier that matches 1 or more occurrences. The \1 backreference restores that initial plus symbol in the resulting string.
If you don't have any syntactic rules, delete would work just fine:
'+1 (516) 949-9508'.delete('^0-9+') #=> "+15169499508"

Regex matching chars around text

I have a string with chars inside and I would like to match only the chars around a string.
"This is a [1]test[/1] string. And [2]test[/2]"
Rubular http://rubular.com/r/f2Xwe3zPzo
Currently, the code in the link matches the text inside the special chars, how can I change it?
Update
To clarify my question. It should only match if the opening and closing has the same number.
"[2]first[/2] [1]second[/2]"
In the code above, only first should match and not second. The text inside the special chars (first), should be ignored.
Try this:
(\[[0-9]\]).+?(\[\/[0-9]\])
Permalink to the example on Rubular.
Update
Since you want to remove the 'special' characters, try this instead:
foo = "This is a [1]test[/1] string. And [2]test[/2]"
foo.gsub /\[\/?\d\]/, ""
# => "This is a test string. And test"
Update, Part II
You only want to remove the 'special' characters when the surrounding tags match, so what about this:
foo = "This is a [1]test[/1] string. And [2]test[/2], but not [3]test[/2]"
foo.gsub /(?:\[(?<number>\d)\])(?<content>.+?)(?:\[\/\k<number>\])/, '\k<content>'
# => "This is a test string. And test, but not [3]test[/2]"
\[([0-9])\].+?\[\/\1\]
([0-9]) is a capture since it is surrounded with parentheses. The \1 tells it to use the result of that capture. If you had more than one capture, you could reference them as well, \2, \3, etc.
Rubular
You can also use a named capture, rather than \1 to make it a little less cryptic. As in: \[(?<number>[0-9])\].+?\[\/\k<number>\]
Here's a way to do it that uses the form of String#gsub that takes a block. The idea is to pull strings such as "[1]test[/1]" into the block, and there remove the unwanted bits.
str = "This is a [1]test[/1] string. And [2]test[/2], plus [3]test[/99]"
r = /
\[ # match a left bracket
(\d+) # capture one or more digits in capture group 1
\] # match a right bracket
.+? # match one or more characters lazily
\[\/ # match a left bracket and forward slash
\1 # match the contents of capture group 1
\] # match a right bracket
/x
str.gsub(r) { |s| s[/(?<=\]).*?(?=\[)/] }
#=> "This is a test string. And test, plus [3]test[/99]"
Aside: When I first heard of named capture groups, they seemed like a great idea, but now I wonder if they really make regexes easier to read than \1, \2....

ruby 'gsub' to snake case

The following code from a book is supposed to transfer "FOO92OBAR" to "FOO92_O_BAR":
gsub(/([a-z\d])([A-Z])/, '\1_\2')
Can anyone explain how this works?
([a-z\d]) looks for a lowercase letter (a-z) or a number (\d means a digit). The () around the whole thing assign the result to regex subgroup 1.
([A-Z]) then looks for an uppercase letter, assigning the result to group 2. So the whole thing looks for a lowercase-or-digit followed by an uppercase letter. The second part, '\1_\2', means "regex group 1 followed by regex group 2"
gsub replaces every time it sees a lowercase-or-digit followed by an uppercase letter with (the first thing)_(the second thing).
So actually FOO92OBAR will be FOO92_OBAR.
For FOO92OBAR to become FOO92_O_BAR, the replace part should be '\1_\2_' (since only the O is the second part.. BAR is not matched, so not replaced at all).
It works using regular expressions.
The two parameters of gsub are the match expression and the replacement. Because the match /([a-z\d])([A-Z])/ contains groups (identified by (...)), then you can reference a match in the replacement using \ID where the ID is the number of the group, starting from 1.
That said, the code gsub(/([a-z\d])([A-Z])/, '\1_\2')
# take any combination of
([a-z\d])([A-Z])
# which means any combinations of a (1) lower-case char or (2) digit
([a-z\d])
# followed by an (1) upper case letter
([A-Z])
# if any, replace it with
\1_\2
# that represents the first group
\1
# followed by _
# followed by the second group
\2
Please note that your example will generate FOO92_OBAR, not FOO92_O_BAR
2.1.5 :001 > string = "FOO92OBAR"
=> "FOO92OBAR"
2.1.5 :002 > string.gsub(/([a-z\d])([A-Z])/, '\1_\2')
=> "FOO92_OBAR"
The explanation is because there is only one case of a "lower-case char or digit" (and that is a digit) followed by an upper case char.
2.1.5 :003 > string.scan(/([a-z\d])([A-Z])/)
=> [["2", "O"]]
Regular expressions are case sensitive by default.

ruby parametrized regular expression

I have a string like "{some|words|are|here}" or "{another|set|of|words}"
So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket.
What is the most efficient way to get the selected word of that string ?
I would like do something like this:
#my_string = "{this|is|a|test|case}"
#my_string.get_column(0) # => "this"
#my_string.get_column(2) # => "is"
#my_string.get_column(4) # => "case"
What should the method get_column contain ?
So this is the solution I like right now:
class String
def get_column(n)
self =~ /\A\{(?:\w*\|){#{n}}(\w*)(?:\|\w*)*\}\Z/ && $1
end
end
We use a regular expression to make sure that the string is of the correct format, while simultaneously grabbing the correct column.
Explanation of regex:
\A is the beginnning of the string and \Z is the end, so this regex matches the enitre string.
Since curly braces have a special meaning we escape them as \{ and \} to match the curly braces at the beginning and end of the string.
next, we want to skip the first n columns - we don't care about them.
A previous column is some number of letters followed by a vertical bar, so we use the standard \w to match a word-like character (includes numbers and underscore, but why not) and * to match any number of them. Vertical bar has a special meaning, so we have to escape it as \|. Since we want to group this, we enclose it all inside non-capturing parens (?:\w*\|) (the ?: makes it non-capturing).
Now we have n of the previous columns, so we tell the regex to match the column pattern n times using the count regex - just put a number in curly braces after a pattern. We use standard string substition, so we just put in {#{n}} to mean "match the previous pattern exactly n times.
the first non skipped column after that is the one we care about, so we put that in capturing parens: (\w*)
then we skip the rest of the columns, if any exist: (?:\|\w*)*.
Capturing the column puts it into $1, so we return that value if the regex matched. If not, we return nil, since this String has no nth column.
In general, if you wanted to have more than just words in your columns (like "{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\na newline or two?}"), then just replace all the \w in the regex with [^|{}] so you can have each column contain anything except a curly-brace or a vertical bar.
Here's my previous solution
class String
def get_column(n)
raise "not a column string" unless self =~ /\A\{\w*(?:\|\w*)*\}\Z/
self[1 .. -2].split('|')[n]
end
end
We use a similar regex to make sure the String contains a set of columns or raise an error. Then we strip the curly braces from the front and back (using self[1 .. -2] to limit to the substring starting at the first character and ending at the next to last), split the columns using the pipe character (using .split('|') to create an array of columns), and then find the n'th column (using standard Array lookup with [n]).
I just figured as long as I was using the regex to verify the string, I might as well use it to capture the column.

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources