regular expression extract string between two strings - ruby

I am trying to extract strings using regexp. For example in the following string:
select DESCENDANTS([Customer].[Yearly Income],,LEAVES) on axis(0),
DESCENDANTS([Sales Territory].[Sales Territory],,LEAVES) on axis(1),
DESCENDANTS([Customer].[Total Children],,LEAVES) on axis(2)
from [Adventure Works]
where [Measures].[Internet Sales Amount]
I want to extract the substring between every pair of "DESCENDANTS(" and ",,".
So the result in this case would be: [Customer].[Yearly Income], [Sales Territory].[Sales Territory], [Customer].[Total Children]
Any help is appreciated. Thanks in advance.

If you have your text in a string called query you can do:
query.scan(/DESCENDANTS\((.+),,/).flatten
=> ["[Customer].[Yearly Income]", "[Sales Territory].[Sales Territory]",
"[Customer].[Total Children]"]
Some notes:
\( matches the literal open bracket
(.+) remembers the characters between the open bracket and the two commas as a capture
If the regexp contains captures () then scan will return an array of arrays of the captured parts for each match. In this case there is only 1 capture per match so flatten can be used to return a single array of all the matches we are interested in.

/DESCENDANTS\(([^,]+),,/
See it on rubular

Here's an uglier variation that uses split: split on "DESCENDANTS(" and ",,", and take every other substring:
s.split(/DESCENDANTS\(|,,/).each_with_index.inject([]) {|m,(e,i)| m << e if i.odd?; m}

.+? is more safe, it works correctly if SQL is in one line.
query.scan(/DESCENDANTS\((.+?),,/).flatten

Related

Regular expression to remove a portion of text from each entry in commas separated list

I have a string of comma separated values, that I want to trim down for display purpose.
The string is a comma separated list of values of varying lengths and number of list entries.
Each entry in the list is formatted as a five character pattern in the format "##-NX" followed by some text.
e.g., "01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc..."
Is there an regular expression function I can use to remove the text after the 5 character prefix portion of each entry in the list, returning "01-NX, 02-NX, 09-NX, 12-NX,..."?
I am a novice with regular expressions and I haven't been able figure out how to code the pattern.
I think what you need is
regexp_replace(regexp_replace(mystring, '(\d{2}-NX)(.*?)(,)', '\1\3'), '(\d{2}.*NX).*', '\1')
The inner REGEXP_REPLACE looks for a pattern like nn-NX (two numeric characters followed by "-NX") and any number of characters up to the next comma, then replaces it with the first and third term, dropping the "any number of characters" part.
The outer REGEXP_REPLACE looks for a pattern like two numeric characters followed by any number of characters up to the last NX, and keeps that part of the string.
Here is the Oracle code I used for testing:
with a as (
select '01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc.' as myString
from dual
)
select mystring
, regexp_replace(regexp_replace(mystring, '(\d{2}-NX)(.*?)(,)', '\1\3'), '(\d{2}.*NX).*', '\1') as output
from a
This alternative calls REGEXP_REPLACE() once.
Match 2 digits, a dash and 'NX' followed by any number of zero or more characters (non-greedy) where followed by a comma or the end of the string. Replace with the first group and the 3rd group which will be either the comma or the end of the string.
EDIT: Took dougp's advice and eliminated the RTRIM by adding the 3rd capture group. Thanks for that!
WITH tbl(str) AS (
SELECT '01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc.' FROM dual
)
SELECT
REGEXP_REPLACE(str, '(\d{2}-NX)(.*?)(,|$)', '\1\3') str
from tbl;

Split a string and remove the first element in string

Original string '4.0.0-4.0-M-672092'
How to modify the Original string to "4.0-M-672092" using a one line code.
Any Help is highly appreciated .
Thanks and Regards
The 'split' method works in this case
https://apidock.com/ruby/String/split
'4.0.0-4.0-M-672092'.split('-')[1..-1].join('-')
# => "4.0-M-672092"
Just be careful, in this application is fine, but in long texts this might become unoptimized, since it splits all the string and then joins the array all over again
If you need this in wider texts to be more optimized, you can find the "-" index (which is your split) and use the next position to make a substring
text = '4.0.0-4.0-M-672092'
text[(text.index('-') + 1)..-1]
# => "4.0-M-672092"
But you can't do it in one line, and not finding a split character will result in an error, so use a rescue statement if that is possible to happen
Simplest way:
'4.0.0-4.0-M-672092'.split('-', 2).second
"4.0.0-4.0-M-672092"[/(?<=-).*/]
#=> "4.0-M-672092"
The regular expression reads, "Match zero or more characters other than newlines, as many as possible (.*), provided the match is preceded by a hyphen. (?<=-) is a positive lookbehind. See String#[].

How to replace _A_&_B_ using gsub in R

I am trying to join two columns containing company names from two distinct data tables on R. In one column I have the pattern _A_&_B_ where A and B can be any letters. I would like to get rid of those two letters i.e letter of length 1 surrounded by _
So if I have John_K_&_E_Scott I would like to have John__&__Scott as I can remove the punctuation. I have tried the below
names[, JOINING_ID := gsub("[A-Za-z]_&_[A-Za-z]\\w", "", JOINING_ID)]
But this transforms John_A_&_ BOYS_ in John__&_ OYS_ which is not what I want.
Use the following regex pattern:
_[[:alpha:]]_&_[[:alpha:]]_
and replace with __&__. See the regex demo. It won't match strings like John_A_&_BOYS_ and thus there won't be issues like the one you are having.
Note that [[:alpha:]] matches any letter.
R usage:
gsub("_[[:alpha:]]_&_[[:alpha:]]_", "__&__", JOINING_ID)
Or, if you only expect 1 match per string, use sub:
sub("_[[:alpha:]]_&_[[:alpha:]]_", "__&__", JOINING_ID)

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Ruby Regex Match Between "foo" and "bar"

I have unfortunately wandered into a situation where I need regex using Ruby. Basically I want to match this string after the underscore and before the first parentheses. So the end result would be 'table salt'.
_____ table salt (1) [F]
As usual I tried to fight this battle on my own and with rubular.com. I got the first part
^_____ (Match the beginning of the string with underscores ).
Then I got bolder,
^_____(.*?) ( Do the first part of the match, then give me any amount of words and letters after it )
Regex had had enough and put an end to that nonsense and crapped out. So I was wondering if anyone on stackoverflow knew or would have any hints on how to say my goal to the Ruby Regex parser.
EDIT: Thanks everyone, this is the pattern I ended up using after creating it with rubular.
ingredientNameRegex = /^_+([^(]*)/;
Everything got better once I took a deep breath, and thought about what I was trying to say.
str = "_____ table salt (1) [F]"
p str[ /_{3}\s(.+?)\s+\(/, 1 ]
#=> "table salt"
That says:
Find at least three underscores
and a whitespace character (\s)
and then one or more (+) of any character (.), but as little as possible (?), up until you find
one or more whitespace characters,
and then a literal (
The parens in the middle save that bit, and the 1 pulls it out.
Try this: ^[_]+([^(]*)\(
It will match lines starting with one or more underscores followed by anything not equal to an opening bracket: http://rubular.com/r/vthpGpVr4y
Here's working regex:
str = "_____ table salt (1) [F]"
match = str.match(/_([^_]+?)\(/)
p match[1].strip # => "table salt"
You could use
^_____\s*([^(]+?)\s*\(
^_____ match the underscore from the beginning of string
\s* matches any whitespace character
( grouping start
[^(]+ matches all non ( character at least once
? matches the shortest possible string (non greedy)
) grouping end
\s* matches any whitespace character
\( find the (
"_____ table salt (1) [F]".gsub(/[_]\s(.+)\s\(/, ' >>>\1<<< ')
# => "____ >>>table salt<<< 1) [F]"
It seems to me the simplest regex to do what you want is:
/^_____ ([\w\s]+) /
That says:
leading underscores, space, then capture any combination of word chars or spaces, then another space.

Resources