preg_match_all - get part between pattern and string - preg-match

In this example i have large string with many defined elements in it
part of example string here.
In this example i get matches from example file (Starting from first(~32) tilde, to ~120 3), which sloud be correct in my regex, but i need update regex so it get first closest match in reverse from ~120 3, so the result be:
PRIEDE EGLE BERZS LAPU KOKI
<?php
$regex = '/~[1-9](.*?)\~120 3/s';
preg_match($regex, $str, $matches);
echo '<pre>';
print_r($matches);
exit();
?>
So the question is:
How should i set direction to get part of string in "reverse"? If i match ~120 3, then i get all results from ~120 3 in reverse until i match tilde symbol+number - ~[1-9]?
Attached image of my currect regex result and marked few elements:
* Green - element which i know and in my imagination - will start search in reverse.
* Grey - the correct result.
* Red - firest match what was found in reverse from ~120 3
Thanks for recommendations in advance!

So the question is:
How should i set direction to get part of string in "reverse"? If i match ~120 3, then i get all results from ~120 3 in reverse until i match tilde symbol+number - ~[1-9]?
IT is not possible to change Boost regex matching direction within the input, however, you may use lookaheads to restrict the text matched.
Acc. to the requirements, you need
~[1-9]([^~]*(?:~(?![1-9])[^~]*)*)~120 3
See the regex demo.
Details:
~[1-9] - your initial delimiter
([^~]*(?:~(?![1-9])[^~]*)*) - Capturing group 1 matching:
[^~]* - any 0+ chars other than tilde
(?:~(?![1-9])[^~]*)* - 0+ sequences of:
~(?![1-9]) - a tilde that is not followed with a digit from 1 to 9
[^~]* - any 0+ chars other than tilde
~120 3 - end delimiter
However, it won't capture what you need since it will include some digits and space at the start. Maybe your starting delimiter should be ~[\d\s]+ and the lookahead then should be (?![\d\s]+). See another demo.

Related

Non-greedy subgroup Ruby regular expression matching

I'm trying to write a regex to parse the vendor, version, and format components of a media-type string, where the version will be after the final dash. For example:
matching on "vnd.mycompany-foo-bar-v1+json" should produce ['mycompany-foo-bar', 'v1', 'json']
matching on "vnd.mycompany-v1+json" should produce ['mycompany', 'v1', 'json']
matching on "vnd.mycompany+json" should produce ['mycompany', nil, 'json']
matching on "vnd.mycompany-foo-bar-v1" should produce ['mycompany-foo-bar', 'v1', nil]
So far the closest I've got is
/\Avnd\.([a-z0-9*.\-_!#\$&\^]+?)(?:-([a-z0-9*\-.]+))?(?:\+([a-z0-9*\-.+]+))?\z/
but matching against "vnd.mycompany-foo_bar-v1+json" gives me ['mycompany', 'foo-bar-v1', 'json'].
It's the possibly infinite number of dashes that's throwing me for a loop.
Regex:
\Avnd\.(.+?)(?:-([^-+]+))?(?:\+(.*))?\z
regex101 Demo
Break-down:
\Avnd\. Matches vnd. literally form the start of string
(.+?) Matches any char, as few as possible times [group 1]
(?:-([^-+]+))? Optional. Match a - followed by any number of chars except - and + [group 2]
(?:\+(.*))? Optional. Match a + followed by any chars. [group 3]
\z Until the end of string.
If the version is after the final dash, then version (and format) can't contain dashes. Just take them out of the character class.
/\Avnd\.([a-z0-9*.\-_!#\$&\^]+?)(?:-([a-z0-9*.]+))?(?:\+([a-z0-9*.+]+))?\z/

Ruby regex | Match enclosing brackets

I'm trying to create a regex pattern to match particular sets of text in my string.
Let's assume this is the string ^foo{bar}#Something_Else
I would like to match ^foo{} skipping entirely the content of the brackets.
Until now i figured out how to get all everything with this regex here \^(\w)\{([^\}]+)} but i really don't know how to ignore the text inside the curly brackets.
Anyone has an idea? Thanks.
Update
This is the final solution:
puts script.gsub(/(\^\w+)\{([^}]+)(})/, '[BEFORE]\2[AFTER]')
Though I'd prefer this with fewer groups:
puts script.gsub(/\^\w+\{([^}]+)}/, '[BEFORE]\1[AFTER]')
Original answer
I need to replace the ^foo{} part with something else
Here is a way to do it with gsub:
s = "^foo{bar}#Something_Else"
puts s.gsub(/(.*)\^\w+\{([^}]+)}(.*)/, '\1SOMETHING ELSE\2\3')
See demo
The technique is the same: you capture the text you want to keep and just match text you want to delete, and use backreferences to restore the text you captured.
The regex matches:
(.*) - matches and captures into Group 2 as much text as possible from the start
\^\w+\{ - matches ^, 1 or more word characters, {
([^}]+) - matches and captures into Group 2 1 or more symbols other than }
} - matches the }
(.*) - and finally match and capture into Group 3 the rest of the string.
If you mean to match ^foo{} by a single match against a regex, it is impossible. A regex match only matches a substring of the original string. Since ^foo{} is not a substring of ^foo{bar}#Something_Else, you cannot match that with a single match.

Splitting the content of brackets without separating the brackets ruby

I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"

Limit number of '1' in a string using Regexp

I am trying to make a Regexp to match a expression which has more or equal to two '1's.
Here is what I have written till now -
puts "Match." if /(1){1,5}/ =~ test_string
This correctly matches strings having '1' more than or equal to two, but it still matches if the numbers of occurrences of '1' is greater than 5.
How can I correct this Regexp to only match strings having 1 to 5 occurrences of 1?
There are possibly better versions, but this seems to do the trick:
/^([^1]*1){1,5}[^1]*$/
Broken down:
^ - Start of string
[^1]*1 - Zero or more non-1 characters
1 - A '1'.
([^1]*1){1,5} - This pattern occurring between one and five times.
[^1]* - Zero or more non-1 characters
$ - End of string
#Adrian Wragg already have explained the answer,as asked by OP.But I would like to propose another possible solution for this problem,which is below:
puts "Match." if "#{test_string}".count("1") >= 2
If you have strings which contain characters other than one, here is a Regex that will do the job. See an example here at Rubular.
/\A([^1]*1[^1]*){1,5}\Z/
This will match any strings with 2 or more ones. See an example here at Rubular.
/\A1{2,}\Z/
This will match any string with 1-5 ones. See an example here at Rubular.
/\A1{1,5}\Z/

How do I match repeated characters?

How do I find repeated characters using a regular expression?
If I have aaabbab, I would like to match only characters which have three repetitions:
aaa
Try string.scan(/((.)\2{2,})/).map(&:first), where string is your string of characters.
The way this works is that it looks for any character and captures it (the dot), then matches repeats of that character (the \2 backreference) 2 or more times (the {2,} range means "anywhere between 2 and infinity times"). Scan will return an array of arrays, so we map the first matches out of it to get the desired results.

Resources