Submatching repeating pattern - ruby

I am trying to put together a regexp in VBA, but even in ruby I can't get it right.
the string:
<thead class="thead"><tr><th>FECHA</th><th>ITLUPVALOR</th><th>ITLUPPLAZO</th><th>ITLUP30DIAS</th><th>ITLUP60DIAS</th><th>ITLUP90DIAS</th><th>ITLUP180DIAS</th><th>ITLUP270DIAS</th><th>ITLUP360DIAS</th><th>ITLUP720DIAS</th><th>ITLUP1080DIAS</th><th>ITLUP1440DIAS</th><th>ITLUP1800DIAS</th></tr></thead>
what i have tried:
/(?:<thead class=\"thead\"><tr>)(<th>[^<]+?<\/th>)+(?:<\/tr><\/thead>)/m
The idea here (http://rubular.com/r/BpbPszctTw) was to have 9 submatches instead of one.
What am I missing?

Sorry, but a regex repeating group will only capture the last match in a group. See http://www.regular-expressions.info/captureall.html for more info.
Update: True, but if you let the regex match do the repeating for you, as in the other answer, you can get multiple matches, per http://rubular.com/r/BclU13qWYm ! In other words, accept the other answer, not this one. :-)

With this pattern you can obtain what you want:
/<thead class="thead"><tr>|\G<th>([^<]+)<\/th>/
Just remove the first result.

Related

simple symbol regex solution

The problem I'm looking at says only inputs with '+' symbols covering any letters in the string is true so like "+d++" or "+d+==+a+" but not
"f++d+"
"3+a=+b+"
"++d+=c+"
I tried to solve this using regex since it's kind of a string pattern matching problem. /(+[a-z][^+])|([^+.][a-z]+)/ but this does not cover patterns where the letters are at the beginning or end of the string. I need help something more comprehensive.
You should try following
/^\+{0,2}[a-z0-9]+\+{0,2}(=*\+{0-2}[a-z0-9]+\+{0,2})*$/
You could use the below regex.
^(?:[^\w\n]*\+[a-z]+\+)+[^\w\n]*$
DEMO
If you want to match +f+g+ also, then put the following + inside a positive lookahead assertion.
^(?:[^\w\n]*\+[a-z]+(?=\+))+[^\w\n]*$
DEMO

ruby regex: match URL recurring pattern

I want to be able to match all the following cases below using Ruby 1.8.7.
/pages/multiedit/16801,16809,16817,16825,16833
/pages/multiedit/16801,16809,16817
/pages/multiedit/16801
/pages/multiedit/1,3,5,7,8,9,10,46
I currently have:
\/pages\/multiedit\/\d*
This matches upto the first set of numbers. So for example:
"/pages/multiedit/16801,16809,16817,16825,16833"[/\/pages\/multiedit\/\d*/]
# => "/pages/multiedit/16801"
See http://rubular.com/r/ruFPx5yIAF for example.
Thanks for the help, regex gods.
\/pages\/multiedit\/\d+(?:,\d+)*
Example: http://rubular.com/r/0nhpgki6Gy
Edit: Updated to not capture anything... Although the performance hit would be negligible. (Thanks Tin Man)
The currently accepted answer of
\/pages\/multiedit\/[\d,]+
may not be a good idea because that will also match the following strings
.../pages/multiedit/,,,
.../pages/multiedit/,1,
My answer requires there be at least one digit before the first comma, and at least one digit between commas, and it must end with a digit.
I'd use:
/\/pages\/multiedit\/[\d,]+/
Here's a demonstration of the pattern at http://rubular.com/r/h7VLZS1W1q
[\d,]+ means "find one or more numbers or commas"
The reason \d* doesn't work is it means "find zero or more numbers". As soon as the pattern search runs into a comma it stops. You have to tell the engine that it's OK to find numbers and commas.

Ruby Regular Expressions: Matching if substring doesn't exist

I'm having an issue trying to capture a group on a string:
"type=gist\nYou need to gist this though\nbecause its awesome\nright now\n</code></p>\n\n<script src=\"https://gist.github.com/3931634.js\"> </script>\n\n\n<p><code>Not code</code></p>\n"
My regex currently looks like this:
/<code>([\s\S]*)<\/code>/
My goal is to get everything in between the code brackets. Unfortunately, it's matching up to the 2nd closing code bracket Is there a way to match everything inside the code brackets up until the first occurrence of ending code bracket?
All repetition quantifiers in regular expressions are greedy by default (matching as many characters as possible). Make the * ungreedy, like this:
/<code>([\s\S]*?)<\/code>/
But please consider using a DOM parser instead. Regex is just not the right tool to parse HTML.
And I just learned that for going through multiple parts, the
String.scan( /<code>(.*?)<\/code>/ ){
puts $1
}
is a very nice way of going through all occurences of code - but yes, getting a proper parser is better...

Regular expression syntax

I have a similar problem, to a previously asked question. But similar practices apparently do not produce similar results.
Previous Question
New question - I want to match the lines beginning in T as the first match, and the following lines beginning with X as the second match (as a whole string, to be later matched by another regex)
What I have so far is (^T(\d+)\n(.*?)(?:the_problem)/m) I don't know what to replace "the_problem" with, or even if that is the issue. I assumed some rendition (?:\n|\z), but apparently not. Everything I tried, would not count the next occurrence of ^T(\d+) as the start of a new group, and continue to capture all of the lines between each occurrence, at the same time.
Sample text;
T01C0.025
T02C0.035
T03C0.055
T04C0.150
T05C0.065
T06C0.075
%
G05
G90
T01
X011200Y004700
X011200Y009700
X018500Y011200
X013500Y-011200
X023800Y019500
T02
X034800Y017800
X-033800Y-017800
X032800Y017800
T03
X036730Y003000
X038700Y003000
X040668Y-003000
X059230Y003000
T04
X110580Y017800
X023800Y027300
X095500Y028500
X005500Y-006500
X021500Y-006500
T05
X003950Y002000
X003950Y004500
X003950Y007000
T06
X026300Y027300
M30
I only want to capture the shorter version of T01, T02,...T0n, not the longer version at the top, then the entire collection of ^X(-?\d+)Y(-?\d+) that follows it, as another match.
Result 1.
Match 1. T01
Match 2. X011200Y004700
X011200Y009700
X018500Y011200
X013500Y-011200
X023800Y019500
Result 2.
Match 1. T02
Match 2. X034800Y017800
X-033800Y-017800
X032800Y017800
Result 3.
Match 1. T03
Match 2. X036730Y003000
X038700Y003000
....etc....
Thanks in advance for any help ;-) Note: I prefer to use raw Ruby, without extensions or plugins. My version of ruby is 1.8.6.
Try this instead:
^(T[^\s]+)[\n\r\s]((?:(?:X\S+)[\n\r\s])+)
It makes the groups for the X lines into non-capturing groups, then puts all the repetitions of the final pattern into a single group. All the X lines will be in a single capture.
You can test this using Rubular (an indispensable tool for developing regular expressions) http://rubular.com/r/PRnurKy64Q
this seems to work...
^(T[^\s]+)[\n\r\s]((X[^\s]+)[\n\r\s]){1,}
I'm not totally sure I understand your problem, but I'll give this a shot. It looks like you want:
/(^T\d+$(^X[-A-Z\d]+$)+)*/g
This will have to be run under multiline mode so that ^ and $ match after and before newlines. Word of caution: I don't have much practice with mulitline regex, so you might want to do a sanity check on the use of ^ and $.
Also, I notice you didn't include the lines similar to T01C0.025 in your sample results, so I made the T\d+ assumption based on that.

Optimal Regular Expression: match sets of lines starting with

Alright, this one's interesting. I have a solution, but I don't like it.
The goal is to be able to find a set of lines that start with 3 periods - not an individual line, mind you, but a collection of all the lines in a row that match. For example, here's some matches (each match is separated by a blank line):
...
...hello
...
...hello
...world
...
...wazzup?
...
My solution is as follows:
^\.\.\..*(\n\.\.\..*)*$
It matches all those, so it's what I'm using for now - however, it looks kinda silly to repeat the \.\.\..* pattern. Is there a simpler way?
Please test your regex before submitting it, rather than submit what "should work." For example, I tried the following first:
(^\.\.\..*$)+
which only returned individual lines, even though in my mind it looks like it would do the trick - I guess I just don't understand regex internals. (And no, I didn't need to set any flags to get ^ and $ to match line boundaries, since I'm implementing this in Ruby.)
So I'm not totally sure there's a good answer, but one would be much appreciated - thanks in advance!
In most regex implementations you can shorten \.\.\. using \.{3} so your solution would turn into \.{3}.*(\n\.{3}.*)*.
What you already have is already simple and understandable. Keep in mind that more "clever" RegExps may very well be slower and undoubtedly less readable.
Assuming lines are terminated by a \n:
((^|\n)\.{3}[^\n]*)+
I am not familiar with Ruby, so depending on how it returns matches you might need to "nonmatch" groups:
((?:(?:^|\n)\.{3}[^\n]*)+)
^([.]{3}.*$\n?)+
This doesn't really need $ in there.
You are pretty close to a solution with (^\.\.\..*$)+, but because the + modifier is on the outside of the group, it is getting overwritten each time and you are only left with the last line. Try wrapping it in an outer group: ((^\.\.\..*$)+) and looking at the first submatch and ignoring the inner one.
Combined with the other suggestion: ((^\.{3}.*$)+)

Resources