Regex match all strings between 2 characters - ruby

I tried a couple of other links like Regex Match all characters between two strings and Regex get all content between two characters
but they don't seem to fit this use case.
I want to get all the names, potato and tomato. Eg, from | to >.
text = "with <#U0D08NR3|potato> and <#U1698M96|tomato> please"
text.scan((?<=|).*?(?=>)) doesnt seem to work either..
Please guide me regex gods.

You forgot to escape the pipe (|), which is now interpreted as the indicator for an alternation
Your regex with the escaped pipe:
(?<=\|).*?(?=>)
Here you can see the result

Just escape the | : (?<=\|).*?(?=>). Without it, the positive lookbehind means match anything

Try this:
text.scan(/\|(\w*)\>/).flatten
# Returns => ["potato", "tomato"]
Not exactly sure why this works. Something to do with greedy and non-greedy matching. See this

Related

Ruby regex syntax for "not matching one of the following"

Nice simple regex syntax question for you.
I have a block of text and i want to find instances of href=" or href=' which are NOT followed by either [ or http://
I can get "not followed by [" with
record.body =~ /href=['"](?!\[)/
and i can get "not followed by http://" with
record.body =~ /href=['"](?!http\:\/\/)/
But i can't quite work out how to combine the two.
Just to be clear: i want to find bad strings like this
`href="www.foo.com"`
but i'm ok with (ie don't want to find) strings like this
`href="http://www.foo.com"`
`href="[registration_url]"`
Combine the both by using the alternation operator.
href=['"](?!http\:\/\/|\[)
For more specific, it would be.
href=(['"])(?!http\:\/\/|\[)(?:(?!\1).)*\1
This would handle both single quoted or double quoted string in the href part. And this won't match the strings like href='foo.com" or href="foo.com' (unmatched quotes)
(['"]) would capture double quote or single quote. (?!http\:\/\/|\[) and the matched quote won't be followed by http:// or [, if yes, then it moves on to the next pattern. (?:(?!\1).)* matches any character but not of the captured character, zero or more times. \1 followed by the captured character.
DEMO
Use alternative list with pipe | symbol to combine the look-ahead conditions:
(?!http\:\/\/|\[)
So, to match the hrefs, you can use the following regex:
href=\"((?!http\:\/\/|\[)[^\"]+?)\"
See demo on Rubular.com.

ruby regex match any character besides a specific one

I am looking for a way to match any character besides, for example, a "#."
It would look something like...
gsub(/^foo.*foo$/)
But I'd want it to match
"foofdfdfdfoo"
But not
"fooddgdgd#fdfoo"
Thanks.
^[^#]+$
http://rubular.com/r/glijo99dU9
gsub is for substitution. If you just want to match, the .match method
To expand on Explosion Pills answer, a caret (^) will negate the match in a regex. This means that it will not match if the characters following it are found in the expression. You can read more about it in the documentation.

ruby remove variable length string from regular expression leaving hyphen

I have a string such as this: "im# -33.870816,151.203654"
I want to extract the two numbers including the hyphen.
I tried this:
mystring = "im# -33.870816,151.203654"
/\D*(\-*\d+\.\d+),(\-*\d+\.\d+)/.match(mystring)
This gives me:
33.870816,151.203654
How do I get the hyphen?
I need to do this in ruby
Edit: I should clarify, the "im# " was just an example, there can be any set of characters before the numbers. the numbers are mostly well formed with the comma. I was having trouble with the hyphen (-)
Edit2: Note that the two nos are lattidue, longitude. That pattern is mostly fixed. However, in theory, the preceding string can be arbitrary. I don't expect it to have nos. or hyphen, but you never know.
How about this?
arr = "im# -33.2222,151.200".split(/[, ]/)[1..-1]
and arr is ["-33.2222", "151.200"], (using the split method).
now
arr[0].to_f is -33.2222 and arr[1].to_f is 151.2
EDIT: stripped "im#" part with [1..-1] as suggested in comments.
EDIT2: also, this work regardless of what the first characters are.
If you want to capture the two numbers with the hyphen you can use this regex:
> str = "im# -33.870816,151.203654"
> str.match(/([\d.,-]+)/).captures
=> ["33.870816,151.203654"]
Edit: now it captures hyphen.
This one captures each number separetely: http://rubular.com/r/NNP2OTEdiL
Note: Using String#scan will match all ocurrences of given pattern, in this case
> str.scan /\b\s?([-\d.]+)/
=> [["-33.870816"], ["151.203654"]] # Good, but flattened version is better
> str.scan(/\b\s?([-\d.]+)/).flatten
=> ["-33.870816", "151.203654"]
I recommend you playing around a little with Rubular. There's also some docs about regegular expressions with Ruby:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UJ
http://www.regular-expressions.info/ruby.html
http://www.ruby-doc.org/core-1.9.3/Regexp.html
Your regex doesn't work because the hyphen is caught by \D, so you have to modify it to catch only the right set of characters.
[^0-9-]* would be a good option.

Regex - Matching text AFTER certain characters

I want to scrape data from some text and dump it into an array. Consider the following text as example data:
| Example Data
| Title: This is a sample title
| Content: This is sample content
| Date: 12/21/2012
I am currently using the following regex to scrape the data that is specified after the 'colon' character:
/((?=:).+)/
Unfortunately this regex also grabs the colon and the space after the colon. How do I only grab the data?
Also, I'm not sure if I'm doing this right.. but it appears as though the outside parens causes a match to return an array. Is this the function of the parens?
EDIT: I'm using Rubular to test out my regex expressions
You could change it to:
/: (.+)/
and grab the contents of group 1. A lookbehind works too, though, and does just what you're asking:
/(?<=: ).+/
In addition to #minitech's answer, you can also make a 3rd variation:
/(?<=: ?)(.+)/
The difference here being, you create/grab the group using a look-behind.
If you still prefer the look-ahead rather than look-behind concept. . .
/(?=: ?(.+))/
This will place a grouping around your existing regex where it will catch it within a group.
And yes, the outside parenthesis in your code will make a match. Compare that to the latter example I gave where the entire look-ahead is 'grouped' rather than needlessly using a /( ... )/ without the /(?= ... )/, since the first result in most regular expression engines return the entire matched string.
I know you are asking for regex but I just saw the regex solution and found that it is rather hard to read for those unfamiliar with regex.
I'm also using Ruby and I decided to do it with:
line_as_string.split(": ")[-1]
This does what you require and IMHO it's far more readable.
For a very long string it might be inefficient. But not for this purpose.
In Ruby, as in PCRE and Boost, you may make use of the \K match reset operator:
\K keeps the text matched so far out of the overall regex match. h\Kd matches only the second d in adhd.
So, you may use
/:[[:blank:]]*\K.+/ # To only match horizontal whitespaces with `[[:blank:]]`
/:\s*\K.+/ # To match any whitespace with `\s`
Seee the Rubular demo #1 and the Rubular demo #2 and
Details
: - a colon
[[:blank:]]* - 0 or more horizontal whitespace chars
\K - match reset operator discarding the text matched so far from the overall match memory buffer
.+ - matches and consumes any 1 or more chars other than line break chars (use /m modifier to match any chars including line break chars).

regular expression gsub only if it does not have anything before

Is there anyway to scan only if there is nothing before what I am scanning for.
For example I have a post and I am scanning for a forward slash and what follows it but I do not want to scan for a forward slash if it is not the beginning character.
I want to scan for /this but I do not want to scan for this/this or http://this.com.
The regular expression I am currently using is..
/\/(\w+)/
I am using this with gsub to link each /forwardslash.
I think what you are asking for is to only match words that begin with '/', not strings or lines beginning with '/'. If that is true, I believe the following regex will work: %r{(?:^|\s+)/(\w+)}:
For example:
"/foo /this this/that http://this".scan %r{(?:^|\s+)/(\w+)} # => [["foo"], ["this"]]
The caret (^) character means "beginning of string" -- a dollar sign ($) means "end of string."
So
/^\/(\w+)/
...will get you what you want -- only matching at the beginning of the string.
First thing, since you're using a regex with slashes change the delimiter to something else, then you won't have to escape the backslashes and it will be easier to read.
Secondly, if you want to replace the slash as well then include it in the capture.
On to the regex.
...if it is not the beginning
character...
...of a line:
!^(/\w+)!
if it is not the beginning
character...
...of a word:
!\s(/\w+)!
but that won't match if it's at the very beginning of a line. For that you'll need something a lot more complex, so I'd just run both the regexes here instead of creating that monster.

Resources