How can I match line breaks between two asterisks in Ruby? I have this string
Foo **bar**
test **hello
world** 12345
and I only want to find
**hello
world**
I tried it with \*{1,2}(.*)\n(.*)\*{1,2} but this matches
**bar**
test **
I played with a non greedy matcher like \*{1,2}(.*?)\n(.*?)\*{1,2} but this doesn't work either, so I hope someone can help.
You may use
/\*{1,2}\b([^*]*)\R([^*]*)\b\*{1,2}/
See the Rubular demo
Details
\*{1,2} - 1 or 2 asterisks
\b - a word boundary, the next char must be a word char
([^*]*) - Group 1: any 0+ chars other than *
\R - a line break sequence
([^*]*) - Group 2: any 0+ chars other than *
\b - a word boundary, the preceding char must be a word char
\*{1,2} - 1 or 2 asterisks
Wiktor already gave a good answer. Here is another way of doing it:
(?<=\*{1,2})([^*])*(?=\*{1,2})
Tested here
NOTE: This will not work in Ruby, but it can work in some other languages
From the link in this answer:
Subexp of look-behind must be fixed-width.
Related
My test configuration file(test_config.conf) looks as below
[DEFAULT]
system_name=
#test
flag=true
I want to read this and scan the value for key "system_name", with the expected output nil. I could have used config parser to read the contents, but using scan is my requirement.
I did:
File.read
Scan: file_data.scan(/^#{each}\s*=\s*(?!.*#)\s*(.*)/)
Regex: ^system_name\s*=\s*(?!.*#)\s*(.*)$
I used (?!.*#) to ignore the values that start with #.
It returns #test. Could someone help me understand why it does so, and how I can change my regex to make it work as expected?
It is another case of how backtracking confuses regex users. (?!.*#) negative lookahead must match a location that is not immediately followed with #. Since the preceding pattern part can match the string in various ways, once failed, the regex engine retries the quantified subpatterns. So, in your case, \s* matches 0 or more whitespaces. Once the regex engine matched all the whitespaces after =, it finds # - and fails. Then backtracks: tries to match zero whitespaces. And finds out that there is no # after =. And succeeds.
Use a possessive quantifier with \s*+ to disallow backtracking:
^system_name\s*=\s*+(?!#)(.*)$
^
See the Rubular demo. So, the lookahead will only be run once after all the 0+ whitespaces are matched. If it fails to match, the whole match will be failed right away.
Another way is to use [^\s#] negated character class:
^system_name\s*=\s*([^\s#].*)$
^^^^^^^
See another Rubular demo
Here, [^\s#] will only match a char that is not a whitespace, nor #, and then .* will match any 0+ chars other than line break chars.
As per the feedback inside comments, the structure of the input may be rather loose, and a key=value can follow the system_name line. In that case, you also need to make sure the text you capture does not actually start with some word chars followed with = sign:
/^system_name\s*=\s*+(?!#|\w+=)(.*)$/
See this Rubular demo
Full pattern details:
^ - start of a line
system_name - a literal substring
\s* - 0 or more whitespaces
= - an equal sign
\s*+ - 0 or more whitespaces with no backtracking into the pattern due to *+ possessive quantifier
(?!#|\w+=) - a negative lookahead that fails the match if the # or 1+ word chars and then = are found immediately to the right of the current location (that is right after the 0+ whitespaces)
(.*) - Group 1: any 0+ chars up to the end of the line
$ - end of a line.
A string must begin with 3 or 4 letters (not numbers), and a ":" symbol should follow these letters, and after the colon there should be three more characters, like AAA. For example, AAAA:AAA or AAA:AAA.
I`m starting to build this, but regex is so much pain for me, can anyone help me with this?
Here is what I have now:
^[a-zA-Z]{3,4}(:)$
Your regex is almost there: you need to add [a-zA-Z]{3}.
I prefer the [[:alpha:]] POSIX class in Ruby to match letters though.
/[[:alpha:]]/ - Alphabetic character
POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters.
So, here is a possible regex:
\A[[:alpha:]]{3,4}:[[:alpha:]]{3}\z
See demo
The regex matches:
\A - start of string (in RoR, you have to use \A instead of ^, or you will get errors)
[[:alpha:]]{3,4} - 3 or 4 letters
: - literal :
[[:alpha:]]{3} - 3 letters
\z - end of string (in RoR, you have to use \z instead of $, or you will get errors)
To allow just AAA or AAAA, you need to introduce an optional (? quantifier) non-capturing group ((?:...) construction):
\A[[:alpha:]]{3,4}(?::[[:alpha:]]{3})?\z
^^^ ^^
See another demo
Try using this (quotes if regex in your dialect must be passed as a string)
"^[a-zA-Z]{3,4}:[a-zA-Z]{3}$"
I want to capture any word between two colons. I tried with this (try on Rubular):
(\:.*\:)
Hello :name:
What are you doing today, :title:?
$:name:, have a lovely :event:.
It works except the last line it captures this:
Match 3
1. :name:, have a lovely :event:
It's getting tripped up by the second (closing) colon and the third (opening) colon. It should capture :name: and :event: individually on that last line.
You need a non-greedy regular expression:
(\:.*?\:)
The .*? will match the shortest possible string, whereas .* matches the longest string found.
For any word between two colons:
(?<=:)\b.*?\b(?=:)
Rubular link
(\:[^:]*\:)
[^:] means "anything but a ':'.
Please be aware that this expression will match "::" also.
Here is your rubular link updated: http://rubular.com/r/VtwhIqtbli.
I'm working on some text processing in Ruby 1.8.7 to support some custom shortcodes that I've created. Here are some examples of my shortcode:
[CODE first-part]
[CODE first-part second-part]
I'm using the following RegEx to grab the
text.gsub!( /\[CODE (\S+)\s?(\S?)\]/i, replacementText )
The problem is this: the regex doesn't work on the following text:
[CODE first-part][CODE first-part-again]
The results are as follows:
1. first-part][CODE
2. first-part-again
It seems that the \s? is the problematic part of the regex that is searching on until it hits the last space, not the first one. When I change the regex to the following:
\[CODE ([\w-]+)\s?(\S*)\]/i
It works fine. The only concern I have is what all \w vs \s as I want to make sure the \w will match URL-safe characters.
I'm sure there's a perfectly valid explanation, but it's eluding me. Any ideas? Thanks!
Actually, thinking about it, just using [^\]] might not be enough, as it will swallow up all spaces as well. You also need to exclude those:
/\[CODE[ ]([^\]\s]+)\s?([^\]\s]*)\]/i
Note the [ ] - I just think it makes literal spaces more readable.
Working demo.
Explained in free-spacing mode:
\[CODE[ ] # match your identifier
( # capturing group 1
[^\]\s]+ # match one or more non-], non-whitespace characters
) # end of group 1
\s? # match an optional whitespace character
( # capturing group 2
[^\]\s]+ # match zero or more non-], non-whitespace characters
) # end of group 2
\] # match the closing ]
As none of the character classes in the pattern includes ], you can never possibly go beyond the end of the square bracketed expression.
By the way, if you find unnecessary escapes in regex as obscuring as I do, here is the minimal version:
/\[CODE[ ]([^]\s]+)\s?([^]\s]*)]/i
But that is definitely a matter of taste.
The problem was with the greedy \S+ in this
/\[CODE (\S+)\s?(\S?)\]/i
You could try:
/\[CODE (\S+?)\s?(\S?)\]/i
but actually your new character class is IMO superiror.
Even better might be:
/\[CODE ([^\]]+?)\s?([^\]]*)\]/i
Hey I'm trying to use a regex to count the number of quotes in a string that are not preceded by a backslash..
for example the following string:
"\"Some text
"\"Some \"text
The code I have was previously using String#count('"')
obviously this is not good enough
When I count the quotes on both these examples I need the result only to be 1
I have been searching here for similar questions and ive tried using lookbehinds but cannot get them to work in ruby.
I have tried the following regexs on Rubular from this previous question
/[^\\]"/
^"((?<!\\)[^"]+)"
^"([^"]|(?<!\)\\")"
None of them give me the results im after
Maybe a regex is not the way to do that. Maybe a programatic approach is the solution
How about string.count('"') - string.count("\\"")?
result = subject.scan(
/(?: # match either
^ # start-of-string\/line
| # or
\G # the position where the previous match ended
| # or
[^\\] # one non-backslash character
) # then
(\\\\)* # match an even number of backslashes (0 is even, too)
" # match a quote/x)
gives you an array of all quote characters (possibly with a preceding non-quote character) except unescaped ones.
The \G anchor is needed to match successive quotes, and the (\\\\)* makes sure that backslashes are only counted as escaping characters if they occur in odd numbers before the quote (to take Amarghosh's correct caveat into account).