Regex match anything except ending string - ruby

I'm trying to make a regex that matches anything except an exact ending string, in this case, the extension '.exe'.
Examples for a file named:
'foo' (no extension) I want to get 'foo'
'foo.bar' I want to get 'foo.bar'
'foo.exe.bar' I want to get 'foo.exe.bar'
'foo.exe1' I want to get 'foo.exe1'
'foo.bar.exe' I want to get 'foo.bar'
'foo.exe' I want to get 'foo'
So far I created the regex /.*\.(?!exe$)[^.]*/
but it doesn't work for cases 1 and 6.

You can use a positive lookahead.
^.+?(?=\.exe$|$)
^ start of string
.+? non greedily match one or more characters...
(?=\.exe$|$) until literal .exe occurs at end. If not, match end.
See demo at Rubular.com

Wouldn't a simple replacement work?
string.sub(/\.exe\z/, "")

Do you mean regex matching or capturing?
There may be a regex only answer, but it currently eludes me. Based on your test data and what you want to match, doing something like the following would cover both what you want to match and capture:
name = 'foo.bar.exe'
match = /(.*).exe$/.match(name)
if match == nil
# then this filename matches your conditions
print name
else
# otherwise match[1] is the capture - filename without .exe extension
print match[1]
end

string pattern = #" (?x) (.* (?= \.exe$ )) | ((?=.*\.exe).*)";
First match is a positive look-ahead that checks if your string
ends with .exe. The condition is not included in the match.
Second match is a positive look-ahead with the condition included in the
match. It only checks if you have something followed by .exe.
(?x) is means that white spaces inside the pattern string are ignored.
Or don't use (?x) and just delete all white spaces.
It works for all the 6 scenarios provided.

Related

repeating regex to match mathematical symbol then number fails

I am trying to match mathematica expressions like 1+2 and 1*2/3.... to infinity. Can someone explain why my regex matches the final case below, and how to fix it so that it matches only valid expressions (that might stretch forever)?
perms=["12+2*4","2+2","-2+","12+34-"]
perms.each do |line|
puts "#{line}=#{eval(line)}" if line =~ /^\d+([+-\/*]\d+){1,}/
end
I expected the output to be:
12+2*4=20
2+2=4
Inside a [character set], the - character defines a range of characters -- think of [a-z] or [0-9]. If you want to match a literal -, it must be the first or last character.
/^\d+(?:[+\/*-]\d+)+$/
Other things: {1,} is exactly +; and you need to anchor at the end too, so you don't match 1+2+
You should finalize your expression with $ to match the entire input string:
/^\d+([-+\/*]\d+){1,}$/
The wrong position of the hyphen - is one source of error in your expression. The missing $ the other.

Ruby regex too greedy with back to back matches

I'm working on some text processing in Ruby 1.8.7 to support some custom shortcodes that I've created. Here are some examples of my shortcode:
[CODE first-part]
[CODE first-part second-part]
I'm using the following RegEx to grab the
text.gsub!( /\[CODE (\S+)\s?(\S?)\]/i, replacementText )
The problem is this: the regex doesn't work on the following text:
[CODE first-part][CODE first-part-again]
The results are as follows:
1. first-part][CODE
2. first-part-again
It seems that the \s? is the problematic part of the regex that is searching on until it hits the last space, not the first one. When I change the regex to the following:
\[CODE ([\w-]+)\s?(\S*)\]/i
It works fine. The only concern I have is what all \w vs \s as I want to make sure the \w will match URL-safe characters.
I'm sure there's a perfectly valid explanation, but it's eluding me. Any ideas? Thanks!
Actually, thinking about it, just using [^\]] might not be enough, as it will swallow up all spaces as well. You also need to exclude those:
/\[CODE[ ]([^\]\s]+)\s?([^\]\s]*)\]/i
Note the [ ] - I just think it makes literal spaces more readable.
Working demo.
Explained in free-spacing mode:
\[CODE[ ] # match your identifier
( # capturing group 1
[^\]\s]+ # match one or more non-], non-whitespace characters
) # end of group 1
\s? # match an optional whitespace character
( # capturing group 2
[^\]\s]+ # match zero or more non-], non-whitespace characters
) # end of group 2
\] # match the closing ]
As none of the character classes in the pattern includes ], you can never possibly go beyond the end of the square bracketed expression.
By the way, if you find unnecessary escapes in regex as obscuring as I do, here is the minimal version:
/\[CODE[ ]([^]\s]+)\s?([^]\s]*)]/i
But that is definitely a matter of taste.
The problem was with the greedy \S+ in this
/\[CODE (\S+)\s?(\S?)\]/i
You could try:
/\[CODE (\S+?)\s?(\S?)\]/i
but actually your new character class is IMO superiror.
Even better might be:
/\[CODE ([^\]]+?)\s?([^\]]*)\]/i

Why is my regular expression skipping the dot instead of matching the string before it?

I was trying to work out a regular expression in IRB and got some unexpected output.
The goal was to match everything up until the last dot in a FQDN.
So, for example, if I was trying to match the string "flowtechconsulting.com",
I started with the following:
s1.sub(/^(.*)\\./, "\\1") #=> "flowtechconsultingcom"
However, the sub function simply returned everything but the dot, instead of the first matching group.
If I add two matching groups it works:
s1.sub(/^(.*)\\.(.*)$/, "\\1") #=> "flowtechconsulting"
I'm just not sure why the first doesn't work. It seems like it should.
/^(.*)\./ only captures everything up to the dot. The "com" is not captured and thus not replaced in the substitution.
Forget about sub, and do something like:
"foo.bar.baz.com"[/(.*)(?:\.)/, 1]
# => "foo.bar.baz"

preg_match match up to an optional character then ignore remaining text

I have the following string format:
[[TEXT|TEXT]] <-- the "|TEXT" is optional
And so far what works fine is:
/([^\[]+)(?=\]\])/
Which will return:
TEXT
or
TEXT|TEXT
I want to match up to "|TEXT" if it has been included and only ever match the left side of "|" or "]]" depending on which is first.
Any suggestions?
UPDATE - if you only want to match what's in the [[]] and not including the |TEXT if it's there, try this:
([^|\[]+?)(?=(?:\]\]|\|))
Which is more-or-less your starting regex but it stops at a | or a ]].
Or if your regex flavour supports look-behinds you could do:
((?<=\[\[)([^\|]+?)(?=(?:\]\]|\|))
This just enforces that you match from [[ onwards. Not really necessary but a bit more unknown-content-proof.

count quotes in a string that do not have a backslash before them

Hey I'm trying to use a regex to count the number of quotes in a string that are not preceded by a backslash..
for example the following string:
"\"Some text
"\"Some \"text
The code I have was previously using String#count('"')
obviously this is not good enough
When I count the quotes on both these examples I need the result only to be 1
I have been searching here for similar questions and ive tried using lookbehinds but cannot get them to work in ruby.
I have tried the following regexs on Rubular from this previous question
/[^\\]"/
^"((?<!\\)[^"]+)"
^"([^"]|(?<!\)\\")"
None of them give me the results im after
Maybe a regex is not the way to do that. Maybe a programatic approach is the solution
How about string.count('"') - string.count("\\"")?
result = subject.scan(
/(?: # match either
^ # start-of-string\/line
| # or
\G # the position where the previous match ended
| # or
[^\\] # one non-backslash character
) # then
(\\\\)* # match an even number of backslashes (0 is even, too)
" # match a quote/x)
gives you an array of all quote characters (possibly with a preceding non-quote character) except unescaped ones.
The \G anchor is needed to match successive quotes, and the (\\\\)* makes sure that backslashes are only counted as escaping characters if they occur in odd numbers before the quote (to take Amarghosh's correct caveat into account).

Resources