My company uses FreeBSD, and therefore FreeBSD's flavor of make.
A few of our in-house ports include something like this (where BRANCH is something that came from an SVN URL, either 'trunk' or a branch name like 'branches/1.2.3').
PORTVERSION= ${BRANCH:C,^branches/,,}
The Variable modifiers section of make(1) documents the :C colon-c modifier as
:C/pattern/replacement/[1gW]
Am I looking at the right documentation? ^branches/ looks like a regex pattern to me, but it looks like the actual code uses , instead of / as a separator. Did I skip documentation explaining that?
The documentation says:
:C/pattern/replacement/[1gW]
The :C modifier is just like the :S modifier except that the old and new strings, instead of being simple strings, are an extended regular expression (see regex(3)) string pattern and an ed(1)-style string replacement.
and in :S:
Any character may be used as a delimiter for the parts of the modifier string.
As #MadScientist pointed out, it's quite common to use a different delimiter, especially when / is a part of pattern or replacement string, like in your case. Otherwise it would require escaping and would look like ${BRANCH:C/^branches\///} which seems less readable.
Related
Basically, I want to check if a string (main) starts with another string (sub), using both of the above methods. For example, following is my code:
main = gets.chomp
sub = gets.chomp
p main.start_with? sub
p main[/^#{sub}/]
And, here is an example with I/O - Try it online!
If I enter simple strings, then both of them works exactly the same, but when I enter strings like "1\2" in stdin, then I get errors in the Regexp variant, as seen in TIO example.
I guess this is because of the reason that the string passed into second one isn't raw. So, I tried passing sub.dump into second one - Try it online!
which gives me nil result. How to do this correctly?
As a general rule, you should never ever blindly execute inputs from untrusted sources.
Interpolating untrusted input into a Regexp is not quite as bad as interpolating it into, say, Kernel#eval, because the worst thing an attacker can do with a Regexp is to construct an Evil Regex to conduct a Regular expression Denial of Service (ReDoS) attack (see also the section on Performance in the Regexp documentation), whereas with eval, they could execute arbitrary code, including but not limited to, deleting the entire file system, scanning memory for unencrypted passwords / credit card information / PII and exfiltrate that via the network, etc.
However, it is still a bad idea. For example, when I say "the worst thing that happen is a ReDoS", that assumes that there are no bugs in the Regexp implementation (Onigmo in the case of YARV, Joni in the case of JRuby and TruffleRuby, etc.) Ruby's Regexps are quite powerful and thus Onigmo, Joni and co. are large and complex pieces of code, and may very well have their own security holes that could be used by a specially crafted Regexp.
You should properly sanitize and escape the user input before constructing the Regexp. Thankfully, the Ruby core library already contains a method which does exactly that: Regexp::escape. So, you could do something like this:
p main[/^#{Regexp.escape(sub)}/]
The reason why your attempt at using String#dump didn't work, is that String#dump is for representing a String the same way you would have to write it as a String literal, i.e. it is escaping String metacharacters, not Regexp metacharacters and it is including the quote characters around the String that you need to have it recognized as a String literal. You can easily see that when you simply try it out:
sub.dump
#=> "\"1\\\\2\""
# equivalent to '"1\\2"'
So, that means that String#dump
includes the quotes (which you don't want),
escapes characters that don't need escaping in Regexp just because they need escaping in Strings (e.g. # or "), and
doesn't escape characters that don't need escaping in Strings (e.g. [, ., ?, *, +, ^, -).
My regexp behaves just like I want it to on http://regexr.com, but not like I want it in irb.
I'm trying to make a regular expression that will match the following:
A forward slash,
then 2 * any number of random characters (i.e. `.*`),
up to but not including another /
OR the end of the string (whichever comes first)
I'm sorry as that was probably unclear, but it's my best attempt at an English translation.
Here's my current attempt and hopefully that will give you a better idea of what I'm trying to do:
/(\/.*?(?=\/|$)){2}/
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar). I'm trying to do this using the command path.scan(-regex-).shift.
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar)
Ruby already has a class for handling paths, Pathname. You can use Pathname#relative_path_from to do what you want.
require 'pathname'
path = Pathname.new("/foo/bar/baz/bin/bash")
# Normally you'd use Pathname.getwd
cwd = Pathname.new("/foo/bar")
# baz/bin/bash
puts path.relative_path_from(cwd)
Regexes just invite problems, like assuming the path separator is /, not honoring escapes, and not dealing with extra /. For example, "//foo/bar//b\\/az/bin/bash". // is particularly common in code which joins together directories using paths.join("/") or "#{dir}/#{file}.
For completeness, the general way you match a single piece of a path is this.
%r{^(/[^/]+)}
That's the beginning of the string, a /, then 1 or more characters which are not /. Using [^/]+ means you don't have to try and match an optional / or end of string, a very useful technique. Using %r{} means less leaning toothpicks.
But this is only applicable to a canonicalized path. It will fail on //foo//b\\/ar/. You can try to fix up the regex to deal with that, or do your own canonicalization, but just use Pathname.
In Perl, you can do this:
(?x)
(?(DEFINE)
(?<animal>dog|cat)
)
(?&animal)
In Ruby (Oniguruma engine), it seems that the (?(DEFINE... syntax is not supported. Also, (?&... becomes \g. So, you can do this:
(?x)
(?<animal>dog|cat)
\g<animal>
But of course, this is not equivalent to the Perl example I gave above, becuase the first (?<animal>dog|cat) is not ignored, since there isn't anything like (?(DEFINE....
If I want to define a large regex with a bunch of named subroutines, what I could once do in Perl can't be done this way.
It does seem that I could hack together a pretty awkward solution by doing something like this:
(?x)
(?:^$DEFINE
(?<animal>dog|cat)
){0}
\g<animal>
But, that is pretty hackish. Is there a better way to do this? Does Oniguruma support a way to define named subroutines without having to try to "match" them first?
Alternatively, if there is a way to get true PCRE to work in Ruby, with ?(DEFINE... and (?&... I'd take that too.
Thanks!
You don't need a so complicated hack. Writing:
(?x)
(?<animal>dog|cat){0}
(?<color>red|green|blue){0}
...
your main pattern here
does exactly the same.
Putting all group definitions inside (?:^$DEFINE ... ){0} is only cosmetic.
Note that a group with the quantifier {0} isn't tried at all (the quantifier is taken in account first), and if in this way the named group is defined anyway, man can deduce that it isn't really a hack, but the way to do it with oniguruma.
I am trying to grasp the concept of Regular Expressions but seem to be missing something.
I want to ensure that someone enters a string that ends with .wav in a field. Should be a pretty simple Regular Expression.
I've tried this...
[RegularExpression(#"$.wav")]
but seem to be incorrect. Any help is appreciated. Thanks!
$ is the anchor for the end of the string, so $.wav doesn't make any sense. You can't have any characters after the end of the string. Also, . has a special meaning for regex (it just means 'any character') so you need to escape it.
Try writing
\.wav$
If that doesn't work, try
.*\.wav$
(It depends on if the RegularExpression attribute wants to match the whole string, or just a part of it. .* means 'any character, 0 or more times')
Another thing you should consider is what to do with extra whitespace in the field. Users have a terrible habit of adding extra white space in inputs - its why various .Trim() functions are so important. Here, RegularExpressionAttribute might be evaluated before you can trim the input, so you might want to write this:
.*\.wav[\s]*$
The [\s]* section means 'any whitespace character (tabs, space, linebreak, etc) 0 or more times'.
You should read a tutorial on regex. It's not so hard to understand for simple problems like this. When I was learning I found this site pretty handy: http://www.regular-expressions.info/
can any body tell me how to use regex for negation of string?
I wanna find all line that start with public class and then any thing except first,second and finally any thing else.
for example in the result i expect to see public class base but not public class myfirst:base
can any body help me please??
Use a negative lookahead:
public\s+class\s+(?!first|second).+
If Peter is correct and you're using Visual Studio's Find feature, this should work:
^:b*public:b+class:b+~(first|second):i.*$
:b matches a space or tab
~(...) is how VS does a negative lookahead
:i matches a C/C++ identifier
The rest is standard regex syntax:
^ for beginning of line
$ for end of line
. for any character
* for zero or more
+ for one or more
| for alternation
Both the other two answers come close, but probably fail for different reasons.
public\s+class\s+(?:(?!first|second).)+
Note how there is a (non-capturing) group around the negative lookahead, to ensure it applies to more than just the first position.
And that group is less restrictive - since . excludes newline, it's using that instead of \S, and the $ is not necessary - this will exclude the specified words and match others.
No slashes wrapping the expression since those aren't required in everything and may confuse people that have only encountered string-based regex use.
If this still fails, post the exact content that is wrongly matched or missed, and what language/ide you are using.
Update:
Turns out you're using Visual Studio, which has it's own special regex implementation, for some unfathomable reason. So, you'll be wanting to try this instead:
public:b+class:b+~(first|second)+$
I have no way of testing that - if it doesn't work, try dropping the $, but otherwise you'll have to find a VS user. Or better still, the VS engineer(s) responsible for this stupid non-standard regex.
Here is something that should work for you
/public\sclass\s(?:[^fs\s]+|(?!first|second)\S)+(?=\s|$)/
The second look a head could be changed to a $(end of line) or another anchor that works for your particular use case, like maybe a '{'
Edit: Try changing the last part to:
(?=\s|$)