I need to identified urls that are either "/fr/" or "/en/". (and only these two)
I'm looking for the good regular expression.
Of course it works if I write "/../", but it's too large.
Best I could find is "/[fe][rn]/" but it also take /fn/ and /er/.
Simply use a pipe in a group :
/(fr|en)/?
Edit: added the optional trailing slash
Related
My regexp behaves just like I want it to on http://regexr.com, but not like I want it in irb.
I'm trying to make a regular expression that will match the following:
A forward slash,
then 2 * any number of random characters (i.e. `.*`),
up to but not including another /
OR the end of the string (whichever comes first)
I'm sorry as that was probably unclear, but it's my best attempt at an English translation.
Here's my current attempt and hopefully that will give you a better idea of what I'm trying to do:
/(\/.*?(?=\/|$)){2}/
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar). I'm trying to do this using the command path.scan(-regex-).shift.
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar)
Ruby already has a class for handling paths, Pathname. You can use Pathname#relative_path_from to do what you want.
require 'pathname'
path = Pathname.new("/foo/bar/baz/bin/bash")
# Normally you'd use Pathname.getwd
cwd = Pathname.new("/foo/bar")
# baz/bin/bash
puts path.relative_path_from(cwd)
Regexes just invite problems, like assuming the path separator is /, not honoring escapes, and not dealing with extra /. For example, "//foo/bar//b\\/az/bin/bash". // is particularly common in code which joins together directories using paths.join("/") or "#{dir}/#{file}.
For completeness, the general way you match a single piece of a path is this.
%r{^(/[^/]+)}
That's the beginning of the string, a /, then 1 or more characters which are not /. Using [^/]+ means you don't have to try and match an optional / or end of string, a very useful technique. Using %r{} means less leaning toothpicks.
But this is only applicable to a canonicalized path. It will fail on //foo//b\\/ar/. You can try to fix up the regex to deal with that, or do your own canonicalization, but just use Pathname.
I have been struggling with the same issue for a while now and I could not find an good answer yet. I'm using rack-rewrite to add some url rewrite rules to my app's middleware stack.
I have the following rule:
r301 %r{^/([^(docs|help|legal|login|account|apps)])(.+)/$}, '$1'
Which is not working properly or as I would expect it. I have tried one of my previous question's answer, but neither that works, it actually generates an event more weird behaviour (it redirects to an url without the domain name, just to the path).
What I am trying to do is:
if user requests http://example.com/ or http://example.com/random-path/ I need the rewrite rule to strip the slash, thus the examples would become http://example.com respectively http://example.com/random-path;
if the requested paths matches any of the paths in the list docs|help|legal|login|account|apps, do not strip the slash at the end of the path if exists, but add a slash if it's not there
I tried with two rules, one that ignores the listed paths above and strips slashes and one that adds the slash if it hits something from the list and the slash after the path is not there:
r301 %r{^/([^(docs|help|legal|login|account|apps)])(.+)/$}, '/$1'
r301 %r{^/([(docs|help|legal|login|account|apps)])(.+)/$}, '/$1/'
How could I write a rule that would do that, or two rules, because what I tried it did not work?
You can do that like so:
r301 %r{^/((?!docs|help|legal|login|account|apps).+)/$}, '/$1'
r301 %r{^/((?=docs|help|legal|login|account|apps).+[^/])$}, '/$1/'
example 1
example 2
and some documentation on lookahead and lookbehind
EDIT: stray parentheses.
I'm trying to come up with a regex that will elegantly match everything in an URL AFTER the domain name, and before the first ?, the last slash, or the end of the URL, if neither of the 2 exist.
This is what I came up with but it seems to be failing in some cases:
regex = /[http|https]:\/\/.+?\/(.+)[?|\/|]$/
In summary:
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price/ should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price?id=2 should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
Please don't use Regex for this. Use the URI library:
require 'uri'
str_you_want = URI("http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price").path
Why?
See everything about this famous question for a good discussion of why these kinds of things are a bad idea.
Also, this XKCD really says why:
In short, Regexes are an incredibly powerful tools, but when you're dealing with things that are made from hundred page convoluted standards when there is already a library for doing it faster, easier, and more correctly, why reinvent this wheel?
If lookaheads are allowed
((2[0-9][0-9][0-9].*)(?=\?\w+)|(2[0-9][0-9][0-9].*)(?=/\s+)|(2[0-9][0-9][0-9].*).*\w)
Copy + Paste this in http://regexpal.com/
See here with ruby regex tester: http://rubular.com/r/uoLLvTwkaz
Image using javascript regex, but it works out the same
(?=) is just a a lookahead
I basically set up three matches from 2XXX up to (in this order):
(?=\?\w+) # lookahead for a question mark followed by one or more word characters
(?=/\s+) # lookahead for a slash followed by one or more whitespace characters
.*\w # match up to the last word character
I'm pretty sure that some parentheses were not needed but I just copy pasted.
There are essentially two OR | expressions in the (A|B|C) expression. The order matters since it's like a (ifthen|elseif|else) type deal.
You can probably fix out the prefix, I just assumed that you wanted 2XXX where X is a digit to match.
Also, save the pitchforks everyone, regular expressions are not always the best but it's there for you when you need it.
Also, there is xkcd (https://xkcd.com/208/) for everything:
I'm using Url Rewriter to create user-friendly URLs in my web app and have the following rule set up
<rewrite url="/(?!Default.aspx).+" to="/letterchain.aspx?ppc=$1"/>
How do I replace $1 so that it is the last part of the URL?
So that the following
www.mywebapp.com/hello
would transform to
/letterchain.aspx?ppc=hello
I've read the docs but can't find anything.
The $1 in the to portion of the group refers to the first capture group defined (eg the part in the brackets).
The part that you actually want injecting into the $1 is the .+ which isnt in a capture group.
I'm not sure but I think because of the (?! ) "match if suffix is absent" query this isnt counted as numbered capture group $1 so this should work:
<rewrite url="/(?!Default.aspx)(.+)" to="/letterchain.aspx?ppc=$1"/>
If it doesnt then just try inserting the second capture group into your to string instead:
<rewrite url="/(?!Default.aspx)(.+)" to="/letterchain.aspx?ppc=$2"/>
Please note that if you are developing for IIS 7+ http://www.iis.net/download/urlrewrite/ is a module from Microsoft that performs faster rewrites with lower footprint.
BTW, your regex has a small problem, you need to escape the dot character, that is "/(?!Default.aspx)(.+)"
can any body tell me how to use regex for negation of string?
I wanna find all line that start with public class and then any thing except first,second and finally any thing else.
for example in the result i expect to see public class base but not public class myfirst:base
can any body help me please??
Use a negative lookahead:
public\s+class\s+(?!first|second).+
If Peter is correct and you're using Visual Studio's Find feature, this should work:
^:b*public:b+class:b+~(first|second):i.*$
:b matches a space or tab
~(...) is how VS does a negative lookahead
:i matches a C/C++ identifier
The rest is standard regex syntax:
^ for beginning of line
$ for end of line
. for any character
* for zero or more
+ for one or more
| for alternation
Both the other two answers come close, but probably fail for different reasons.
public\s+class\s+(?:(?!first|second).)+
Note how there is a (non-capturing) group around the negative lookahead, to ensure it applies to more than just the first position.
And that group is less restrictive - since . excludes newline, it's using that instead of \S, and the $ is not necessary - this will exclude the specified words and match others.
No slashes wrapping the expression since those aren't required in everything and may confuse people that have only encountered string-based regex use.
If this still fails, post the exact content that is wrongly matched or missed, and what language/ide you are using.
Update:
Turns out you're using Visual Studio, which has it's own special regex implementation, for some unfathomable reason. So, you'll be wanting to try this instead:
public:b+class:b+~(first|second)+$
I have no way of testing that - if it doesn't work, try dropping the $, but otherwise you'll have to find a VS user. Or better still, the VS engineer(s) responsible for this stupid non-standard regex.
Here is something that should work for you
/public\sclass\s(?:[^fs\s]+|(?!first|second)\S)+(?=\s|$)/
The second look a head could be changed to a $(end of line) or another anchor that works for your particular use case, like maybe a '{'
Edit: Try changing the last part to:
(?=\s|$)