Modify regex to match expanded folder struction - ruby

I have a regex that I am using to search for files .ipa files in my '_inbox' folder. It works for files that are directly under that folder. But now I need to modify it to find files in subfolders.
current regex
%r{_inbox/[^/]+.ipa}i
matches
'_inbox/NewApplication.ipa'
does not match
'_inbox/Test/1/NewApplication.ipa'

I don't have the rep required to comment so I'll do my best to answer without. I think ruby supports lookahead, if it doesn't, this answer is partially invalid.
I think this regex should cover what you need, or at least be a good starting point:
_inbox/[^/](?!//)[\w\d\_\-\./]+?\.ipa
This RegEx will match a file path starting with _inbox/, unless there is another slash afterwards.
Next, it uses a negative lookahead (?!//) to ensure that the rest of the subject string doesn't contain two consecutive slashes. If it doesn't contain that, it makes sure that the rest of the string is made entirely of upper/lower case letters: \w, digits: \d, underscores, dashes, dots or forward slashes: \_\-\./. Finally, it checks that the path ends with the file extension: \.ipa.
Hope this helps.

You can match subfolders with (?:[^/]+/)*:
%r{_inbox/(?:[^/]+/)*[^/]+\.ipa}i
Please see demo.
Also, you'd better escape the dot to match the literal dot.

Why use a regex rather than Dir#glob? If the current directory contains the "_inbox" subdirectory, the following will return the array you want:
Dir.glob(File.join("**","_inbox","**","*.ipa"))

Related

Glob string pattern for one or more files

I need a pattern for one or more files, name of each will be known before the matching occurs (but I do not know what they are right now).
For example, one occurence could be two files: A.lkml and B.lkml, and another could be three: CDFDFDSADF.lkml, SD.lkml and R4545452.lkml. The filenames will be passed as a single argument with single space as separator (So for example 1, will see A.lkml B.lkml).
What I can be sure of:
all files end with .lkml
for each matching, I need to add a manifest.lkml into the list. For example, in example 1, the list should contain 3 instead of 2 filenames, A.lkml, B.lkml and manifest.lkml
What puzzles me is that glob pattern matching doesn't seem to be able to do logic "OR". I have tried to use ",", "|" to no avail. In my experiments I fixed the filenames but in reality they change each time.
Update: I think brace expression such as {a.lkml,manifest.lkml} should work. Somehow it doesn't pass.

Regex for matching everything before trailing slash, or first question mark?

I'm trying to come up with a regex that will elegantly match everything in an URL AFTER the domain name, and before the first ?, the last slash, or the end of the URL, if neither of the 2 exist.
This is what I came up with but it seems to be failing in some cases:
regex = /[http|https]:\/\/.+?\/(.+)[?|\/|]$/
In summary:
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price/ should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price?id=2 should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
Please don't use Regex for this. Use the URI library:
require 'uri'
str_you_want = URI("http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price").path
Why?
See everything about this famous question for a good discussion of why these kinds of things are a bad idea.
Also, this XKCD really says why:
In short, Regexes are an incredibly powerful tools, but when you're dealing with things that are made from hundred page convoluted standards when there is already a library for doing it faster, easier, and more correctly, why reinvent this wheel?
If lookaheads are allowed
((2[0-9][0-9][0-9].*)(?=\?\w+)|(2[0-9][0-9][0-9].*)(?=/\s+)|(2[0-9][0-9][0-9].*).*\w)
Copy + Paste this in http://regexpal.com/
See here with ruby regex tester: http://rubular.com/r/uoLLvTwkaz
Image using javascript regex, but it works out the same
(?=) is just a a lookahead
I basically set up three matches from 2XXX up to (in this order):
(?=\?\w+) # lookahead for a question mark followed by one or more word characters
(?=/\s+) # lookahead for a slash followed by one or more whitespace characters
.*\w # match up to the last word character
I'm pretty sure that some parentheses were not needed but I just copy pasted.
There are essentially two OR | expressions in the (A|B|C) expression. The order matters since it's like a (ifthen|elseif|else) type deal.
You can probably fix out the prefix, I just assumed that you wanted 2XXX where X is a digit to match.
Also, save the pitchforks everyone, regular expressions are not always the best but it's there for you when you need it.
Also, there is xkcd (https://xkcd.com/208/) for everything:

Regular expression help to skip first occurrence of a special character while allowing for later special chars but no whitespace

I'm looking for words starting with a hashtag: "#yolo"
My regex for this was very simple: /#\w+/
This worked fine until I hit words that ended with a question mark: "#yolo?".
I updated my regex to allow for words and any non whitespace character as well: /#[\w\S]*/.
The problem is I sometimes need to pull a match from a word starting with two '#' characters, up until whitespace, that may contain a special character in it or at the end of the word (which I need to capture).
Example:
"##yolo?"
And I would like to end up with:
"#yolo?"
Note: the regular expressions are for Ruby.
P.S. I'm testing these out here: http://rubular.com/
Maybe this would work
#(#?[\S]+)
What about
#[^#\s]+
\w is a subset of ^\s (i.e. \S) so you don't need both. Also, I assume you don't want any more #s in the match, so we use [^#\s] which negates both whitespace and # characters.

Multi-Line Regex: Find A where B is absent

I have been looking through a lot on Regex lately and have seen a lot of answers involving the matching of one word, where a second word is absent. I have seen a lot of Regex Examples where I can have a Regex search for a given word (or any more complex regex in its place) and find where a word is missing.
It seems like the works very well on a line by line basis, but after including the multi-line mode it still doesn't seem to match properly.
Example: Match an entire file string where the word foo is included, but the word bar is absent from the file. What I have so far is (?m)^(?=.*?(foo))((?!bar).)*$ which is based off the example link. I have been testing with a Ruby Regex tester, but I think it is a open ended regex problem/question. It seems to match smaller pieces, I would like to have it either match/not match on the entire string as one big chunk.
In the provided example above, matches are found on a line by line basis it seems. What changes need to be made to the regex so it applies over the ENTIRE string?
EDIT: I know there are other more efficient ways to solve this problem that doesn't involve using a regex. I am not looking for a solution to the problem using other means, I am asking from a theoretical regex point of view. It has a multi-line mode (which looks to "work"), it has negative/positive searching which can be combined on a line by line basis, how come combining these two principals doesn't yield the expected result?
Sawa's answer can be simplified, all that's needed is a positive lookahead, a negative lookahead, and since you're in multiline mode, .* takes care of the rest:
/(?=.*foo)(?!.*bar).*/m
Multiline means that . matches \n also, and matches are greedy. So the whole string will match without the need for anchors.
Update
#Sawa makes a good point for the \A being necessary but not the \Z.
Actually, looking at it again, the positive lookahead seems unnecessary:
/\A(?!.*bar).*foo.*/m
A regex that matches an entire string that does not include foo is:
/\A(?!.*foo.*).*\z/m
and a regex that matches from the beginning of an entire string that includes bar is:
/\A.*bar/m
Since you want to satisfy both of these, take a conjunction of these by putting one of them in a lookahead:
/\A(?=.*bar)(?!.*foo.*).*\z/m

Regex to match all characters before file extension

I'm using this Windows application to batch rename a bunch of images. The application supports Regex, so I'm looking for an expression that will match everything (letters, numbers, hyphens, anything) before my file extension.
Thanks!
Not quite enough information given in the question, but this is probably what you want:
([^/\\]+)(\.[^/\\]+?)?
The first capture group will contain your file's basename and the second capture group will contain the extension, including the '.' character, if it exists.
You can reference the two capture groups in the 'Replace' section with $1 and $2.

Resources