Regex to see if directory begins with name - ruby

I was using some code such as the following in my Ruby script:
if File.dirname(path) =~ /^www\.example\.com\/foo/
And this works great when a file is only one subdirectory deep underneath /foo, but unfortunately the condition would fail if the file was underneath say /foo/bar. My question is, what can the regex above be modified to so that File.dirname will match any file that's underneath at minimum the condition set above and not just one level deep?

This is one of those cases where I'd eschew a regex entirely:
if path.split(File::SEPARATOR)[0,2] == ['www.example.com','foo']
More readable, no escaping needed.

Try File.fnmatch, it uses some matching patterns (similar but not regex), for your case we could use:
**foo**
Matches all files with path including a directory called foo
File.fnmatch('**foo**','foo/test.txt')
#> true
File.fnmatch('**foo**','/boo/foo/test.txt')
#> true
File.fnmatch('**foo**','/boo/test.txt')
#> false

Related

Check if string is a glob pattern

On the input I have string that can be plain path string (e.g. /home/user/1.txt) or glob pattern (e.g. /home/user/*.txt).
Next I want to get array of matches if string is glob pattern and in case when string is just plain path I want to get array with single element - this path.
So somehow I should check if string contains unescaped glob symbols and if it does then call Pathname.glob() to get matches otherwise just return array with this string.
How can I check if string is a glob pattern?
UPDATE
I had this question while implementing homebrew cask glob pattern support for zap stanza.
And the solution that I used is to made a little refactoring to avoid need to check if string is a glob pattern.
Next I want to get array of matches if string is glob pattern and in case when string is just plain path I want to get array with single element - this path.
They're both valid glob patterns. One contains a wildcard, one does not. Run them both through Pathname.glob() and you'll always get an array back. Bonus, it'll check if it matches anything.
$ irb
2.3.3 :001 > require "pathname"
=> true
2.3.3 :002 > Pathname.glob("test.data")
=> [#<Pathname:test.data>]
2.3.3 :003 > Pathname.glob("test.*")
=> [#<Pathname:test.asm>, #<Pathname:test.c>, #<Pathname:test.cpp>, #<Pathname:test.csv>, #<Pathname:test.data>, #<Pathname:test.dSYM>, #<Pathname:test.html>, #<Pathname:test.out>, #<Pathname:test.php>, #<Pathname:test.pl>, #<Pathname:test.py>, #<Pathname:test.rb>, #<Pathname:test.s>, #<Pathname:test.sh>]
2.3.3 :004 > Pathname.glob("doesnotexist")
=> []
This is a great way to normalize and validate your data early, so the rest of the program doesn't have to.
If you really want to figure out if something is a literal path or a glob, you could try scanning for any special glob characters, but that rapidly gets complicated and error prone. It requires knowing how glob works in detail and remembering to check for quoting and escaping. foo* has a glob pattern. foo\* does not. foo[123] does. foo\[123] does not. And I'm not sure what foo[123\] is doing, I think it counts as a non-terminated set.
In general, you want to avoid writing code that has to reproduce the inner workings of another piece of code. If there was a Pathname.has_glob_chars you could use that, but there isn't such a thing.
Pathname.glob uses File.fnmatch to do the globbing and you can use that without touching the filesystem. You might be able to come up with something using that, but I can't make it work. I thought maybe only a literal path will match itself, but foo* defeats that.
Instead, check if it exists.
Pathname.new(path).exist?
If it exists, it was a real path to a real file. If it didn't exist, it might have been a real path, or it might be a glob. That's probably good enough.
You can also check by looking to see if Pathname.glob(path) returned a single element that matches the original path. Note that when matching paths it's important to normalize both sides with cleanpath.
paths = Pathname.glob(path)
if paths.size == 1 && paths[0].cleanpath == Pathname.new(path).cleanpath
puts "#{path} is a literal path"
elsif paths.size == 0
puts "#{path} matched nothing"
else
puts "#{path} was a glob"
end

Regular expression help: how to ignore every path that isn’t a CSS file

I have a CSS framework submodule in my Git repo that includes a bunch of README, component.json and other files. I don’t want to modify or delete the files because I’d imagine it’d cause problems when updates are pushed to the submodule. Yet Middleman wants to process them.
I currently have this in my config.rb file:
# Ignore everything that's not a CSS file inside inuit.css
ignore 'css/inuit.css/*.html'
ignore 'css/inuit.css/*.json'
ignore 'css/inuit.css/LICENSE'
How could I express this with a file pattern or a regex?
I’m not familiar with Middleman, but doesn’t this work?
ignore /^css\/inuit\.css\/.*(?<![.]css)$/
Since ignore can take a regex, pass ignore a Ruby regex // instead of a string "" with a filename glob. In the regex, use negative lookahead (?!) and the end-of-string anchor $ to check that the filename doesn’t end in “.css”.
ignore /^ css\/inuit\.css\/ (?: [^.]+ | .+ \. (?!css) \w+ ) $/ix
This regex correctly handles all of these test cases:
Should match:
css/inuit.css/abc.html
css/inuit.css/thecssthing.json
css/inuit.css/sub/in_a_folder.html
css/inuit.css/sub/crazily.named.css.json
css/inuit.css/sub/crazily.css.named.json
css/inuit.css/LICENSE
Shouldn’t match:
css/inuit.css/realcss.css
css/inuit.css/main.css
css/inuit.css/sub/in_a_folder.css
css/inuit.css/sub/crazily.css.named.css
css/inuit.css/sub/crazily.named.css.css
The first alternation of the (?:) non-capturing group handles the case of files with no extension (no “.”). Otherwise, the second case checks that the last “.” in the path is not followed by “css”, which would indicate a “.css” extension.
I use the x flag to ignore whitespace in the regex, so that I can add spaces in the regex to make it clearer.

why doesn't *.abc match a file named .abc?

I thought I understood wildcards, till this happened to me. Essentially, I'm looking for a wild card pattern that would return all files that are not named .gitignore. I came up with this, which seems to work for all cases I could conjure:
ls *[!{gitignore}]
To really validate if this works, I thought I'd negate the expression and see if it returns the file named .gitignore (actually any file that ended with gitignore; so 1.gitignore should also be returned). To that effect, I thought the negated expression would be:
ls *[{gitignore}]
However, this expression doesn't return a files named .gitignore (although it returns a file named 1.gitignore).
Essentially, my question, after simplification, boils down to:
Why doesn't *.abc match a file that is named .abc
I think I can take it from there.
PS:
I am working on Mac OSX Lion (10.7.4)
I wanted to add a clause to .gitignore such that I would ignore every file, except .gitignore in a given folder. So I ended up adding * in the .gitignore file. Result was, git ended up ignoring .gitignore :)
From the numerous searches I've made on google - Use the asterisk character (*) to represent zero or more characters.
I assume you're using Bash. From the Bash manual:
When a pattern is used for filename expansion, the character ‘.’ at the start of a filename or immediately following a slash must be matched explicitly, unless the shell option dotglob is set.
.gitignore patterns, however, are treated differently:
Otherwise, git treats the pattern as a shell glob suitable for consumption by fnmatch(3) with the FNM_PATHNAME flag: wildcards in the pattern will not match a / in the pathname.
According to the fnmatch(3) docs, a leading dot has to be explicitly matched only if the FNM_PERIOD flag is set, so *gitignore as a gitignore pattern would match .gitignore.
There is an easier way to accomplish this, though. To have .gitignore ignore everything except .gitignore:
*
!.gitignore
If you want to ignore everything except the gitignore file, use this as the file:
*
!.gitignore
Lines starting with an exclamation point are interpreted as exceptions.

What does two asterisks together in file path mean?

What does the following file path mean?
$(Services_Jobs_Drop_Path)\**\*.config
The variable just holds some path, nothing interesting. I'm a lot more concerned, what the hell the ** mean.
Any ideas?
P.S. The following path is used in msbuild scripts, if it helps.
\**\ This pattern is often used in Copy Task for recursive folder tree traversal. Basically it means that all files with extension config would be processed from the all subdirectories of $(Services_Jobs_Drop_Path) path.
MSDN, Using Wildcards to Specify Items:
You can use the **, *, and ? wildcard characters to specify a group of
files as inputs for a build instead of listing each file separately.
The ? wildcard character matches a single character.
The * wildcard character matches zero or more characters.
The ** wildcard character sequence matches a partial path.
MSDN, Specifying Inputs with Wildcards
To include all .jpg files in the Images directory and subdirectories
Use the following Include attribute:
Include="Images\**\*.jpg"

Regular expression to match only the first file in a RAR file set

To see what file to invoke the unrar command on, one needs to determine which file is the first in the file set.
Here are some sample file names, of which - naturally - only the first group should be matched:
yes.rar
yes.part1.rar
yes.part01.rar
yes.part001.rar
no.part2.rar
no.part02.rar
no.part002.rar
no.part011.rar
One (limited) way to do it with PCRE compatible regexps is this:
.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)
This did not work in Ruby when I tested it at Rejax however.
How would you write one Ruby compatible regular expression to match only the first file in a set of RAR files?
Don't rely on the names of the files to determine which one is first. You're going to end up finding an edge case where you get the wrong file.
RAR's headers will tell you which file is the first on in the volume, assuming they were created in a somewhat-recent version of RAR.
HEAD_FLAGS Bit flags:
2 bytes
0x0100 - First volume (set only by RAR 3.0 and later)
So open up each file and examine the RAR headers, looking specifically for the flag that indicates which file is the first volume. This will never fail, as long as the archive isn't corrupt. I have done my own tests with spanning RAR archives and their headers are correct according to the link above.
This is a much, much safer way of determining which file is first in a set like this.
The short answer is that it's not possible to construct a single regex to satisfy your problem. Ruby 1.8 does not have lookaround assertions (the (?<! stuff in your example regex) which is why your regex doesn't work. This leaves you with two options.
1) Use more than one regex to do it.
def is_first_rar(filename)
if ((filename =~ /part(\d+)\.rar$/) == nil)
return (filename =~ /\.rar$/) != nil
else
return $1.to_i == 1
end
end
2) Use the regex engine for ruby 1.9, Oniguruma. It supports lookaround assertions, and you can install it as a gem for ruby 1.8. After that, you can do something like this:
def is_first_rar(filename)
reg = Oniguruma::ORegexp.new('.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)')
match = reg.match(filename)
return match != nil
end
Personally I wouldn't use (extended) regular expressions in this case (or at least not just one to do it all). What's wrong with coding this in, for example, a few ifs?
I am no regex expert but here is my attempt
^(yes|no)\.(rar|part0*1\.rar)$
Replace "yes|no" with the actual file name. I matched it against your examples to see if it would only match the first set hence the "yes|no" in the regex.
UPDATE: fixed as per the comment. Not sure why the user would not know the filename so i did not fix that part...

Resources