How the wildcard ** work within fnmatch? - ruby

I notice an surprising behavior of the fnmatch function of Ruby:
File.fnmatch('**.rb', 'main.rb') #=> true
File.fnmatch('**.rb', './main.rb') #=> false
As far as being explained in the the Ruby reference, ** will:
Matches directories recursively or files expansively.
So why doesn't it expands and matches ./main.rb?

This behavior is actually documented, but it's easy to miss. Buried in the examples it says:
wildcard doesn't match leading period by default.
To enable this behavior, you need to specify the File::FNM_DOTMATCH flag:
File.fnmatch('**.rb', './main.rb', File::FNM_DOTMATCH)
=> true

Related

Ruby frozen string comment syntax difference

There are two ways of enabling all strings in a file to be implicitly frozen1.
# frozen-string-literal: true
# frozen_string_literal: true
Is there a difference between these two syntaxes?
Thanks!
The answer you link to never uses the magic comment # frozen-string-literal: true only # frozen_string_literal: true. The difference is that only the latter will work.
The other way to enable frozen string literals is to run the application with the --enable=frozen-string-literal flag.

Check if string is a glob pattern

On the input I have string that can be plain path string (e.g. /home/user/1.txt) or glob pattern (e.g. /home/user/*.txt).
Next I want to get array of matches if string is glob pattern and in case when string is just plain path I want to get array with single element - this path.
So somehow I should check if string contains unescaped glob symbols and if it does then call Pathname.glob() to get matches otherwise just return array with this string.
How can I check if string is a glob pattern?
UPDATE
I had this question while implementing homebrew cask glob pattern support for zap stanza.
And the solution that I used is to made a little refactoring to avoid need to check if string is a glob pattern.
Next I want to get array of matches if string is glob pattern and in case when string is just plain path I want to get array with single element - this path.
They're both valid glob patterns. One contains a wildcard, one does not. Run them both through Pathname.glob() and you'll always get an array back. Bonus, it'll check if it matches anything.
$ irb
2.3.3 :001 > require "pathname"
=> true
2.3.3 :002 > Pathname.glob("test.data")
=> [#<Pathname:test.data>]
2.3.3 :003 > Pathname.glob("test.*")
=> [#<Pathname:test.asm>, #<Pathname:test.c>, #<Pathname:test.cpp>, #<Pathname:test.csv>, #<Pathname:test.data>, #<Pathname:test.dSYM>, #<Pathname:test.html>, #<Pathname:test.out>, #<Pathname:test.php>, #<Pathname:test.pl>, #<Pathname:test.py>, #<Pathname:test.rb>, #<Pathname:test.s>, #<Pathname:test.sh>]
2.3.3 :004 > Pathname.glob("doesnotexist")
=> []
This is a great way to normalize and validate your data early, so the rest of the program doesn't have to.
If you really want to figure out if something is a literal path or a glob, you could try scanning for any special glob characters, but that rapidly gets complicated and error prone. It requires knowing how glob works in detail and remembering to check for quoting and escaping. foo* has a glob pattern. foo\* does not. foo[123] does. foo\[123] does not. And I'm not sure what foo[123\] is doing, I think it counts as a non-terminated set.
In general, you want to avoid writing code that has to reproduce the inner workings of another piece of code. If there was a Pathname.has_glob_chars you could use that, but there isn't such a thing.
Pathname.glob uses File.fnmatch to do the globbing and you can use that without touching the filesystem. You might be able to come up with something using that, but I can't make it work. I thought maybe only a literal path will match itself, but foo* defeats that.
Instead, check if it exists.
Pathname.new(path).exist?
If it exists, it was a real path to a real file. If it didn't exist, it might have been a real path, or it might be a glob. That's probably good enough.
You can also check by looking to see if Pathname.glob(path) returned a single element that matches the original path. Note that when matching paths it's important to normalize both sides with cleanpath.
paths = Pathname.glob(path)
if paths.size == 1 && paths[0].cleanpath == Pathname.new(path).cleanpath
puts "#{path} is a literal path"
elsif paths.size == 0
puts "#{path} matched nothing"
else
puts "#{path} was a glob"
end

Regex to see if directory begins with name

I was using some code such as the following in my Ruby script:
if File.dirname(path) =~ /^www\.example\.com\/foo/
And this works great when a file is only one subdirectory deep underneath /foo, but unfortunately the condition would fail if the file was underneath say /foo/bar. My question is, what can the regex above be modified to so that File.dirname will match any file that's underneath at minimum the condition set above and not just one level deep?
This is one of those cases where I'd eschew a regex entirely:
if path.split(File::SEPARATOR)[0,2] == ['www.example.com','foo']
More readable, no escaping needed.
Try File.fnmatch, it uses some matching patterns (similar but not regex), for your case we could use:
**foo**
Matches all files with path including a directory called foo
File.fnmatch('**foo**','foo/test.txt')
#> true
File.fnmatch('**foo**','/boo/foo/test.txt')
#> true
File.fnmatch('**foo**','/boo/test.txt')
#> false

Regex that matches files that don't end in test.js, but ends in .js

Here is a list of sample strings to match against:
These should be matched:
hello.js
test.hello.js
test.js.js
test.js.hello.js
test.js.test.js.test.js.dss.js
test.js.dss.js
test.js.js.js.js.dummy.js
These should not be matched:
fun.css
test.js
fun.test.js
test.test.js
test.js.test.js
hell.cs
test.test.js
A simple option is to use a negative lookahead to check the line does not end with .test.js:
^(?!.*\.test\.js$).*\.js$
That one doesn't cover test.js, which is a special case, but you can use ^(?!(?:^|.*\.)test\.js$).*\.js$ to get around it (or other similar options).
Working example: http://www.rubular.com/r/UVvRZMcnjl
If your flavor supports lookbehinds (JavaScript and old Ruby don't, but Ruby 1.9.2 does), you may use:
^.*(?<!^test|\.test)\.js$
Working example: http://www.rubular.com/r/lBCphNYEeS
Since the question tags were changed from javascript to ruby, use a simple lookbehind:
.*(?<!test).js$
See it here in action: http://rubular.com/r/yVH2xfsRqw

How do I test whether a string would match a glob in Ruby?

Without hitting the filesystem, is it possible to see whether the glob "foo*" would match "food" in Ruby?
Background: one of my scripts produce files, and I'd like to unit test that other scripts would be able to detect such files with their current glob.
Yes, it is possible using the fnmatch method:
File.fnmatch("foo*", "food") #=> true

Resources