Here are two ways to use glob to recursively list directories:
Dir.glob("**/*/")
Dir.glob("**/")
The output appears to be the same, at least for a small subtree. Is there a difference between those two commands I am missing out on?
The ** matches 0 or more directories. By placing a * at the end you remove directories in the root, essentially making it 1 or more:
a = Dir.glob('/tmp/**/*/').sort
b = Dir.glob('/tmp/**/').sort.size
b.size => 19
a.size => 18
b - a => ["/tmp/"]
Without a leading constant path though, it doesn't look like there is a difference as 0 length matches aren't interesting and don't get put in the results.
In that case no there isn't.
But, there are cases where that type of distinction can be important. If the patterns were instead **/* and **/*/* to recursively match files rather than directories, the first one would include files in the current directory while the latter would only list files that were at least one level down from the current directory since the /*/ in the middle has to match something.
Related
I'm stuck with this problem:
List all files with ls that have at least two vowels in any position and
end with .c, .h or .s
I have come with a partial solution : ls *{a,e,i,o,u}*.[chs]
But obviously this does not fulfill the problem requirements because it list all files that have any numbers of vowel, not two or more.
I'd use a character class for the vowels, too
ls *[aeiou]*[aeiou]*.[chs]
Using the bracket expansion is possible, too, but some files are then listed multiple times:
ls *{a,e,i,o,u}*{a,e,i,o,u}*.[chs]
I have in a directory a bunch of files. Each file's basename ends with a two digit number and a letter, such as file_01A.txt, file_03B.txt, file_13A.txt.
In a terminal using bash (I assume, working on a mac osx) I use
ls *01*[AB]*.txt
returns all files such as 01A and 01B. This makes sense to me.
ls *02*[AB]*.txt
returns similarly all files such as 02A and 02B.
Now I want to return all files 01A, 01B, 02A, 02B. Hence I want something like:
ls *(01 or 02)*[AB]*.txt
Attempt 1: I tried with | but that throws an error.
Attempt 2: ls *[01,02]*[AB]*.tex but that gives the 03 files too, since I assume it is interpreting the 01 and 02 as individual matches.
Attempt 3: ls *["01","02"]*[AB]*.tex is the same again.
It's not hard to articulate a single wildcard which matches your requirement.
ls *0[12]*[AB]*.tex
In the general case, use multiple wildcards if you can't articulate a single one. Notice that the shell expands them in the order you write them, and if they both match some files, there will be duplicates in the expansion.
ls *01*[AB]*.tex *02*[AB]*.tex
You seem to be confused about what the metacharaters mean. * matches any string, ? matches any character, and [abc] matches any one character which is listed between the square brackets. [!abc] watches a single character which is not a, b, or c. Bash also supports an extension called brace expansion, where foo{bar,quux} is basically an abbreviation of foobar fooquux. Your attempt could thus be rearticulated as
ls *{01,02}*[AB].tex
though the repeated prefix 0 is obviously redundant, and would better be left outside the braces, and then you might as well switch back to straight square brackets.
There is also a separate extended globbing syntax which allows for more elaborate wildcards. See the reference manual for details.
On the input I have string that can be plain path string (e.g. /home/user/1.txt) or glob pattern (e.g. /home/user/*.txt).
Next I want to get array of matches if string is glob pattern and in case when string is just plain path I want to get array with single element - this path.
So somehow I should check if string contains unescaped glob symbols and if it does then call Pathname.glob() to get matches otherwise just return array with this string.
How can I check if string is a glob pattern?
UPDATE
I had this question while implementing homebrew cask glob pattern support for zap stanza.
And the solution that I used is to made a little refactoring to avoid need to check if string is a glob pattern.
Next I want to get array of matches if string is glob pattern and in case when string is just plain path I want to get array with single element - this path.
They're both valid glob patterns. One contains a wildcard, one does not. Run them both through Pathname.glob() and you'll always get an array back. Bonus, it'll check if it matches anything.
$ irb
2.3.3 :001 > require "pathname"
=> true
2.3.3 :002 > Pathname.glob("test.data")
=> [#<Pathname:test.data>]
2.3.3 :003 > Pathname.glob("test.*")
=> [#<Pathname:test.asm>, #<Pathname:test.c>, #<Pathname:test.cpp>, #<Pathname:test.csv>, #<Pathname:test.data>, #<Pathname:test.dSYM>, #<Pathname:test.html>, #<Pathname:test.out>, #<Pathname:test.php>, #<Pathname:test.pl>, #<Pathname:test.py>, #<Pathname:test.rb>, #<Pathname:test.s>, #<Pathname:test.sh>]
2.3.3 :004 > Pathname.glob("doesnotexist")
=> []
This is a great way to normalize and validate your data early, so the rest of the program doesn't have to.
If you really want to figure out if something is a literal path or a glob, you could try scanning for any special glob characters, but that rapidly gets complicated and error prone. It requires knowing how glob works in detail and remembering to check for quoting and escaping. foo* has a glob pattern. foo\* does not. foo[123] does. foo\[123] does not. And I'm not sure what foo[123\] is doing, I think it counts as a non-terminated set.
In general, you want to avoid writing code that has to reproduce the inner workings of another piece of code. If there was a Pathname.has_glob_chars you could use that, but there isn't such a thing.
Pathname.glob uses File.fnmatch to do the globbing and you can use that without touching the filesystem. You might be able to come up with something using that, but I can't make it work. I thought maybe only a literal path will match itself, but foo* defeats that.
Instead, check if it exists.
Pathname.new(path).exist?
If it exists, it was a real path to a real file. If it didn't exist, it might have been a real path, or it might be a glob. That's probably good enough.
You can also check by looking to see if Pathname.glob(path) returned a single element that matches the original path. Note that when matching paths it's important to normalize both sides with cleanpath.
paths = Pathname.glob(path)
if paths.size == 1 && paths[0].cleanpath == Pathname.new(path).cleanpath
puts "#{path} is a literal path"
elsif paths.size == 0
puts "#{path} matched nothing"
else
puts "#{path} was a glob"
end
I was using some code such as the following in my Ruby script:
if File.dirname(path) =~ /^www\.example\.com\/foo/
And this works great when a file is only one subdirectory deep underneath /foo, but unfortunately the condition would fail if the file was underneath say /foo/bar. My question is, what can the regex above be modified to so that File.dirname will match any file that's underneath at minimum the condition set above and not just one level deep?
This is one of those cases where I'd eschew a regex entirely:
if path.split(File::SEPARATOR)[0,2] == ['www.example.com','foo']
More readable, no escaping needed.
Try File.fnmatch, it uses some matching patterns (similar but not regex), for your case we could use:
**foo**
Matches all files with path including a directory called foo
File.fnmatch('**foo**','foo/test.txt')
#> true
File.fnmatch('**foo**','/boo/foo/test.txt')
#> true
File.fnmatch('**foo**','/boo/test.txt')
#> false
I encounter one problem about the file system in the shell.
what's difference between tmp/**/* and tmp/*?
I make the experiment in my system,
have this directory dir2
dir2
-->dir1
-->xx2
-->ff.txt
and I run ls dir2/*:
dir2/ff.txt
dir2/dir1:
xx2
then I run ls dir2/**/*:
dir2/dir1/xx2
So it means the ** is to ignore this directory(like ignore the dir1),
Can some one help me ?
I think there's a formatting issue in the question test, but I'll answer based on the question title and examples.
There shouldn't be any difference between a single and double asterisk at any single level of the path. Either expression matches any name, except for hidden ones which start with a dot (this can be changed by shell options). So:
tmp/**/* (equivalent to tmp/*/*) is expanded to all names which are nested two levels deep in tmp. The first asterisk expands only to directories and not files at the first level because it's followed by a slash.
tmp/* expands to anything nested one level deep inside tmp.
To this comes the fact that ls will list contents of directory if a directory is given on its command line. This can be overridden by adding -d option to ls.