Does a double-asterisk wildcard mean anything apart from `globstar`? - bash

I have an Ant build.xml script that includes the following snippet:
<fileset dir="${project.home}/${project.lib}">
<include name="**/*.jar"/>
</fileset>
According to the answers to this question and the Bash documentation, the double-asterisk is indicative of globstar pattern-matching:
globstar
If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If
the pattern is followed by a ‘/’, only directories and subdirectories
match.
This seems to be the sense in which whoever wrote the code meant it to work: locate all .jar files within the project library directory, no matter how many directories deep.
However, the code is routinely executed in a Bash shell for which the globstar setting is turned off. In this case, it seems like the double asterisk should be interpreted as a single asterisk, which would break the build. Nevertheless, the build executes perfectly well.
Is there any scenario outside of globstar for which the Bash shell will interpret ** in any way differently than *? For example, does the extglob setting alone differentiate the two? Documentation on this seems to be sparse.

You present part of an Ant build file. What makes you think that bash syntax or the settings of any particular bash shell has anything to do with the interpretation of the contents of that file? Ant implements its own pattern-matching; details are documented in the Ant User Manual, and in particular, here: https://ant.apache.org/manual/dirtasks.html#patterns.
As for bash and the actual questions you posed,
Is there any scenario outside of globstar for which the Bash shell will interpret ** in any way differently than *?
In the context of arithmetic evaluation, * means multiplication, whereas ** means exponentiation.
Bash's globstar option affects how ** is interpreted in a bash pathname expansion context, and nothing else. If globstar is enabled then ** has different effect in that context than does * alone; if it is not enabled then ** is just two * wildcards, one after the other, which does not change which file names match.
Other than in those two contexts I don't think ** has any special meaning to bash at all, but there are contexts where * by itself is meaningful. For example, $* is a special variable that represents the list of positional parameters, and if foo is an array-valued variable then ${foo[*]} represents all the elements of the array. There are a few others. Substituting ** for * in those places changes the meaning; in most of them it creates a syntax error.
For example, does the extglob setting alone differentiate the two?
The bash manual has a fairly lengthy discussion of pathname expansion (== filename expansion). There are several options that affect various aspects of it, but the only one that modulates the meaning of ** is globstar.
The noglob option disables pathname expansion altogether, however. If noglob is enabled then * and ** each represents itself in contexts where pathname expansion otherwise would have been performed.

Ant does not use bash to create the fileset, that is all Java code.
The meaning of the double star is indeed as you describe, to dive down into all folders and find all *.jar in any subfolder. But works even on Windows, where there is typically no bash to be seen anywhere.

Related

Globbing patterns in windows command prompt/ powershell

I would like to know if there is any way to achieve this behavior on windows, for example:
/b?n/ca? /etc/pa??wd -> executes 'cat /etc/passwd'
With limited exceptions in PowerShell, on Windows there is no support for shell-level globbing - target commands themselves must perform resolution of wildcard patterns to matching filenames; if they don't, globbing must be performed manually, up front, and the results passed as literal paths; see the bottom section for background information.
PowerShell:
Perhaps surprisingly, you can invoke an executable by wildcard pattern, as zett42 points out, though that behavior is problematic (see bottom section):
# Surprisingly DOES find C:\Windows\System32\attrib.exe
# and invokes it.
C:\Windows\System3?\attr*.exe /?
Generally, you can discover commands, including external programs, via the Get-Command cmdlet.
Many file-processing cmdlets in PowerShell do perform their own globbing (e.g., Get-ChildItem, Remove-Item); if you're calling commands that do not, notably external programs that don't, you must perform globbing manually, up front, except on Unix-like platforms when calling _external programs, where PowerShell does perform automatic globbbing (see bottom section):
Use Convert-Path to get the full, file-system-native paths of matching files or directories.
While Resolve-Path may work too, it returns objects whose .ProviderPath property you need to access to get the same information (stringifying these objects, as happens implicitly when you pass them to external programs, yields their .Path property, which may be based on PowerShell-only drives that external programs and .NET APIs know nothing about.)
For more control over what is matched, use Get-ChildItem and access the result objects' .Name or .FullName property, as needed; for instance, Get-ChildItem allows you to limit matching to files (-File) or directories (-Directory) only.
PowerShell makes it easy to use the results of manually performed globbing programmatically; the following example passes the full paths of all *.txt files in the current directory to the cmd.exe's echo command as individual arguments; PowerShell automatically encloses paths with spaces in "...", if needed:
cmd /c echo (Get-ChildItem -Filter *.txt).FullName
Generally, note that PowerShell's wildcard patterns are more powerful than those of the host platform's file-system APIs, and notably include support for character sets (e.g. [ab]) and ranges (e.g. [0-9]); another important difference is that ? matches exactly one character, whereas the native file-system APIs on Windows match none or one.
However, when using the -Filter parameter of file-processing cmdlets such as Get-ChildItem, the host platform's patterns are used, which - while limiting features - improves performance; a caveat is that on Unix-like platforms ? then seemingly acts like on Windows, i.e causing it to match none or one character.
cmd.exe (Command Prompt, the legacy shell):
cmd.exe does not support calling executables by wildcard pattern; some of cmd.exe's internal commands (e.g., dir and del) and some standard external programs (e.g., attrib.exe) do perform their own globbing; otherwise you must perform globbing manually, up front:
where.exe, the external program for discovering external programs fundamentally only supports wildcard patterns in executable names (e.g. where find*.exe), not in paths, which limits wildcard-based lookups to executables located in directories listed in the PATH environment variable.
:: OK - "*" is part of a *name* only
where.exe find*.exe
:: !! FAILS: "*" or "?" must not be part of a *path*
:: !! -> "ERROR: Invalid pattern is specified in "path:pattern"."
where.exe C:\Windows\System32\find*.exe
Globbing via dir appears to be limited to wildcard characters in the last path component:
:: OK - "*" is only in the *last* path component.
dir C:\Windows\System32\attri*
:: !! FAILS: "*" or "?" must not occur in *non-terminal* components.
:: !! -> "The filename, directory name, or volume label syntax is incorrect."
dir C:\Windows\System3?\attri*
Using manual globbing results programmatically is quite cumbersome in cmd.exe and requires use of for statements (whose wildcard matching has the same limitations as the dir command); for example, using the syntax for batch files (.cmd or .bat files):
To use the resolved executable file path for invocation (assuming only one file matches):
#echo off
setlocal
:: Use a `for` loop over a wildcard pattern to enumerate
:: the matching filenames - assumed to be just *one* in this case,
:: namely attrib.exe, and save it in a variable.
for %%f in (C:\Windows\System32\attr*.exe) do set "Exe=%%f"
:: Execute the resolved filename to show its command-line help.
"%Exe%" /?
To pass matching filenames as multiple arguments to a single command:
#echo off
setlocal enableDelayedExpansion
:: Use a `for` loop over a wildcard pattern to enumerate
:: matching filenames and collect them in a single variable.
set files=
for %%f in (*.txt) do set files=!files! "%%f"
:: Pass all matching filenames to `echo` in this example.
echo %files%
Background information:
On Unix-like platforms, POSIX-compatible shells such as Bash themselves perform globbing (resolving filename wildcard patterns to matching filenames), before the target command sees the resulting filenames, as part of a feature set called shell expansions (link is to the Bash manual).
On Windows, cmd.exe (the legacy shell also known as Command Prompt) does NOT perform such expansions and PowerShell mostly does NOT.
That is, it is generally up to each target command to interpret wildcard patterns as such and resolve them to matching filenames.
That said, in PowerShell, many built-in commands, known as cmdlets, do support PowerShell's wildcard patterns, notably via the -Path parameter of provider cmdlets, such as Get-ChildItem.
Additionally and more generally, cmdlet parameters that represent names often support wildcards too; e.g., Get-Process exp* lists all processes whose image name start with exp, such as explorer.
Note that the absence of Unix-style shell expansions on Windows also implies that no semantic distinction is made between unquoted and quoted arguments (e.g., *.txt vs. "*.txt"): a target command generally sees both as verbatim *.txt.
In PowerShell, automatic globbing DOES occur in these limited cases:
Perhaps surprisingly, an executable file path can be invoked via a wildcard pattern:
as-is, if the pattern isn't enclosed in '...' or "..." and/or contains no variable references or expressions; e.g.:
C:\Windows\System3?\attri?.exe
via &, the call operator, otherwise; e.g.:
& $env:SystemRoot\System32\attri?.exe
However, this feature is of questionable utility - When would you not want to know up front what specific executable you're invoking? - and it is unclear whether it was implemented by design, given that inappropriate wildcard processing surfaces in other contexts too - see GitHub issue #4726.
Additionally, up to at least PowerShell 7.2.4, if two or more executables match the wildcard pattern, a misleading error occurs, suggesting that no matching executable was found - see GitHub issue #17468; a variation of the problem also affects passing a wildcard-based path (as opposed to a mere name) that matches multiple executables to Get-Command.
In POSIX-compatible shells, the multi-match scenario is handled differently, but is equally useless: the first matching executable is invoked, and all others are passed as its arguments.
On Unix-like platforms only, PowerShell emulates the globbing feature of POSIX-compatible shells when calling external programs, in an effort to behave more like the platform-native shells; if PowerShell didn't do that, something as simple as ls *.txt would fail, given that the external /bin/ls utility would then receive verbatim *.txt as its argument.
However, this emulation has limitations, as of PowerShell 7.2.4:
The inability to use wildcard patterns that contain spaces - see GitHub issue #10683.
The inability to include hidden files - see GitHub issue #4683.
A still experimental feature, available in preview versions of 7.3, PSNativePSPathResolution, automatically translates wildcard patterns based on PowerShell-only drives to their underlying native file-system paths; however, this feature is currently overzealous - see GitHub issue #13640 - and inherently bears the risk of false positives - see GitHub issue #13644
In PowerShell you can use Resolve-Path which Resolves the wildcard characters in a path, and displays the path contents.
Example: I want to locate signtool.exe from the Windows SDK which typically resides in "c:\Program Files (x86)\Windows Kits\10\bin\10.0.19041.0\x64\signtool.exe" where there could be any other version(s) installed.
So I could use: Resolve-Path 'c:\program*\Windows Kits\10\bin\*\x64\signtool.exe'
EDIT:
If you want to execute it directly you can use the & invocation operator e.g.
&(Resolve-Path 'c:\wind?ws\n?tepad.exe')

Makefiles and understanding wildcards

So I'm trying to start using make on an existing project and I'm getting super confused as to how to properly use wildcards. or atleast that's what I think I need.
basically this is the command I'm trying to run fieldalignment ./**/*copy. copy would be a variable that would be passed into the command and basically I'm just trying to search the whole current directory and the subdirectories for that package and run the fieldalignment command against it. I'm working in go. from what I understand the '*' should be replace with wildcards? but I'm not entirely sure how.
this is the basic idea of what I'm trying to do.
checkfieldalignment:
fieldalignment ./...$(PACK)
fixfieldalignment :
fieldalignment -fix ./**/*$(PACK)
The first one kind of works but also gets an error 3. not sure what that means.
make implements standard POSIX-style globbing in its wildcard function, the same as the traditional shell globbing and the default globbing in shells like bash. And, when make invokes a shell to run a recipe it always invokes /bin/sh, not whatever shell the user happens to be running.
Make does not implement "extended" globbing such as that provided by zsh and some other shells, or available in bash if you turn it on.
And /bin/sh is a POSIX shell, or a link to another shell that is running in "POSIX mode", so it doesn't support extended globbing either.
So, special features like ** to mean "search all subdirectories" are not available here.
You can use the find program to replace it, like this:
fixfieldalignment :
fieldalignment -fix $$(find ./ -name '*$(PACK)')

Bash glob pattern being expanded using remote data

I have done this by mistake:
s3cmd del s3://mybucket/*
But ... it is working:
...
delete: 's3://mybucket/file0080.bin'
delete: 's3://mybucket/file0081.bin'
delete: 's3://mybucket/file0082.bin'
...
I am baffled. Usually * is expanded by the shell (Bash), using the information available in the localhost.
How/why is expansion working against an s3 bucket?
(This is an unquoted glob pattern)
If the glob doesn’t match anything it’ll remain as-is (unless you set the nullglob option in Bash), with an asterisk in this case, and s3cmd del apparently understands that.
Of course it’s not a good idea to rely on this behaviour, since if a local file should suddenly exist that matches the glob it would (probably) stop working. Quoting the glob (i.e. making it not a glob) is a good habit.
An other option is to set the nullglob option (shopt -s nullglob) to make non-matching globs go away entirely.
To see how a glob expands and what the final command looks like you can run set -x in Bash before running it, which makes Bash print each (expanded) command before running it (set +x to turn it off).

Blacklist program from bash completion

Fedora comes with "gstack" and a bunch of "gst-" programs which keep appearing in my bash completions when I'm trying to quickly type my git aliases. They're of course installed under /usr/bin along with a thousand other programs, so I can't just remove their directory from my PATH. Is there any way in Linux to blacklist these specific programs from appearing for completion?
I've tried the FIGNORE and GLOBIGNORE environment variables but they don't work, it looks like they're only for file completion after you've entered a command.
In 2016 Bash introduced an option for that. I'm reproducing the text from this newer answer by zuazo:
This is rather new, but in Bash 4.4 you can set the EXECIGNORE variable:
aa. New variable: EXECIGNORE; a colon-separate list of patterns that
will cause matching filenames to be ignored when searching for commands.
From the official documentation:
EXECIGNORE
A colon-separated list of shell patterns (see Pattern Matching) defining the list of filenames to be ignored by command search using
PATH. Files whose full pathnames match one of these patterns are not
considered executable files for the purposes of completion and command
execution via PATH lookup. This does not affect the behavior of the [,
test, and [[ commands. Full pathnames in the command hash table are
not subject to EXECIGNORE. Use this variable to ignore shared library
files that have the executable bit set, but are not executable files.
The pattern matching honors the setting of the extglob shell option.
For Example:
$ EXECIGNORE=$(which pytest)
Or using Pattern Matching:
$ EXECIGNORE=*/pytest
I don't know if you can blacklist specific files, but it is possible to complete from your command history instead of the path. To do that add the following line to ~/.inputrc:
TAB dynamic-complete-history
FIGNORE is for SUFFIXES only. It presumes for whatever reason that you want to blacklist an entire class of files. So you need to knock off the first letter.
E.g. To eliminate gstack from autocompletion:
FIGNORE=stack
Will rid gstack but also rid anything else ending in stack.

bash script wildcard not globbing files

I have a script where I am switching from the apparently bad practice of populating arrays with find or ls to using globs.
I recently got a report from a user where the expression is not globbing the files.. The user has a different Linux distro than I, but the script is being called by GNU bash, version 4.2.45(1)-release in both cases. I have tried a bunch of different variations which work in my shell but not in theirs. Here is the latest:
declare -a ARRAY
GLOB="keyword"
VAR=("path/to/file/*${GLOB}*")
ARRAY+=("$VAR")
However the my logs indicate that
$ echo ${ARRAY[*]}
path/to/file/*keyword*
With unexpanded wildcards, instead of the expected/desired
$ echo ${ARRAY[*]}
13_keyword_$23.txt
14_keyword_$24.txt
...
The VAR path is populated with variables, but it is expanding correctly and the files are present. The directory holds a bunch of files like 17_keyword_$22.txt.
I wonder if someone can tell me what I am missing so I can count on inter-bash portability. I have had several slightly different versions of this work on my machine but not the other, and am wondering what environmental variable might be causing the disconnect. I have not added any shopt noglob options to the script, I just double quote all file path related variables. Could that be it?
Edit: also tried simply
ARRAY+=(path/to/file/*'keyword'*.txt
or
GLOB=(path/to/file/*keyword*)
ARRAY+=("$GLOB")
Which worked only for my computer.
Quoting a wildcard inhibits globbing.
VAR=("path/to/file/"*"$GLOB"*)
But you'll need to fix all the other problems as well.
Just as an update, it turned out that the problem in my actual script (not the cruddy mock-up above) was that the globbing was not working because of the formatting of the user's partition.
I had fine results with ext3, ext4 and fat32. But NTFS formatted partitions handled the globbing differently. At least, I think it was the globbing that is the problem. I have not fixed the original issue yet, but at least I can simply recommend a different partition.
I will continue to accept the earlier answer since it accurately answered the question as written.
Thanks!

Resources