bash script wildcard not globbing files - bash

I have a script where I am switching from the apparently bad practice of populating arrays with find or ls to using globs.
I recently got a report from a user where the expression is not globbing the files.. The user has a different Linux distro than I, but the script is being called by GNU bash, version 4.2.45(1)-release in both cases. I have tried a bunch of different variations which work in my shell but not in theirs. Here is the latest:
declare -a ARRAY
GLOB="keyword"
VAR=("path/to/file/*${GLOB}*")
ARRAY+=("$VAR")
However the my logs indicate that
$ echo ${ARRAY[*]}
path/to/file/*keyword*
With unexpanded wildcards, instead of the expected/desired
$ echo ${ARRAY[*]}
13_keyword_$23.txt
14_keyword_$24.txt
...
The VAR path is populated with variables, but it is expanding correctly and the files are present. The directory holds a bunch of files like 17_keyword_$22.txt.
I wonder if someone can tell me what I am missing so I can count on inter-bash portability. I have had several slightly different versions of this work on my machine but not the other, and am wondering what environmental variable might be causing the disconnect. I have not added any shopt noglob options to the script, I just double quote all file path related variables. Could that be it?
Edit: also tried simply
ARRAY+=(path/to/file/*'keyword'*.txt
or
GLOB=(path/to/file/*keyword*)
ARRAY+=("$GLOB")
Which worked only for my computer.

Quoting a wildcard inhibits globbing.
VAR=("path/to/file/"*"$GLOB"*)
But you'll need to fix all the other problems as well.

Just as an update, it turned out that the problem in my actual script (not the cruddy mock-up above) was that the globbing was not working because of the formatting of the user's partition.
I had fine results with ext3, ext4 and fat32. But NTFS formatted partitions handled the globbing differently. At least, I think it was the globbing that is the problem. I have not fixed the original issue yet, but at least I can simply recommend a different partition.
I will continue to accept the earlier answer since it accurately answered the question as written.
Thanks!

Related

Shell script- reason for using bin/echo instead of echo?

I have a shell script from 2011, the purpose of which is to run on different unix systems.
The script defines certain variables and I don't understand the logic behind it, I would like to know why is this done so. For example:
instead of using echo or grep directly in the script, these variables are defined as follows:
ECHO="/bin/echo"
GREP="/bin/grep" (for linux)
for Solaris or other , the corresponding path is defined as well.
They are then used as ${ECHO} "something out"
What is the purpose of this practice and why can I not use it directly?
As others have pointed out, it is unlikely that those lines are correct, more likely they should be:
ECHO="/bin/echo"
GREP="/bin/grep" # (for linux)
Assuming that they are correct, code like this used to be commonly seen in shell scripts (not mine, I might add). You don't see many people using these any more.
echo: ksh (Korn shell, which used to be the shell of choice), csh (C-shell, default shell on Sun) and sh (Bourne shell before it was POSIX) all had their own built-in versions of echo that were slightly different (mostly around the -n argument). Therefore the stand-alone program /bin/echo was sometimes used for portability. There is a performance price to pay for that.
grep and others: It used to be commonly recommended that the full path name for external programs should be set in a script. The main reason was security. In theory a user could provide their own version in a local directory and change their PATH variable. PATH, and all other environment variables, is still considered a security risk by many. A secondary reason was the performance overhead of searching the directories of $PATH - this was before the days of tracked aliases (ksh) or hashing (bash).
I'll repeat that I don't subscribe to all these views, and I have had arguments over the years with those who do, however that is the explanation. In my opinion this practice causes more problems than it solves.
EDIT: the practices I mention go back to the 1970s and 80s. Why would they be in a script from 2011? Possibly because "we always do that", a.k.a. "company policy", i.e. no one knows or cares why, we just do. Alternatively it could be a case of copy n'paste from an old web-site or book, or even someone who believes this is a good idea.
There is no good reason whatsoever for this practice.
It reduces scripts' portability (by requiring modification when moving to any system with different binary locations), reduces performance (by suppressing use of shell builtins where available), and (as PATH lookups are cached) does not significantly improve runtime performance by saving lookup costs.
One caveat: On some systems, /bin/ is not the canonical location for POSIX tools; for instance, /usr/xpg/bin/sh would be the location for POSIX sh, and /usr/xpg/bin/awk would be the location for POSIX awk, on some ancient SunOS systems.
The wrong way to enforce use of POSIX-compliant tools on such a system is to hardcode these paths in variables defined at the top of the script.
The right way to enforce use of POSIX-compliant tools on such a system is simply to specify a PATH that puts /usr/xpg/bin before /bin. For instance, a script can specify [ -d /usr/xpg/bin ] && PATH=/usr/xpg/bin:$PATH, and serve this purpose thus.
Alternately, assume that one wishes to use GNU find. Instead of setting a FIND variable at the top of a script, one can specify a wrapper as needed, falling through to the default behavior of using the standard find command if no renamed alternative exists:
# use GNU find if under a name other than "find"
if type gnufind >/dev/null 2>&1; then
find() { gnufind "$#"; }
elif type gfind >/dev/null 2>&1; then
find() { gfind "$#"; }
fi
Limiting this answer the echo part.
Probably this was an attempt to make the program portable, but it was a futile one.
The echo command itself is always unportable if the arguments can contain a backslash or the first argument is -n. POSIX says in these cases the behavior will be implementation-dependent.
Source: https://www.gnu.org/software/coreutils/manual/html_node/echo-invocation.html#echo-invocation
Both dash and bash claim POSIX compliance, but echo 'a\nb' will lead to a different result. And both are correct. I would not rely on the hope that all stand-alone echo programs on the planet just happen to choose the same implementation either.
The easiest way to get the code really portable for any argument is to use printf instead of echo.
If you really wanted to call the command echo instead of the built-in because you are confident that your code will never be run on a system with a different implementation choice command echo would be the best way to do it.

Why do scripts define common commands in variables?

I see this all the time at my place of work:
#!/bin/sh
.....
CAT=/usr/bin/cat # An alias for cat
MAIL=/usr/bin/mail # An alias for mail
WC=/usr/bin/wc # An alias for word count
GREP=/usr/bin/grep # An alias for grep
DIRNAME=/usr/bin/dirname # An alias for dirname
RM=/usr/bin/rm # An alias for rm
MV=/usr/bin/mv # An alias for mv
.....
Is it just my company that does this? Is there a reason why you would want to spell out where these extremely common commands are? Why would I want $CAT to refer to /usr/bin/cat when cat already refers to /usr/bin/cat? Am I missing something? It seems like its needlessly redundant.
Using the full pathname ensures that the script operates correctly even if it's run by a user who customizes their PATH environment variable so that it finds different versions of these commands than the script expects.
Using variables simplifies writing the script, so you don't have to write the full pathname of a command each time it appears in the script.
Is it just my company that does this?
No.
Is there a reason why you would want to spell out where these extremely common commands are?
Yes.
Why would I want $CAT to refer to /usr/bin/cat when cat already refers to /usr/bin/cat?
Are you sure cat always refers to /usr/bin/cat? What if your script happens to be run in an environment where there is a different cat earlier in the path? Or where there is simply a user-controlled directory earlier in the path, where a user could install a rogue cat command? If your script ever happens to be run with elevated privileges, then do you really want to give random users the ability to do anything they want to your system?
Are you sure cat is supposed always to refer to /usr/bin/cat? If ever the script were installed in an environment where a different cat were needed (say /usr/local/bin/gnucat), then would you prefer to modify one line or twenty?
Am I missing something? It seems like its needlessly redundant.
Yes, you are missing something.
One would like to avoid writing out /usr/bin/cat everywhere they want to run cat, and one would like to be able to choose a different cat where needed (or more likely a different make or grep or sed). On the other hand, one wants to avoid potentially unsafe external influence on the behavior of a trusted script. Defining the full path to the command in a shell variable and then using that variable to run the command accomplishes these objectives.
One way to avoid this and still have the safety of ignoring the user's environment is to explicitly spell out the variables in the script
#!/bin/sh
PATH=/bin:/usr/bin # maybe you need something in /usr/sbin, add that
LC_ALL=C # ignore the user's locale
LD_LIBRARY_PATH=something # or unset it if you want nothing
# then
cat /a/file # have confidence you're using /bin/cat
There may well be others: check the man pages of the programs you use in your code.
Welcome to the enterprise where nothing is taken for granted.
These are commonly defined to ensure correct version, or to enforce env setup across many boxes.
https://github.com/torvalds/linux/blob/master/tools/scripts/Makefile.include#L15
This way you can check if dir exist.

Blacklist program from bash completion

Fedora comes with "gstack" and a bunch of "gst-" programs which keep appearing in my bash completions when I'm trying to quickly type my git aliases. They're of course installed under /usr/bin along with a thousand other programs, so I can't just remove their directory from my PATH. Is there any way in Linux to blacklist these specific programs from appearing for completion?
I've tried the FIGNORE and GLOBIGNORE environment variables but they don't work, it looks like they're only for file completion after you've entered a command.
In 2016 Bash introduced an option for that. I'm reproducing the text from this newer answer by zuazo:
This is rather new, but in Bash 4.4 you can set the EXECIGNORE variable:
aa. New variable: EXECIGNORE; a colon-separate list of patterns that
will cause matching filenames to be ignored when searching for commands.
From the official documentation:
EXECIGNORE
A colon-separated list of shell patterns (see Pattern Matching) defining the list of filenames to be ignored by command search using
PATH. Files whose full pathnames match one of these patterns are not
considered executable files for the purposes of completion and command
execution via PATH lookup. This does not affect the behavior of the [,
test, and [[ commands. Full pathnames in the command hash table are
not subject to EXECIGNORE. Use this variable to ignore shared library
files that have the executable bit set, but are not executable files.
The pattern matching honors the setting of the extglob shell option.
For Example:
$ EXECIGNORE=$(which pytest)
Or using Pattern Matching:
$ EXECIGNORE=*/pytest
I don't know if you can blacklist specific files, but it is possible to complete from your command history instead of the path. To do that add the following line to ~/.inputrc:
TAB dynamic-complete-history
FIGNORE is for SUFFIXES only. It presumes for whatever reason that you want to blacklist an entire class of files. So you need to knock off the first letter.
E.g. To eliminate gstack from autocompletion:
FIGNORE=stack
Will rid gstack but also rid anything else ending in stack.

Preprocess line before it is processed by bash

Is there a way to preprocess a line entered into bash in interactive mode before it is processed by bash?
I'd like to introduce some custom shorthand syntax to deal with long paths. For example, instead of writing 'cd /one/two/three/four/five', I'd like to be able to write something like 'cd /.../five', and then my preprocessing script would replace this by the former command (if a unique directory 'five' exists somewhere below /).
I found http://glyf.livejournal.com/63106.html which describes how to execute a hook function before a command is executed. However, the approach does not allow to alter the command to be executed.
There's no good way of doing this generally for all commands.
However, you can do it for specific commands by overriding them with a function. For your cd case, you can stick something like this in your .bashrc:
cd() {
path="$1"
[[ $path == "/.../five" ]] && path="/one/two/three/four/five"
builtin cd "$path"
}
In bash 4 or later, you can use the globstar option.
shopt -s globstar
cd /**/five
assuming that five is a unique directory.
The short answer is not directly. As you have found, the PROMPT_COMMAND environment variable allows you to issue a command before the prompt is displayed, which can allow for some very creative uses, e.g. Unlimited BASH History, but nothing that would allow you to parse and replace input directly.
What you are wanting to do can be accomplished using functions and alias within your .bashrc. One approach would be to use either findutils-locate or simply a find command to search directories below the present working directory for the last component in the ellipsed path, and then provide the full path in return. However, even with the indexing, locate would take a bit of time, and depending on the depth, find itself may be to slow for generically doing this for all possible directories. If however, you had a list of specific directories you would like to implement something like this for, then the solution would be workable and relatively easy.
To provide any type of prototype or further detail, we would need to know more about how you intent to use the path information, and whether multiple paths could be provided in a single command.
Another issue arises if the directory five is non-unique...

ZSH/Shell variable assignment/usage

I use ZSH for my terminal shell, and whilst I've written several functions to automate specific tasks, I've never really attempted anything that requires the functionality I'm after at the moment.
I've recently re-written a blog using Jekyll and I want to automate the production of blog posts and finally the uploading of the newly produced files to my server using something like scp.
I'm slightly confused about the variable bindings/usage in ZSH; for example:
DATE= date +'20%y-%m-%d'
echo $DATE
correctly outputs 2011-08-23 as I'd expect.
But when I try:
DATE= date +'20%y-%m-%d'
FILE= "~/path/to/_posts/$DATE-$1.markdown"
echo $FILE
It outputs:
2011-08-23
blog.sh: line 4: ~/path/to/_posts/-.markdown: No such file or directory
And when run with what I'd be wanting the blog title to be (ignoring the fact the string needs to be manipulated to make it more url friendly and that the route path/to doesn't exist)
i.e. blog "blog title", outputs:
2011-08-23
blog.sh: line 4: ~/path/to/_posts/-blog title.markdown: No such file or directory
Why is $DATE printing above the call to print $FILE rather than the string being included in $FILE?
Two things are going wrong here.
Firstly, your first snippet is not doing what I think you think it is. Try removing the second line, the echo. It still prints the date, right? Because this:
DATE= date +'20%y-%m-%d'
Is not a variable assignment - it's an invocation of date with an auxiliary environment variable (the general syntax is VAR_NAME=VAR_VALUE COMMAND). You mean this:
DATE=$(date +'20%y-%m-%d')
Your second snippet will still fail, but differently. Again, you're using the invoke-with-environment syntax instead of assignment. You mean:
# note the lack of a space after the equals sign
FILE="~/path/to/_posts/$DATE-$1.markdown"
I think that should do the trick.
Disclaimer
While I know bash very well, I only started using zsh recently; there may be zshisms at work here that I'm not aware of.
Learn about what a shell calls 'expansion'. There are several kinds, performed in a particular order:
The order of word expansion is as follows:
tilde expansion
parameter expansion
command substitution
arithmetic expansion
pathname expansion, unless set -f is in effect
quote removal, always performed last
Note that tilde expansion is only performed when the tilde is not quoted; viz.:
$ FILE="~/.zshrc"
$ echo $FILE
~/.zshrc
$ FILE=~./zshrc
$ echo $FILE
/home/user42/.zshrc
And there must be no spaces around the = in variable assignments.
Since you asked in a comment where to learn shell programming, there are several options:
Read the shell's manual page man zsh
Read the specification of the POSIX shell, http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html, especially if you want to run your scripts on different operating systems (and you will find yourself in that situation one fine day!)
Read books about shell programming.
Hang out in the usenet newsgroup comp.unix.shell where a lot of shell wizards answer questions

Resources