Bash for loop with wildcards and hidden files - bash

Just witting a simple shell script and little confused:
Here is my script:
% for f in $FILES; do echo "Processing $f file.."; done
The Command:
ls -la | grep bash
produces:
% ls -a | grep bash
.bash_from_cshrc
.bash_history
.bash_profile
.bashrc
When
FILES=".bash*"
I get the same results (different formatting) as ls -a. However when
FILES="*bash*"
I get this output:
Processing *bash* file..
This is not the expected output and not what I expect. Am I not allowed to have a wild card at the beginning of the file name? Is the . at the beginning of the file name "special" somehow?
Setting
FILES="bash*"
Also does not work.

The default globbing in bash does not include filenames starting with a . (aka hidden files).
You can change that with
shopt -s dotglob
$ ls -a
. .. .a .b .c d e f
$ ls *
d e f
$ shopt -s dotglob
$ ls *
.a .b .c d e f
$
To disable it again, run shopt -u dotglob.

If you want hidden and non hidden, set dotglob (bash)
#!/bin/bash
shopt -s dotglob
for file in *
do
echo "$file"
done

FILES=".bash*" works because the hidden files name begin with a .
FILES="bash*" doesn't work because the hidden files name begin with a . not a b
FILES="*bash*" doesn't work because the * wildcard at the beginning of a string omits hidden files.

Yes, the . at the front is special, and normally won't be matched by a * wildcard, as documented in the bash man page (and common to most Unix shells):
When a pattern is used for pathname expansion, the character “.”
at the start of a name or immediately following a slash must
be matched explicitly, unless the shell option dotglob is
set. When matching a pathname, the slash character must
always be matched explicitly. In other cases, the “.”
character is not treated specially.

If you want to include hidden files, you can specify two wildcards; one for the hidden files, and another for the others.
for f in .[!.]* *; do
echo "Processing $f file.."
done
The wildcard .* would expand to all the dot files, but that includes the parent directory, which you normally would want to exclude; so .[!.]* matches all files whose first character is a dot, but the second one isn't.
If you have other files with two leading dots, you need to specify a third wildcard to cover those but exclude the parent directory! Try ..?* which requires there to be at least one character after the second dot.

for file in directory/{.[!.]*,*};do echo $file;done
Should echo either hidden files and normal file. Thanks to tripleee for the .[!.]* tip.
The curly brackets permits a 'or' in the pattern matching. {pattern1,pattern2}

Related

How to convert files in Unix using iconv?

I'm new to Bash scripting. I have a requirement to convert multiple input files in UTF-8 encoding to ISO 8859-1.
I am using the below command, which is working fine for the conversion part:
cd ${DIR_INPUT}/
for f in *.txt; do iconv -f UTF-8 -t ISO-8859-1 $f > ${DIR_LIST}/$f; done
However, when I don't have any text files in my input directory ($DIR_INPUT), it still creates an empty .txt file in my output directory ($DIR_LIST).
How can I prevent this from happening?
The empty file *.txt is being created in your output directory because by default, bash expands an unmatched expansions to the literal string that you supplied. You can change this behaviour in a number of ways, but what you're probably looking for is shopt -s nullglob. Observe:
$ for i in a*; do echo "$i"; done
a*
$ shopt -s nullglob
$ for i in a*; do echo "$i"; done
$
You can find documentation about this in the bash man page under Pathname Expansion. Or here or here.
In your case, I'd probably rewrite this in this way:
shopt -s nullglob
for f in "$DIR_INPUT"/*.txt; do
iconv -f UTF-8 -t ISO-8859-1 "$f" > "${DIR_LIST}/${f##*/}"
done
This avoids the need for the initial cd, and uses parameter expansion to strip off the path portion of $f for the output redirection. The nullglob will obviously eliminate the work being done on a nonexistent file.
As #ghoti pointed out, in the absence of files matching the wildcard expression a* the expression itself becomes the result of pathname expansion. By default (when nullglob option is unset), a* is expanded to, literally, a*.
You can set nullglob option, of course. But then you should be aware of the fact that all subsequent pathname expansions will be affected, unless you unset the option after the loop.
I would rather use find command which has a clear interface (and, in my opinion, is less likely to perform implicit conversions as opposed to the Bash globbing). E.g.:
cmd='iconv --verbose -f UTF-8 -t ISO-8859-1 "$0" > "$1"/$(basename "$0")'
find "${DIR_INPUT}/" \
-mindepth 1 \
-maxdepth 1 \
-type f \
-name '*.txt' \
-exec sh -c "$cmd" {} "${DIR_LIST}" \;
In the example above, $0 and $1 are positional arguments for the file path and ${DIR_LIST} respectively. The command is invoked via standard shell (sh) because of the need to refer to the file path {} twice. Although most modern implementations of find may handle multiple occurrences of {} correctly, the POSIX specification states:
If more than one argument containing the two characters "{}" is present, the behavior is unspecified.
As in the for loop, the -name pattern *.txt is evaluated as true if the basename of the current pathname matches the operand (*.txt) using the pattern matching notation. But, unlike the for loop, filename expansion do not apply as this is a matching operation, not an expansion.

how to address files by their suffix

I am trying to copy a .nii file (Gabor3.nii) path to a variable but even though the file is found by the find command, I can't copy the path to the variable.
find . -type f -name "*.nii"
Data= '/$PWD/"*.nii"'
output:
./Gabor3.nii
./hello.sh: line 21: /$PWD/"*.nii": No such file or directory
What went wrong
You show that you're using:
Data= '/$PWD/"*.nii"'
The space means that the Data= parts sets an environment variable $Data to an empty string, and then attempts to run '/$PWD/"*.nii"'. The single quotes mean that what is between them is not expanded, and you don't have a directory /$PWD (that's a directory name of $, P, W, D in the root directory), so the script "*.nii" isn't found in it, hence the error message.
Using arrays
OK; that's what's wrong. What's right?
You have a couple of options. The most reliable is to use an array assignment and shell expansion:
Data=( "$PWD"/*.nii )
The parentheses (note the absence of spaces before the ( — that's crucial) makes it an array assignment. Using shell globbing gives a list of names, preserving spaces etc in the names correctly. Using double quotes around "$PWD" ensures that the expansion is correct even if there are spaces in the current directory name.
You can find out how many files there are in the list with:
echo "${#Data[#]}"
You can iterate over the list of file names with:
for file in "${Data[#]}"
do
echo "File is [$file]"
ls -l "$file"
done
Note that variable references must be in double quotes for names with spaces to work correctly. The "${Data[#]}" notation has parallels with "$#", which also preserves spaces in the arguments to the command. There is a "${Data[*]}" variant which behaves analogously to "$*", and is of similarly limited value.
If you're worried that there might not be any files with the extension, then use shopt -s nullglob to expand the globbing expression into an empty list rather than the unexpanded expression which is the historical default. You can unset the option with shopt -u nullglob if necessary.
Alternatives
Alternatives involve things like using command substitution Data=$(ls "$PWD"/*.nii), but this is vastly inferior to using an array unless neither the path in $PWD nor the file names contain any spaces, tabs, newlines. If there is no white space in the names, it works OK; you can iterate over:
for file in $Data
do
echo "No white space [$file]"
ls -l "$file"
done
but this is altogether less satisfactory if there are (or might be) any white space characters around.
You can use command substitution:
Data=$(find . -type f -name "*.nii" -print -quit)
To prevent multiline output, the -quit option stop searching after the first file was found(unless you're sure only one file will be found or you want to process multiple files).
The syntax to do what you seem to be trying to do with:
Data= '/$PWD/"*.nii"'
would be:
Data="$(ls "$PWD"/*.nii)"
Not saying it's the best approach for whatever you want to do next of course, it's probably not...

use bash to rename a file with spaces and regex

If I have a file name with spaces and a random set of numbers that looks like this:
file name1234.csv
I want to rename it to this (assuming date is previously specified):
file_name_${date}.csv
I am able to do it like this:
mv 'file name'*'.csv file_name_${date}.csv
However, in a situation that 'file name*.csv' can actually match multiple files, I want to specify that it's 'file name[random numbers].csv'
I've searched around and can't find any relevant answers.
You need what is called a "pathname expansion", to match one or more digits:
+([0-9])
A functional script could be like this one:
date=$(date +'%Y-%m-%d')
shopt -s extglob nullglob
for f in 'file name'+([[:digit:]]).csv; do
file="${f%%[0-9]*}"
echo mv "$f" "${file// /_}_${date}.csv"
done
Warning: all files found will be renamed to just one name, make sure that that is what you want before removing the echo.
To activate the extended version of "Pathname Expansion" we use shopt -s extglob.
To avoid the case where no file is matched, we also need the nullglob set.
We can set the positional arguments to the result of the above expansion.
Then we loop over all files found to change each of their names.
The ${f%%[0-9]*} removes all from the digits to the end.
The ${file// /_} replaces spaces with underscores.
The mv is not actually done with the script presented because of the echo.
If after running a test, you want the change(s) performed, remove the echo.
Use Extended Globs and Parameter Expansion
You can do what you want with Bash extended globs and a few parameter expansions, without resorting to external or non-standard utilities.
date="2016-11-21"
shopt -s extglob
for file in 'file name'+([[:digit:]]).csv; do
newfile="${file%%[0-9]*}"
newfile="${newfile// /_}"
mv "$file" "${newfile}_${date}.csv"
done

In shell scripting, what does .[!.]* mean?

A command that prints a list of files and folders in the current directory along with their total sizes is du -sh *. That command alone doesn't, however, list hidden files or folders. I found a solution for a command that does correctly list the hidden files and folders along with the rest: du -sh .[!.]* *. Although it works perfectly, the solution was provided as-is, without any explanation.
What is the meaning of .[!.]*, exactly? How does it work?
It's a globbing pattern that basically tells bash to find all files starting with a ., followed by any character but a .and containing any character after that.
See this page for a great explanation of bash globbing patterns.
. - match a ., prefix of hidden file
[!.] - match any character, as long as it is not a ., see ref
* - any number of characters
so this pattern means match files starts with . but not ..
.[!.]* the meaning is any file or directory name start with . but not following with ., so it will include all hidden files and directories under current directory but exclude parent directory.
Because this behaviour is decided by shell glob pattern. So you can use ls .[!.]* to see what actually get in your shell environment.
BTW, you can turn dotglob on in your shell to simplify your du command.
$ shopt -s dotglob
$ du -sh *
$ shopt -u dotglob
From bash manual
dotglob If set, bash includes filenames beginning with a `.' in the results of pathname expansion.

Remove all files except files with certain extension

This removes all files that end with .a or .b
$ ls *.a
a.a b.a c.a
$ ls *.b
a.b b.b c.b
$ rm *.a *.b
How would I do the opposite and remove all files that end with *.* except the ones that end with *.a and *.b?
The linked answer has useful info, though the question is somewhat ambiguous and the answers use differing interpretations.
The simplest approach in your case is probably (a streamlined version of https://stackoverflow.com/a/10448940/45375):
(GLOBIGNORE='*.a:*.b'; rm *.*)
Note the use of a subshell ((...)) to localize setting the GLOBIGNORE variable.
The patterns assigned to GLOBIGNORE must be :-separated.
The appeal of this approach is that you can use a single subshell without changing global state.
By contrast, getting away with a single subshell with shopt -s extglob requires a bit of trickery:
(shopt -s extglob; glob='*.!(a|b)'; echo $glob)
Note the mandatory use of an intermediate variable, without which the command would break (because a literal glob would be expanded BEFORE executing the commands, at which point the extended globbing syntax is not yet recognized).
Caveat: Using GLOBIGNORE has an unexpected side effect (bug?):
If GLOBIGNORE is set - to whatever value - pathname expansion of * and *.* behaves as if shell option dotglob were in effect - even if it isn't.
In other words: If GLOBIGNORE is set, hidden files not explicitly exempted by a pattern in GLOBIGNORE are always matched by * and *.*.
dotglob is OFF by default, causing * NOT to include hidden files (if GLOBIGNORE is not set, which is true by default).
If you also wanted to exclude hidden files while using GLOBIGNORE, add the following pattern: .*; applied to the question, you'd get:
(GLOBIGNORE='*.a:*.b:.*'; rm *.*)
By contrast, using extended globbing after turning on the extglob shell option DOES respect the dotglob option.
You can enable extended glob in bash:
shopt -s extglob
Then you can use:
rm *.!(a|b)
To remove all files that end with *.* except the ones that end with *.a OR *.b
Update: (Thanks to #mklement0) Here is a way to localize setting extglob (without altering the global state) by doing this in a subshell and using an intermediate variable:
(shopt -s extglob; glob='*.!(a|b)'; rm $glob)
There are some shells that are capable of this (I think?), however, bash is not by default. If you are running bash on Cygwin, you can do this:
rm $(ls -1 | grep -v '.*\.a' | grep -v '.*\.b')
ls -1 (that's a one) list all files in current directory one per line.
grep -v '.*\.a' return all matches that don't end in .a
grep -v '.*\.b' return all matches that don't end in .b
Sometimes it's better to not insist on solving a problem a certain way. And for the general problem of "acting on certain files to be determined in some tricky way", find is probably the best all-around tool you'll find.
find . -type f -maxdepth 1 ! -name \*.[ab] -delete
Omit the -maxdepth 1 if you want to recurse into subdirectories.

Resources