Expand part of the path in bash script - bash

I am trying to list all files located in specific sub-directories of a directory in my bash script. Here is what I tried.
root_dir="/home/shaf/data/"
sub_dirs_prefixes=('ab' 'bp' 'cd' 'cn' 'll' 'mr' 'mb' 'nb' 'nh' 'nw' 'oh' 'st' 'wh')
ls "$root_dir"{$(IFS=,; echo "${sub_dirs_prefixes[*]}")}"rrc/"
Please note that I do not want to expand value stored in $root_dir as it may contain spaces but I do want to expand sub-path contained in {} which is a comma delimited string of contents of $sub_dirs_prefixes. I stored sub-directories prefixes in an array variable, $sub_dirs_prefixes , because I have to use them later on for something else.
I am getting following error:
ls: cannot access /home/shaf/data/{ab,bp,cd,cn,ll,mr,mb,nb,nh,nw,oh,st,wh}rrc/
If I copy the path in error message and run ls from command line, it does display contents of listed sub-directories.

You can command substitution to generate an extended pattern.
shopt -s extglob
ls "$root_dir"/$(IFS="|"; echo "#(${sub_dirs_prefixes[*]})rrc")
By the time parameter can command substitutions have completed, the shell sees this just before performing pathname expansion:
ls "/home/shaf/data/"/#(ab|bp|cd|cn|ll|mr|mb|nb|nh|nw|oh|st|wh)rrc
The #(...) pattern matches one of the enclosed prefixes.
It gets a little trickier if the components of the directory names contain characters that need to be quoted, since we aren't quoting the command substitution.

Related

How can I loop through a list of files, from directories specified in a variable?

I am trying to loop through the files in some directories, and performa an action on each file.
The list of directories is specified by a list of strings, stored as an environment variable
LIST_OF_DIRECTORIES=dir1 dir2 dir3
for dir in $LIST_OF_DIRECTORIES; do
for file in $dir/* ; do
echo $file
done
done
This results in nothing. I'm expecting all of the files within that directory to be echoed.
I am basing my logic off of Bash For-Loop on Directories, and trying to make this work for my use case.
You have to place strings with spaces around quotes otherwise each "word" will be interpreted separately. In your example, LIST_OF_DIRECTORIES=dir1 is executed (dir1 is indeed assigned LIST_OF_DIRECTORIES), but because it precedes a now interpreted simple command (dir2 dir3), it only lives temporarily for that command.
You should do either of these instead:
LIST_OF_DIRECTORIES="dir1 dir2 dir3"
LIST_OF_DIRECTORIES='dir1 dir2 dir3'
From Simple Command Expansion:
If no command name results, the variable assignments affect the
current shell environment. In the case of such a command (one that
consists only of assignment statements and redirections), assignment
statements are performed before redirections. Otherwise, the variables
are added to the environment of the executed command and do not affect
the current shell environment. If any of the assignments attempts to
assign a value to a readonly variable, an error occurs, and the
command exits with a non-zero status.
Also as a suggestion, use arrays for storing multiple entries instead and don't use word splitting unless your script doesn't use filename expansion and noglob is enabled with set -f or shopt -so noglob.
LIST_OF_DIRECTORIES=(dir1 dir2 dir3)
for dir in "${LIST_OF_DIRECTORIES[#]}"; do
Other References:
Quoting
Arrays
Filename Expansion
Word Splitting
This will work fine for you as.
LIST_OF_DIRECTORIES="dir1 dir2 dir3"
for dir in $LIST_OF_DIRECTORIES;
#add all the files to the files variable
do files=`ls $dir`;
for file in $files;
#Take action on your file here, I am just doing ls for my file here.
do echo `ls $dir/$file`;
done;
done

Have bash interpret double quotes stored in variable containing file path

I would like to capture a directory that contains spaces in a bash variable and pass this to the ls command without surrounding in double quotes the variable deference. Following are two examples that illustrate the problem. Example 1 works but it involves typing double quotes. Example 2 does not work, but I wish it did because then I could avoid typing the double quotes.
Example 1, with quotes surrounding variable, as in the solution to How to add path with space in Bash variable, which does not solve the problem:
[user#machine]$ myfolder=/home/username/myfolder\ with\ spaces/
[user#machine]$ ls "$myfolder"
file1.txt file2.txt file3.txt
Example 2, with quotes part of variable, which also does not solve the problem. According to my understanding, in this example, the first quote character sent to the ls command before the error is thrown:
[user#machine]$ myfolder=\"/home/username/myfolder\ with\ spaces/\"
[user#machine]$ ls $myfolder
ls: cannot access '"/home/username/myfolder': No such file or directory
In example 2, the error message indicates that the first double quote was sent to the ls command, but I want these quotes to be interpreted by bash, not ls. Is there a way I can change the myfolder variable so that the second line behaves exactly as the following:
[user#machine]$ ls "/home/username/myfolder with spaces/"
The goal is to craft the myfolder variable in such a way that (1) it does not need to be surrounded by any characters and (2) the ls command will list the contents of the existing directory that it represents.
The motivation is to have an efficient shorthand to pass long directory paths containing spaces to executables on the command line with as few characters as possible - so without double quotes if that is possible.
Assuming some 'extra' characters prior to the ls command is acceptable:
$ mkdir /tmp/'myfolder with spaces'
$ touch /tmp/'myfolder with spaces'/myfile.txt
$ myfolder='/tmp/myfolder with spaces'
$ myfolder=${myfolder// /?} # replace spaces with literal '?'
$ typeset -p myfolder
declare -- myfolder="/tmp/myfolder?with?spaces"
$ set -xv
$ ls $myfolder
+ ls '/tmp/myfolder with spaces'
myfile.txt
Here's a fiddle
Granted, the ? is going to match on any single character but how likely is it that you'll have multiple directories/files with similar names where the only difference is a space vs a non-space?

Add string to each member of variable in Bash

I have the following command to gather all files in a folder and concatenate them....but what is held in the variable is only the file names and not the directory. How can I add 'colid-data/' to each of the files for cat to us?
cat $(ls -t colid-data) > catfiles.txt
List the filenames, not the directory.
cat $(ls -t colid-data/*) > catfiles.txt
Note that this will not work if any of the filenames contain whitespace. See Why not parse ls? for better alternatives.
If you want to concatenate them in date order, consider using zsh:
cat colid-data/*(.om) >catfiles.txt
That would concatenate all regular files only, in order of most recently modified first.
From bash, you could do this with
zsh -c 'cat colid-data/*(.om)' >catfiles.txt
If the ordering of the files is not important (and if there's only regular files in the directory, no subdirectories), just use
cat colid-data/* >catfiles.txt
All of these variations would work with filenames containing spaces, tabs and newlines, since the list of pathnames returned by a filename globbing pattern is not split into further words (which the result of an unquoted command substitution is).

How do I only list specific files with 'ls' in bash?

I was wondering how I can list files with ls in bash that will only list a specific subset of files?
For example, I have a folder with 10000 files, some of which are named:
temp_cc1_covmat and temp_cc1_slurm, but the values of 1 range from 1-1000.
So how would I list only say, temp_cc400_slurm-temp_cc_499_slurm?
I want to do this as I would like to queue files on a supercomputer that only ends with slurm. I could do sbatch *_slurm but there are also a lot of other files in the folder that ends with _slurm.
You can use this Brace Expansion in bash:
temp_cc{400..499}_slurm
To list these file use:
echo temp_cc{400..499}_slurm
or:
printf "%s\n" temp_cc{400..499}_slurm
or even ls:
ls temp_cc{400..499}_slurm
Using the ? wildcard:
$ ls temp_cc4??_slurm
man 7 glob:
Wildcard matching
A string is a wildcard pattern if it contains one of the characters
'?', '*' or '['. Globbing is the operation that expands a wildcard
pattern into the list of pathnames matching the pattern. Matching is
defined by:
A '?' (not between brackets) matches any single character.
The argument list too long error applies using the ? also. I tested with ls test????? and it worked but with ls test[12]????? I got the error. (Yes, you could ls temp_cc4[0-9][0-9]_slurm also.)

how to address files by their suffix

I am trying to copy a .nii file (Gabor3.nii) path to a variable but even though the file is found by the find command, I can't copy the path to the variable.
find . -type f -name "*.nii"
Data= '/$PWD/"*.nii"'
output:
./Gabor3.nii
./hello.sh: line 21: /$PWD/"*.nii": No such file or directory
What went wrong
You show that you're using:
Data= '/$PWD/"*.nii"'
The space means that the Data= parts sets an environment variable $Data to an empty string, and then attempts to run '/$PWD/"*.nii"'. The single quotes mean that what is between them is not expanded, and you don't have a directory /$PWD (that's a directory name of $, P, W, D in the root directory), so the script "*.nii" isn't found in it, hence the error message.
Using arrays
OK; that's what's wrong. What's right?
You have a couple of options. The most reliable is to use an array assignment and shell expansion:
Data=( "$PWD"/*.nii )
The parentheses (note the absence of spaces before the ( — that's crucial) makes it an array assignment. Using shell globbing gives a list of names, preserving spaces etc in the names correctly. Using double quotes around "$PWD" ensures that the expansion is correct even if there are spaces in the current directory name.
You can find out how many files there are in the list with:
echo "${#Data[#]}"
You can iterate over the list of file names with:
for file in "${Data[#]}"
do
echo "File is [$file]"
ls -l "$file"
done
Note that variable references must be in double quotes for names with spaces to work correctly. The "${Data[#]}" notation has parallels with "$#", which also preserves spaces in the arguments to the command. There is a "${Data[*]}" variant which behaves analogously to "$*", and is of similarly limited value.
If you're worried that there might not be any files with the extension, then use shopt -s nullglob to expand the globbing expression into an empty list rather than the unexpanded expression which is the historical default. You can unset the option with shopt -u nullglob if necessary.
Alternatives
Alternatives involve things like using command substitution Data=$(ls "$PWD"/*.nii), but this is vastly inferior to using an array unless neither the path in $PWD nor the file names contain any spaces, tabs, newlines. If there is no white space in the names, it works OK; you can iterate over:
for file in $Data
do
echo "No white space [$file]"
ls -l "$file"
done
but this is altogether less satisfactory if there are (or might be) any white space characters around.
You can use command substitution:
Data=$(find . -type f -name "*.nii" -print -quit)
To prevent multiline output, the -quit option stop searching after the first file was found(unless you're sure only one file will be found or you want to process multiple files).
The syntax to do what you seem to be trying to do with:
Data= '/$PWD/"*.nii"'
would be:
Data="$(ls "$PWD"/*.nii)"
Not saying it's the best approach for whatever you want to do next of course, it's probably not...

Resources