Bash - Convert comma separated list - bash

I'm trying to write a script that works on a list of files. If I want to work on all files in the same directory then
FILES=*.ext
for i in $FILES; do
something on "$i"
done
works OK, the problem comes when I want to work on just a selection of files and not everything. How do I convert a comma separated list of files, which may or may not contain spaces into the same format, so that I can store it in $FILES and use the same code?
Many thanks, in advance
David Shaw

The correct thing to do is not use a delimited list of filenames but use an array (and avoid uppercase variable names), this will avoid the problem of filenames containing your separator (e.g. ,) and is the idiomatic approach:
files=( *foo*.ext1 *bar*.ext2 file1 "file 2 with a space" )
for file in "${files[#]}"; do
[ -e "${file}" ] || continue
do_something_with "${file}"
done
Unless you have no control over how $files is populated this is what you want, if your script gets fed a comma-separated list and you absolutely cannot avoid it then you can set IFS accordingly as in #BroSlow's answer.
Since globbing does the right thing when expanding filenames with spaces are not a problem (not even in your example).
You might also want to check extended globbing (extglob) to be able to match more specifically.

If I am interpreting your question correctly you can just the internal field separator (IFS) in bash to comma and then have word-splitting take care of the rest, e.g.
#!/bin/bash
FILES="file1,file2 with space,file3,file4 with space"
IFS=','
for i in $FILES; do
echo "File = [$i]"
done
Which would output
File = [file1]
File = [file2 with space]
File = [file3]
File = [file4 with space]
Note, as Adrian Frühwirth pointed out in comments, this will fail if the filenames can contain commas.

Related

How to remove all file extensions in bash?

x=./gandalf.tar.gz
noext=${x%.*}
echo $noext
This prints ./gandalf.tar, but I need just ./gandalf.
I might have even files like ./gandalf.tar.a.b.c which have many more extensions.
I just need the part before the first .
If you want to give sed a chance then:
x='./gandalf.tar.a.b.c'
sed -E 's~(.)\..*~\1~g' <<< "$x"
./gandalf
Or 2 step process in bash:
x="${s#./}"
echo "./${x%%.*}"
./gandalf
Using extglob shell option of bash:
shopt -s extglob
x=./gandalf.tar.a.b.c
noext=${x%%.*([!/])}
echo "$noext"
This deletes the substring not containing a / character, after and including the first . character. Also works for x=/pq.12/r/gandalf.tar.a.b.c
Perhaps a regexp is the best way to go if your bash version supports it, as it doesn't fork new processes.
This regexp works with any prefix path and takes into account files with a dot as first char in the name (hidden files):
[[ "$x" =~ ^(.*/|)(.[^.]*).*$ ]] && \
noext="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Regexp explained
The first group captures everything up to the last / included (regexp are greedy in bash), or nothing if there are no / in the string.
Then the second group captures everything up to the first ., excluded.
The rest of the string is not captured, as we want to get rid of it.
Finally, we concatenate the path and the stripped name.
Note
It's not clear what you want to do with files beginning with a . (hidden files). I modified the regexp to preserve that . if present, as it seemed the most reasonable thing to do. E.g.
x="/foo/bar/.myinitfile.sh"
becomes /foo/bar/.myinitfile.
If performance is not an issue, for instance something like this:
fil=$(basename "$x")
noext="$(dirname "$x")"/${fil%%.*}

Add substring to a string in bash

I have the following array:
SPECIFIC_FILES=('resources/logo.png' 'resources/splash.png' 'www/img/logo.png' 'www/manifest.json')
And the following variable:
CUSTOMER=default
How can I loop through my array and generate strings that would look like
resources/logo_default.png
depending on the variable.
The below uses parameter expansion to extract the relevant substrings, as also described in BashFAQ #100:
specific_files=('resources/logo.png' 'resources/splash.png' 'www/img/logo.png' 'www/manifest.json')
customer=default
for file in "${specific_files[#]}"; do
[[ $file = *.* ]] || continue # skip files without extensions
prefix=${file%.*} # trim everything including and after last "."
suffix=${file##*.} # trim everything up to and including last "."
printf '%s\n' "${prefix}_$customer.$suffix" # concatenate results of those operations
done
Lower-case variable names are used here in keeping with POSIX-specified conventions (all-caps names are used for variables meaningful to the operating system or shell, whereas variables with at least one lower-case character are reserved for application use; setting a regular shell variable overwrites any like-named environment variable, so the conventions apply to both classes).
Here's a solution with sed:
for f in "${SPECIFIC_FILES[#]}"; do
echo "$f" | sed "s/\(.*\)\.\([^.]*\)/\1_${CUSTOMER}.\2/p"
done
If you know that there is only one period per filename, you can use expansion on each element directly:
$ printf '%s\n' "${SPECIFIC_FILES[#]/./_"$CUSTOMER".}"
resources/logo_default.png
resources/splash_default.png
www/img/logo_default.png
www/manifest_default.json
If you don't, Charles' answer is the robust one covering all cases.

bash: why "for" have different behavior when reading an array "${files[#]}"

I want to read a text file line by line, here are two methods:
this method work fine, and read the text line by line, I know it using the read command
readarray -t files < c.txt
for i in "${files[#]}"; do
printf '%s\n' "$i" >> 2.txt
read -p 'press enter to continue'
done
but this method read all the file in one time and not line by line!
for i in "$(<c.txt)"; do
printf '%s\n' "$i" >> 1.txt
read -p 'press enter to continue'
done
if I remove the double quotes in "$(<c.txt)" and use IFS=$'\n' and set -f it read the text line by line as expected.
the question:
why when I used "${files[#]}" it reads line by line , why for have different behavior?
text file used in this exemples:
$ cat c.txt
this is a test
)=_ç)çà)èç(-è_-'é²"²2°4.²&é (§/%Mµ%µ¨£¨P£
trailing space
tab                
#comment
*
echo test    
The problem with using for is that it doesn't know anything about lines -- for expects to be given a list of "words" to iterate over. If you use for i in "$(<c.txt)", the double-quotes tell the shell not to do any parsing on the contents of the file, so the entire contents will be treated as a single word. On the other hand, if you leave off the double-quotes (for i in $(<c.txt)), the shell will split the file's contents into "words" separated by whitespace (by default that means spaces, tabs, and newlines), and then tries to expand any of those words that include wildcards into a list of matching filenames. You can adjust the shell options to make this split-and-expand process work more like what you want, but fundamentally it's meant for doing something else and at best it'll be a kluge.
If you want to read lines from a file, use something that's meant for reading lines from a file. read and readarray are both intended for this purpose. The have their own quirks and such that you may need to work around, but they at least start in the right area; for really doesn't. BTW, the BashFAQ entry you linked has a perfectly good workaround for the problem with ffmpeg etc:
while read -r line <&3; do
...
done 3<file
The readline approach you gave will also work fine, but isn't as portable. What readline does is read the entire file into a shell array, with each line as a separate array element. Then "${files[#]}" tells the shell to expand the array contents, with each array element treated as a separate word. Thus lines become array elements, which become "words", which are the things for iterates over, and you get your expected result.

Multiple elements instead of one in bash script for loop

I have been following the answers given in these questions
Shellscript Looping Through All Files in a Folder
How to iterate over files in a directory with Bash?
to write a bash script which goes over files inside a folder and processes them. So, here is the code I have:
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in "$INFOLDER$YEAR*.mdb";
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done
I am receiving this error: basename: extra operand.
I added echo $f and I realized that f contains all the filenames separated by space. But I expected to get one at a time. What could be the problem here?
You're running into problems with quoting. In the shell, double-quotes prevent word splitting and wildcard expansion; generally, you don't want these things to happen to variable's values, so you should double-quote variable references. But when you have something that should be word-split or wildcard-expanded, it cannot be double-quoted. In your for statement, you have the entire file pattern in double-quotes:
for f in "$INFOLDER$YEAR*.mdb";
...which prevents word-splitting and wildcard expansion on the variables' values (good) but also prevents it on the * which you need expanded (that's the point of the loop). So you need to quote selectively, with the variables inside quotes and the wildcard outside them:
for f in "$INFOLDER$YEAR"*.mdb;
And then inside the loop, you should double-quote the references to $f in case any filenames contain whitespace or wildcards (which are completely legal in filenames):
echo "$f"
absname="$INFOLDER$YEAR$(basename "$f")"
(Note: the double-quotes around the assignment to absname aren't actually needed -- the right side of an assignment is one of the few places in the shell where it's safe to skip them -- but IMO it's easier and safer to just double-quote all variable references and $( ) expressions than to try to keep track of where it's safe and where it's not.)
Just quote your shell variables if they are supposed to contain strings with spaces in between.
basename "$f"
Not doing so will lead to splitting of the string into separate characters (see WordSplitting in bash), thereby messing up the basename command which expects one string argument rather than multiple.
Also it would be a wise to include the * outside the double-quotes as shell globbing wouldn't work inside them (single or double-quote).
#!/bin/bash
# good practice to lower-case variable names to distinguish them from
# shell environment variables
year="2002/"
in_folder="/local/data/datasets/Convergence/"
for file in "${in_folder}${year}"*.mdb; do
# break the loop gracefully if no files are found
[ -e "$file" ] || continue
echo "$file"
# Worth noting here, the $file returns the name of the file
# with absolute path just as below. You don't need to
# construct in manually
absname=${in_folder}${year}$(basename "$file")
done
just remove "" from this line
for f in "$INFOLDER$YEAR*.mdb";
so it looks like this
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in $INFOLDER$YEAR*.mdb;
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done

Why does echo "$out" split output onto multiple lines, if quotes suppress word-splitting?

I have very simple directory with "directory1" and "file2" in it.
After
out=`ls`
I want to print my variable: echo $out gives:
directory1 file2
but echo "$out" gives:
directory1
file2
so using quotes gives me output with each record on separate line. As we know ls command prints output using single line for all files/dirs (if line is big enough to contain output) so I expected that using double quotes prevents my shell from splitting words to separate lines while ommitting quotes would split them.
Pls tell me: why using quotes (used for prevent word-splitting) suddenly splits output ?
On Behavior Of ls
ls only prints multiple filenames on a single line by default when output is to a TTY. When output is to a pipeline, a file, or similar, then the default is to print one line to a file.
Quoting from the POSIX standard for ls, with emphasis added:
The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined.
Literal Question (Re: Quoting)
It's the very act of splitting your command into separate arguments that causes it to be put on one line! Natively, your value spans multiple lines, so echoing it unmodified (without any splitting) prints it precisely that manner.
The result of your command is something like:
out='directory1
file2'
When you run echo "$out", that exact content is printed. When you run echo $out, by contrast, the behavior is akin to:
echo "directory1" "file2"
...in that the string is split into two elements, each passed as completely different argument to echo, for echo to deal with as it sees fit -- in this case, printing both those arguments on the same line.
On Side Effects Of Word Splitting
Word-splitting may look like it does what you want here, but that's often not the case! Consider some particular issues:
Word-splitting expands glob expressions: If a filename contains a * surrounded by whitespace, that * will be replaced with a list of files in the current directory, leading to duplicate results.
Word-splitting doesn't honor quotes or escaping: If a filename contains whitespace, that internal whitespace can't be distinguished from whitespace separating multiple names. This is closely related to the issues described in BashFAQ #50.
On Reading Directories
See Why you shouldn't parse the output of ls. In short -- in your example of out=`ls`, the out variable (being a string) isn't able to store all possible filenames in a useful, parsable manner.
Consider, for instance, a file created as such:
touch $'hello\nworld"three words here"'
...that filename contains spaces and newlines, and word-splitting won't correctly detect it as a single name in the output from ls. However, you can store and process it in an array:
# create an array of filenames
names=( * )
if ! [[ -e $names || -L $names ]]; then # this tests only the FIRST name
echo "No names matched" >&2 # ...but that's good enough.
else
echo "Found ${#files[#]} files" # print number of filenames
printf '- %q\n' "${names[#]}"
fi

Resources