Bash: Correct way to store result of command in array [duplicate] - bash

This question already has answers here:
How can I store the "find" command results as an array in Bash
(8 answers)
Closed 4 years ago.
How do I put the result of find $1 into an array?
In for loop:
for /f "delims=/" %%G in ('find $1') do %%G | cut -d\/ -f6-

I want to cry.
In bash:
file_list=()
while IFS= read -d $'\0' -r file ; do
file_list=("${file_list[#]}" "$file")
done < <(find "$1" -print0)
echo "${file_list[#]}"
file_list is now an array containing the results of find "$1
What's special about "field 6"? It's not clear what you were attempting to do with your cut command.
Do you want to cut each file after the 6th directory?
for file in "${file_list[#]}" ; do
echo "$file" | cut -d/ -f6-
done
But why "field 6"? Can I presume that you actually want to return just the last element of the path?
for file in "${file_list[#]}" ; do
echo "${file##*/}"
done
Or even
echo "${file_list[#]##*/}"
Which will give you the last path element for each path in the array. You could even do something with the result
for file in "${file_list[#]##*/}" ; do
echo "$file"
done
Explanation of the bash program elements:
(One should probably use the builtin readarray instead)
find "$1" -print0
Find stuff and 'print the full file name on the standard output, followed by a null character'. This is important as we will split that output by the null character later.
<(find "$1" -print0)
"Process Substitution" : The output of the find subprocess is read in via a FIFO (i.e. the output of the find subprocess behaves like a file here)
while ...
done < <(find "$1" -print0)
The output of the find subprocess is read by the while command via <
IFS= read -d $'\0' -r file
This is the while condition:
read
Read one line of input (from the find command). Returnvalue of read is 0 unless EOF is encountered, at which point while exits.
-d $'\0'
...taking as delimiter the null character (see QUOTING in bash manpage). Which is done because we used the null character using -print0 earlier.
-r
backslash is not considered an escape character as it may be part of the filename
file
Result (first word actually, which is unique here) is put into variable file
IFS=
The command is run with IFS, the special variable which contains the characters on which read splits input into words unset. Because we don't want to split.
And inside the loop:
file_list=("${file_list[#]}" "$file")
Inside the loop, the file_list array is just grown by $file, suitably quoted.

arrayname=( $(find $1) )
I don't understand your loop question? If you look how to work with that array then in bash you can loop through all array elements like this:
for element in $(seq 0 $((${#arrayname[#]} - 1)))
do
echo "${arrayname[$element]}"
done

This is probably not 100% foolproof, but it will probably work 99% of the time (I used the GNU utilities; the BSD utilities won't work without modifications; also, this was done using an ext4 filesystem):
declare -a BASH_ARRAY_VARIABLE=$(find <path> <other options> -print0 | sed -e 's/\x0$//' | awk -F'\0' 'BEGIN { printf "("; } { for (i = 1; i <= NF; i++) { printf "%c"gensub(/"/, "\\\\\"", "g", $i)"%c ", 34, 34; } } END { printf ")"; }')
Then you would iterate over it like so:
for FIND_PATH in "${BASH_ARRAY_VARIABLE[#]}"; do echo "$FIND_PATH"; done
Make sure to enclose $FIND_PATH inside double-quotes when working with the path.

Here's a simpler pipeless version, based on the version of user2618594
declare -a names=$(echo "("; find <path> <other options> -printf '"%p" '; echo ")")
for nm in "${names[#]}"
do
echo "$nm"
done

To loop through a find, you can simply use find:
for file in "`find "$1"`"; do
echo "$file" | cut -d/ -f6-
done
It was what I got from your question.

Related

How to only concatenate files with same identifier using bash script?

I have a directory with files, some have the same ID, which is given in the first part of the file name before the first underscore (always). e.g.:
S100_R1.txt
S100_R2.txt
S111_1_R1.txt
S111_R1.txt
S111_R2.txt
S333_R1.txt
I want to concatenate those identical IDs (and if possible placing the original files in another dir, e.g. output:
original files (folder)
S100_merged.txt
S111_merged.txt
S333_R1.txt
Small note: I imaging that perhaps a solution would be to place all files which will be processed by the code in a new directory and than in a second step move the files with the appended "merged" back to the original dir or something like this...
I am extremely new to bash scripting, so I really can't produce this code. I am use to R language and I can think how it should be but can't write it.
My pitiful attempt is something like this:
while IFS= read -r -d '' id; do
cat *"$id" > "./${id%.txt}_grouped.txt"
done < <(printf '%s\0' *.txt | cut -zd_ -f1- | sort -uz)
or this:
for ((k=100;k<400;k=k+1));
do
IDList= echo "S${k}_S*.txt" | awk -F'[_.]' '{$1}'
while [ IDList${k} == IDList${k+n} ]; do
cat IDList${k}_S*.txt IDList${k+n}_S*.txt S${k}_S*.txt S${k}_S*.txt >cat/S${k}_merged.txt &;
done
Sometimes there are only one version of the file (e.g. S333_R1.txt) sometime two (S100*), three (S111*) or more of the same.
I am prepared for harsh critique for this question because I am so far from a solution, but if someone would be willing to help me out I would greatly appreciate it!
while read $fil;
do
if [[ "$(find . -maxdepth 1 -name $line"_*.txt" | wc -l)" -gt "1" ]]
then
cat $line_*.txt >> "$line_merged.txt"
fi
done <<< "$(for i in *_*.txt;do echo $i;done | awk -F_ '{ print $1 }')"
Search for files with _.txt and run the output into awk, printing the strings before "_". Run this through a while loop. Check if the number of files for each prefix pattern is greater than 1 using find and if it is, cat the files with that prefix pattern into a merged file.
for id in $(ls | grep -Po '^[^_]+' | uniq) ; do
if [ $(ls ${id}_*.txt 2> /dev/null | wc -l) -gt 1 ] ; then
cat ${id}_*.txt > _${id}_merged.txt
mv ${id}_*.txt folder
fi
done
for f in _*_merged.txt ; do
mv ${f} ${f:1}
done
A plain bash loop with preprocessing:
# first get the list of files
find . -type f |
# then extract the prefix
sed 's#./\([^_]*\)_#\1\t&#' |
# then in a loop merge the files
while IFS=$'\t' read prefix file; do
cat "$file" >> "${prefix}_merged.txt"
done
That script is iterative - one file at a time. To detect if there is one file of specific prefix, we have to look at all files at a time. So first an awk script to join list of filenames with common prefix:
find . -type f | # maybe `sort |` ?
# join filenames with common prefix
awk '{
f=$0; # remember the file path
gsub(/.*\//,"");gsub(/_.*/,""); # extract prefix from filepath and store it in $0
a[$0]=a[$0]" "f # Join path with leading space in associative array indexed with prefix
}
# Output prefix and filanames separated by spaces.
# TBH a tab would be a better separator..
END{for (i in a) print i a[i]}
' |
# Read input separated by spaces into a bash array
while IFS=' ' read -ra files; do
#first array element is the prefix
prefix=${files[0]}
unset files[0]
# rest is the files
case "${#files[#]}" in
0) echo super error; ;;
# one file - preserve the filename
1) cat "${files[#]}" > "$outdir"/"${files[1]}"; ;;
# more files - do a _merged.txt suffix
*) cat "${files[#]}" > "$outdir"/"${prefix}_merged.txt"; ;;
esac
done
Tested on repl.
IDList= echo "S${k}_S*.txt"
Executes the command echo with the environment variable IDList exported and set to empty with one argument equal to S<insert value of k here>_S*.txt.
Filename expansion (ie. * -> list of files) is not executed inside " double quotes.
To assign a result of execution into a variable, use command substitution var=$( something seomthing | seomthing )
IDList${k+n}_S*.txt
The ${var+pattern} is a variable expansion that does not add two variables together. It uses pattern when var is set and does nothing when var is unset. See shell parameter expansion and this my answer on ${var-pattern}, but it's similar.
To add two numbers use arithemtic expansion $((k + n)).
awk -F'[_.]' '{$1}'
$1 is just invalid here. To print a line, print it {print %1}.
Remember to check your scripts with http://shellcheck.net
A pure bash way below. It uses only globs (no need for external commands like ls or find for this question) to enumerate filenames and an associative array (which is supported by bash since the version 4.0) in order to compute frequencies of ids. Parsing ls output to list files is questionable in bash. You may consider reading ParsingLs.
#!/bin/bash
backupdir=original_files # The directory to move the original files
declare -A count # Associative array to hold id counts
# If it is assumed that the backup directory exists prior to call, then
# drop the line below
mkdir "$backupdir" || exit
for file in [^_]*_*; do ((++count[${file%%_*}])); done
for id in "${!count[#]}"; do
if ((count[$id] > 1)); then
mv "$id"_* "$backupdir"
cat "$backupdir/$id"_* > "$id"_merged.txt
fi
done

How can I save only a substring of file names from a directory without the file extension?

I have a directory that I'm reading from and I want to save only the date representation as a string.
I am close to getting it , although I know there is probably an easier way. Here is what I have so far:
#files are in the format of "THIS_20200420.csv" so I want only "20200420"
declare -a arr
declare -a arr2
FILES=test2/*.csv
for file in $FILES
do
arr=(${arr[*]} "${file##*/}")
done
for i in "${arr[#]}"
do
arr2+=$(echo $i | cut -c6-13)
done
for item in "${arr2[#]}"
do
echo $item
done
the output shows the array only having one element which is all the strings concatenated:
20200110202001202020021920200220202004202020042220200110202001202020021920200220202004202020042220200219202002202020042020200422
Im bashing my head against my computer at this point.
arr=(
"THIS_20200420.csv"
"THIS_20200421.csv"
"THIS_20200422.csv"
"THIS_20200423.csv"
"THIS_20200424.csv"
"THIS_20200425.csv"
"THIS_20200426.csv"
"THIS_20200427.csv"
"THIS_20200428.csv"
"THIS_20200429.csv"
"THIS_20200430.csv" )
arr=( ${arr[#]//*_} )
arr=( ${arr[#]//.*} )
echo "arr: ${arr[#]}"
Explanation:
arr=( ${arr[#]//*_} ) will match all char up to '_' for each element, and replace them with empty string.
arr=( ${arr[#]//.*} ) will match all char after '.' for each element, and replace them with empty string.
For more information on parameter expansion, a good reference is TLDP's guide on parameter expansion.
Try this
declare -a arrayname=($(ls -1 test2/*.csv | grep -o '[0-9]*'))
Demo:
$ls -1 *csv
THIS_20200420.csv
THIS_20200421.csv
THIS_20200422.csv
THIS_20200423.csv
THIS_20200424.csv
THIS_20200425.csv
THIS_20200426.csv
THIS_20200427.csv
THIS_20200428.csv
THIS_20200429.csv
THIS_20200430.csv
$declare -a arrayname=($(ls -1 *csv | grep -o '[0-9]*'))
$echo ${arrayname[#]}
20200420 20200421 20200422 20200423 20200424 20200425 20200426 20200427 20200428 20200429 20200430
$echo ${arrayname[2]}
20200422
$
You could achieve this using a loop with awk:
$ for file in *.csv; do echo $file | awk -F '[^[:alnum:]]' '{print $2}'; done
The -F '[^[:alnum:]]' tells awk to use non alphanumeric characters as the delimiter.
Another way to do this is to use bash shell parameter expansion to echo only the part of the filename you want. This obviously only works if your filenames have consistent formatting:
$ for file in *.csv; do echo "${file:5:8}"; done
I thought it would be nice to use bash parameter expansion to strip the unwanted prefix and suffix but you can't have nested expansion (afaict) so this is the best I could come up with:
$ for file in *.csv; do echo "$(tmp=${file%.csv}; echo ${tmp#THIS_})"; done
Meet Cut! A good friend of Linux Users
for file in ./*.csv; do echo $file | cut -d "_" -f 2 | cut -d "." -f 1 ; done
This one line should do the trick!
Example:
Use an array for the files assignment and parameter expansion.
#!/usr/bin/env bash
shopt -s nullglob
##: Save the files ending in *.csv in an array
## so it expands properly, variable assignment does not expand the glob *
files=(test2/*.csv)
##: Remain only the files that end with .csv without the pathname, longest match
files=("${files[#]##*/}")
##: Remain only the file names without the .csv extention
files=("${files[#]%.csv}")
##: Remain only the filename after the _ from the beginning, shortest match.
files=("${files[#]#*_}")
printf '%s ' "${files[#]}"

Bash shell script to find missing files from filename

I have a folder that should contain 1485 files, named PA0001.png, PA0002.png ... up to PA1485.png
Some of them are missing and I'd like to write a shell script able to identify the missing ones and print them, as a list, in a .txt file (preferably without the leading string PA and the .png extension, but with the leading zeroes, if any)
I have no clue on how to proceed though, maybe using awk? But I'm still quite of a noob... Any help would be much appreciated!
You can get the list of the sequence number of missing files using bash loop
# Redirect output, per answer
exec > file.txt
for ((i=1 ; i<=1485 ; i++)) ; do
# Convert to 4 digit zero padded
printf -v id '%04d' $i
if [ ! -f "PA$id.png" ] ; then
echo $id
fi
done
Here's a slight refactoring of the existing answer, with explanations in the comments.
# Assign each number in the sequence to i; loop until we have done them all
for ((i=1 ; i<=1485 ; i++)) ; do
# Format the number with padding for the file name part
printf -v id '%04d' "$i"
# If a file with this name does not exist,
if [ ! -f "PA$id.png" ] ; then
# Print it to standard output
echo "$id"
fi
# Redirect the loop's standard output to a file
done >missing.txt
You can do exactly this without a single Bash loop:
#!/usr/bin/env bash
{
find . \
-maxdepth 1 \
-regextype posix-extended \
-regex '.*/([[:digit:]]){4}\.png' \
-printf '%f\n'
printf '%04d.png\n' {1..1485}
} | sort | uniq --unique
It combines the list of files with the list of expected files;
then sort and print the unique entries that are those that are only in the printed expected list, so are missing files.

bash for loop with same order as GNU "ls -v" ("version-number" sort)

In a bash script I want to do a typical "for file in somedir" but I want the files to be processed in the same order that "ls -v" returns them. I know the downfalls of using "ls" as a function. Is there some way to replicate "-v" without using "ls"? Thanks.
Assuming that this is "version number" sort order, this is also implemented by GNU sort. Thus, on a GNU platform:
somedir=/foo
while IFS= read -r -d '' filename; do
printf 'Processing file: %q\n' "$filename"
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
If you really want to use a for loop rather than a while loop, parse into an array and iterate over that:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
for filename in "${files[#]}"; do
printf 'Processing file: %q\n' "$filename"
done
To explain some of the magic above:
In < <(...), <(...) is a process substitution. It's replaced with a filename which, when read from, will return the output of the code enclosed. Thus, < <(...) will put that process substitution's output as the input to the while read loop. This loop form is described in BashFAQ #1. The reasons to use this kind of redirection instead of piping into the loop are given in BashFAQ #24.
set -- "$somedir"/* replaces the argument list within the current context (that context being the subshell running the process substitution!) with the results of "$somedir"/*; thus, (non-hidden, by default) contents of the directory named in the variable somedir.
[[ -e $1 || -L $1 ]] is true only if that glob expanded to at least one item; if it remained * (and no actual filesystem object exists by that name), gating output on this condition prevents the process substitution from emitting any output.
sort -z tells sort to delimit elements in both input and output with NULs -- a character that isn't allowed to exist in filenames.

How to loop through a list in shell?

Suppose I have a command which outputs a list of strings
string1
string2
string3
.
.
stringN
How can I loop through the output of the list in a shell?
For example:
myVal=myCmd
for val in myVal
do
# do some stuff
end
Use a bash while-loop, the loop can be done over a command or an input file.
while IFS= read -r string
do
some_stuff to do
done < <(command_that_produces_string)
With an example, I have a sample file with contents as
$ cat file
My
name
is not
relevant
here
I have modified the script to echo the line as it reads through the file
$ cat script.sh
#!/bin/bash
while IFS= read -r string
do
echo "$string"
done < file
produces an o/p when run as ./script.sh
My
name
is not
relevant
here
The same can also be done over a bash-command, where we adopt process-substitution (<()) to run the command on the sub-shell.
#!/bin/bash
while IFS= read -r -d '' file; do
echo "$file"
done < <(find . -maxdepth 1 -mindepth 1 -name "*.txt" -type f -print0)
The above simple find lists all files from the current directory (including ones with spaces/special-characters). Here, the output of find command is fed to stdin which is parsed by while-loop.
You are very close, can't tell if it is just cut and paste/typos that are causing the issue - note the quotes on line 1 and the $ in line 2.
myVal=`echo "a b c"`
for val in $myVal
do
echo "$val"
done

Resources