Concatenating multiple text files inside a directory (including sub directories)into a single file in Bash? - bash

I have a script which runs hourly and i am storing failure data as follows:
2017/10/09/00/RetryFailure.txt
2017/10/09/01/RetryFailure.txt
2017/10/09/02/RetryFailure.txt ...
where 10 is the month, 09 is the day and 00,01,02 are hours. Now at the end of the day i want to concatenate all(24) RetryFailure.txt into one file say RetryFailure10.txt.
Can anyone tell me the command to do so?

You can use this find for aggregating all the files of same date:
find . -name 'RetryFailure.txt' -exec bash -c \
'IFS=/ read -ra arr <<< "$1"; cat "$1" >> "RetryFailure${arr[2]}.txt"' - {} \;
For better performance use a loop with process substitution:
while IFS= read -rd '' file; do
IFS=/ read -ra arr <<< "$file"
cat "$file" >> "RetryFailure${arr[2]}.txt"
done < <(find . -name 'RetryFailure.txt' -print0)
Using find we find each RetryFailure.txt file
Using read -ra and IFS=/ we split each entry by / and populate a shell array
2nd element of array is data number
Using cat ... command we redirect each file into a new file using ${arr[2]}

cat 2017/10/*/RetryFailure.txt > concat_file
or more restrictive
cat 2017/10/{00..23}/RetryFailure.txt > concat_file

Short find + cat approach:
find . -type f -name RetryFailure.txt -exec cat {} + > RetryFailure_merged.txt

Related

For Loop: Identify Filename Pairs, Input to For Loop

I am attempting to adapt a previously answered question for use in a for loop.
I have a folder containing multiple paired file names that need to be provided sequentially as input to a for loop.
Example Input
WT1_0min-SRR9929263_1.fastq
WT1_0min-SRR9929263_2.fastq
WT1_20min-SRR9929265_1.fastq
WT1_20min-SRR9929265_2.fastq
WT3_20min-SRR12062597_1.fastq
WT3_20min-SRR12062597_2.fastq
Paired file names can be identified with the answer from the previous question:
find . -name '*_1.fastq' -exec basename {} '_1.fastq' \; | xargs -n1 -I{} echo {}_1.fastq {}_2.fastq
I now want adopt this for use in a for loop so that each output file can be independently piped to subsequent commands and also so that output file names can be appended.
Input files can be provided as a comma-separated list of files after the -1 and -2 flags respectively. So for this example, the bulk and undesired input would be:
-1 WT1_0min-SRR9929263_1.fastq,WT1_20min-SRR9929265_1.fastq,WT3_20min-SRR12062597_1.fastq
-2 WT1_0min-SRR9929263_2.fastq,WT1_20min-SRR9929265_2.fastq,WT3_20min-SRR12062597_2.fastq
However, I would like to run this as a for loop so that input files are provided sequentially:
Iteration #1
-1 WT1_0min-SRR9929263_1.fastq
-2 WT1_0min-SRR9929263_2.fastq
Iteration #2
-1 WT1_20min-SRR9929265_1.fastq
-2 WT1_20min-SRR9929265_2.fastq
Iteration #3
-1 WT3_20min-SRR12062597_1.fastq
-2 WT3_20min-SRR12062597_2.fastq
Below is an example of the for loop I would like to run using the xarg code to pull filenames. It currently does not work. I assume I need to somehow save the paired filenames from the xarg code as a variable that can be referenced in the for loop?
find . -name '*_1.fastq' -exec basename {} '_1.fastq' \; | xargs -n1 -I{} echo {}_1.fastq {}_2.fastq
for file in *.fastq
do
bowtie2 -p 8 -x /path/genome \
1- {}_1.fastq \
2- {}_2.fastq \
"../path/${file%%.fastq}_UnMappedReads.fastq.gz" \
2> "../path/${file%%.fastq}_Bowtie2_log.txt" | samtools view -# 7 -b | samtools sort -# 7 -m 5G -o "../path/${file%%.fastq}_Mapped.bam"
done
The expected outputs for the example would be:
WT1_0min-SRR9929263_UnMappedReads.fastq.gz
WT1_20min-SRR9929265_UnMappedReads.fastq.gz
WT3_20min-SRR12062597_UnMappedReads.fastq.gz
WT1_0min-SRR9929263_Bowtie2_log.txt
WT1_20min-SRR9929265_Bowtie2_log.txt
WT3_20min-SRR12062597_Bowtie2_log.txt
WT1_0min-SRR9929263_Mapped.bam
WT1_20min-SRR9929265_Mapped.bam
WT3_20min-SRR12062597_Mapped.bam
I don't know what "bowtie2" or "samtools" are but best I can tell all you need is:
#!/usr/bin/env bash
for file1 in *_1.fastq; do
file2="${file1%_1.fastq}_2.fastq"
echo "$file1" "$file2"
done
Replace echo with whatever you want to do with ta pair of files.
If you HAD to use find for some reason then it'd be:
#!/usr/bin/env bash
while IFS= read -r file1; do
file2="${file1%_1.fastq}_2.fastq"
echo "$file1" "$file2"
done < <(find . -type f -name '*_1.fastq' -print)
or if your file names can contain newlines then:
#!/usr/bin/env bash
while IFS= read -r -d $'\0' file1; do
file2="${file1%_1.fastq}_2.fastq"
echo "$file1" "$file2"
done < <(find . -type f -name '*_1.fastq' -print0)

Cannot reference a file name that has spaces in a unix shell script [duplicate]

I want to iterate over a list of files. This list is the result of a find command, so I came up with:
getlist() {
for f in $(find . -iname "foo*")
do
echo "File found: $f"
# do something useful
done
}
It's fine except if a file has spaces in its name:
$ ls
foo_bar_baz.txt
foo bar baz.txt
$ getlist
File found: foo_bar_baz.txt
File found: foo
File found: bar
File found: baz.txt
What can I do to avoid the split on spaces?
You could replace the word-based iteration with a line-based one:
find . -iname "foo*" | while read f
do
# ... loop body
done
There are several workable ways to accomplish this.
If you wanted to stick closely to your original version it could be done this way:
getlist() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: %s\n' "$file"
done
}
This will still fail if file names have literal newlines in them, but spaces will not break it.
However, messing with IFS isn't necessary. Here's my preferred way to do this:
getlist() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
If you find the < <(command) syntax unfamiliar you should read about process substitution. The advantage of this over for file in $(find ...) is that files with spaces, newlines and other characters are correctly handled. This works because find with -print0 will use a null (aka \0) as the terminator for each file name and, unlike newline, null is not a legal character in a file name.
The advantage to this over the nearly-equivalent version
getlist() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done
}
Is that any variable assignment in the body of the while loop is preserved. That is, if you pipe to while as above then the body of the while is in a subshell which may not be what you want.
The advantage of the process substitution version over find ... -print0 | xargs -0 is minimal: The xargs version is fine if all you need is to print a line or perform a single operation on the file, but if you need to perform multiple steps the loop version is easier.
EDIT: Here's a nice test script so you can get an idea of the difference between different attempts at solving this problem
#!/usr/bin/env bash
dir=/tmp/getlist.test/
mkdir -p "$dir"
cd "$dir"
touch 'file not starting foo' foo foobar barfoo 'foo with spaces'\
'foo with'$'\n'newline 'foo with trailing whitespace '
# while with process substitution, null terminated, empty IFS
getlist0() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# while with process substitution, null terminated, default IFS
getlist1() {
while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# pipe to while, newline terminated
getlist2() {
find . -iname 'foo*' | while read -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# pipe to while, null terminated
getlist3() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, default IFS
getlist4() {
for file in "$(find . -iname 'foo*')" ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, newline IFS
getlist5() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# see how they run
for n in {0..5} ; do
printf '\n\ngetlist%d:\n' $n
eval getlist$n
done
rm -rf "$dir"
There is also a very simple solution: rely on bash globbing
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
$ ls
stupid file 3 stupid file1 stupid file2
$ for file in *; do echo "file: '${file}'"; done
file: 'stupid file 3'
file: 'stupid file1'
file: 'stupid file2'
Note that I am not sure this behavior is the default one but I don't see any special setting in my shopt so I would go and say that it should be "safe" (tested on osx and ubuntu).
find . -iname "foo*" -print0 | xargs -L1 -0 echo "File found:"
find . -name "fo*" -print0 | xargs -0 ls -l
See man xargs.
Since you aren't doing any other type of filtering with find, you can use the following as of bash 4.0:
shopt -s globstar
getlist() {
for f in **/foo*
do
echo "File found: $f"
# do something useful
done
}
The **/ will match zero or more directories, so the full pattern will match foo* in the current directory or any subdirectories.
I really like for loops and array iteration, so I figure I will add this answer to the mix...
I also liked marchelbling's stupid file example. :)
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
Inside the test directory:
readarray -t arr <<< "`ls -A1`"
This adds each file listing line into a bash array named arr with any trailing newline removed.
Let's say we want to give these files better names...
for i in ${!arr[#]}
do
newname=`echo "${arr[$i]}" | sed 's/stupid/smarter/; s/ */_/g'`;
mv "${arr[$i]}" "$newname"
done
${!arr[#]} expands to 0 1 2 so "${arr[$i]}" is the ith element of the array. The quotes around the variables are important to preserve the spaces.
The result is three renamed files:
$ ls -1
smarter_file1
smarter_file2
smarter_file_3
find has an -exec argument that loops over the find results and executes an arbitrary command. For example:
find . -iname "foo*" -exec echo "File found: {}" \;
Here {} represents the found files, and wrapping it in "" allows for the resultant shell command to deal with spaces in the file name.
In many cases you can replace that last \; (which starts a new command) with a \+, which will put multiple files in the one command (not necessarily all of them at once though, see man find for more details).
I recently had to deal with a similar case, and I built a FILES array to iterate over the filenames:
eval FILES=($(find . -iname "foo*" -printf '"%p" '))
The idea here is to surround each filename with double quotes, separate them with spaces and use the result to initialize the FILES array.
The use of eval is necessary to evaluate the double quotes in the find output correctly for the array initialization.
To iterate over the files, just do:
for f in "${FILES[#]}"; do
# Do something with $f
done
In some cases, here if you just need to copy or move a list of files, you could pipe that list to awk as well.
Important the \"" "\" around the field $0 (in short your files, one line-list = one file).
find . -iname "foo*" | awk '{print "mv \""$0"\" ./MyDir2" | "sh" }'
Ok - my first post on Stack Overflow!
Though my problems with this have always been in csh not bash the solution I present will, I'm sure, work in both. The issue is with the shell's interpretation of the "ls" returns. We can remove "ls" from the problem by simply using the shell expansion of the * wildcard - but this gives a "no match" error if there are no files in the current (or specified folder) - to get around this we simply extend the expansion to include dot-files thus: * .* - this will always yield results since the files . and .. will always be present. So in csh we can use this construct ...
foreach file (* .*)
echo $file
end
if you want to filter out the standard dot-files then that is easy enough ...
foreach file (* .*)
if ("$file" == .) continue
if ("file" == ..) continue
echo $file
end
The code in the first post on this thread would be written thus:-
getlist() {
for f in $(* .*)
do
echo "File found: $f"
# do something useful
done
}
Hope this helps!
Another solution for job...
Goal was :
select/filter filenames recursively in directories
handle each names (whatever space in path...)
#!/bin/bash -e
## #Trick in order handle File with space in their path...
OLD_IFS=${IFS}
IFS=$'\n'
files=($(find ${INPUT_DIR} -type f -name "*.md"))
for filename in ${files[*]}
do
# do your stuff
# ....
done
IFS=${OLD_IFS}

How to concatenate a list of folder paths within a variable that have spaces in them in shell script [duplicate]

I want to iterate over a list of files. This list is the result of a find command, so I came up with:
getlist() {
for f in $(find . -iname "foo*")
do
echo "File found: $f"
# do something useful
done
}
It's fine except if a file has spaces in its name:
$ ls
foo_bar_baz.txt
foo bar baz.txt
$ getlist
File found: foo_bar_baz.txt
File found: foo
File found: bar
File found: baz.txt
What can I do to avoid the split on spaces?
You could replace the word-based iteration with a line-based one:
find . -iname "foo*" | while read f
do
# ... loop body
done
There are several workable ways to accomplish this.
If you wanted to stick closely to your original version it could be done this way:
getlist() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: %s\n' "$file"
done
}
This will still fail if file names have literal newlines in them, but spaces will not break it.
However, messing with IFS isn't necessary. Here's my preferred way to do this:
getlist() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
If you find the < <(command) syntax unfamiliar you should read about process substitution. The advantage of this over for file in $(find ...) is that files with spaces, newlines and other characters are correctly handled. This works because find with -print0 will use a null (aka \0) as the terminator for each file name and, unlike newline, null is not a legal character in a file name.
The advantage to this over the nearly-equivalent version
getlist() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done
}
Is that any variable assignment in the body of the while loop is preserved. That is, if you pipe to while as above then the body of the while is in a subshell which may not be what you want.
The advantage of the process substitution version over find ... -print0 | xargs -0 is minimal: The xargs version is fine if all you need is to print a line or perform a single operation on the file, but if you need to perform multiple steps the loop version is easier.
EDIT: Here's a nice test script so you can get an idea of the difference between different attempts at solving this problem
#!/usr/bin/env bash
dir=/tmp/getlist.test/
mkdir -p "$dir"
cd "$dir"
touch 'file not starting foo' foo foobar barfoo 'foo with spaces'\
'foo with'$'\n'newline 'foo with trailing whitespace '
# while with process substitution, null terminated, empty IFS
getlist0() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# while with process substitution, null terminated, default IFS
getlist1() {
while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# pipe to while, newline terminated
getlist2() {
find . -iname 'foo*' | while read -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# pipe to while, null terminated
getlist3() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, default IFS
getlist4() {
for file in "$(find . -iname 'foo*')" ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, newline IFS
getlist5() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# see how they run
for n in {0..5} ; do
printf '\n\ngetlist%d:\n' $n
eval getlist$n
done
rm -rf "$dir"
There is also a very simple solution: rely on bash globbing
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
$ ls
stupid file 3 stupid file1 stupid file2
$ for file in *; do echo "file: '${file}'"; done
file: 'stupid file 3'
file: 'stupid file1'
file: 'stupid file2'
Note that I am not sure this behavior is the default one but I don't see any special setting in my shopt so I would go and say that it should be "safe" (tested on osx and ubuntu).
find . -iname "foo*" -print0 | xargs -L1 -0 echo "File found:"
find . -name "fo*" -print0 | xargs -0 ls -l
See man xargs.
Since you aren't doing any other type of filtering with find, you can use the following as of bash 4.0:
shopt -s globstar
getlist() {
for f in **/foo*
do
echo "File found: $f"
# do something useful
done
}
The **/ will match zero or more directories, so the full pattern will match foo* in the current directory or any subdirectories.
I really like for loops and array iteration, so I figure I will add this answer to the mix...
I also liked marchelbling's stupid file example. :)
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
Inside the test directory:
readarray -t arr <<< "`ls -A1`"
This adds each file listing line into a bash array named arr with any trailing newline removed.
Let's say we want to give these files better names...
for i in ${!arr[#]}
do
newname=`echo "${arr[$i]}" | sed 's/stupid/smarter/; s/ */_/g'`;
mv "${arr[$i]}" "$newname"
done
${!arr[#]} expands to 0 1 2 so "${arr[$i]}" is the ith element of the array. The quotes around the variables are important to preserve the spaces.
The result is three renamed files:
$ ls -1
smarter_file1
smarter_file2
smarter_file_3
find has an -exec argument that loops over the find results and executes an arbitrary command. For example:
find . -iname "foo*" -exec echo "File found: {}" \;
Here {} represents the found files, and wrapping it in "" allows for the resultant shell command to deal with spaces in the file name.
In many cases you can replace that last \; (which starts a new command) with a \+, which will put multiple files in the one command (not necessarily all of them at once though, see man find for more details).
I recently had to deal with a similar case, and I built a FILES array to iterate over the filenames:
eval FILES=($(find . -iname "foo*" -printf '"%p" '))
The idea here is to surround each filename with double quotes, separate them with spaces and use the result to initialize the FILES array.
The use of eval is necessary to evaluate the double quotes in the find output correctly for the array initialization.
To iterate over the files, just do:
for f in "${FILES[#]}"; do
# Do something with $f
done
In some cases, here if you just need to copy or move a list of files, you could pipe that list to awk as well.
Important the \"" "\" around the field $0 (in short your files, one line-list = one file).
find . -iname "foo*" | awk '{print "mv \""$0"\" ./MyDir2" | "sh" }'
Ok - my first post on Stack Overflow!
Though my problems with this have always been in csh not bash the solution I present will, I'm sure, work in both. The issue is with the shell's interpretation of the "ls" returns. We can remove "ls" from the problem by simply using the shell expansion of the * wildcard - but this gives a "no match" error if there are no files in the current (or specified folder) - to get around this we simply extend the expansion to include dot-files thus: * .* - this will always yield results since the files . and .. will always be present. So in csh we can use this construct ...
foreach file (* .*)
echo $file
end
if you want to filter out the standard dot-files then that is easy enough ...
foreach file (* .*)
if ("$file" == .) continue
if ("file" == ..) continue
echo $file
end
The code in the first post on this thread would be written thus:-
getlist() {
for f in $(* .*)
do
echo "File found: $f"
# do something useful
done
}
Hope this helps!
Another solution for job...
Goal was :
select/filter filenames recursively in directories
handle each names (whatever space in path...)
#!/bin/bash -e
## #Trick in order handle File with space in their path...
OLD_IFS=${IFS}
IFS=$'\n'
files=($(find ${INPUT_DIR} -type f -name "*.md"))
for filename in ${files[*]}
do
# do your stuff
# ....
done
IFS=${OLD_IFS}

getting the output of a grep command in a loop

I have a shell script that includes this search:
find . -type f -exec grep -iPho "barh(li|mar|ag)" {} \;
I want to capture each string the grep command finds and send it a function I will create named "parser"
parser(){
# do stuff with each single grep result found
}
how can that be done?
is this right?
find . -type f -exec grep -iPho "barh(li|mar|ag)" {parser $1} \;
I do not want to output the entire find command result to the function
Only shell can execute a function. You need to use bash -c in your find in order to execute it. That is also the reason you need to export your function, so that the new process sees it.
parser() {
while IFS= read -r line; do
echo "Processing line: $line"
done <<< "$1"
}
export -f parser
find . -type f -exec bash -c 'parser "$(grep -iPho "barh(li|mar|ag)" "$1")"' -- {} \;
The code above will send all occurrences from file1, then file2 etc to your function to process. It will not send each line one by one and therefore you need to loop over the lines in your function. If there is no occurrence of your regex in a file, it will still call your function with an empty input!
That might not be the best solution for you so let's try to add the loop inside the bash -c statement and really process the lines one by one:
parser() {
echo "Processing line: $1"
}
export -f parser
find . -type f -exec bash -c 'grep -iPho "barh(li|mar|ag)" "$#" | while IFS= read -r line; do parser "$line"; done' -- {} +
EDIT: Very nice and simple solution not using bash -c suggested by #gniourf_gniourf:
parser() {
echo "Processing line: $1"
}
find . -type f -exec grep -iPho "barh(li|mar|ag)" {} + | while IFS= read -r line; do parser "$line"; done
This approach works fine and it will process each line one by one. You also do not need to export your function with this approach. But you have to care for some things that might surprise you.
Each command in a pipeline is executed in its own subshell, and any variable assignment in your parser function or your while in general will be lost after returning from that very subshell. If you are writing a script, simple shopt -s lastpipe will suffice and run the last pipe command in the current shell environment. Or you can use process substitution:
parser() {
echo "Processing line: $1"
}
while IFS= read -r line; do
parser "$line";
done < <(find . -type f -exec grep -iPho "barh(li|mar|ag)" {} +)
Note that in the previous bash -c examples, you will experience the same behavior and your variable assignments will be lost as well.
You need to export your function.
You also need to call bash to execute the function.
parser() {
echo "GOT: $1"
}
export -f parser
find Projects/ -type f -name '*rb' -exec bash -c 'parser "$0"' {} \;
i suggest you to use sed ,this is more powerful tool to do text processing.
for example i want to add string "myparse" after the line that end as "ha",i can do this like
# echo "haha" > text1
# echo "hehe" > text2
# echo "heha" > text3
# find . -type f -exec sed '/ha$/s/ha$/ha myparse/' {} \;
haha myparse
heha myparse
hehe
if you really want to replace the file ,not just print to stdout,you can do this like
# find . -type f -exec sed -i '/ha$/s/ha$/ha myparse/' {} \;

How to loop through a list in shell?

Suppose I have a command which outputs a list of strings
string1
string2
string3
.
.
stringN
How can I loop through the output of the list in a shell?
For example:
myVal=myCmd
for val in myVal
do
# do some stuff
end
Use a bash while-loop, the loop can be done over a command or an input file.
while IFS= read -r string
do
some_stuff to do
done < <(command_that_produces_string)
With an example, I have a sample file with contents as
$ cat file
My
name
is not
relevant
here
I have modified the script to echo the line as it reads through the file
$ cat script.sh
#!/bin/bash
while IFS= read -r string
do
echo "$string"
done < file
produces an o/p when run as ./script.sh
My
name
is not
relevant
here
The same can also be done over a bash-command, where we adopt process-substitution (<()) to run the command on the sub-shell.
#!/bin/bash
while IFS= read -r -d '' file; do
echo "$file"
done < <(find . -maxdepth 1 -mindepth 1 -name "*.txt" -type f -print0)
The above simple find lists all files from the current directory (including ones with spaces/special-characters). Here, the output of find command is fed to stdin which is parsed by while-loop.
You are very close, can't tell if it is just cut and paste/typos that are causing the issue - note the quotes on line 1 and the $ in line 2.
myVal=`echo "a b c"`
for val in $myVal
do
echo "$val"
done

Resources