cat multiple files in separate directories file1 file2 file3....file100 using loop in bash script - bash

I have several files in multiple directories like in directory 1/file1 2/file2 3/file3......100/file100. I want to cat all those files to a single file using loop over index in bash script. Is there easy loop for doing so?
Thanks,

seq 100 | sed 's:.*:dir&/file&:' | xargs cat
seq 100 generates list of numbers from 1 to 100
sed
s substitutes
: separates parts of the command
.* the whole line
: separator. Usually / is used, but it's used in replacement string.
dir&/file& by dir<whole line>/file<whole line>
: separator
so it generates list of dir1/file1 ... dir100/file100
xargs - pass input as arguments to ...
cat - so it will execute cat dir1/file1 dir2/file2 ... dir100/file100.

This code should do the trick;
for((i=1;i<=`ls -l | wc -l`;i++)); do cat dir${i}/file${i} >> output; done

I made an example of what you're describing about your directory structure and files. Create directories and files with It's own content.
for ((i=1;i<=100;i++)); do
mkdir "$i" && touch "$i/file$i" && echo content of "$(pwd) $i" > "$i/file$i"
done
Check the created directories.
ls */*
ls */* | sort -n
If you see that the directories and files are created then proceed to the next step.
This solution does not involve any external command from the shell except of course cat :-)
Now we can check the contents of each files using bash syntax.
i=1
while [[ -e "$i" ]]; do
cat "$i"/*
((i++))
done
This code was tested in dash.
i=1
while [ -e "$i" ]; do
cat "$i"/*
i=$((i+1))
done
Just add the redirection of the output to the file after the done.
You can add some more test if you like see help test
One more thing :-), you can just check the contents using tail and brace expansion
tail -n +1 {1..100}/*
Using cat also you can redirect the output already, just remember brace expansion is bash3+ feature/syntax.
cat {1..100}/*

Related

How to continually process last lines of two files when the files change randomly?

I have the following simple snippet:
#!/bin/bash
tail -f "data/top.right.log" | while read val1
do
val2=$(tail -n 1 "data/top.left.log")
echo $(echo "$val1 - $val2" | bc)
done
top.left.log and top.right.log are files to which some other processes continually write. The bash script simply subtracts the last lines of both files and show a result.
I would like to make the script more efficient. In pseudo-code I would like to do this:
#!/bin/bash
magiccommand "data/top.right.log" "data/top.left.log" | while read val1 val2
do
echo $(echo "$val1 - $val2" | bc)
done
so that whenever top.left.log OR top.right.log changes the echo command is called.
I have already tried various snippets from StackOverflow but often they rely on the fact that the files do not change or that both files contain the same amount of lines which is not my case.
If you have inotify-tools you can use following command:
inotifywait -q -e modify file1 file2
Description:
inotifywait efficiently waits for changes to files using Linux's inotify(7) interface.
It is suitable for waiting for changes to files from shell scripts.
It can either exit once an event occurs, or continually execute and output events as they occur.
An example:
while : ;
do
inotifywait -q -e modify file1 file2
echo `tail -n1 file1`
echo `tail -n1 file2`
done
Create a temporary file that you touch each time the files are processed. If any of the files is newer than the temporary file, process the files again.
#!/bin/bash
log1=top.left.log
log2=top.right.log
tmp=last_change
last_change=0
touch "$tmp"
while : ; do
if [[ $log1 -nt $tmp || $log2 -nt $tmp ]] ; then
touch "$tmp"
x=$(tail -n1 "$log1")
y=$(tail -n1 "$log2")
echo $(( x - y ))
fi
done
You might need to remove the temporary file once the script is killed.
If the files are changing fast, you might miss some lines. Otherwise, adding sleep 1 somewhere would decrease the CPU usage.
Instead of calling tail every time, you can open file descriptors once and read line after line. This makes use of the fact that the files are kept open, and read will always read from the next line of a file.
First, open the files in bash, assigning them file descriptors 3 and 4
exec 3<file1 4<file2
Now, you can read from these files using read -u <fd>. In combination with inotifywait of Dawid's answer, this gives you an efficient way to read files line by line:
while :; do
# TODO: add some break condition
# wait until one of the files has changed
inotifywait -q -e modify file1 file2
# read the next line of file1 into val1_new
# if file1 has not changed and there is no new line, read will return with failure
read -u 3 val1_new && val1="$val1_new"
# same for file2
read -u 4 val2_new && val2="$val2_new"
done
You may extend this by reading until you have reached the last line, or parsing inotifywait's output to detect which file has changed.
A possible way is to parse the output of tail -f and display the difference in value whenever the ==> <== pattern is found.
I came up with this script:
$ cat test.awk
$0 ~ /==>.*right.*<==/ {var=1}
$0 ~ /==>.*left.*<==/ {var=2}
$1~/[0-9]+/ && var==1 { val1=$1 }
$1~/[0-9]+/ && var==2 { val2=$1 }
val1 != "" && val2 != "" && $1~/[0-9]+/{
print val1-val2
}
The script assume the values are integer [0-9]+ in both file.
You can use it like this:
tail -f top.right.log top.left.log | awk -f test.awk
Whenever a value is appended in any of the file, the difference between the last value of each file is displayed.

Merging fastq files by identifiers with a shell script

I have to merge files with the following naming pattern :
[SampleID]_[custom_ID01]_ID[RUN_ID]_L001_R1.fastq
[SampleID]_[custom_ID02]_ID[RUN_ID]_L002_R1.fastq
[SampleID]_[custom_ID03]_ID[RUN_ID]_L003_R1.fastq
[SampleID]_[custom_ID04]_ID[RUN_ID]_L004_R1.fastq
I need to merge all files with identical [SampleID] but different "Lanes" (L001-L004).
The following script works fine when directly run in the terminal:
custom_id="000"
RUN_ID="0025"
wd="/path/to/script/" # was missing/ incorrect
# get ALL sample identifiers
touch temp1.txt
for line in $wd/*.fastq ; do
fastq_identifier=$(echo "$line" | cut -d"_" -f1);
echo $fastq_identifier >> temp1.txt
done
# get all uniqe samples identical
cat temp1.txt | uniq > temp2.txt
input_var=$(cat temp2.txt)
# concatenate all fastq (different lanes) with identical identifier
for line in $input_var; do
cat $line*fastq >> $line"_"$custom_id"_ID"$Run_ID"_L001_R1.fastq"
done
rm temp1.txt temp2.txt;
But if I create a script file (concatenate_fastq.sh) and make it executable
$ chomd +x concatenate_fastq.sh
and run it
$ ./concatenate_fastq.sh
I got the following error:
$ concatenate_fastq.sh: line 17: /*.fastq_000_ID_L001_R1.fastq: Keine Berechtigung # = Permission denied
Thx to your hints below I solved the problem by fixing
wd=/path/to/script/
The immediate problem seems to be that wd is unset. If you script really genuinely contains exactly the line
wd="/path/to/script/"
then I would suspect invisible control characters in the script file (using a Windows editor is a common way to shoot yourself in the foot).
More generally, your script should cope correctly when the wildcard does not match any files. A common way to do that is to shopt -s nullglob but the subsequent script would still need adaptation then.
Refactoring the script to loop only over actual matches would help avoid trouble. Perhaps something like this:
shopt -s nullglob # bashism
printf '%s\n' "$wd"/*.fastq |
cut -d_ -f1 |
uniq |
while read -r line; do
cat "$line"*fastq >> "${line}_${custom_id}_ID${Run_ID}_L001_R1.fastq"
done
You'll notice that this simplifies the script tremendously, and avoids the pesky temporary files.
I solved it with:
if [ $# -ne 3 ] ; then
echo -e "Usage: $0 {path_to_working_directory} {custom_ID:Z+} {run_ID:ZZZZ}\n"
exit 1
fi
cwd=$(pwd)
wd=$1
custom_id=$2
RUN_ID=$3
folder=$(basename $wd)
input_var=$(ls *fastq | cut --fields 1 -d "_" | uniq)
for line in $input_var; do
cat $line*fastq >> $line"_"$custom_id"_ID"$RUN_ID"_L001_R1.fastq"
done

Print the contents of files from the output of a program

Let's say I have a program foo that finds files with a certain specification and that the output of running foo is:
file1.txt
file2.txt
file3.txt
I want to print the contents of each of those files (preferably with the file name prepended). How would I do this? I would've thought piping it to cat like so:
foo | cat
would work but it doesn't.
EDIT:
My solution to this problem prints out each file and prepends the filename to each line of output is:
foo | xargs grep .
This gets output similar to:
file1.txt: Hello world
file2.txt: My name is foobar.
<your command> | xargs cat
You need xargs here:
foo | xargs cat
In order to allow for file names that have spaces in them, you'll need something like this:
#/bin/bash
while read -r file
do
# Check for existence of the file before using cat on it.
if [[ -f $file ]]; then
cat "$file"
# Don't bother with empty lines
elif [[ -n $file ]]; then
echo "There is no file named '$file'"
fi
done
Put this a script. Let's call it myscript.sh. Then, execute:
foo | myscript.sh
foo | xargs grep '^' /dev/null
why grep on ^ ? to display also empty lines (replace with "." if you want only non-empty lines)
why is there a /dev/null ? so that, in addition to any filename provided in "foo" output, there is at least 1 additionnal file (and a file NOT maching anything, such as /dev/null). That way there is AT LEAST 2 filenames given to grep, and thus grep will always show the matching filename.

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done
Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

Resources