How can I create a Bash script that creates multiple files with text, excluding one? - bash

I need to create Bash script that generates text files named file001.txt through file050.txt
Of those files, all should have this text inserted "This if file number xxx" (where xxx is the assigned file number), except for file007.txt, which needs to me empty.
This is what I have so far..
#!/bin/bash
touch {001..050}.txt
for f in {001..050}
do
echo This is file number > "$f.txt"
done
Not sure where to go from here. Any help would be very appreciated.

#!/bin/bash
for f in {001..050}
do
if [[ ${f} == "007" ]]
then
# creates empty file
touch "${f}.txt"
else
# creates + inserts text into file
echo "some text/file" > "${f}.txt"
fi
done

The continue statement can be used to skip an iteration of a loop and go on to the next -- though since you actually do want to take an operation on file 7 (creating it), it makes just as much sense to have a conditional:
for (( i=1; i<50; i++ )); do
printf -v filename '%03d.txt' "$i"
if (( i == 7 )); then
# create file if it doesn't exist, truncate if it does
>"$filename"
else
echo "This is file number $i" >"$filename"
fi
done
A few words about the specific implementation decisions here:
Using touch file is much slower than > file (since it starts an external command), and doesn't truncate (so if the file already exists it will retain its contents); your textual description of the problem indicates that you want 007.txt to be empty, making truncation appropriate.
Using a C-style for loop, ie. for ((i=0; i<50; i++)), means you can use a variable for the maximum number; ie. for ((i=0; i<max; i++)). You can't do {001..$max}, by contrast. However, this does need meaning to add zero-padding in a separate step -- hence the printf.

Of course, you can costumize the files' name and the text, the key thing is the ${i}. I tried to be clear, but let us know if you don't understand something.
#!/bin/bash
# Looping through 001 to 050
for i in {001..050}
do
if [ ${i} == 007 ]
then
# Create an empty file if the "i" is 007
echo > "file${i}.txt"
else
# Else create a file ("file012.txt" for example)
# with the text "This is file number 012"
echo "This is file number ${i}" > "file${i}.txt"
fi
done

Related

Unix looping from text file

I have 2 text files. I want to loop in the first file to get a list, then using that list, loop from the second file to search for matching fields.
The first loop was fine, but when the second loop comes in, the variable $CLIENT_ABBREV cannot be read in the second loop, it's reading as blank. Output looks like does not match DOG where there's a blank before does.
while IFS=',' read CLIENT_ID NAME SERVER_NAME CLIENT_ABBREV
do
echo "\n------------"
echo Configuration in effect for this run
echo CLIENT_ID=$CLIENT_ID
echo NAME=$NAME
echo SERVER_NAME=$SERVER_NAME
echo CLIENT_ABBREV=$CLIENT_ABBREV
while IFS=',' read JOB_NAME CLIENT_ABBREV_FROMCOMMAND JOBTYPE JOBVER
do
if [ "$CLIENT_ABBREV" == "$CLIENT_ABBREV_FROMCOMMAND" ]; then
# do something
else
echo $CLIENT_ABBREV does not match $CLIENT_ABBREV_FROMCOMMAND
done <"$COMMAND_LIST"
done <"$CLIENT_LIST"
Is there a file with the name COMMAND_LIST ?
Or, actually do you want to use $COMMAND_LIST instead of COMMAND_LIST ?

Writing a script for large text file manipulation (iterative substitution of duplicated lines), weird bugs and very slow.

I am trying to write a script which takes a directory containing text files (384 of them) and modifies duplicate lines that have a specific format in order to make them not duplicates.
In particular, I have files in which some lines begin with the '#' character and contain the substring 0:0. A subset of these lines are duplicated one or more times. For those that are duplicated, I'd like to replace 0:0 with i:0 where i starts at 1 and is incremented.
So far I've written a bash script that finds duplicated lines beginning with '#', writes them to a file, then reads them back and uses sed in a while loop to search and replace the first occurrence of the line to be replaced. This is it below:
#!/bin/bash
fdir=$1"*"
#for each fastq file
for f in $fdir
do
(
#find duplicated read names and write to file $f.txt
sort $f | uniq -d | grep ^# > "$f".txt
#loop over each duplicated readname
while read in; do
rname=$in
i=1
#while this readname still exists in the file increment and replace
while grep -q "$rname" $f; do
replace=${rname/0:0/$i:0}
sed -i.bu "0,/$rname/s/$rname/$replace/" "$f"
let "i+=1"
done
done < "$f".txt
rm "$f".txt
rm "$f".bu
done
echo "done" >> progress.txt
)&
background=( $(jobs -p) )
if (( ${#background[#]} ==40)); then
wait -n
fi
done
The problem with it is that its impractically slow. I ran it on a 48 core computer for over 3 days and it hardly got through 30 files. It also seemed to have removed about 10 files and I'm not sure why.
My question is where are the bugs coming from and how can I do this more efficiently? I'm open to using other programming languages or changing my approach.
EDIT
Strangely the loop works fine on one file. Basically I ran
sort $f | uniq -d | grep ^# > "$f".txt
while read in; do
rname=$in
i=1
while grep -q "$rname" $f; do
replace=${rname/0:0/$i:0}
sed -i.bu "0,/$rname/s/$rname/$replace/" "$f"
let "i+=1"
done
done < "$f".txt
To give you an idea of what the files look like below are a few lines from one of them. The thing is that even though it works for the one file, it's slow. Like multiple hours for one file of 7.5 M. I'm wondering if there's a more practical approach.
With regard to the file deletions and other bugs I have no idea what was happening Maybe it was running into memory collisions or something when they were run in parallel?
Sample input:
#D00269:138:HJG2TADXX:2:1101:0:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:0:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
Sample output:
#D00269:138:HJG2TADXX:2:1101:1:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:2:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
Here's some code that produces the required output from your sample input.
Again, it is assumed that your input file is sorted by the first value (up to the first space character).
time awk '{
#dbg if (dbg) print "#dbg:prev=" prev
if (/^#/ && prev!=$1) {fixNum=0 ;if (dbg) print "prev!=$1=" prev "!=" $1}
if (/^#/ && (prev==$1 || NR==1) ) {
prev=$1
n=split($1,tmpArr,":") ; n++
#dbg if (dbg) print "tmpArr[6]="tmpArr[6] "\tfixNum="fixNum
fixNum++;tmpArr[6]=fixNum;
# magic to rebuild $1 here
for (i=1;i<n;i++) {
tmpFix ? tmpFix=tmpFix":"tmpArr[i]"" : tmpFix=tmpArr[i]
}
$1=tmpFix ; $0=$0
print $0
}
else { tmpFix=""; print $0 }
}' file > fixedFile
output
#D00269:138:HJG2TADXX:2:1101:1:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:2:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
I've left a few of the #dbg:... statements in place (but they are now commented out) to show how you can run a small set of data as you have provided, and watch the values of variables change.
Assuming a non-csh, you should be able to copy/paste the code block into a terminal window cmd-line and replace file > fixFile at the end with your real file name and a new name for the fixed file. Recall that awk 'program' file > file (actually, any ...file>file) will truncate the existing file and then try to write, SO you can lose all the data of a file trying to use the same name.
There are probably some syntax improvements that will reduce the size of this code, and there might be 1 or 2 things that could be done that will make the code faster, but this should run very quickly. If not, please post the result of time command that should appear at the end of the run, i.e.
real 0m0.18s
user 0m0.03s
sys 0m0.06s
IHTH
#!/bin/bash
i=4
sort $1 | uniq -d | grep ^# > dups.txt
while read in; do
if [ $((i%4))=0 ] && grep -q "$in" dups.txt; then
x="$in"
x=${x/"0:0 "/$i":0 "}
echo "$x" >> $1"fixed.txt"
else
echo "$in" >> $1"fixed.txt"
fi
let "i+=1"
done < $1

bash - difference between two text files

Let's say there are two text files and I need to check if they are different.
If they are, I need to make some changes to them and display information on the terminal.
Will something like this work?
diff file1.txt file2.txt > difference.txt
if [ -s difference.txt ]
then
.....
else
.....
fi
I also tried to find some other ways of writing this in bash, and I've found this code :
DIFF_OUTPUT="$(diff new.html old.html)"
if [ "0" != "${#DIFF_OUTPUT}" ]; then
But I can't quite understand it.
I guess in the first line we create a variable DIFF_OUTPUT which works just like difference.txt in my code?
Then there's
${#DIFF_OUTPUT}
which I don't understand at all. What's going on here?
I apologise if my questions are very basic, but I couldn't find an answer anywhere else.
diff has an exit status of 1 if the files are different.
diff file1.txt file2.txt > difference.txt
status=$?
case $status in
0) echo "Files are the same"
# more code here
;;
1) echo "Files are different"
# more code here
;;
*) echo "Error occurred: $status"
# more code here
;;
esac
If you aren't concerned with errors, then just check for a zero-vs-non-zero condition:
if diff file1.txt file2.txt > difference.txt; then
# exit status was 0, files are the same
else
# exit status was > 0, files are different or an error occurred
fi
The first line sets a variable DIFF_OUTPUT as the output/terminal result of the command diff new.html old.html.
This is called command substitution. You can encapsulate an expression inline by using $(). Think of it as copying the expression into a terminal and running it and then pasting the result straight back into your code.
So, DIFF_OUTPUT now contains the output of the diff of the two files. If the files are identical, then diff will output nothing, thus the variable DIFF_OUTPUT will be assigned an empty string.
${#variable} returns the length of a variable in bash. Thus, if there was no difference between the files, the variable (DIFF_OUTPUT) will be an empty string - which has a length of 0. Thus, ${#DIFF_OUTPUT} == "0", meaning that, if there was a difference in the files, ${#DIFF_OUTPUT} != "0" and your condition is satisfied.
DIFF_OUTPUT="$(diff new.html old.html)"
The first line saves the output of a command diff into a variable DIFF_OUTPUT.
${#DIFF_OUTPUT}
and this expression outputs the length of DIFF_OUTPUT. ${#VAR } syntax will calculate the number of characters in a variable

stopping 'sed' after match found on a line; don't let sed keep checking all lines to EOF

I have a text file in which each a first block of text on each line is separated by a tab from a second block of text like so:
VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.
In case it is hard to tell, tab is long space between "quasi-subjunctive" and "Be".
So I am thinking off the top of my head a 'for' loop in which a var is set using 'sed' to read the first block of text of a line, upto and including the tab (or not, doesn't really matter) and then the 'var' is used to find subsequent matches adding a "(x)" right before the tab to make the line unique. The 'x' of course would be a running counter numbering the first instance '1' incrementing and then each subsequent match one number higher.
One problem I see is stopping 'sed' after each subsequent match so the counter can be incremented. Is there a way to do this, since it is "sed's" normal behaviour to continue on thru without stop (as far as I know) until all lines are processed.
You can set the IFS to TAB character and read the line into variables. Something like:
$ while IFS=$'\t' read block1 block2;do
echo "block1 is $block1"
echo "block2 is $block2"
done < file
block1 is VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive
block2 is Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.
Ok so I got the job done with this little (or perhaps big if too much overkill?) script I whipped up:
#!/bin/bash
sedLnCnt=1
while [[ "$sedLnCnt" -lt 521 ]] ; do
lN=$(sed -n "${sedLnCnt} p" sGNoSecNums.html|sed -r 's/^([^\t]*\t).*$/\1/') #; echo "\$lN: $lN"
lnNum=($(grep -n "$lN" sGNoSecNums.html|sed -r 's/^([0-9]+):.*$/\1/')) #; echo "num of matches: ${#lnNum[#]}"
if [[ "${#lnNum[#]}" -gt 1 ]] ; then #'if'
lCnt="${#lnNum[#]}"
((eleN = $lCnt-1)) #; echo "\$eleN: ${eleN}" # var $eleN needs to be 1 less than total line count of zero-based array
while [[ "$lCnt" -gt 0 ]] ; do
sed -ri "${lnNum[$eleN]}s/^([^\t]*)\t/\1 \(${lCnt}\)\t/" sGNoSecNums.html
((lCnt--))
((eleN--))
done
fi
((sedLnCnt++))
done
Grep was the perfect way to find line numbers of matches, jamming them into an array and then editing each line appending the unique identifier.

change a single word in a file with Bash

i try to write a very simple bash file that allow my to open and modify n times a file.java .
The modification i want is only a change in a single (or two) row of a single number.
I try to do this with the follow code:
#!/bin/bash
# commento
touch ic.java
touch input
n=0
for n in "1" "2" "3" "4.5"
do
echo 'import java.io.*;'>ic.java
echo 'import java.util.*;'>>ic.java
echo ' '>>ic.java
echo 'class INITIAL_CONDITION_NORMAL {'>>ic.java
echo 'public static void main (String args[]) {'>>ic.java
echo "$n">>ic.java
n=$(($n+1))
echo '....'>>ic.java
done
java ic.java
as you see i must write all the file and, when i like to change the number, put the "$n"
and n=$(($n+1)) in the row then go on until the end of the file and lounch it (java ic.java).
I know i can use something like:
sed -i 'm-th_row/old/new/' ic.java
but if i want to do this recursively (100 times) whit every time a different new value (as in the example) how can i do that?
Thanks a lot for Your help !
As long as new contains no / (slash) character, or any other special character that would confuse sed, this is the sort of pattern you need.
for n in "1" "2" "3" "4.5"
sed -i "m-th_row/old/$n/" ic.java
done
Of course, that snippet would just modify the same file repeatedly, which probably wouldn't be helpful, but you get the idea.

Resources