awk, bin/sh 0: 1: not found - bash

I am almost brand-new to coding and I would appreciate feedback. I am trying to make a shell script that will evaluate information in a text file and moved said text file if certain conditions are met. My code is below:
#!/bin/bash
stock=HD
awk ' /Rank/ {
if ( $4 == "1-Strong" ) system("mv " "'$stock'.txt" " " ~/Desktop/Stocks/1/"'$stock'.txt") ;
else if ( $4 == "2-Buy" ) system("mv " "'$stock'.txt" " " ~/Desktop/Stocks/2/"'$stock'.txt") ;
else if ( $4 == "3-Hold" ) system("mv " "'$stock'.txt" " " ~/Desktop/Stocks/3/"'$stock'.txt") ;
else if ( $4 == "4-Sell" ) system("mv " "'$stock'.txt" " " ~/Desktop/Stocks/4/"'$stock'.txt") ;
else if ( $4 == "5-Strong" ) system("mv " "'$stock'.txt" " " ~/Desktop/Stocks/5/"'$stock'.txt")
} ' $stock.txt
When I try to run the script, I get this error:
andrew#andrew-VirtualBox:~/Desktop/Stocks$ ./stockscript7.sh
/bin/sh: 1: 0: not found
I made sure I was in bash shell. I did use chmod to give the script permissions. Otherwise from this error message I'm lost on what to try next. Any help would be much appreciated.

The following is a translation to native bash, with none of the bugs associated with abusing awk as you were (consider what would have happened with an input file named EVIL'$(rm -rf .)'.txt -- as created with the command touch $'EVIL\'$(rm -rf .)\'.txt'):
#!/bin/bash
stock=HD
while IFS= read -r line; do
[[ $line = *Rank* ]] || continue
read -r _ _ _ rank _ <<<"$line"
case $rank in
1-Strong) mv -- "$stock.txt" ~/Desktop/Stocks/1/ ;;
2-Buy) mv -- "$stock.txt" ~/Desktop/Stocks/2/ ;;
3-Hold) mv -- "$stock.txt" ~/Desktop/Stocks/3/ ;;
4-Sell) mv -- "$stock.txt" ~/Desktop/Stocks/4/ ;;
5-Strong) mv -- "$stock.txt" ~/Desktop/Stocks/5/ ;;
esac
done <"$stock.txt"
However, this doesn't make much sense as it's written: You're running one mv per line in the file matching Rank -- but you can't successfully move a single file more than once. Perhaps you just want to read the first line containing Rank, and rename the file according to the number preceding a dash in the fourth column?
#!/bin/bash
stock=HD
if read -r _ _ _ rating _ < <(grep -e Rank "$stock.txt") && [[ $rating ]]; then
mv -- "$stock.txt" ~/Desktop/Stocks/"${rating%%-*}"/
fi
Explanation:
<(...) is replaced with a filename which, when read, returns the output of the command enclosed. Thus, foo < <(...) runs foo in the parent shell with its stdin fed from the command in the .... See BashFAQ #24 to understand why this is necessary rather than running grep -e Rank "$stock.txt" | read _ _ _ rating _.
read _ _ _ rating _ reads the fourth word of its input stream into the variable rating. (First, second, third, and fifth-and-on are read into variables named _, which is convention for something you don't care about / want to throw away).
"${rating%%-*}" throws away all contents in the variable rating following the first -, thus converting 1-Strong to 1, 2-Buy to 2, etc.
However, all the above doesn't explain the exact error you're receiving. To get that, let's break down your awk commands:
system("mv " "'$stock'.txt" " " ~/Desktop/Stocks/1/"'$stock'.txt") ;
...so, what are the strings concatenated together and passed to system() (and thus /bin/sh)?
system(
"mv "
"'$stock'.txt"
" "
~/Desktop/Stocks/1/"'$stock'.txt"
) ;
See the problem here? That fourth piece (which you intend to have concatenated together with the rest) isn't a string in awk! Thus, when it contains / operations, those are treated as numeric division -- casting the content before it to a numeric value, and dividing by the content after it (likewise cast to integer). On some awk releases, including mine, this results in a divide-by-zero error; yours apparently differs.
(Also, $stock is a literal string, not substituted with HD here, but that's a separate issue)

You are currently creating this execution tree:
shell { awk { system( shell ) } }
Don't do that, it's extremely messy and error prone. Do this instead (or something equally simple):
shell { awk }
e.g. this MAY be all you need (depending on the contents of HD.txt):
mv "${stock}.txt" ~/Desktop/Stocks/"$(awk '/Rank/{ sub(/-.*/,"",$4); print $4 }' "${stock}.txt")"

You are making things too complicated by 1) sticking just with awk and 2) repeating the file name in the destination path of those mv comands.
I will assume the "Rank" line has columns separated by spaces and that the 4th one is the one containing your id, as you yourself are assuming. If either this or the following assumption do not hold, the solution here may give you problems. See Charle's answer.
If you are sure the 4th field always contains a string with the pattern digits-text then you can try with
#!/bin/bash
stock=HD
id=`cat "$stock.txt"|grep Rank| cut -d ' ' -f 4| sed 's#-.*##g'`
echo "$stock.txt" ~/Desktop/Stocks/"$id"
#mv "$stock.txt" ~/Desktop/Stocks/"$id"
I have commented out the actual mv command. The echo line will allow you to double check that the file would indeed go to the right destination. Make a few tests and if things look ok, uncomment the mv line by removing the sign #.
EDIT: Do make sure that all the directories ~/Desktop/Stocks/$id exist before issuing the mv commands. Otherwise you'll lose those files!, as they would overwrite each other and only the content of the last file moved to any given ~/Desktop/Stocks/$id directory would survive -but named $id!!
The following code will make sure that they are moved into a directory and are not overwritten onto each other.
#!/bin/bash
stock=HD
DDIR=~/Desktop/Stocks
id=`cat "$stock.txt"|grep Rank| cut -d ' ' -f 4| sed 's#-.*##g'`
echo "$stock.txt" ~/Desktop/Stocks/"$id"
{ [ -d "$DDIR/$id" ] || mkdir "$DDIR/$id" ;} && mv "$stock.txt" ~/Desktop/Stocks/"$id"
This checks first for folder $DDIR/$id to exist and makes it if it doesn't. Only then proceeds with the actual move.

Related

Writing a script for large text file manipulation (iterative substitution of duplicated lines), weird bugs and very slow.

I am trying to write a script which takes a directory containing text files (384 of them) and modifies duplicate lines that have a specific format in order to make them not duplicates.
In particular, I have files in which some lines begin with the '#' character and contain the substring 0:0. A subset of these lines are duplicated one or more times. For those that are duplicated, I'd like to replace 0:0 with i:0 where i starts at 1 and is incremented.
So far I've written a bash script that finds duplicated lines beginning with '#', writes them to a file, then reads them back and uses sed in a while loop to search and replace the first occurrence of the line to be replaced. This is it below:
#!/bin/bash
fdir=$1"*"
#for each fastq file
for f in $fdir
do
(
#find duplicated read names and write to file $f.txt
sort $f | uniq -d | grep ^# > "$f".txt
#loop over each duplicated readname
while read in; do
rname=$in
i=1
#while this readname still exists in the file increment and replace
while grep -q "$rname" $f; do
replace=${rname/0:0/$i:0}
sed -i.bu "0,/$rname/s/$rname/$replace/" "$f"
let "i+=1"
done
done < "$f".txt
rm "$f".txt
rm "$f".bu
done
echo "done" >> progress.txt
)&
background=( $(jobs -p) )
if (( ${#background[#]} ==40)); then
wait -n
fi
done
The problem with it is that its impractically slow. I ran it on a 48 core computer for over 3 days and it hardly got through 30 files. It also seemed to have removed about 10 files and I'm not sure why.
My question is where are the bugs coming from and how can I do this more efficiently? I'm open to using other programming languages or changing my approach.
EDIT
Strangely the loop works fine on one file. Basically I ran
sort $f | uniq -d | grep ^# > "$f".txt
while read in; do
rname=$in
i=1
while grep -q "$rname" $f; do
replace=${rname/0:0/$i:0}
sed -i.bu "0,/$rname/s/$rname/$replace/" "$f"
let "i+=1"
done
done < "$f".txt
To give you an idea of what the files look like below are a few lines from one of them. The thing is that even though it works for the one file, it's slow. Like multiple hours for one file of 7.5 M. I'm wondering if there's a more practical approach.
With regard to the file deletions and other bugs I have no idea what was happening Maybe it was running into memory collisions or something when they were run in parallel?
Sample input:
#D00269:138:HJG2TADXX:2:1101:0:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:0:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
Sample output:
#D00269:138:HJG2TADXX:2:1101:1:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:2:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
Here's some code that produces the required output from your sample input.
Again, it is assumed that your input file is sorted by the first value (up to the first space character).
time awk '{
#dbg if (dbg) print "#dbg:prev=" prev
if (/^#/ && prev!=$1) {fixNum=0 ;if (dbg) print "prev!=$1=" prev "!=" $1}
if (/^#/ && (prev==$1 || NR==1) ) {
prev=$1
n=split($1,tmpArr,":") ; n++
#dbg if (dbg) print "tmpArr[6]="tmpArr[6] "\tfixNum="fixNum
fixNum++;tmpArr[6]=fixNum;
# magic to rebuild $1 here
for (i=1;i<n;i++) {
tmpFix ? tmpFix=tmpFix":"tmpArr[i]"" : tmpFix=tmpArr[i]
}
$1=tmpFix ; $0=$0
print $0
}
else { tmpFix=""; print $0 }
}' file > fixedFile
output
#D00269:138:HJG2TADXX:2:1101:1:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:2:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
I've left a few of the #dbg:... statements in place (but they are now commented out) to show how you can run a small set of data as you have provided, and watch the values of variables change.
Assuming a non-csh, you should be able to copy/paste the code block into a terminal window cmd-line and replace file > fixFile at the end with your real file name and a new name for the fixed file. Recall that awk 'program' file > file (actually, any ...file>file) will truncate the existing file and then try to write, SO you can lose all the data of a file trying to use the same name.
There are probably some syntax improvements that will reduce the size of this code, and there might be 1 or 2 things that could be done that will make the code faster, but this should run very quickly. If not, please post the result of time command that should appear at the end of the run, i.e.
real 0m0.18s
user 0m0.03s
sys 0m0.06s
IHTH
#!/bin/bash
i=4
sort $1 | uniq -d | grep ^# > dups.txt
while read in; do
if [ $((i%4))=0 ] && grep -q "$in" dups.txt; then
x="$in"
x=${x/"0:0 "/$i":0 "}
echo "$x" >> $1"fixed.txt"
else
echo "$in" >> $1"fixed.txt"
fi
let "i+=1"
done < $1

Getting different output files

I'm doing a test with these files:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R2_001.fastq
I want to get the files that have the same code until the first _ (underscore) and have the code R1 in different output files. The output files should be called according with the code until the first _ (underscore).
-This is my code, but I'm having trouble on making the output files.
#!/bin/bash
for i in {900..995}; do
if [[ ${i} -eq ${i} ]]; then
cat comp${i}_*_R1_001.fastq
fi
done
-I want to have two outputs:
One output will have all lines from:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and its name should be comp900_R1.out
The other output will have lines from:
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
and its name should be comp995_R1.out
Finally, as I said, this is a small test. I want my script to work with a lot of files that have the same characteristics.
Using awk:
ls -1 *.fastq | awk -F_ '$8 == "R1" {system("cat " $0 ">>" $1 "_R1.out")}'
List all files *.fastq into awk, splitting on _. Check if 8:th part $8 is R1, then append cat >> the file into first part $1 + _R1.out, which will be comp900_R1.out or comp995_R1.out. It is assumed that no filenames contain spaces or other special characters.
Result:
File comp900_R1.out containing all lines from
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and file comp995_R1.out containing all lines from
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
My stab at a general solution:
#!/bin/bash
for f in *_R1_*; do
code=$(echo $f | cut -d _ -f 1)
cat $f >> ${code}_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
done
Iterates over files with _R1_ in it, then appends its output to a file based on code.
cut pulls out the code by splitting the filename (-d _) and returning the first field (-f 1).

Why is this command within my code giving different result than the same command in terminal?

**Edit: Okay, so I've tried implementing everyone's advice so far.
-I've added quotes around each variable "$1" and "$codon" to avoid whitespace.
-I've added the -ioc flag to grep to avoid caps.
-I tried using tr -d' ', however that leads to a runtime error because it says -d' ' is an invalid option.
Unfortunately I am still seeing the same problem. Or a different problem, which is that it tells me that every codon appears exactly once. Which is a different kind of wrong.
Thanks for everything so far - I'm still open to new ideas. I've updated my code below.**
I have this bash script that is supposed to count all permutations of (A C G T) in a given file.
One line of the script is not giving me the desired result and I don't know why - especially because I can enter the exact same line of code in the command prompt and get the desired result.
The line, executed in the command prompt, is:
cat dnafile | grep -o GCT | wc -l
This line tells me how many times the regular expression "GCT" appears in the file dnafile. When I run this command the result I get is 10 (which is accurate).
In the code itself, I run a modified version of the same command:
cat $1 | grep -o $codon | wc -l
Where $1 is the file name, and $codon is the 3-letter combination. When I run this from within the program, the answer I get is ALWAYS 0 (which is decidedly not accurate).
I was hoping one of you fine gents could enlighten this lost soul as to why this is not working as expected.
Thank you very, very much!
My code:
#!/bin/bash
#countcodons <dnafile> counts occurances of each codon in sequence contained within <dnafile>
if [[ $# != 1 ]]
then echo "Format is: countcodons <dnafile>"
exit
fi
nucleos=(a c g t)
allCods=()
#mix and match nucleotides to create all codons
for x in {0..3}
do
for y in {0..3}
do
for z in {0..3}
do
perm=${nucleos[$x]}${nucleos[$y]}${nucleos[$z]}
allCods=("${allCods[#]}" "$perm")
done
done
done
#for each codon, use grep to count # of occurances in file
len=${#allCods[*]}
for (( n=0; n<len; n++ ))
do
codon=${allCods[$n]}
occs=`cat "$1" | grep -ioc "$codon" | wc -l`
echo "$codon appears: $occs"
# if (( $occs > 0 ))
# then
# echo "$codon : $occs"
# fi
done
exit
You're generating your sequences in lowercase. Your code greps for gct, not GCT. You want to add the -i switch to grep. Try:
occs=`grep -ioc $codon $1`
You've got your logic backwards - you shouldn't have to read your input file once for every codon, you should only have to read it once and check each line for every codon.
You didn't supply any sample input or expected output so it's untested but something like this is the right approach:
awk '
BEGIN {
nucleosStr="a c g t"
split(nucleosStr,nucleos)
#mix and match nucleotides to create all codons
for (x in nucleos) {
for (y in nucleos) {
for (z in nucleos) {
perm = nucleos[x] nucleos[y] nucleos[z]
allCodsStr = allCodsStr (allCodsStr?" ":"") perm
}
}
}
split(allCodsStr,allCods)
}
{
#for each codon, count # of occurances in file
for (n in allCods) {
codon = allCods[n]
if ( tolower($0) ~ codon ) {
occs[n]++
}
}
}
END {
for (n in allCods) {
printf "%s appears: %d\n", allCods[n], occs[n]
}
}
' "$1"
I expect you'll see a huge performance improvement with that approach if your file is moderately large.
Try:
occs=`cat $1 | grep -o $codon | wc -l | tr -d ' '`
The problem is that wc indents the output, so $occs has a bunch of spaces at the beginning.

Searching in bash shell

I have a text file.
Info in text file is
Book1:Author1:10.50:50:5
Book2:Author2:4.50:30:10
First one is book name, second is author name, third is the price, fourth is the quantity and fifth is the quantity sold.
Currently I have this set of codes
function search_book
{
read -p $'Title: ' Title
read -p $'Author: ' Author
if grep -Fq "${Title}:${Author}" BookDB.txt
then
record=grep -c "${Title}:${Author}" BookDB.txt
echo "Found '" $record "' record(s)"
else
echo "Book not found"
fi
}
for $record, I am trying the count the number of lines that is found. Did I do the right thing for it because when I run this code, it just shows error command -c.
When i did this
echo "Found"
grep -c "${Title}" BookDB.txt
echo "record(s)"
It worked, but the output is
Found
1
record(s)
I would like them to be together
Can I also add grep -i to grep -Fq in order to make all into small letters for better searching?
Lets say if I want to search Book1 and Author1, if I enter 'ok' for title and 'uth' for author, is there any % command to add to the title to search in the middle of the title and author?
The expected output is also expected to be..
Found 1 record(s)
Book1,Author1,$10.50,50,5.
Is there any where I can change the : delimiter to ,?
And also adding $ to the 3rd column which is the rice?
Please help..
Changing record=grep -c "${Title}:${Author}" BookDB.txt to record=$(grep -c "${Title}:${Author}" BookDB.txt) will fix the error. record=$(cmd) means assigning the output of command cmd to the variable record. Without that, shell will interpret record=grep -c ... as a command -c prepended by a environment variable setting(record=grep).
BTW, since your DB format is column-oriented text data, awk should be a better tool. Sample code:
function search_book
{
read -p $'Title: ' Title
read -p $'Author: ' Author
awk -F: '{if ($1 == "'"$Title"'" && $2 ~ "'"$Author"'") {count+=1; output=output "\n" $0} }
END {
if (count > 0) {print "found", count, "record(s)\n", output}
else {print "Book not found";}}' BookDB.txt
}
As you can see, using awk makes it easier to change delimiter(e.g. awk -F, for comma delimiter), and also makes the program more robust(e.g. it restricts the matching string to the first two fields). If you only need fuzzy match instead of exact match, you could change == to ~ in condition.
The "unnamed command -c" error can be avoided by enclosing the right part of the assignment in backticks or "$()", e.g.:
record=`grep -ic "${Title}:${Author}" BookDB.txt`
record=$(grep -ic "${Title}:${Author}" BookDB.txt)
Also, this snippet shows that -i is perfectly fine. However, please note that both grep commands should use the same list of flags (-F is missing in the 2nd one) - except for -q, of course.
Anyway, performing grep twice is probably not the best way to go. What about...
record=`grep -ic "${Title}:${Author}" BookDB.txt 2>/dev/null`
if [ ! -z "$record" ]; then ...
... or something like that?
By the way: If you omit -F you allow the user to operate with regular expressions. This would not only provide wildcards but also the possibility for more complex patterns. You could also apply an option to your script that decides whether to use -F or not..
Last but not least: To modify the lines, in order to change the delimiter or manipulate the columns at all, you could look into the manual pages or awk(1) or cut(1), at least. Although I believe that a more sophisticated language is more suitable here, e.g. perl(1) or python(1), especially when the script is to be extended with more features.
to add to the answer(s) above (this started as a comment, but it grew...) :
the $() form is preferred:
- it allows nesting,
- and it simplifies a lot the use of " and ' (each "level" of nesting see them at their level, so to speak). Tough to do with as using nested quotes and single-quotes becomes a nightmare of` and \\... depending on the "level of subshell" they are to be interpreted in...
ex: (trying to only grep once)
export results="$(grep -i "${Title}:${Author}" BookDB.txt)" ;
export nbresults=$(echo "${results}" | wc -l) ;
printf "Found %8s record(s)\n" "nbresults" ;
echo "$nbresults" ;
or, if too many results to fit in variable:
export tmpresults="/tmp/results.$$"
grep -i "${Title}:${Author}" BookDB.txt > "${tmpresults}"
export nbresults=$(wc -l "${tmpresults}") ;
printf "Found %8s record(s)\n" "nbresults" ;
cat "${tmpresults}" ;
rm -f "${tmpresults}" ;
Note: I use " a lot (except on the wc -l line) to illustrate it could be needed sometimes (not in all the cases above!) to keep spaces, newlines, etc. (And I purposely drop it for nbresults so that it only contain the number of lines, not the preceding spaces).

bashscript for file search and replace!

Hey I try to write a littel bash script. This should copy a dir and all files in it. Then it should search each file and dir in this copied dir for a String (e.g #ForTestingOnly) and then this save the line number. Then it should go on and count each { and } as soon as the number is equals it should save againg the line number. => it should delete all the lines between this 2 numbers.
I'm trying to make a script which searchs for all this annotations and then delete the method which is directly after this ano.
Thx for help...
so far I have:
echo "please enter dir"
read dir
newdir="$dir""_final"
cp -r $dir $newdir
cd $newdir
grep -lr -E '#ForTestingOnly' * | xargs sed -i 's/#ForTestingOnly//g'
now with grep I can search and replace the #ForTestingOnly anot. but I like to delete this and the following method...
Give this a try. It's oblivious to braces in comments and literals, though, as David Gelhar warned. It only finds and deletes the first occurrence of the "#ForTestingOnly" block (under the assumption that there will only be one anyway).
#!/bin/bash
find . -maxdepth 1 | while read -r file
do
open=0 close=0
# start=$(sed -n '/#ForTestingOnly/{=;q}' "$file")
while read -r line
do
case $line in
*{*) (( open++ )) ;;
*}*) (( close++ ));;
'') : ;; # skip blank lines
*) # these lines contain the line number that the sed "=" command printed
if (( open == close ))
then
break
fi
;;
esac
# split braces onto separate lines dropping all other chars
# print the line number once per line that contains either { or }
# done < <(sed -n "$start,$ { /[{}]/ s/\([{}]\)/\1\n/g;ta;b;:a;p;=}" "$file")
done < <(sed -n "/#ForTestingOnly/,$ { /[{}]/ s/\([{}]\)/\1\n/g;ta;b;:a;p;=}" "$file")
end=$line
# sed -i "${start},${end}d" "$file"
sed -i "/#ForTestingOnly/,${end}d" "$file"
done
Edit: Removed one call to sed (by commenting out and replacing a few lines).
Edit 2:
Here's a breakdown of the main sed line:
sed -n "/#ForTestingOnly/,$ { /[{}]/ s/\([{}]\)/\1\n/g;ta;b;:a;p;=}" "$file"
-n - only print lines when explicitly requested
/#ForTestingOnly/,$ - from the line containing "#ForTestingOnly" to the end of the file
s/ ... / ... /g perform a global (per-line) substitution
\( ... \) - capture
[{}] - the characters that appear in the list bewteen the square brackets
\1\n - substitute what was captured plus a newline
ta - if a substitution was made, branch to label "a"
b - branch (no label means "to the end and begin the per-line cycle again for the next line) - this branch functions as an "else" for the ta, I could have used T instead of ta;b;:a, but some versions of sed don't support T
:a - label "a"
p - print the line (actually, print the pattern buffer which now consists of possibly multiple lines with a "{" or "}" on each one)
= - print the current line number of the input file
The second sed command simply says to delete the lines starting at the one that has the target string and ending at the line found by the while loop.
The sed command at the top which I commented out says to find the target string and print the line number it's on and quit. That line isn't necessary since the main sed command is taking care of starting in the right place.
The inner whileloop looks at the output of the main sed command and increments counters for each brace. When the counts match it stops.
The outer while loop steps through all the files in the current directory.
I fixed the bugs in the old version. The new versions has two scripts: an awk script and a bash driver.
The driver is:
#!/bin/bash
AWK_SCRIPT=ann.awk
for i in $(find . -type f -print); do
while [ 1 ]; do
cmd=$(awk -f $AWK_SCRIPT $i)
if [ -z "$cmd" ]; then
break
else
eval $cmd
fi
done
done
the new awk script is:
BEGIN {
# line number where we will start deleting
start = 0;
}
{
# check current line for the annotation
# we're looking for
if($0 ~ /#ForTestingOnly/) {
start = NR;
found_first_open_brace = 0;
num_open = 0;
num_close = 0;
}
if(start != 0) {
if(num_open == num_close && found_first_open_brace == 1) {
print "sed -i \'\' -e '" start "," NR " d' " ARGV[1];
start = 0;
exit;
}
for(i = 1; i <= length($0); i++) {
c = substr($0, i, 1);
if(c == "{") {
found_first_open_brace = 1;
num_open++;
}
if(c == "}") {
num_close++;
}
}
}
}
Set the path to the awk script in the driver then run the driver in the root dir.

Resources