why this code is not working? - shell

#! /bin/ksh
awk -F':' '{
if( match($0,":server:APBS") )
{
print x;
x=$0;
}
}' iws_config4.dat

You write your program in native awk as:
awk -F':' '/:server:APBS/ { print x; x=$0; }' iws_config4.dat
An awk program consists of patterns and the actions to take when those patterns match. What you wrote is tantamount to abusing the built-in facilities of awk.
Given that you're only interested in $0, the field separator is redundant, so the -F':' argument could go.
What your program does is:
read a line.
if it matches the pattern
print the last line that matched
save the current line
So, if your input contains one match, you see nothing output (or, more precisely, a blank line). If your input contains two matches, you see the first; if three, the first two; and so on.
Given the desire to emulate 'grep -B1 :server:APBS iws_config4.dat', you can do:
awk '/:server:APBS/ { print old }
{ old = $0 }' iws_config4.dat
If the line matches, print the old saved line.
Regardless, store the current line as the (new) old saved line.
It probably can all be flattened onto one line. It is crucial that the pattern match precede the unconditional save.
Given the script k.awk and data file iws_config4.dat shown, I get the output I expect. What do you get? What do you expect?
$ cat iws_config4.dat
One line of text followed by the marker
:server:APBS blah blah bah 1
:server:APBS blah blah bah 2
More text
Blah blah blah
$ cat k.awk
awk '/:server:APBS/ { print old }
{ old = $0 }' iws_config4.dat
$ sh k.awk
One line of text followed by the marker
:server:APBS blah blah bah 1
$
If blank lines could be the trouble, only save non-blank lines:
awk '/:server:APBS/ { print old }
/[^ ]/ { old = $0 }' iws_config4.dat
The second line now is only active on lines that contain at least one non-blank character, so only those lines will be saved.

Related

Grep a line from a file and replace a substring and append the line to the original file in bash?

This is what I want to do.
for example my file contains many lines say :
ABC,2,4
DEF,5,6
GHI,8,9
I want to copy the second line and replace a substring EF(all occurrences) and make it XY and add this line back to the file so the file looks like this:
ABC,2,4
DEF,5,6
GHI,8,9
DXY,5,6
how can I achieve this in bash?
EDIT : I want to do this in general and not necessarily for the second line. I want to grep EF, and do the substition in whatever line is returned.
Here's a simple Awk script.
awk -F, -v pat="EF" -v rep="XY" 'BEGIN { OFS=FS }
$1 ~ pat { x = $1; sub(pat, rep, x); y = $0; sub($1, x, y); a[++n] = y }
1
END { for(i=1; i<=n; i++) print a[i] }' file
The -F , says to use comma as the input field separator (internal variable FS) and in the BEGIN block, we also set that as the output field separator (OFS).
If the first field matches the pattern, we copy the first field into x, substitute pat with rep, and then substitute the first field of the whole line $0 with the new result, and append it to the array a.
1 is a shorthand to say "print the current input line".
Finally, in the END block, we output the values we have collected into a.
This could be somewhat simplified by hardcoding the pattern and the replacement, but I figured it's more useful to make it modular so that you can plug in whatever values you need.
While this all could be done in native Bash, it tends to get a bit tortured; spending the 30 minutes or so that it takes to get a basic understanding of Awk will be well worth your time. Perhaps tangentially see also while read loop extremely slow compared to cat, why? which explains part of the rationale for preferring to use an external tool like Awk over a pure Bash solution.
You can use the sed command:
sed '
/EF/H # copy all matching lines
${ # on the last line
p # print it
g # paste the copied lines
s/EF/XY/g # replace all occurences
s/^\n// # get rid of the extra newline
}'
As a one-liner:
sed '/EF/H;${p;g;s/EF/XY/g;s/^\n//}' file.csv
If ed is available/acceptable, something like:
#!/bin/sh
ed -s file.txt <<-'EOF'
$kx
g/^.*EF.*,.*/t'x
'x+;$s/EF/XY/
,p
Q
EOF
Or in one-line.
printf '%s\n' '$kx' "g/^.*EF.*,.*/t'x" "'x+;\$s/EF/XY/" ,p Q | ed -s file.txt
Change Q to w if in-place editing is needed.
Remove the ,p to silence the output.
Using BASH:
#!/bin/bash
src="${1:-f.dat}"
rep="${2:-XY}"
declare -a new_lines
while read -r line ; do
if [[ "$line" == *EF* ]] ; then
new_lines+=("${line/EF/${rep}}")
fi
done <"$src"
printf "%s\n" "${new_lines[#]}" >> "$src"
Contents of f.dat before:
ABC,2,4
DEF,5,6
GHI,8,9
Contents of f.dat after:
ABC,2,4
DEF,5,6
GHI,8,9
DXY,5,6
Following on from the great answer by #tripleee, you can create a variation that uses a single call to sub() by outputting all records before the substitution is made, then add the updated record to the array to be output with the END rule, e.g.
awk -F, '1; /EF/ {sub(/EF/,"XY"); a[++n]=$0} END {for(i=1;i<=n;i++) print a[i]}' file
Example Use/Output
An expanded input based on your answer to my comment below the question that all occurrences of EF will be replaced with XY in all records, e.g.
$ cat file
ABC,2,4
DEF,5,6
GHI,8,9
EFZ,3,7
Use and output:
$ awk -F, '1; /EF/ {sub(/EF/,"XY"); a[++n]=$0} END {for(i=1;i<=n;i++) print a[i]}' file
ABC,2,4
DEF,5,6
GHI,8,9
EFZ,3,7
DXY,5,6
XYZ,3,7
Let me know if you have questions.

delete lines if firstline matches expression, but next 2 lines do not match different expression

I have a test file in this format:
G03X22Y22.5
G01X48.5
M98P9001 (OFF)****
G00X20Y25
M98P8051 (FAST CUT)
G01X22Y34
G01X25Y33
I am trying to make a bash or MSDOS script that will :
Find all lines in the file that match : M98P9001
if the NEXT 2 LINES do not contain the code { M98P8050, M98P8080 OR M09 } Delete all 3 lines . which would result in the output :
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33
I've tried solutions with SED or AWK, but haven't gotten the right one yet:
sed -e '/M98P9001/,+2d' input.txt >> output.txt
this one will always delete all 3 lines after finding the match , but I need to only delete the lines if the next 2 lines following the match do not have a match with { M98P8050, M98P8080 OR M09 }.
a mark and sweep approach
$ awk 'NR==FNR {if(!(/M98P80[58]0|M09/ && p~/M98P80[58]0|M09/) && pp~/M98P9001/)
{a[NR]; a[NR-1]; a[NR-2]}
pp=p; p=$0; next}
!(FNR in a)' file{,}
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33
This seems to give your desired output:
awk '
/M98P9001/ {
getline l2; getline l3;
if((l2 l3)~/M98P8050|M98P8080|M09/) printf "%s\n%s\n%s\n", $0, l2, l3;
next;
}
{ print; }'
Description:
If first line pattern match, read in next two lines to variables.
Check concatenation of both lines for any of the 3 secondary patterns
If match, print all three lines, else print nothing.
go to next record.
on all other lines, print.
This might work for you (GNU sed):
sed -E ':a;N;s/\n/&/2;Ta;/^[^\n]*M98P9001/{/\n.*(M98P8050|M98P8080|M09)/!d};P;D' file
Open a three line window throughout the length of the file.
If the first line of the window contains M98P9001 and either of the second or third lines do not contain M98P8050, M98P8080 or M09 delete the entire window and repeat.
Otherwise, print/delete the first line of the window and repeat.
N.B. The idiom :a;N;s/\n/&/2;Ta tops up the three line window.

retaining text after delimiter in fasta headers using awk

I have what should be a simple problem, but my lack of awk knowledge is holding me back.
I would like to clean up the headers of a fasta file that is in this format:
>HWGG454_Clocus2_Locus3443_allele1
ATTCTACTACTACTCT
>GHW757_clocus37_Locus555662_allele2
CTTCCCTACGATG
>TY45_clocus23_Locus800_allele0
TTCTACTTCATCT
I would like to clean up each header (line starting with ">") to retain only the informative part, which is the second "_Locus*" with or without the allele part.
I thought awk would be the easy way to do this, but I cant quite get it to work.
If I wanted to retain just the first column of text up to the "_" delimiter for the header, and the sequences below, I run this (assuming this toy example is in the file test.fasta):
cat test.fasta | awk -F '_' '{print $1}'
>HWGG454
ATTCTACTACTACTCT
>GHW757
CTTCCCTACGATG
>TY45
TTCTACTTCATCT
But, what I want is to retain just the "Locus*" text, which is after the 3rd delimiter, but, using this code I get this:
cat test.fasta | awk -F '_' '{print $3}'
Locus3443
Locus555662
Locus800
What am I doing wrong here?
thanks.
I understand this to mean that you want to pick the Locus field from the header lines and leave the others unchanged. Then:
awk -F _ '/^>/ { print $3; next } 1' filename
is perhaps the easiest way. This works as follows:
/^>/ { # in lines that begin with >
print $3 # print the third field
next # and go to the next line.
}
1 # print other lines unchanged. Here 1 means true, and the
# default action (unchanged printing) is performed.
The thing to understand here is awk's control flow: awk code consists of conditions with associated actions, and the actions are performed if the condition evaluates to true.
/^>/ is a regex match over the whole record (line by default); it is true if the line begins with > (because ^ matches the beginning), so
/^>/ { print $3; next }
will make awk execute print $3; next in lines that begin with >. The less straightforward part is
1
which prints lines unchanged. We only get here if the first action was not executed (because of the next in it), and this 1 is to be read as a condition that is always true -- nonzero values being true in awk.
Now, if either the condition or the action in an awk statement are omitted, a default is used. The default action is printing the line unchanged, and this takes advantage of it. It would be equally possible to write
1 { print }
or
{ print }
In the latter case, the condition is omitted and the default condition "true" is used. 1 is the shortest variant of this and idiomatic because of it.
$ awk -F_ '{print (/^>/ ? $3 : $0)}' file
Locus3443
ATTCTACTACTACTCT
Locus555662
CTTCCCTACGATG
Locus800
TTCTACTTCATCT
You need a second awk match for the row below. e.g.
cat test.fasta | awk -F _ '/^>/ { print $3"_"$4 } /^[A-Z]/ {print $1}'
Output:
Locus3443_allele1
ATTCTACTACTACTCT
Locus555662_allele2
CTTCCCTACGATG
Locus800_allele0
TTCTACTTCATCT
If you don't want the _allele1 bit remove "_"$4 from the awk script.
You can just do a regex on each line:
$ awk '{ sub(/^.*_L/,"L"); print $0}' /tmp/fasta.txt
Locus3443_allele1
ATTCTACTACTACTCT
Locus555662_allele2
CTTCCCTACGATG
Locus800_allele0
TTCTACTTCATCT

setting the NR to 1 does not work (awk)

I have the following script in bash.
awk -F ":" '{if($1 ~ "^fall")
{ NR==1
{{printf "\t<course id=\"%s\">\n",$1} } } }' file1.txt > container.xml
So what I have a small file. If ANY line starts with fall, then I want the first field of the VERY first line.
So I did that in the code and set NR==1. However, it does not do the job!!!
Try this:
awk -F: 'NR==1 {id=$1} $1~/^fall/ {printf "\t<course id=\"%s\">\n",id}' file1.txt > container.xml
Notes:
NR==1 {id=$1}
This saves the course ID from the first line
$1~/^fall/ {printf "\t<course id=\"%s\">\n",id}
If any line begins with fall, then the course ID is printed.
The above code illustrates that awk commands can be preceded by conditions. Thus, id=$1 is executed only if we are on the first line: NR==1. If this way, it is often unnecessary to have explicit if statements.
In awk, assignment with done with = while tests for equality are done with ==.
If this doesn't do what you want, then please add sample input and corresponding desired output to the question.
awk -F: 'NR==1{x=$1}/^fail/{printf "\t<course id=\"%s\">\n",x;exit}' file
Note:
if the file has any line beginning with fail, print the 1st field in very first line in certain format (xml tag).
no matter how many lines with fail as start, it outputs the xml tag only once.
if the file has no line starts with fail, it outputs nothing.
#!awk -f
BEGIN {
FS = ":"
}
NR==1 {
foo = $1
}
/^fall/ {
printf "\t<course id=\"%s\">\n", foo
}
Also note
BUGS
The -F option is not necessary given the command line variable assignment
feature; it remains only for backwards compatibility.
awk man page

Extracting field from last row of given table using sed

I would like to write a bash script to extract a field in the last row of a table. I will illustrate by example. I have a text file containing tables with space delimited fields like ...
Table 1 (foobar)
num flag name comments
1 ON Frank this guy is frank
2 OFF Sarah she is tall
3 ON Ahmed who knows him
Table 2 (foobar)
num flag name comments
1 ON Mike he is short
2 OFF Ahmed his name is listed twice
I want to extract the first field in the last row of Table1, which is 3. Ideally I would like to be able to use any given table's title to do this. There are guaranteed carriage returns between each table. What would be the best way to accomplish this, preferably using sed and grep?
Awk is perfect for this, print the first field in the last row for each record:
$ awk '!$1{print a}{a=$1}END{print a}' file
3
2
Just from the first record:
$ awk '!$1{print a;exit}{a=$1}' file
3
Edit:
For a given table title:
$ awk -v t="Table 1" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
3
$ awk -v t="Table 2" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
2
This sed line seems to work for your sample.
table='Table 2'
sed -n "/$table"'/{n;n;:next;h;n;/^$/b last;$b last;b next;:last;g;s/^\s*\(\S*\).*/\1/p;}' file
Explanation: When we find a line matching the table name in $table, we skip that line, and the next (the field labels). Starting at :next we push the current line into the hold space, get the next line and see if it is blank or the end of the file, if not we go back to :next, push the current line into hold and get another. If it is blank or EOF, we skip to :last, pull the hold space (the last line of the table) into pattern space, chop out all but the first field and print it.
Just read each block as a record with each line as a field and then print the first sub-field of the last field of whichever record you care about:
$ awk -v RS= -F'\n' '/^Table 1/{split($NF,a," "); print a[1]}' file
3
$ awk -v RS= -F'\n' '/^Table 2/{split($NF,a," "); print a[1]}' file
2
Better tool to that is awk!
Here is a kind legible code:
awk '{
if(NR==1) {
row=$0;
next;
}
if($0=="") {
$0=row;
print $1;
} else {
row=$0;
}
} END {
if(row!="") {
$0=row;
print $1;
}
}' input.txt

Resources