delete rows after specific character | awk - bash

I am writing a Bash script and,
I need to remove all lines in between TER, including 'TER's
Input File :
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
TER
ATOM 1 HO5' A 1 3.429 -7.861 3.641 1.00 0.00 H
ATOM 2 O5' A 1 4.232 -7.360 3.480 1.00 0.00 O
ATOM 3 C5' A 1 5.480 -8.064 3.350 1.00 0.00 C
ATOM 4 H5' A 1 5.429 -8.766 2.518 1.00 0.00 H
TER
Expected output:
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
I found
sed '/TER/,$d' ${myArray[j]}.txt >> ${MyArray[j]}.txt ### ${MyArray[j]} file name through an array
But this does not work, I think awk will work with Bash Script. help Thanks

You can just use sed like this:
sed -i.bak '/^TER/,/^TER/d' "${myArray[j]}.txt"
cat "${myArray[j]}.txt"
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H

sed '/TER/,/TER/d'
echo
"ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
TER
ATOM 1 HO5' A 1 3.429 -7.861 3.641 1.00 0.00 H
ATOM 2 O5' A 1 4.232 -7.360 3.480 1.00 0.00 O
ATOM 3 C5' A 1 5.480 -8.064 3.350 1.00 0.00 C
ATOM 4 H5' A 1 5.429 -8.766 2.518 1.00 0.00 H
TER" |sed '/TER/,/TER/d'
######################################################################################
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
sed '/Start Pattern/,/End Pattern/d'

It can be done like this
sed '/TER/,$d' ${myArray[j]}.txt > tmp.txt #note only one " > "
mv tmp.txt ${myArray[j]}.txt

awk also provides a simple solution using a flag to control printing. Below the skip variable is used as a flag. If 1 the lines are skipped, on the transition from 1 to 0, the script exits.
awk -v skip=0 '$1=="TER"{skip=skip?1:0; if (!skip)exit}1' file
Above $1=="TER" is used to match lines (records) where the first field is TER (this disambiguates between "TER" and "TERMINAL", etc...) Within the rule, the ternary skip=skip?1:0 sets skip=1 the first time "TER" is encountered and to 0 on the next. If skip==0 the script exits. The 1 at the end is just shorthand for print.
Example Use/Output
Using your data in file, you would get:
$ awk -v skip=0 '$1=="TER"{skip=skip?1:0; if (!skip)exit}1' file
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H

Related

How do I return a varying number as a variable in a string found in another file that otherwise stays constant (BASH)?

I have a file that contains text like this (only a portion of it here) and want to find the ATOM # associated with the O5' line (in this case "2"). I would then like to store this number as a variable for future use. Note that the data below is stored in another file titled "xyz.file" for example. The number of spaces between "ATOM" and the column the number of interest is found in may vary as the number of interest's value changes.
ATOM 1 HO5' G5 1 7.415 -9.123 -8.109 1.00 0.00
ATOM 2 O5' G5 1 7.997 -8.960 -8.863 1.00 0.00
ATOM 3 C5' G5 1 9.136 -9.784 -8.729 1.00 0.00
ATOM 4 H5' G5 1 9.679 -9.808 -9.673 1.00 0.00
ATOM 5 H5'' G5 1 8.814 -10.797 -8.484 1.00 0.00
ATOM 6 C4' G5 1 10.067 -9.272 -7.628 1.00 0.00
ATOM 7 H4' G5 1 10.847 -10.015 -7.448 1.00 0.00
ATOM 8 O4' G5 1 10.700 -8.053 -7.990 1.00 0.00
ATOM 9 C1' G5 1 10.866 -7.262 -6.821 1.00 0.00
ATOM 10 H1' G5 1 11.907 -6.970 -6.696 1.00 0.00
ATOM 11 N9 G5 1 10.027 -6.048 -6.896 1.00 0.00
An awk one-liner:
n=$(awk '$3 == "O5'\''" {print $2; quit}' file)
echo $n
prints
2

How to replace a multiple columns with others using bash?

I have a text file that contains data arranged in columns, and I need to replace some columns with others, and to be specific, xyz coordinates. What I'm looking for is described in the image below.(replace the red rectangle number 1 with the green rectangle number 2).
HETATM 1 C LIG 1 -0.517 1.592 -0.048 1.00 0.00 0.212 A
HETATM 2 C LIG 1 0.017 -0.536 0.534 1.00 0.00 0.149 A
HETATM 3 C LIG 1 1.133 0.155 0.029 1.00 0.00 0.212 A
HETATM 4 N LIG 1 -1.027 0.379 0.499 1.00 0.00 -0.337 N
HETATM 5 N LIG 1 0.789 1.466 -0.324 1.00 0.00 -0.219 NA
HETATM 6 C LIG 1 -2.429 0.112 0.889 1.00 0.00 0.221 C
HETATM 7 C LIG 1 -3.179 -0.453 -0.210 1.00 0.00 -0.097 C
HETATM 8 C LIG 1 -3.805 -0.925 -1.124 1.00 0.00 0.014 C
HETATM 9 N LIG 1 2.482 -0.388 -0.118 1.00 0.00 -0.095 N
HETATM 10 O LIG 1 2.619 -1.549 0.253 1.00 0.00 -0.530 OA
HETATM 11 O LIG 1 3.362 0.305 -0.578 1.00 0.00 -0.530 OA
ATOM 1 C LIG 1 -13.469 13.704 72.248 -0.37 -0.04 +0.212 75.145
ATOM 2 C LIG 1 -14.243 15.824 72.493 -0.41 -0.03 +0.149 75.145
ATOM 3 C LIG 1 -15.124 15.039 71.727 -0.40 -0.04 +0.212 75.145
ATOM 4 N LIG 1 -13.200 14.974 72.836 -0.28 +0.06 -0.337 75.145
ATOM 5 N LIG 1 -14.635 13.735 71.586 -0.32 +0.05 -0.219 75.145
ATOM 6 C LIG 1 -11.994 15.348 73.608 -0.46 -0.02 +0.221 75.145
ATOM 7 C LIG 1 -12.341 15.781 74.943 -0.66 +0.01 -0.097 75.145
ATOM 8 C LIG 1 -12.628 16.141 76.055 -0.66 -0.00 +0.014 75.145
ATOM 9 N LIG 1 -16.387 15.490 71.145 -0.60 +0.01 -0.095 75.145
ATOM 10 O LIG 1 -17.127 14.595 70.751 -0.10 +0.02 -0.530 75.145
ATOM 11 O LIG 1 -16.631 16.674 71.082 -0.58 -0.08 -0.530 75.145
Assuming that files have the same length, you could merge them with paste. and then extract columns in the desired order:
paste file1.txt file2.txt|awk '{print $1, $2, $3, $4, $5, $18, $19, $20, $9, $10, $11, $12}'
It's not clear if you are trying to align the rows contextually, but if you are literally just wanting to replace columns 6, 7, and 8 with the columns from the same row in the other file, you can just do something like:
$ cat file1
HETATM 1 C LIG 1 -0.517 1.592 -0.048 1.00 0.00 0.212 A
HETATM 2 C LIG 1 0.017 -0.536 0.534 1.00 0.00 0.149 A
HETATM 3 C LIG 1 1.133 0.155 0.029 1.00 0.00 0.212 A
HETATM 4 N LIG 1 -1.027 0.379 0.499 1.00 0.00 -0.337 N
HETATM 5 N LIG 1 0.789 1.466 -0.324 1.00 0.00 -0.219 NA
HETATM 6 C LIG 1 -2.429 0.112 0.889 1.00 0.00 0.221 C
HETATM 7 C LIG 1 -3.179 -0.453 -0.210 1.00 0.00 -0.097 C
HETATM 8 C LIG 1 -3.805 -0.925 -1.124 1.00 0.00 0.014 C
HETATM 9 N LIG 1 2.482 -0.388 -0.118 1.00 0.00 -0.095 N
HETATM 10 O LIG 1 2.619 -1.549 0.253 1.00 0.00 -0.530 OA
HETATM 11 O LIG 1 3.362 0.305 -0.578 1.00 0.00 -0.530 OA
$ cat file2
ATOM 1 C LIG 1 -13.469 13.704 72.248 -0.37 -0.04 +0.212 75.145
ATOM 2 C LIG 1 -14.243 15.824 72.493 -0.41 -0.03 +0.149 75.145
ATOM 3 C LIG 1 -15.124 15.039 71.727 -0.40 -0.04 +0.212 75.145
ATOM 4 N LIG 1 -13.200 14.974 72.836 -0.28 +0.06 -0.337 75.145
ATOM 5 N LIG 1 -14.635 13.735 71.586 -0.32 +0.05 -0.219 75.145
ATOM 6 C LIG 1 -11.994 15.348 73.608 -0.46 -0.02 +0.221 75.145
ATOM 7 C LIG 1 -12.341 15.781 74.943 -0.66 +0.01 -0.097 75.145
ATOM 8 C LIG 1 -12.628 16.141 76.055 -0.66 -0.00 +0.014 75.145
ATOM 9 N LIG 1 -16.387 15.490 71.145 -0.60 +0.01 -0.095 75.145
ATOM 10 O LIG 1 -17.127 14.595 70.751 -0.10 +0.02 -0.530 75.145
ATOM 11 O LIG 1 -16.631 16.674 71.082 -0.58 -0.08 -0.530 75.145
$ awk '{getline s < "file2"; split(s, a); $6 = a[6]; $7 = a[7]; $8 = a[8]}1' file1
HETATM 1 C LIG 1 -13.469 13.704 72.248 1.00 0.00 0.212 A
HETATM 2 C LIG 1 -14.243 15.824 72.493 1.00 0.00 0.149 A
HETATM 3 C LIG 1 -15.124 15.039 71.727 1.00 0.00 0.212 A
HETATM 4 N LIG 1 -13.200 14.974 72.836 1.00 0.00 -0.337 N
HETATM 5 N LIG 1 -14.635 13.735 71.586 1.00 0.00 -0.219 NA
HETATM 6 C LIG 1 -11.994 15.348 73.608 1.00 0.00 0.221 C
HETATM 7 C LIG 1 -12.341 15.781 74.943 1.00 0.00 -0.097 C
HETATM 8 C LIG 1 -12.628 16.141 76.055 1.00 0.00 0.014 C
HETATM 9 N LIG 1 -16.387 15.490 71.145 1.00 0.00 -0.095 N
HETATM 10 O LIG 1 -17.127 14.595 70.751 1.00 0.00 -0.530 OA
HETATM 11 O LIG 1 -16.631 16.674 71.082 1.00 0.00 -0.530 OA

Is there any Shell script to replace values of File1.dat to file2.dat with out changing the file format for each line

I have a file1.dat and a file2.dat containing values. I want to replace the values of the file2.dat with the file1.dat in the first column by changing the file format and data.
I tried this awk command, but problem is its changing the file format and the entire first column is getting changed.
awk 'NR==FNR{a[NR]=$0;next}{$1=a[FNR]}1' file1.dat file2.dat > result.dat
File1.dat (input):
A123456789 1 C HIE 1 48.343 23.545 32.02 1.00 0.00 H
A875678235 3 C PHE 1 48.343 23.545 32.02 1.00 0.00 C
A907654234 4 N ALA 1 48.343 23.545 32.02 1.00 0.00 N
A907863544 5 B VAL 1 48.343 23.545 32.02 1.00 0.00 B
File2.dat (input):
987654321
567890123
098765432
890765348
Desired output:
A987654321 1 C HIE 1 48.343 23.545 32.02 1.00 0.00 H
A567890123 3 C PHE 1 48.343 23.545 32.02 1.00 0.00 C
A098765432 4 N ALA 1 48.343 23.545 32.02 1.00 0.00 N
A890765348 5 B VAL 1 48.343 23.545 32.02 1.00 0.00 B
If you want to keep the first character of column1 (the A) in the first file and assuming
it's okay to use tabs to separate the fields:
awk -v OFS='\t' '
NR==FNR{ a[FNR]=$1; next }
{ $1=substr($1,1,1) a[FNR] }1
' file2.dat file1.dat > result.dat
This might work for you (GNU parallel):
parallel echo {=1 's/^(.)\S+/$1$arg[2]/' =} :::: file1 ::::+ file2
Join the two input file by using the ::::+ operator and replace the last part of the first field by the file2 argument.
Alternative using cat & sed:
cat -n file2 | sed -E 's#\t(.*)#s/[0-9]+/\1/#' | sed -Ef - file1
Prepend line numbers to values in file2 and then replace the introduced tab and the following value by a sed command that replaces the first occurrence of multiple integers by that value. This command is piped into a second invocation of sed that acts on file1. The overall result is a sed command that replaces the first number in each line in file1 by the number in the same line in file2.
These will both work with whatever spaces you have in your input as they don't change any of those spaces or make any assumptions about what they are:
$ paste file2 file1 | sed 's/\([^\t]*\)\t\(.\)[^[:space:]]*/\2\1/'
A987654321 1 C HIE 1 48.343 23.545 32.02 1.00 0.00 H
A567890123 3 C PHE 1 48.343 23.545 32.02 1.00 0.00 C
A098765432 4 N ALA 1 48.343 23.545 32.02 1.00 0.00 N
A890765348 5 B VAL 1 48.343 23.545 32.02 1.00 0.00 B
or if you prefer an awk solution:
$ awk 'NR==FNR{a[NR]=$1;next} {print substr($0,1,1) a[FNR] substr($0,length($1)+1)}' file2 file1
A987654321 1 C HIE 1 48.343 23.545 32.02 1.00 0.00 H
A567890123 3 C PHE 1 48.343 23.545 32.02 1.00 0.00 C
A098765432 4 N ALA 1 48.343 23.545 32.02 1.00 0.00 N
A890765348 5 B VAL 1 48.343 23.545 32.02 1.00 0.00 B
The problem you were having is that any time you modify a field (e.g. $1) awk reconstructs the record which, with the default FS and OFS, replaces all contiguous chains of white space with a single blank char. If you modify the record ($0) instead of any specific field that doesn't happen.

remove space from specific column by bash

I am new of bash command and I really appreciate your help.
I have a file like this
ATOM 1 N LYS P1852 10.932 0.523 -24.701 1.00 0.00
ATOM 2 HN1 LYS P1852 11.571 0.864 -25.419 1.00 0.00
ATOM 3 HN2 LYS P1852 10.431 1.305 -24.278 1.00 0.00
ATOM 4 HN3 LYS P1852 10.154 0.023 -25.132 1.00 0.00
ATOM 5 CA LYS P1852 11.556 -0.319 -23.640 1.00 0.00
and I need to remove space from specific position (position 30 let say) for all the lines. The output has to be as follow:
ATOM 1 N LYS P1852 10.932 0.523 -24.701 1.00 0.00
ATOM 2 HN1 LYS P1852 11.571 0.864 -25.419 1.00 0.00
ATOM 3 HN2 LYS P1852 10.431 1.305 -24.278 1.00 0.00
ATOM 4 HN3 LYS P1852 10.154 0.023 -25.132 1.00 0.00
ATOM 5 CA LYS P1852 11.556 -0.319 -23.640 1.00 0.00
I was trying sed and other commands but no solution until now has worked.
Thanks you
You can use cut:
cut --complement -c 30 input.txt
From the manual:
-c, --characters=LIST
select only these characters
--complement
complement the set of selected bytes, characters or fields
--complement is GNU cut specific, if that is not available:
cut -c -29,31- input.txt
Above commands remove any character at position 30. If you only want to remove space:
sed -E 's/^(.{29}) /\1/' input.txt

Grep not parsing the whole file

I want to use grep to pick lines not containing "WAT" in a file containing 425409 lines with a file size of 26.8 MB, UTF8 encoding.
The file looks like this
>ATOM 1 N ALA 1 9.979 -15.619 28.204 1.00 0.00
>ATOM 2 H1 ALA 1 9.594 -15.053 28.938 1.00 0.00
>ATOM 3 H2 ALA 1 9.558 -15.358 27.323 1.00 0.00
>ATOM 12 O ALA 1 7.428 -16.246 28.335 1.00 0.00
>ATOM 13 N HID 2 7.563 -18.429 28.562 1.00 0.00
>ATOM 14 H HID 2 6.557 -18.369 28.638 1.00 0.00
>ATOM 15 CA HID 2 8.082 -19.800 28.535 1.00 0.00
>ATOM 24 HE1 HID 2 8.603 -23.670 33.041 1.00 0.00
>ATOM 25 NE2 HID 2 8.012 -23.749 30.962 1.00 0.00
>ATOM 29 O HID 2 5.854 -20.687 28.537 1.00 0.00
>ATOM 30 N GLN 3 7.209 -21.407 26.887 1.00 0.00
>ATOM 31 H GLN 3 8.168 -21.419 26.566 1.00 0.00
>ATOM 32 CA GLN 3 6.271 -22.274 26.157 1.00 0.00
**16443 lines**
>ATOM 16425 C116 PA 1089 -34.635 6.968 -0.185 1.00 0.00
>ATOM 16426 H16R PA 1089 -35.669 7.267 -0.368 1.00 0.00
>ATOM 16427 H16S PA 1089 -34.579 5.878 -0.218 1.00 0.00
>ATOM 16428 H16T PA 1089 -34.016 7.366 -0.990 1.00 0.00
>ATOM 16429 C115 PA 1089 -34.144 7.493 1.177 1.00 0.00
>ATOM 16430 H15R PA 1089 -33.101 7.198 1.305 1.00 0.00
>ATOM 16431 H15S PA 1089 -34.179 8.585 1.197 1.00 0.00
>ATOM 16432 C114 PA 1089 -34.971 6.910 2.342 1.00 0.00
>ATOM 16433 H14R PA 1089 -35.147 5.847 2.166 1.00 0.00
**132284 lines**
>ATOM 60981 O WAT 7952 -46.056 -5.515 -56.245 1.00 0.00
>ATOM 60982 H1 WAT 7952 -45.185 -5.238 -56.602 1.00 0.00
>ATOM 60983 H2 WAT 7952 -46.081 -6.445 -56.561 1.00 0.00
>TER
>ATOM 60984 O WAT 7953 -51.005 -3.205 -46.712 1.00 0.00
>ATOM 60985 H1 WAT 7953 -51.172 -3.159 -47.682 1.00 0.00
>ATOM 60986 H2 WAT 7953 -51.051 -4.177 -46.579 1.00 0.00
>TER
>ATOM 60987 O WAT 7954 -49.804 -0.759 -49.284 1.00 0.00
>ATOM 60988 H1 WAT 7954 -48.962 -0.677 -49.785 1.00 0.00
>ATOM 60989 H2 WAT 7954 -49.868 0.138 -48.903 1.00 0.00
**many lines until the end**
>TER
>END
I have used grep -v 'WAT' file.txt but it only returned me the first 16179 lines not containing "WAT" and I can see that there are more lines not containing "WAT". For instance, the following line (and many others) does not appear in the output:
> ATOM 16425 C116 PA 1089 -34.635 6.968 -0.185 1.00 0.00
In order to try to figure out what was happening I've tried grep ' ' file.txt. This command should return every line in the file, but it only returned he first 16179 lines too.
I've also tried to use tail -408977 file.txt | grep ' ' and it returned me all lines recalled by tail. Then I've tried tail -408978 file.txt | grep ' ' and the output was totally empty, zero lines.
I am working on a "normal" 64 bit system, Kubuntu.
Thanks a lot for the help!
When I try I get
$: grep WAT file.txt
Binary file file.txt matches
grep is assuming it's a binary file. add -a
-a, --text equivalent to --binary-files=text
$: grep -a WAT file.txt|head -3
ATOM 29305 O WAT 4060 -75.787 -79.125 25.925 1.00 0.00 O
ATOM 29306 H1 WAT 4060 -76.191 -78.230 25.936 1.00 0.00 H
ATOM 29307 H2 WAT 4060 -76.556 -79.670 25.684 1.00 0.00 H
Your file has 2 NULLs each at the end of lines 16426, 16428, 16430, and 16432.
$: tr "\0" # <file.txt|grep -n #
16426:ATOM 16421 KA CAL 1085 -20.614 -22.960 18.641 1.00 0.00 ##
16428:ATOM 16422 KA CAL 1086 20.249 21.546 19.443 1.00 0.00 ##
16430:ATOM 16423 KA CAL 1087 22.695 -19.700 19.624 1.00 0.00 ##
16432:ATOM 16424 KA CAL 1088 -22.147 19.317 17.966 1.00 0.00 ##

Resources