How to replace a multiple columns with others using bash? - bash

I have a text file that contains data arranged in columns, and I need to replace some columns with others, and to be specific, xyz coordinates. What I'm looking for is described in the image below.(replace the red rectangle number 1 with the green rectangle number 2).
HETATM 1 C LIG 1 -0.517 1.592 -0.048 1.00 0.00 0.212 A
HETATM 2 C LIG 1 0.017 -0.536 0.534 1.00 0.00 0.149 A
HETATM 3 C LIG 1 1.133 0.155 0.029 1.00 0.00 0.212 A
HETATM 4 N LIG 1 -1.027 0.379 0.499 1.00 0.00 -0.337 N
HETATM 5 N LIG 1 0.789 1.466 -0.324 1.00 0.00 -0.219 NA
HETATM 6 C LIG 1 -2.429 0.112 0.889 1.00 0.00 0.221 C
HETATM 7 C LIG 1 -3.179 -0.453 -0.210 1.00 0.00 -0.097 C
HETATM 8 C LIG 1 -3.805 -0.925 -1.124 1.00 0.00 0.014 C
HETATM 9 N LIG 1 2.482 -0.388 -0.118 1.00 0.00 -0.095 N
HETATM 10 O LIG 1 2.619 -1.549 0.253 1.00 0.00 -0.530 OA
HETATM 11 O LIG 1 3.362 0.305 -0.578 1.00 0.00 -0.530 OA
ATOM 1 C LIG 1 -13.469 13.704 72.248 -0.37 -0.04 +0.212 75.145
ATOM 2 C LIG 1 -14.243 15.824 72.493 -0.41 -0.03 +0.149 75.145
ATOM 3 C LIG 1 -15.124 15.039 71.727 -0.40 -0.04 +0.212 75.145
ATOM 4 N LIG 1 -13.200 14.974 72.836 -0.28 +0.06 -0.337 75.145
ATOM 5 N LIG 1 -14.635 13.735 71.586 -0.32 +0.05 -0.219 75.145
ATOM 6 C LIG 1 -11.994 15.348 73.608 -0.46 -0.02 +0.221 75.145
ATOM 7 C LIG 1 -12.341 15.781 74.943 -0.66 +0.01 -0.097 75.145
ATOM 8 C LIG 1 -12.628 16.141 76.055 -0.66 -0.00 +0.014 75.145
ATOM 9 N LIG 1 -16.387 15.490 71.145 -0.60 +0.01 -0.095 75.145
ATOM 10 O LIG 1 -17.127 14.595 70.751 -0.10 +0.02 -0.530 75.145
ATOM 11 O LIG 1 -16.631 16.674 71.082 -0.58 -0.08 -0.530 75.145

Assuming that files have the same length, you could merge them with paste. and then extract columns in the desired order:
paste file1.txt file2.txt|awk '{print $1, $2, $3, $4, $5, $18, $19, $20, $9, $10, $11, $12}'

It's not clear if you are trying to align the rows contextually, but if you are literally just wanting to replace columns 6, 7, and 8 with the columns from the same row in the other file, you can just do something like:
$ cat file1
HETATM 1 C LIG 1 -0.517 1.592 -0.048 1.00 0.00 0.212 A
HETATM 2 C LIG 1 0.017 -0.536 0.534 1.00 0.00 0.149 A
HETATM 3 C LIG 1 1.133 0.155 0.029 1.00 0.00 0.212 A
HETATM 4 N LIG 1 -1.027 0.379 0.499 1.00 0.00 -0.337 N
HETATM 5 N LIG 1 0.789 1.466 -0.324 1.00 0.00 -0.219 NA
HETATM 6 C LIG 1 -2.429 0.112 0.889 1.00 0.00 0.221 C
HETATM 7 C LIG 1 -3.179 -0.453 -0.210 1.00 0.00 -0.097 C
HETATM 8 C LIG 1 -3.805 -0.925 -1.124 1.00 0.00 0.014 C
HETATM 9 N LIG 1 2.482 -0.388 -0.118 1.00 0.00 -0.095 N
HETATM 10 O LIG 1 2.619 -1.549 0.253 1.00 0.00 -0.530 OA
HETATM 11 O LIG 1 3.362 0.305 -0.578 1.00 0.00 -0.530 OA
$ cat file2
ATOM 1 C LIG 1 -13.469 13.704 72.248 -0.37 -0.04 +0.212 75.145
ATOM 2 C LIG 1 -14.243 15.824 72.493 -0.41 -0.03 +0.149 75.145
ATOM 3 C LIG 1 -15.124 15.039 71.727 -0.40 -0.04 +0.212 75.145
ATOM 4 N LIG 1 -13.200 14.974 72.836 -0.28 +0.06 -0.337 75.145
ATOM 5 N LIG 1 -14.635 13.735 71.586 -0.32 +0.05 -0.219 75.145
ATOM 6 C LIG 1 -11.994 15.348 73.608 -0.46 -0.02 +0.221 75.145
ATOM 7 C LIG 1 -12.341 15.781 74.943 -0.66 +0.01 -0.097 75.145
ATOM 8 C LIG 1 -12.628 16.141 76.055 -0.66 -0.00 +0.014 75.145
ATOM 9 N LIG 1 -16.387 15.490 71.145 -0.60 +0.01 -0.095 75.145
ATOM 10 O LIG 1 -17.127 14.595 70.751 -0.10 +0.02 -0.530 75.145
ATOM 11 O LIG 1 -16.631 16.674 71.082 -0.58 -0.08 -0.530 75.145
$ awk '{getline s < "file2"; split(s, a); $6 = a[6]; $7 = a[7]; $8 = a[8]}1' file1
HETATM 1 C LIG 1 -13.469 13.704 72.248 1.00 0.00 0.212 A
HETATM 2 C LIG 1 -14.243 15.824 72.493 1.00 0.00 0.149 A
HETATM 3 C LIG 1 -15.124 15.039 71.727 1.00 0.00 0.212 A
HETATM 4 N LIG 1 -13.200 14.974 72.836 1.00 0.00 -0.337 N
HETATM 5 N LIG 1 -14.635 13.735 71.586 1.00 0.00 -0.219 NA
HETATM 6 C LIG 1 -11.994 15.348 73.608 1.00 0.00 0.221 C
HETATM 7 C LIG 1 -12.341 15.781 74.943 1.00 0.00 -0.097 C
HETATM 8 C LIG 1 -12.628 16.141 76.055 1.00 0.00 0.014 C
HETATM 9 N LIG 1 -16.387 15.490 71.145 1.00 0.00 -0.095 N
HETATM 10 O LIG 1 -17.127 14.595 70.751 1.00 0.00 -0.530 OA
HETATM 11 O LIG 1 -16.631 16.674 71.082 1.00 0.00 -0.530 OA

Related

How do I return a varying number as a variable in a string found in another file that otherwise stays constant (BASH)?

I have a file that contains text like this (only a portion of it here) and want to find the ATOM # associated with the O5' line (in this case "2"). I would then like to store this number as a variable for future use. Note that the data below is stored in another file titled "xyz.file" for example. The number of spaces between "ATOM" and the column the number of interest is found in may vary as the number of interest's value changes.
ATOM 1 HO5' G5 1 7.415 -9.123 -8.109 1.00 0.00
ATOM 2 O5' G5 1 7.997 -8.960 -8.863 1.00 0.00
ATOM 3 C5' G5 1 9.136 -9.784 -8.729 1.00 0.00
ATOM 4 H5' G5 1 9.679 -9.808 -9.673 1.00 0.00
ATOM 5 H5'' G5 1 8.814 -10.797 -8.484 1.00 0.00
ATOM 6 C4' G5 1 10.067 -9.272 -7.628 1.00 0.00
ATOM 7 H4' G5 1 10.847 -10.015 -7.448 1.00 0.00
ATOM 8 O4' G5 1 10.700 -8.053 -7.990 1.00 0.00
ATOM 9 C1' G5 1 10.866 -7.262 -6.821 1.00 0.00
ATOM 10 H1' G5 1 11.907 -6.970 -6.696 1.00 0.00
ATOM 11 N9 G5 1 10.027 -6.048 -6.896 1.00 0.00
An awk one-liner:
n=$(awk '$3 == "O5'\''" {print $2; quit}' file)
echo $n
prints
2

bash: cat + grep to produce several replicas merging two filles

Using Linux bash command line, I need to merge two filles integrating several copies of the file1 inside the specified part of the file 2. The file 1 looks like:
ATOM 1 N SER A 1 -2.390 4.343 -17.003 1.00 27.76 N1+
ATOM 2 CA SER A 1 -2.066 5.647 -16.370 1.00 27.12 C
ATOM 3 C SER A 1 -2.394 5.608 -14.874 1.00 26.29 C
ATOM 4 O SER A 1 -3.014 4.627 -14.405 1.00 22.93 O
ATOM 5 CB SER A 1 -2.771 6.798 -17.057 1.00 28.10 C
ATOM 6 OG SER A 1 -2.538 8.023 -16.373 1.00 32.02 O
ATOM 7 N GLY A 2 -1.982 6.655 -14.162 1.00 25.31 N
ATOM 8 CA GLY A 2 -2.172 6.779 -12.716 1.00 24.93 C
ATOM 9 C GLY A 2 -0.888 6.336 -12.067 1.00 23.66 C
ATOM 10 O GLY A 2 -0.168 5.459 -12.608 1.00 27.42 O
ATOM 11 N PHE A 3 -0.636 6.866 -10.900 1.00 22.07 N
ATOM 12 CA PHE A 3 0.622 6.595 -10.191 1.00 21.70 C
ATOM 13 C PHE A 3 0.279 6.570 -8.716 1.00 20.39 C
ATOM 14 O PHE A 3 -0.265 7.544 -8.167 1.00 23.83 O
the file 2 is a multi-block, where separate parts are defined by model1,model 2, model N and separated by ENDMDL:
MODEL 1
REMARK VINA RESULT: -7.828 0.000 0.000
REMARK INTER + INTRA: -13.769
REMARK INTER: -10.110
REMARK INTRA: -3.659
REMARK UNBOUND: -3.196
ENDMDL
MODEL 2
REMARK VINA RESULT: -7.828 0.000 0.000
REMARK INTER + INTRA: -13.769
REMARK INTER: -10.110
REMARK INTRA: -3.659
REMARK UNBOUND: -3.196
ENDMDL
MODEL 3
REMARK VINA RESULT: -7.828 0.000 0.000
REMARK INTER + INTRA: -13.769
REMARK INTER: -10.110
REMARK INTRA: -3.659
REMARK UNBOUND: -3.196
ENDMDL
I need to copy several times all the containt of the file 1 into the file 2 just before the separator ENDMDL (in the second file), thus integrating several coppies of the file 1 into the file 2. Here is the example of expected output:
MODEL 1
REMARK VINA RESULT: -7.828 0.000 0.000
REMARK INTER + INTRA: -13.769
REMARK INTER: -10.110
REMARK INTRA: -3.659
REMARK UNBOUND: -3.196
ATOM 1 N SER A 1 -2.390 4.343 -17.003 1.00 27.76 N1+
ATOM 2 CA SER A 1 -2.066 5.647 -16.370 1.00 27.12 C
ATOM 3 C SER A 1 -2.394 5.608 -14.874 1.00 26.29 C
ATOM 4 O SER A 1 -3.014 4.627 -14.405 1.00 22.93 O
ATOM 5 CB SER A 1 -2.771 6.798 -17.057 1.00 28.10 C
ATOM 6 OG SER A 1 -2.538 8.023 -16.373 1.00 32.02 O
ATOM 7 N GLY A 2 -1.982 6.655 -14.162 1.00 25.31 N
ATOM 8 CA GLY A 2 -2.172 6.779 -12.716 1.00 24.93 C
ATOM 9 C GLY A 2 -0.888 6.336 -12.067 1.00 23.66 C
ATOM 10 O GLY A 2 -0.168 5.459 -12.608 1.00 27.42 O
ATOM 11 N PHE A 3 -0.636 6.866 -10.900 1.00 22.07 N
ATOM 12 CA PHE A 3 0.622 6.595 -10.191 1.00 21.70 C
ATOM 13 C PHE A 3 0.279 6.570 -8.716 1.00 20.39 C
ATOM 14 O PHE A 3 -0.265 7.544 -8.167 1.00 23.83 O
ENDMDL
MODEL 2
REMARK VINA RESULT: -7.828 0.000 0.000
REMARK INTER + INTRA: -13.769
REMARK INTER: -10.110
REMARK INTRA: -3.659
REMARK UNBOUND: -3.196
ATOM 1 N SER A 1 -2.390 4.343 -17.003 1.00 27.76 N1+
ATOM 2 CA SER A 1 -2.066 5.647 -16.370 1.00 27.12 C
ATOM 3 C SER A 1 -2.394 5.608 -14.874 1.00 26.29 C
ATOM 4 O SER A 1 -3.014 4.627 -14.405 1.00 22.93 O
ATOM 5 CB SER A 1 -2.771 6.798 -17.057 1.00 28.10 C
ATOM 6 OG SER A 1 -2.538 8.023 -16.373 1.00 32.02 O
ATOM 7 N GLY A 2 -1.982 6.655 -14.162 1.00 25.31 N
ATOM 8 CA GLY A 2 -2.172 6.779 -12.716 1.00 24.93 C
ATOM 9 C GLY A 2 -0.888 6.336 -12.067 1.00 23.66 C
ATOM 10 O GLY A 2 -0.168 5.459 -12.608 1.00 27.42 O
ATOM 11 N PHE A 3 -0.636 6.866 -10.900 1.00 22.07 N
ATOM 12 CA PHE A 3 0.622 6.595 -10.191 1.00 21.70 C
ATOM 13 C PHE A 3 0.279 6.570 -8.716 1.00 20.39 C
ATOM 14 O PHE A 3 -0.265 7.544 -8.167 1.00 23.83 O
ENDMDL
MODEL 3
REMARK VINA RESULT: -7.828 0.000 0.000
REMARK INTER + INTRA: -13.769
REMARK INTER: -10.110
REMARK INTRA: -3.659
REMARK UNBOUND: -3.196
ATOM 1 N SER A 1 -2.390 4.343 -17.003 1.00 27.76 N1+
ATOM 2 CA SER A 1 -2.066 5.647 -16.370 1.00 27.12 C
ATOM 3 C SER A 1 -2.394 5.608 -14.874 1.00 26.29 C
ATOM 4 O SER A 1 -3.014 4.627 -14.405 1.00 22.93 O
ATOM 5 CB SER A 1 -2.771 6.798 -17.057 1.00 28.10 C
ATOM 6 OG SER A 1 -2.538 8.023 -16.373 1.00 32.02 O
ATOM 7 N GLY A 2 -1.982 6.655 -14.162 1.00 25.31 N
ATOM 8 CA GLY A 2 -2.172 6.779 -12.716 1.00 24.93 C
ATOM 9 C GLY A 2 -0.888 6.336 -12.067 1.00 23.66 C
ATOM 10 O GLY A 2 -0.168 5.459 -12.608 1.00 27.42 O
ATOM 11 N PHE A 3 -0.636 6.866 -10.900 1.00 22.07 N
ATOM 12 CA PHE A 3 0.622 6.595 -10.191 1.00 21.70 C
ATOM 13 C PHE A 3 0.279 6.570 -8.716 1.00 20.39 C
ATOM 14 O PHE A 3 -0.265 7.544 -8.167 1.00 23.83 O
ENDMDL
I have tried to use cat BUT it just fused the both files together without the required replication of the first file:
cat file1.pdb file2.pdb > together.pdb
Need I pipe this to some expression of grep in order to replicate the file1 in the positions before the ENDMDL of the file 2 ?
Here is an awk solution that doesn't call unsafe system or getline:
awk 'NR==FNR {s = s $0 ORS; next} $0 == "ENDMDL" {$0 = s $0} 1' file1 file2
If you want to pass shell variable names then use:
awk 'NR==FNR {s = s $0 ORS; next}
$0 == "ENDMDL" {$0 = s $0} 1' "$file1" "$file2"
Use awk.
awk '/^ENDMDL$/ {system("cat file1.pdb");}; {print}' file2.pdb
Each line from file2 is written to standard output, but when the line matches ENDMDL, the entire contents of file1 are output first.
Some alternatives:
Replace /^ENDMDL$/ with $0 == "ENDMDL"
Replace {print} with 1. (With no explicit pattern, the action is performed. With no explicit action, the current line is printed.)
Here's a straight-forward awk solution:
awk '
BEGIN {
FS = RS = "\a"
getline contents < ARGV[2]
close(ARGV[2])
ARGV[2] = ""
RS = "\n"
}
/^ENDMDL$/ { printf "%s", contents }
{ print }
' file1 file2
The script slurps the file content (the one to be inserted) into a variable then prints it each time ENDMDL appears. I'm using the BELL character as FS and RS because you won't encounter it in a PDB file.

delete rows after specific character | awk

I am writing a Bash script and,
I need to remove all lines in between TER, including 'TER's
Input File :
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
TER
ATOM 1 HO5' A 1 3.429 -7.861 3.641 1.00 0.00 H
ATOM 2 O5' A 1 4.232 -7.360 3.480 1.00 0.00 O
ATOM 3 C5' A 1 5.480 -8.064 3.350 1.00 0.00 C
ATOM 4 H5' A 1 5.429 -8.766 2.518 1.00 0.00 H
TER
Expected output:
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
I found
sed '/TER/,$d' ${myArray[j]}.txt >> ${MyArray[j]}.txt ### ${MyArray[j]} file name through an array
But this does not work, I think awk will work with Bash Script. help Thanks
You can just use sed like this:
sed -i.bak '/^TER/,/^TER/d' "${myArray[j]}.txt"
cat "${myArray[j]}.txt"
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
sed '/TER/,/TER/d'
echo
"ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
TER
ATOM 1 HO5' A 1 3.429 -7.861 3.641 1.00 0.00 H
ATOM 2 O5' A 1 4.232 -7.360 3.480 1.00 0.00 O
ATOM 3 C5' A 1 5.480 -8.064 3.350 1.00 0.00 C
ATOM 4 H5' A 1 5.429 -8.766 2.518 1.00 0.00 H
TER" |sed '/TER/,/TER/d'
######################################################################################
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H
sed '/Start Pattern/,/End Pattern/d'
It can be done like this
sed '/TER/,$d' ${myArray[j]}.txt > tmp.txt #note only one " > "
mv tmp.txt ${myArray[j]}.txt
awk also provides a simple solution using a flag to control printing. Below the skip variable is used as a flag. If 1 the lines are skipped, on the transition from 1 to 0, the script exits.
awk -v skip=0 '$1=="TER"{skip=skip?1:0; if (!skip)exit}1' file
Above $1=="TER" is used to match lines (records) where the first field is TER (this disambiguates between "TER" and "TERMINAL", etc...) Within the rule, the ternary skip=skip?1:0 sets skip=1 the first time "TER" is encountered and to 0 on the next. If skip==0 the script exits. The 1 at the end is just shorthand for print.
Example Use/Output
Using your data in file, you would get:
$ awk -v skip=0 '$1=="TER"{skip=skip?1:0; if (!skip)exit}1' file
ATOM 186 O3' U 6 7.297 6.145 -5.250 1.00 0.00 O
ATOM 187 HO3' U 6 7.342 5.410 -5.865 1.00 0.00 H

remove space from specific column by bash

I am new of bash command and I really appreciate your help.
I have a file like this
ATOM 1 N LYS P1852 10.932 0.523 -24.701 1.00 0.00
ATOM 2 HN1 LYS P1852 11.571 0.864 -25.419 1.00 0.00
ATOM 3 HN2 LYS P1852 10.431 1.305 -24.278 1.00 0.00
ATOM 4 HN3 LYS P1852 10.154 0.023 -25.132 1.00 0.00
ATOM 5 CA LYS P1852 11.556 -0.319 -23.640 1.00 0.00
and I need to remove space from specific position (position 30 let say) for all the lines. The output has to be as follow:
ATOM 1 N LYS P1852 10.932 0.523 -24.701 1.00 0.00
ATOM 2 HN1 LYS P1852 11.571 0.864 -25.419 1.00 0.00
ATOM 3 HN2 LYS P1852 10.431 1.305 -24.278 1.00 0.00
ATOM 4 HN3 LYS P1852 10.154 0.023 -25.132 1.00 0.00
ATOM 5 CA LYS P1852 11.556 -0.319 -23.640 1.00 0.00
I was trying sed and other commands but no solution until now has worked.
Thanks you
You can use cut:
cut --complement -c 30 input.txt
From the manual:
-c, --characters=LIST
select only these characters
--complement
complement the set of selected bytes, characters or fields
--complement is GNU cut specific, if that is not available:
cut -c -29,31- input.txt
Above commands remove any character at position 30. If you only want to remove space:
sed -E 's/^(.{29}) /\1/' input.txt

Grep not parsing the whole file

I want to use grep to pick lines not containing "WAT" in a file containing 425409 lines with a file size of 26.8 MB, UTF8 encoding.
The file looks like this
>ATOM 1 N ALA 1 9.979 -15.619 28.204 1.00 0.00
>ATOM 2 H1 ALA 1 9.594 -15.053 28.938 1.00 0.00
>ATOM 3 H2 ALA 1 9.558 -15.358 27.323 1.00 0.00
>ATOM 12 O ALA 1 7.428 -16.246 28.335 1.00 0.00
>ATOM 13 N HID 2 7.563 -18.429 28.562 1.00 0.00
>ATOM 14 H HID 2 6.557 -18.369 28.638 1.00 0.00
>ATOM 15 CA HID 2 8.082 -19.800 28.535 1.00 0.00
>ATOM 24 HE1 HID 2 8.603 -23.670 33.041 1.00 0.00
>ATOM 25 NE2 HID 2 8.012 -23.749 30.962 1.00 0.00
>ATOM 29 O HID 2 5.854 -20.687 28.537 1.00 0.00
>ATOM 30 N GLN 3 7.209 -21.407 26.887 1.00 0.00
>ATOM 31 H GLN 3 8.168 -21.419 26.566 1.00 0.00
>ATOM 32 CA GLN 3 6.271 -22.274 26.157 1.00 0.00
**16443 lines**
>ATOM 16425 C116 PA 1089 -34.635 6.968 -0.185 1.00 0.00
>ATOM 16426 H16R PA 1089 -35.669 7.267 -0.368 1.00 0.00
>ATOM 16427 H16S PA 1089 -34.579 5.878 -0.218 1.00 0.00
>ATOM 16428 H16T PA 1089 -34.016 7.366 -0.990 1.00 0.00
>ATOM 16429 C115 PA 1089 -34.144 7.493 1.177 1.00 0.00
>ATOM 16430 H15R PA 1089 -33.101 7.198 1.305 1.00 0.00
>ATOM 16431 H15S PA 1089 -34.179 8.585 1.197 1.00 0.00
>ATOM 16432 C114 PA 1089 -34.971 6.910 2.342 1.00 0.00
>ATOM 16433 H14R PA 1089 -35.147 5.847 2.166 1.00 0.00
**132284 lines**
>ATOM 60981 O WAT 7952 -46.056 -5.515 -56.245 1.00 0.00
>ATOM 60982 H1 WAT 7952 -45.185 -5.238 -56.602 1.00 0.00
>ATOM 60983 H2 WAT 7952 -46.081 -6.445 -56.561 1.00 0.00
>TER
>ATOM 60984 O WAT 7953 -51.005 -3.205 -46.712 1.00 0.00
>ATOM 60985 H1 WAT 7953 -51.172 -3.159 -47.682 1.00 0.00
>ATOM 60986 H2 WAT 7953 -51.051 -4.177 -46.579 1.00 0.00
>TER
>ATOM 60987 O WAT 7954 -49.804 -0.759 -49.284 1.00 0.00
>ATOM 60988 H1 WAT 7954 -48.962 -0.677 -49.785 1.00 0.00
>ATOM 60989 H2 WAT 7954 -49.868 0.138 -48.903 1.00 0.00
**many lines until the end**
>TER
>END
I have used grep -v 'WAT' file.txt but it only returned me the first 16179 lines not containing "WAT" and I can see that there are more lines not containing "WAT". For instance, the following line (and many others) does not appear in the output:
> ATOM 16425 C116 PA 1089 -34.635 6.968 -0.185 1.00 0.00
In order to try to figure out what was happening I've tried grep ' ' file.txt. This command should return every line in the file, but it only returned he first 16179 lines too.
I've also tried to use tail -408977 file.txt | grep ' ' and it returned me all lines recalled by tail. Then I've tried tail -408978 file.txt | grep ' ' and the output was totally empty, zero lines.
I am working on a "normal" 64 bit system, Kubuntu.
Thanks a lot for the help!
When I try I get
$: grep WAT file.txt
Binary file file.txt matches
grep is assuming it's a binary file. add -a
-a, --text equivalent to --binary-files=text
$: grep -a WAT file.txt|head -3
ATOM 29305 O WAT 4060 -75.787 -79.125 25.925 1.00 0.00 O
ATOM 29306 H1 WAT 4060 -76.191 -78.230 25.936 1.00 0.00 H
ATOM 29307 H2 WAT 4060 -76.556 -79.670 25.684 1.00 0.00 H
Your file has 2 NULLs each at the end of lines 16426, 16428, 16430, and 16432.
$: tr "\0" # <file.txt|grep -n #
16426:ATOM 16421 KA CAL 1085 -20.614 -22.960 18.641 1.00 0.00 ##
16428:ATOM 16422 KA CAL 1086 20.249 21.546 19.443 1.00 0.00 ##
16430:ATOM 16423 KA CAL 1087 22.695 -19.700 19.624 1.00 0.00 ##
16432:ATOM 16424 KA CAL 1088 -22.147 19.317 17.966 1.00 0.00 ##

Resources