I have seen this question a few times, but the solutions I cannot get to work.
I have the following command:
printf '%s\n' "${fa[#]}" | xargs -n 3 bash -c 'cat *-$2.ss | sed -n 11,1p ; echo $0 $1 $2;'
where
printf '%s\n' "${fa[#]}"
O00238 115 03
O00238 126 04
and cat *-$2.ss gives:
1 D C 0.999 0.000 0.000
2 L C 0.940 0.034 0.012
3 H C 0.971 0.005 0.015
4 P C 0.977 0.005 0.009
5 T C 0.970 0.009 0.018
6 L C 0.977 0.006 0.011
7 P C 0.864 0.027 0.014
8 P C 0.966 0.018 0.011
9 L C 0.920 0.038 0.039
10 K C 0.924 0.043 0.039
11 D C 0.935 0.036 0.035
12 R C 0.934 0.023 0.053
13 D C 0.932 0.022 0.046
14 F C 0.878 0.041 0.088
15 V C 0.805 0.031 0.198
16 D C 0.834 0.039 0.108
17 G C 0.882 0.019 0.071
18 P C 0.800 0.031 0.132
19 I C 0.893 0.039 0.070
20 H C 0.823 0.024 0.179
21 H C 0.920 0.026 0.070
22 R C 0.996 0.001 0.002
running the command then produces
11 D C 0.935 0.036 0.035
O00238 115 03
11 K C 0.449 0.252 0.270
O00238 126 04
Even lines are the output of sed -n 11,1p, odd lines the output of echo $0 $1 $2
How do I pair the output on the same line i.e.
11 D C 0.935 0.036 0.035 O00238 115 03
11 K C 0.449 0.252 0.270 O00238 126 04
I have tried:
printf '%s\n' "${fa[#]}" | xargs -n 3 bash -c 'cat *-$2.ss | {sed -n 11,1p ; echo $0 $1 $2;} | tr "\n" " "'
as suggested here: Concatenate in bash the output of two commands without newline character
however I get
O00238: -c: line 0: syntax error near unexpected token `}'
O00238: -c: line 0: `cat *-$2.ss | {sed -n 11,1p ; echo $0 $1 $2;} | tr "\n" " "'
What is the problem?
You could try using something like this:
i=0
for f in *-"$2".ss; do printf '%s %s\n' "$(sed -n '11p' "$f")" "${fa[$((i++))]}"; done
This loops through your files and prints the 11th line alongside a slice from the array fa, whose index i increases by 1 every iteration.
I could not reproduce your setup, but
printf "O00238 115 03\nO00238 126 04" | xargs -n 3 bash -c 'cat test.dat | sed -n 11,1p | tr -d "\n"; echo " $0 $1 $2"'
gives
11 D C 0.935 0.036 0.035 O00238 115 03
11 D C 0.935 0.036 0.035 O00238 126 04
which should work in your case. I just deleted the newline of the sed command.
Related
I have two text files (tsv format), which each have 240 columns and 100 lines. I would like to sort the columns alternately and make one file (480 columns and 100 lines). How could I achieve this goal with standard command line tools in Linux?
Example (in case of a single line) :
FileA:
1 2 3 4 5 ・・・
FileB:
001 002 003 004 005 ・・・
Expected Result:
1 001 2 002 3 003 ・・・
just awk with "getline"
==> file1 <==
a b c d e f g h i j k l m
n o p q r s t u v w x y z
==> file2 <==
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25 26
$ awk '{split($0,f1);
getline < "file2";
for(i=1;i<=NF;i++) printf "%s%s%s%s", f1[i], OFS, $i, (i==NF?ORS:OFS)}' file1
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13
n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
if space is not the required output delimiter set OFS accordingly...
ps. getline use is normally discouraged for any non-trivial script, and usually should be avoided by beginners. See here for example for more explanation.
paste + awk solution:
Sample file1:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
Sample file2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
paste file1 file2 \
| awk '{ len=NF/2;
for (i=1; i<=len; i++)
printf "%s %s%s", $i, $(i+len),(i==len? ORS:OFS)
}'
The output:
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
Use bash to make some dummy files that match the spec, along with some letter-line suffixes to tell them apart:
for f in {A..z} {A..j} ; do echo $( seq -f '%g'"$f" 240 ) ; done > FileA
for f in {z..A} {j..A} ; do echo $( seq -f '%03.3g'"$f" 240 ) ; done > FileB
Use bash, paste and xargs:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo
Since the output of that is a bit unweildy, show first ten lines, with both the first and last six columns:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo |
head | cut -d' ' -f1-6,476-480
1A 001z 2A 002z 3A 003z 238z 239A 239z 240A 240z
1B 001y 2B 002y 3B 003y 238y 239B 239y 240B 240y
1C 001x 2C 002x 3C 003x 238x 239C 239x 240C 240x
1D 001w 2D 002w 3D 003w 238w 239D 239w 240D 240w
1E 001v 2E 002v 3E 003v 238v 239E 239v 240E 240v
1F 001u 2F 002u 3F 003u 238u 239F 239u 240F 240u
1G 001t 2G 002t 3G 003t 238t 239G 239t 240G 240t
1H 001s 2H 002s 3H 003s 238s 239H 239s 240H 240s
1I 001r 2I 002r 3I 003r 238r 239I 239r 240I 240r
1J 001q 2J 002q 3J 003q 238q 239J 239q 240J 240q
I have several columns in a file. I want to subtract two columns...
They have these form...without decimals...
1.000 900
1.012 1.010
1.015 1.005
1.020 1.010
I need another column in the same file with the subtract
100
2
10
10
I have tried
awk - F "," '{$16=$4-$2; print $1","$2","$3","$4","$5","$6}'
but it gives me...
0.100
0.002
0.010
0.010
Any indication?
Using this awk:
awk -v OFS='\t' '{p=$1;q=$2;sub(/\./, "", p); sub(/\./, "", q); print $0, (p-q)}' file
1.000 900 100
1.012 1.010 2
1.015 1.005 10
1.020 1.010 10
Using perl:
perl -lanE '$,="\t",($x,$y)=map{s/\.//r}#F;say#F,$x-$y' file
prints:
1.000 900 100
1.012 1.010 2
1.015 1.005 10
1.020 1.010 10
I have a code which is intended to output numbers stored in a file (which are in one column) to another TXT file. The part of the code which does this this is:
awk -F"\n" 'NR==1{a=$1" ";next}{a=a$1" "}END{print a}' col_trim.txt >> row.txt
the output is something like this:
1.31 2.3 3.35 2.59 1.63
2.03 2.21 1.99 1.5 1.12
1 0.6 -0.71 -2.1 0.01
But I want it to be like this:
1.31 2.30 3.35 2.59 1.63
2.03 2.21 1.99 1.50 1.12
1.00 0.60 -0.71 -2.10 0.01
As you see all numbers in the second sample have 2 digits after decimal and also if they are negative, the negative sign is placed before the number so it doesn't mess the arrangement of the numbers.
Any idea?
P.S.:
The input file is a text file with a column of numbers (for each row):
1.31
2.3
3.35
2.59
1.63
The whole code is like this:
#!/bin/sh
rm *.txt
for time in 00 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96; do
filename=gfs.t00z.master.grbf$time.10m.uv.grib2
wgrib2 $filename -spread $time.txt
sed 's:lon,lat,[A-Z]* 10 m above ground d=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*:\1 '$time'0000:' $time.txt > temp.txt
for (( j = 1; j <= 2; j++ )); do
if [ j == 1 ]; then
sed -n '/lon,lat,UGRD/,/lon,lat,VGRD/p' $time.txt > vel_sep.txt
else
sed -n '/lon,lat,VGRD/,/359.500000,90.000000/p' $time.txt > vel_sep.txt
fi
line=174305
sed -n 1p temp.txt >> row.txt
for (( i = 1; i <= 48; i++ )); do
sed -n "$line","$(($line+93))"p vel_sep.txt > col.txt
sed 's:[0-9]*.[0-9]*,[0-9]*.[0-9]*,::' col.txt > col_trim.txt
awk -F"\n" 'NR==1{a=$1" ";next}{a=a$1" "}END{print a}' col_trim.txt >> row.txt
line=$(($line-720))
done
done
done
exit 0
Replace your awk by this:
awk -F"\n" 'NR==1{a=sprintf("%10.2f", $1); next}
{a=sprintf("%s%10.2f", a,$1);}END{print a}' col_trim.txt >> row.txt
EDIT: For left alignment:
awk -F"\n" 'NR==1{a=sprintf("%-8.2f", $1); next}
{a=sprintf("%s%-8.2f", a,$1);}END{print a}' col_trim.txt >> row.txt
You can use the column command:
awk -F"\n" 'NR==1{a=$1" ";next}{a=a$1" "}END{print a}' col_trim.txt | \
column -t >> row.txt
This gives:
1.31 2.3 3.35 2.59 1.63
2.03 2.21 1.99 1.5 1.12
1 0.6 -0.71 -2.1 0.01
This can be solved using printf with awk
Eksample:
echo -e "1 -2.5 10\n-3.4 2 12" | awk '{printf "%8.2f %8.2f %8.2f\n",$1,$2,$3}'
1.00 -2.50 10.00
-3.40 2.00 12.00
Additionally, this script has big spaces we can improve.
Here is the first one:
change from:
for time in 00 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96; do
to
for time in $(seq 0 3 96); do
time=$(printf "%02d" $time)
if you can show us the sample output of wgrib2 $filename -spread $time.txt, we can give more suggestions.
Using bash/sed, I am trying to search for matching string and when a match is found it appends that variable to the end of the applicable line.
Two lists:
[linuxbox tmp]$ cat lista
a 23
c 4
e 55
b 2
f 44
d 74
[linuxbox tmp]$ cat listb
a 3
e 34
c 84
b 1
f 500
d 666666
#!/bin/bash
rm -rf listc
cat listb |while read rec
do
var1="$(echo $rec | awk '{ print $1 }')"
var2="$(echo $rec | awk '{ print $2 }')"
if egrep "^$var1" lista; then
sed "/^$var1/ s/$/ $var2/1" lista >> listc
fi
done
when I run it I get:
[linuxbox tmp]$ ./blah.sh
a 23
e 55
c 4
b 2
f 44
d 74
[linuxbox tmp]$ cat listc
a 23 3
c 4
e 55
b 2
f 44
d 74
a 23
c 4
e 55 34
b 2
f 44
d 74
a 23
c 4 84
e 55
b 2
f 44
d 74
a 23
c 4
e 55
b 2 1
f 44
d 74
a 23
c 4
e 55
b 2
f 44 500
d 74
a 23
c 4
e 55
b 2
f 44
d 74 666666
The output i'm trying to get to is:
a 23 3
e 55 34
c 4 84
b 2 1
f 44 500
d 74 666666
What am I doing wrong here? Is there a better way to accomplish this?
Thank you in advance.
If you don't mind getting a sorted output:
join <(sort lista) <(sort listb)
One way using awk:
awk 'FNR==NR { array[$1]=$2; next } { if ($1 in array) print $1, array[$1], $2 }' lista listb
Results:
a 23 3
e 55 34
c 4 84
b 2 1
f 44 500
d 74 666666
Based on your input files (no duplicate keys in a single file), the following will do the trick:
>> for key in $(awk '{print $1}' lista) ; do
+> echo $key $(awk -vK=$key '$1==K{$1="";print}' lista listb)
+> done
a 23 3
c 4 84
e 55 34
b 2 1
f 44 500
d 74 666666
I have two sets of PDB files (this is a standard format which cannot be modified). First set is like:
ATOM 18 C33 Q58 d 91 -25.677 3.886 -30.044 1.00 0.00 C
ATOM 19 C34 Q58 d 91 -24.704 4.881 -29.447 1.00 0.00 C
ATOM 20 C35 Q58 d 91 -23.382 4.873 -30.182 1.00 0.00 C
ATOM 21 C8 Q58 d 91 -20.295 11.484 -33.616 1.00 0.00 C
ATOM 22 C7 Q58 d 91 -19.198 12.305 -33.381 1.00 0.00 C
ATOM 23 C3 Q58 d 91 -18.213 12.498 -34.383 1.00 0.00 C
And the second one goes:
HETATM 2686 C7 589 A 1 -19.344 12.177 -33.319 1.00 25.88 C
HETATM 2687 C8 589 A 1 -20.388 11.319 -33.511 1.00 26.31 C
HETATM 2688 C9 589 A 1 -20.364 10.691 -34.747 1.00 26.14 C
HETATM 2689 C10 589 A 1 -19.402 10.845 -35.729 1.00 26.34 C
HETATM 2690 N11 589 A 1 -21.334 11.123 -32.604 1.00 26.22 N
HETATM 2691 C12 589 A 1 -21.713 9.967 -32.081 1.00 25.65 C
Each column is separated by a variable number of spaces so that its contents occupy a specific positional range.
Columns 7-9 represent x,y,z coordinates in the Cartesian space. I would like to replace the coordinates of file 2 with coordinates from file 1 for all column 3 (atom type) matches.
For instance, in the example, the output file 2 would be:
HETATM 2686 C7 589 A 1 -19.198 12.305 -33.381 1.00 25.88 C
HETATM 2687 C8 589 A 1 -20.295 11.484 -33.616 1.00 26.31 C
HETATM 2688 C9 589 A 1 -20.364 10.691 -34.747 1.00 26.14 C
HETATM 2689 C10 589 A 1 -19.402 10.845 -35.729 1.00 26.34 C
HETATM 2690 N11 589 A 1 -21.334 11.123 -32.604 1.00 26.22 N
HETATM 2691 C12 589 A 1 -21.713 9.967 -32.081 1.00 25.65 C
Please, note how the coordinates have changed for the first two lines (atoms C7 and C8).
I have tried awk, but it seems too delimiter-dependent, which is not good in this example. Column 3 (atom type) is always at positions 14-16, whereas the 3 coordinate columns span from 32 to 54.
NOTE: In certain cases, certain columns may be merged. For instance, in this example columns 5 and 6 and merged (this can also happen with columns 1 and 2):
HETATM 2804 PG ANP A1001 23.808 17.953 28.350 1.00 52.23 P
My SOLUTION this far (slow, but works):
while read line ; do
atom=$(echo "$line" | cut -c13-16)
coord=$(grep -i "$atom" ${ligand}_${chain}_dock.tmp | cut -c32-54)
echo "$line" | sed -r "s/^(.{31})(.{23})/\1${coord}/" >> ${ligand}_${chain}_dock.pdb
done < ${ligand}_${chain}_ref.pdb
I probably choose a stupid way to solve it: playing with printf statement. however it works for your example.
command:
awk -F' *' 'NR==FNR{a[$3]=$7;b[$3]=$8;c[$3]=$9;next;}\
{if($3 in a)printf "%s %s %-3s %s %s %3s %11s %7s %7s %5s %s %11s\n",\
$1,$2,$3,$4,$5,$6,a[$3],b[$3],c[$3],$10,$11,$12; else print $0}' file1 file2
test with your example:
kent$ awk -F' *' 'NR==FNR{a[$3]=$7;b[$3]=$8;c[$3]=$9;next;}
{if($3 in a)printf "%s %s %-3s %s %s %3s %11s %7s %7s %5s %s %11s\n",
$1,$2,$3,$4,$5,$6,a[$3],b[$3],c[$3],$10,$11,$12; else print $0}' file1 file2
HETATM 2686 C7 589 A 1 -19.198 12.305 -33.381 1.00 25.88 C
HETATM 2687 C8 589 A 1 -20.295 11.484 -33.616 1.00 26.31 C
HETATM 2688 C9 589 A 1 -20.364 10.691 -34.747 1.00 26.14 C
HETATM 2689 C10 589 A 1 -19.402 10.845 -35.729 1.00 26.34 C
HETATM 2690 N11 589 A 1 -21.334 11.123 -32.604 1.00 26.22 N
HETATM 2691 C12 589 A 1 -21.713 9.967 -32.081 1.00 25.65 C
I took a guess at the correct field widths, but this should work if they're adjusted correctly.
#!/usr/bin/env bash
file1="$1"
file2="$2"
fw=(7 6 4 4 4 6 9 7 9 5 16 4)
while IFS= read -r -a f2_line ; do
let pos=0
f2_fields=()
for width in "${fw[#]}" ; do
f2_fields=("${f2_fields[#]}" "${f2_line:${pos}:${width}}")
let pos+=width
done
printf '%s' "${f2_fields[#]:0:6}"
orig=1
while IFS= read -r -a f1_line ; do
let pos=0
f1_fields=()
for width in "${fw[#]}" ; do
f1_fields=("${f1_fields[#]}" "${f1_line:${pos}:${width}}")
let pos+=width
done
if [ "${f1_fields[2]}" = "${f2_fields[2]}" ] ; then
orig=
printf '%s' "${f1_fields[#]:6:3}"
break
fi
done < "$file1"
if [ ! -z "$orig" ] ; then
printf '%s' "${f2_fields[#]:6:3}"
fi
printf '%s' "${f2_fields[#]:9}"
printf '\n'
done < "$file2"
It is, of course, not very efficient.
EDIT: Oops, had to s/5/6/ on line 14. Works now.
This would work too. Instead of printing each column, we set the OFS variable to "\t".
UPDATE: Added few more spaces next to tab in the OFS variable. This gives your output enough spacing between each other.
[jaypal:~/Temp] awk -v OFS="\t " 'NR==FNR{a[$3]=$7;b[$3]=$8;c[$3]=$9;next} ($3 in a) {$7=a[$3];$8=b[$3];$9=c[$3];print $0;next} {$1=$1}1' file1 file2
HETATM 2686 C7 589 A 1 -19.198 12.305 -33.381 1.00 25.88 C
HETATM 2687 C8 589 A 1 -20.295 11.484 -33.616 1.00 26.31 C
HETATM 2688 C9 589 A 1 -20.364 10.691 -34.747 1.00 26.14 C
HETATM 2689 C10 589 A 1 -19.402 10.845 -35.729 1.00 26.34 C
HETATM 2690 N11 589 A 1 -21.334 11.123 -32.604 1.00 26.22 N
HETATM 2691 C12 589 A 1 -21.713 9.967 -32.081 1.00 25.65 C