Addition in awk failing - shell

I am using following code snippet where I export the shell variables in awk as follows:
half_buffer1=$((start_buffer/2))
half_buffer2=$((end_buffer/2))
echo $line | awk -v left="$half_buffer1" -v right="$half_buffer2" 'BEGIN {print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
However for the variable 'right' in awk at times the $3 variable is being subtracted from instead of adding the 'right' variable to $3.

Observe that the following provides the "wrong" answers:
$ echo 1 2 3 4 5 | awk -v left=10 -v right=20 'BEGIN {print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
-10 20
To get the right answers, remove BEGIN:
$ echo 1 2 3 4 5 | awk -v left=10 -v right=20 '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
1 -8 23 4 5
The problem is that the BEGIN block is executed before any input is read. Consequently, the variables $1, $2, etc., do not yet have useful values.
If BEGIN is removed, the code is executed on each line read. This gives you the answers that you want.
Examples
Using real input lines from the comments:
$ echo ID1 14389398 14389507 109 + ABC 608 831 | awk -v left=10 -v right=20 '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
ID1 14389388 14389527 109 + ABC 608 831
$ echo ID1 14390340 14390409 69 + ABC 831 32 – | awk -v left=10 -v right=20 '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
ID1 14390330 14390429 69 + ABC 831 32
Also, this shell script:
start_buffer=10
end_buffer=100
half_buffer1=$((start_buffer/2))
half_buffer2=$((end_buffer/2))
echo ID1 14390340 14390409 69 + ABC 831 32 – | awk -v left="$half_buffer1" -v right="$half_buffer2" '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
produces this output:
ID1 14390335 14390459 69 + ABC 831 32

Related

shell script for extracting line of file using awk

I want, the selected lines of file to be print in output file side by side separated by space. Here what I have did so far,
for file in SAC*
do
awk 'FNR==2 {print $4}' $file >>exp
awk 'FNR==3 {print $4}' $file >>exp
awk 'FNR==4 {print $4}' $file >>exp
awk 'FNR==5 {print $4}' $file >>exp
awk 'FNR==7 {print $4}' $file >>exp
awk 'FNR==8 {print $4}' $file >>exp
awk 'FNR==24 {print $0}' $file >>exp
done
My output is:
XV
AMPY
BHZ
2012-08-15T08:00:00
2013-12-31T23:59:59
I want output should be
XV AMPY BHZ 2012-08-15T08:00:00 2013-12-31T23:59:59
First the test data (only 9 rows, tho):
$ cat file
1 2 3 14
1 2 3 24
1 2 3 34
1 2 3 44
1 2 3 54
1 2 3 64
1 2 3 74
1 2 3 84
1 2 3 94
Then the awk. No need for that for loop in shell, awk can handle multiple files:
$ awk '
BEGIN {
ORS=" "
a[2];a[3];a[4];a[5];a[7];a[8] # list of records for which $4 should be outputed
}
FNR in a { print $4 } # output the $4s
FNR==9 { printf "%s\n",$0 } # replace 9 with 24
' file file # ... # the files you want to process (SAC*)
24 34 44 54 74 84 1 2 3 94
24 34 44 54 74 84 1 2 3 94

get subset of table based on unique column values

H- I am looking for a bash/awk/sed solution to get subsets of a table based on unique column values. For example if I have:
chrom1 333
chrom1 343
chrom2 380
chrom2 501
chrom1 342
chrom3 102
I want to be able to split this table into 3:
chrom1 333
chrom1 343
chrom1 342
chrom2 380
chrom2 501
chrom3 102
I know how to do this in R using the split command, but I am specifically looking for a bash/awk/sed solution.
Thanks
I don’t know if this awk is of any use but it will create 3 separate file based on the unique column values:
awk '{print >> $1; close($1)}' file
alternative awk which keeps the original order of records within each block
$ awk '{a[$1]=a[$1]?a[$1] ORS $0:$0}
END{for(k in a) print a[k] ORS ORS}' file
generates
chrom1 333
chrom1 343
chrom1 342
chrom2 380
chrom2 501
chrom3 102
there are 2 trailing empty lines at the end but not displayed in the formatted output.
Using sort and awk:
sort -k1,1 file | awk 'NR>1 && p != $1{print ORS} {p=$1} 1'
EDIT: If you want to keep original order of records from input file then use:
awk -v ORS='\n\n' '!($1 in a){a[$1]=$0; ind[++i]=$1; next}
{a[$1]=a[$1] RS $0}
END{for(k=1; k<=i; k++) print a[ind[k]]}' file
create input list file.txt
(
cat << EOF
chrom1 333
chrom1 343
chrom2 380
chrom2 501
chrom1 342
chrom3 102
EOF
) > file.txt
transfomation
cat file.txt | cut -d" " -f1 | sort -u | while read c
do
cat file.txt | grep "^$c" | sort
echo
done

bash uniq, how to show count number at back

Normally when I do cat number.txt | sort -n | uniq -c , I get numbers like this:
3 43
4 66
2 96
1 97
But what I need is the number shows of occurrences at the back, like this:
43 3
66 4
96 2
97 1
Please give advice on how to change this. Thanks.
Use awk to change the order of columns:
cat number.txt | sort -n | uniq -c | awk '{ print $2, $1 }'
Perl version:
perl -lne '$occ{0+$_}++; END {print "$_ $occ{$_}" for sort {$a <=> $b} keys %occ}' < numbers.txt
Through GNU sed,
cat number.txt | sort -n | uniq -c | sed -r 's/^([0-9]+) ([0-9]+)$/\2 \1/g'

replace string in comma delimiter file using nawk

I need to implement the if condition in the below nawk command to process input file if the third column has more that three digit.Pls help with the command what i am doing wrong as it is not working.
inputfile.txt
123 | abc | 321456 | tre
213 | fbc | 342 | poi
outputfile.txt
123 | abc | 321### | tre
213 | fbc | 342 | poi
cat inputfile.txt | nawk 'BEGIN {FS="|"; OFS="|"} {if($3 > 3) $3=substr($3, 1, 3)"###" print}'
Try:
awk 'length($3) > 3 { $3=substr($3, 1, 3)"###" } 1 ' FS=\| OFS=\| test1.txt
This works with gawk:
awk -F '[[:blank:]]*\\\|[[:blank:]]*' -v OFS=' | ' '
$3 ~ /^[[:digit:]]{4,}/ {$3 = substr($3,1,3) "###"}
1
' inputfile.txt
It won't preserve the whitespace so you might want to pipe through column -t

retrieve and add two numbers of files

In my file I have following structure :-
A | 12 | 10
B | 90 | 112
C | 54 | 34
What I have to do is I have to add column 2 and column 3 and print the result with column 1.
output:-
A | 22
B | 202
C | 88
I retrieve the two columns but dont know how to add
What I did is :-
cut -d ' | ' -f3,5 myfile.txt
How to add those columns and display.
A Bash solution:
#!/bin/bash
while IFS="|" read f1 f2 f3
do
echo $f1 "|" $((f2+f3))
done < file
You can do this easily with awk.
awk '{print $1," | ",($3+$5)'} myfile.txt
wil work perhaps.
You can do this with awk:
awk 'BEGIN{FS="|"; OFS="| "} {print $1 OFS $2+$3}' input_filename
Input:
A | 12 | 10
B | 90 | 112
C | 54 | 34
Output:
A | 22
B | 202
C | 88
Explanation:
awk: invoke the awk tool
BEGIN{...}: do things before starting to read lines from the file
FS="|": FS stands for Field Separator. Think of it as the delimiter that separates each line of your file into fields
OFS="| ": OFS stands for Output Field Separator. Same idea as above, but for output. FS =/= OFS in this case due to formatting
{print $1 OFS $2+$3}: For each line that awk reads, print the first field (the letter), followed by a delimiter specified by OFS, then the sum of field 2 and field 3.
input_filename: awk accepts the input file name as an argument here.

Resources