not getting array value in awk - bash

I want to insert array values with all other contents of testfile.ps into result.ps file but array values not getting printed,please help.
My requirement is every time condition is met array next index value should get printed with other contents of testfile.ps into result.ps
actually arr[0] and arr[1] are big strings in my project but for simplicity i am editing it
#!/bin/bash
a[0]=""lineto""\n""stroke""
a[1]=""476.00"" ""26.00""
awk '{ if($1 == "(Page" ){for (i=0; i<2; i++){print $arr[i]; print $0; }}
else print }' testfile.ps > result.ps
testfile.ps
(Page 1 of 2 )
move
(Page 1 of 3 )
"gsave""\n""2.00"" ""setlinewidth""\n"
result.ps should be
(Page 1 of 2 )
lineto
stroke
move
(Page 1 of 3 )
476.00 26.00
gsave
2.00
setlinewidth
means once second time condition is met array index should be incremented to 1 and it should print a[1]
i applied this approch also,with only single array element but not getting any output
awk -v "a0=$a[0]" 'BEGIN {a[0]=""lineto""stroke""; if($1 == "move" ){for (i in a){ print a0;print $0; }} else print }' testfile.txt
edited:
hi , I have resolved the issue up to some extent but stuck at one place, how can i compare two strings like "a=476.00 1.00 lineto\nstroke\ngrestore\n" and "b=26.00 moveto\n368.00 1.00 lineto\n" in awk command, i am trying
awk -v "a=476.00 1.00 lineto\nstroke\ngrestore\n" -v "b=26.00 moveto\n368.00 1.00 lineto\n" -v "i=$a" '{
if ($1 == "(Page" && ($2%2==0 || $2==1) && $3 == "of"){
print i;
if [ i == a ];then
i=b; print $0;
fi
else if [ i == b ];then
i=c; print $0;
fi
else print $0;
}'testfile.txt

You are using in your awk program a variable arr which is never initialized.
In your case, you want to pass a variable from the shell to awk. From the awk man page:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such
variable values are available to the BEGIN rule of an AWK program.
Hence, you need something like
awk -v "a0=$a[0]" -v "a1=$a[1]" .....
and in a BEGIN block, you can set up your array arr from the variables a0 and a1 in any way you want.

Gather the data to a single var using a separator:
$ awk -v s="lineto\nstroke;476.00 26.00" ' # ; as separator
BEGIN{ n=split(s,a,";") } # split s var to a array
1 # output record
/\(Page/ && i<n { print a[++i] } # if (Page and still data in a
' file
(Page 1 of 2 )
lineto
stroke
move
(Page 1 of 3 )
476.00 26.00
"gsave""\n""2.00"" ""setlinewidth""\n"

Related

Loop to create a a DF from values in bash

Im creating various text files from a file like this:
Chrom_x,Pos,Ref,Alt,RawScore,PHRED,ID,Chrom_y
10,113934,A,C,0.18943,5.682,rs10904494,10
10,126070,C,T,0.030435000000000007,3.102,rs11591988,10
10,135656,T,G,0.128584,4.732,rs10904561,10
10,135853,A,G,0.264891,6.755,rs7906287,10
10,148325,A,G,0.175257,5.4670000000000005,rs9419557,10
10,151997,T,C,-0.21169,0.664,rs9286070,10
10,158202,C,T,-0.30357,0.35700000000000004,rs9419478,10
10,158946,C,T,2.03221,19.99,rs11253562,10
10,159076,G,A,1.403107,15.73,rs4881551,10
What I am trying to do is extract, in bash, all values beetwen two values:
gawk '$6>=0 && $NF<=5 {print $0}' file.csv > 0_5.txt
And create files from 6 to 10, from 11 to 15... from 95 to 100. I was thinking in creating a loop for this with something like
#!/usr/bin/env bash
n=( 0,5,6,10...)
if i in n:
gawk '$6>=n && $NF<=n+1 {print $0}' file.csv > n_n+1.txt
and so on.
How can i convert this as a loop and create files with this specific values.
While you could use a shell loop to provide inputs to an awk script, you could also just use awk to natively split the values into buckets and write the lines to those "bucket" files itself:
awk -F, ' NR > 1 {
i=int((($6 - 1) / 5))
fname=(i*5) "_" (i+1)*5 ".txt"
print $0 > fname
}' < input
The code skips the header line (NR > 1) and then computes a "bucket index" by dividing the value in column six by five. The filename is then constructed by multiplying that index (and its increment) by five. The whole line is then printed to that filename.
To use a shell loop (and call awk 20 times on the input), you could use something like this:
for((i=0; i <= 19; i++))
do
floor=$((i * 5))
ceiling=$(( (i+1) * 5))
awk -F, -v floor="$floor" -v ceiling="$ceiling" \
'NR > 1 && $6 >= floor && $6 < ceiling { print }' < input \
> "${floor}_${ceiling}.txt"
done
The basic idea is the same; here, we're creating the bucket index with the outer loop and then passing the range into awk as the floor and ceiling variables. We're only asking awk to print the matching lines; the output from awk is captured by the shell as a redirection into the appropriate file.

Awk if else with conditions

I am trying to make a script (and a loop) to extract matching lines to print them into a new file. There are 2 conditions: 1st is that I need to print the value of the 2nd and 4th columns of the map file if the 2nd column of the map file matches with the 4th column of the test file. The 2nd condition is that when there is no match, I want to print the value in the 2nd column of the test file and a zero in the second column.
My test file is made this way:
8 8:190568 0 190568
8 8:194947 0 194947
8 8:197042 0 197042
8 8:212894 0 212894
My map file is made this way:
8 190568 0.431475 0.009489
8 194947 0.434984 0.009707
8 19056880 0.395066 112.871160
8 101908687 0.643861 112.872348
1st attempt:
for chr in {21..22};
do
awk 'NR==FNR{a[$2]; next} {if ($4 in a) print $2, $4 in a; else print $2, $4 == "0"}' map_chr$chr.txt test_chr$chr.bim > position.$chr;
done
Result:
8:190568 1
8:194947 1
8:197042 0
8:212894 0
My second script is:
for chr in {21..22}; do
awk 'NR == FNR { ++a[$4]; next }
$4 in a { print a[$2], $4; ++found[$2] }
END { for(k in a) if (!found[k]) print a[k], 0 }' \
"test_chr$chr.bim" "map_chr$chr.txt" >> "position.$chr"
done
And the result is:
1 0
1 0
1 0
1 0
The result I need is:
8:190568 0.009489
8:194947 0.009707
8:197042 0
8:212894 0
This awk should work for you:
awk 'FNR==NR {map[$2]=$4; next} {print $4, map[$4]+0}' mapfile testfile
190568 0.009489
194947 0.009707
197042 0
212894 0
This awk command processes mapfile first and stores $2 as key with $4 as a value in an associative array named as map.
Later when it processes testfile in 2nd block we print $4 from 2nd file with the stored value in map using key as $4. We add 0 in stored value to make sure that we get 0 when $4 is not present in map.

awk to calculate average of field in multiple text files and merge into one

I am trying to calculate the average of $2 in multiple test files in a directory and merge the output in one tab-delimeted output file. The output file is two fields, in which $1 is the file name that has been extracted by pref, and $2" is the calculated average with one decimal, rounded up. There is also a header in the outputSamplein$1andPercentin$2`. The below seems close but I am missing a few things (adding the header to the output, merging into one tab-delimeted file, and rounding to 3 decimal places), that I do not know how to do yet and not getting the desired output. Thank you :).
123_base.txt
AASS 99.81
ABAT 100.00
ABCA10 0.0
456_base.txt
ABL2 97.81
ABO 100.00
ACACA 99.82
desired output (tab-delimeted)
Sample Percent
123 66.6
456 99.2
Bash
for f in /home/cmccabe/Desktop/20x/percent/*.txt ; do
bname=$(basename $f)
pref=${bname%%_base_*.txt}
awk -v OFS='\t' '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/coverage/${pref}_average.txt
done
This one uses GNU awk, which provides handy BEGINFILE and ENDFILE events:
gawk '
BEGIN {print "Sample\tPercent"}
BEGINFILE {sample = FILENAME; sub(/_.*/,"",sample); sum = n = 0}
{sum += $2; n++}
ENDFILE {printf "%s\t%.1f\n", sample, sum/n}
' 123_base.txt 456_base.txt
If you're giving a pattern with the directory attached, I'd get the sample name like this:
match(FILENAME, /^.*\/([^_]+)/, m); sample = m[1]
and then, yes this is OK: gawk '...' /path/to/*_base.txt
And to steal against division by zero, inspired by James Brown's answer:
ENDFILE {printf "%s\t%.1f\n", sample, n==0 ? 0 : sum/n}
with perl
$ perl -ane '
BEGIN{ print "Sample\tPercent\n" }
$c++; $sum += $F[1];
if(eof)
{
($pref) = $ARGV=~/(.*)_base/;
printf "%s\t%.1f\n", $pref, $sum/$c;
$c = 0; $sum = 0;
}' 123_base.txt 456_base.txt
Sample Percent
123 66.6
456 99.2
print header using BEGIN block
-a option would split input line on spaces and save to #F array
For each line, increment counter and add to sum variable
If end of file eof is detected, print in required format
$ARGV contains current filename being read
If full path of filename is passed but only filename should be used to get pref, then use this line instead
($pref) = $ARGV=~/.*\/\K(.*)_base/;
In awk. Notice printf "%3.3s" to truncate the filename after 3rd char:
$ cat ave.awk
BEGIN {print "Sample", "Percent"} # header
BEGINFILE {s=c=0} # at the start of every file reset
{s+=$2; c++} # sum and count hits
ENDFILE{if(c>0) printf "%3.3s%s%.1f\n", FILENAME, OFS, s/c}
# above output if more than 0 lines
Run it:
$ touch empty_base.txt # test for division by zero
$ awk -f ave.awk 123_base.txt 123_base.txt empty_base.txt
Sample Percent
123 66.6
456 99.2
another awk
$ awk -v OFS='\t' '{f=FILENAME;sub(/_.*/,"",f);
a[f]+=$2; c[f]++}
END{print "Sample","Percent";
for(k in a) print k, sprintf("%.1f",a[k]/c[k])}' {123,456}_base.txt
Sample Percent
456 99.2
123 66.6

Bash: extract columns with cut and filter one column further

I have a tab-separated file and want to extract a few columns with cut.
Two example line
(...)
0 0 1 0 AB=1,2,3;CD=4,5,6;EF=7,8,9 0 0
1 1 0 0 AB=2,1,3;CD=1,1,2;EF=5,3,4 0 1
(...)
What I want to achieve is to select columns 2,3,5 and 7, however from column 5 only CD=4,5,6.
So my expected result is
0 1 CD=4,5,6; 0
1 0 CD=1,1,2; 1
How can I use cut for this problem and run grep on one of the extracted columns? Any other one-liner is of course also fine.
here is another awk
$ awk -F'\t|;' -v OFS='\t' '{print $2,$3,$6,$NF}' file
0 1 CD=4,5,6 0
1 0 CD=1,1,2 1
or with cut/paste
$ paste <(cut -f2,3 file) <(cut -d';' -f2 file) <(cut -f7 file)
0 1 CD=4,5,6 0
1 0 CD=1,1,2 1
Easier done with awk. Split the 5th field using ; as the separator, and then print the second subfield.
awk 'BEGIN {FS="\t"; OFS="\t"}
{split($5, a, ";"); print $2, $3, a[2]";", $7 }' inputfile > outputfile
If you want to print whichever subfield begins with CD=, use a loop:
awk 'BEGIN {FS="\t"; OFS="\t"}
{n = split($5, a, ";");
for (i = 1; i <= n; i++) {
if (a[i] ~ /^CD=/) subfield = a[i];
}
print $2, $3, subfield";", $7}' < inputfile > outputfile
I think awk is the best tool for this kind of task and the other two answers give you good short solutions.
I want to point out that you can use awk's built-in splitting facility to gain more flexibility when parsing input. Here is an example script that uses implicit splitting:
parse.awk
# Remember second, third and seventh columns
{
a = $2
b = $3
d = $7
}
# Split the fifth column on ";". After this the positional variables
# (e.g. $1, # $2, ..., $NF) contain the fields from the previous
# fifth column
{
oldFS = FS
FS = ";"
$0 = $5
}
# For example to test if the second elemnt starts with "CD", do
# something like this
$2 ~ /^CD/ {
c = $2
}
# Print the selected elements
{
print a, b, c, d
}
# Restore FS
{
FS = oldFS
}
Run it like this:
awk -f parse.awk FS='\t' OFS='\t' infile
Output:
0 1 CD=4,5,6 0
1 0 CD=1,1,2 1

Add leading zeroes to awk variable

I have the following awk command within a "for" loop in bash:
awk -v pdb="$pdb" 'BEGIN {file = 1; filename = pdb"_" file ".pdb"}
/ENDMDL/ {getline; file ++; filename = pdb"_" file ".pdb"}
{print $0 > filename}' < ${pdb}.pdb
This reads a series of files with the name $pdb.pdb and splits them in files called $pdb_1.pdb, $pdb_2.pdb, ..., $pdb_21.pdb, etc. However, I would like to produce files with names like $pdb_01.pdb, $pdb_02.pdb, ..., $pdb_21.pdb, i.e., to add padding zeros to the "file" variable.
I have tried without success using printf in different ways. Help would be much appreciated.
Here's how to create leading zeros with awk:
# echo 1 | awk '{ printf("%02d\n", $1) }'
01
# echo 21 | awk '{ printf("%02d\n", $1) }'
21
Replace %02 with the total number of digits you need (including zeros).
Replace file on output with sprintf("%02d", file).
Or even the whole assigment with filename = sprintf("%s_%02d.pdb", pdb, file);.
This does it without resort of printf, which is expensive. The first parameter is the string to pad, the second is the total length after padding.
echo 722 8 | awk '{ for(c = 0; c < $2; c++) s = s"0"; s = s$1; print substr(s, 1 + length(s) - $2); }'
If you know in advance the length of the result string, you can use a simplified version (say 8 is your limit):
echo 722 | awk '{ s = "00000000"$1; print substr(s, 1 + length(s) - 8); }'
The result in both cases is 00000722.
Here is a function that left or right-pads values with zeroes depending on the parameters: zeropad(value, count, direction)
function zeropad(s,c,d) {
if(d!="r")
d="l" # l is the default and fallback value
return sprintf("%" (d=="l"? "0" c:"") "d" (d=="r"?"%0" c-length(s) "d":""), s,"")
}
{ # test main
print zeropad($1,$2,$3)
}
Some tests:
$ cat test
2 3 l
2 4 r
2 5
a 6 r
The test:
$ awk -f program.awk test
002
2000
00002
000000
It's not fully battlefield tested so strange parameters may yield strange results.

Resources