awk: run time error: negative field index - bash

I currently have the following:
function abs() {
echo $(($1<0 ?-$1:$1));
}
echo $var1 | awk -F" " '{for (i=2;i<=NF;i+=2) $i=(95-$(abs $i))*1.667}'
where var1 is:
4 -38 2 -42 1 -43 10 -44 1 -45 6 -46 1 -48 1 -49
When I run this, I am getting the error:
awk: run time error: negative field index $-38
FILENAME="-" FNR=1 NR=1
Does this have something to do with the 95-$(abs $i) part? I'm not sure how to fix this.

Try this:
echo "$var1" |
awk 'function abs(x) { return x<0 ? -x : x }
{ for (i=2;i<=NF;i+=2) $i = (95-abs($i))*1.667; print }'

Every line of input to AWK is placed in fields by the interpreter. The fields can be accessed with $N for N > 0. $0 means the whole line. $N for N < 0 is nonsensical. Variables are not prefixed with a dollar sign.

Related

Add x^2 to every "nonzero" coefficient with sed/awk

I have to write, as easy as possible, a script or command which has to use awk or/and sed.
Input file:
23 12 0 33
3 4 19
1st line n=3
2nd line n=2
In each line of file we have string of numbers. Each number is coefficient and we have to add x^n where n is the highest power (sum of spaces between numbers in each line (no space after last number in each line)) and if we have "0" in our string we have to skip it.
So for that input we will have output like:
23x^3+12x^2+33
3x^2+4x+19
Please help me to write a short script solving that problem. Thank you so much for your time and all the help :)
My idea:
linescount=$(cat numbers|wc -l)
linecounter=1
While[linecounter<=linescount];
do
i=0
for i in spaces=$(cat numbers|sed 1p | sed " " )
do
sed -i 's/ /x^spaces/g'
i=($(i=i-1))
done
linecounter=($(linecounter=linecounter-1))
done
Following awk may help you on same too.
awk '{for(i=1;i<=NF;i++){if($i!="" && $i){val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))} else {pointer++}};if(val){print val};val=""} pointer==NF{print;} {pointer=""}' Input_file
Adding a non-one liner form of solution too here.
awk '
{
for(i=1;i<=NF;i++){
if($i!="" && $i){
val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))}
else {
pointer++}};
if(val) {
print val};
val=""
}
pointer==NF {
print}
{
pointer=""
}
' Input_file
EDIT: Adding explanation too here for better understanding of OP and all people's learning here.
awk '
{
for(i=1;i<=NF;i++){ ##Starting a for loop from variable 1 to till the value of NF here.
if($i!="" && $i){ ##checking if variable i value is NOT NULL then do following.
val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))} ##creating variable val here and putting conditions here if val is NULL then
##simply take value of that field else concatenate the value of val with its
##last value. Second condition is to check if last field of line is there then
##keep it like that else it is second last then print "x" along with it else keep
##that "x^" field_number-1 with it.
else { ##If a field is NULL in current line then come here.
pointer++}}; ##Increment the value of variable named pointer here with 1 each time it comes here.
if(val) { ##checking if variable named val is NOT NULL here then do following.
print val}; ##Print the value of variable val here.
val="" ##Nullifying the variable val here.
}
pointer==NF { ##checking condition if pointer value is same as NF then do following.
print} ##Print the current line then, seems whole line is having zeros in it.
{
pointer="" ##Nullifying the value of pointer here.
}
' Input_file ##Mentioning Input_file name here.
Offering a Perl solution since it has some higher level constucts than bash that make the code a little simpler:
use strict;
use warnings;
use feature qw(say);
my #terms;
while (my $line = readline(*DATA)) {
chomp($line);
my $degree = () = $line =~ / /g;
my #coefficients = split / /, $line;
my #terms;
while ($degree >= 0) {
my $coefficient = shift #coefficients;
next if $coefficient == 0;
push #terms, $degree > 1
? "${coefficient}x^$degree"
: $degree > 0
? "${coefficient}x"
: $coefficient;
}
continue {
$degree--;
}
say join '+', #terms;
}
__DATA__
23 12 0 33
3 4 19
Example output:
hunter#eros  ~  perl test.pl
23x^3+12x^2+33
3x^2+4x+19
Any information you want on any of the builtin functions used above: readline, chomp, push, shift, split, say, and join can be found in perldoc with perldoc -f <function-name>
$ cat a.awk
function print_term(i) {
# Don't print zero terms:
if (!$i) return;
# Print a "+" unless this is the first term:
if (!first) { printf " + " }
# If it's the last term, just print the number:
if (i == NF) printf "%d", $i
# Leave the coefficient blank if it's 1:
coef = ($i == 1 ? "" : $i)
# If it's the penultimate term, just print an 'x' (not x^1):
if (i == NF-1) printf "%sx", coef
# Print a higher-order term:
if (i < NF-1) printf "%sx^%s", coef, NF - i
first = 0
}
{
first = 1
# print all the terms:
for (i=1; i<=NF; ++i) print_term(i)
# If we never printed any terms, print a "0":
print first ? 0 : ""
}
Example input and output:
$ cat file
23 12 0 33
3 4 19
0 0 0
0 1 0 1
17
$ awk -f a.awk file
23x^3 + 12x^2 + 33
3x^2 + 4x + 19
0
x^2 + 1
17
$ cat ip.txt
23 12 0 33
3 4 19
5 3 0
34 01 02
$ # mapping each element except last to add x^n
$ # -a option will auto-split input on whitespaces, content in #F array
$ # $#F will give index of last element (indexing starts at 0)
$ # $i>0 condition check to prevent x^0 for last element
$ perl -lane '$i=$#F; print join "+", map {$i>0 ? $_."x^".$i-- : $_} #F' ip.txt
23x^3+12x^2+0x^1+33
3x^2+4x^1+19
5x^2+3x^1+0
34x^2+01x^1+02
$ # with post processing
$ perl -lape '$i=$#F; $_ = join "+", map {$i>0 ? $_."x^".$i-- : $_} #F;
s/\+0(x\^\d+)?\b|x\K\^1\b//g' ip.txt
23x^3+12x^2+33
3x^2+4x+19
5x^2+3x
34x^2+01x+02
One possibility is:
#!/usr/bin/env bash
line=1
linemax=$(grep -oEc '(( |^)[0-9]+)+' inputFile)
while [ $line -lt $linemax ]; do
degree=$(($(grep -oE ' +' - <<<$(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1) | cut -d : -f 1 | uniq -c)+1))
coeffs=($(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1))
i=0
while [ $i -lt $degree ]; do
if [ ${coeffs[$i]} -ne 0 ]; then
if [ $(($degree-$i-1)) -gt 1 ]; then
echo -n "${coeffs[$i]}x^$(($degree-$i-1))+"
elif [ $(($degree-$i-1)) -eq 1 ]; then
echo -n "${coeffs[$i]}x"
else
echo -n "${coeffs[$i]}"
fi
fi
((i++))
done
echo
((line++))
done
The most important lines are:
# Gets degree of the equation
degree=$(($(grep -oE ' +' - <<<$(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1) | cut -d : -f 1 | uniq -c)+1))
# Saves coefficients in an array
coeffs=($(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1))
Here, grep -oE '(( |^)[0-9]+)+' finds lines containing only numbers (see edit). grep -oE ' +' - ........... |cut -d : -f 1 |uniq counts the numbers of coefficients per line as explained in this question.
Edit: An improved regex for capturing lines with only numbers is
grep -E '(( |^)[0-9]+)+' inputfile | grep -v '[a-zA-Z]'
sed -r "s/(.*) (.*) (.*) (.*)/\1x^3+\2x^2+\3x+\4/; \
s/(.*) (.*) (.*)/\1x^2+\2x+\3/; \
s/\+0x(^.)?\+/+/g; \
s/^0x\^.[+]//g; \
s/\+0$//g;" koeffs.txt
Line 1: Handle 4 elements
Line 2: Handle 3
Line 3: Handle 0 in the middle
Line 5: Handle 0 at start
Line 5: Handle 0 at end
Here is a more bashy, less sedy answer which is better readable, than the sed one, I think:
#!/bin/bash
#
# 0 4 12 => 12x^3
# 2 4 12 => 12x
# 3 4 12 => 12
term () {
p=$1
leng=$2
fac=$3
pot=$((leng - 1 - p))
case $pot in
0) echo -n '+'${fac} ;;
1) echo -n '+'${fac}x ;;
*) echo -n '+'${fac}x^$pot ;;
esac
}
handleArray () {
# mapfile puts a counter into the array, starting with 0 for the 1st
# get rid of it!
shift
coeffs=($*)
# echo ${coeffs[#]}
cnt=0
len=${#coeffs[#]}
while (( cnt < len ))
do
if [[ ${coeffs[$cnt]} != 0 ]]
then
term $cnt $len ${coeffs[$cnt]}
fi
((cnt++))
done
echo # -e '\n' # extra line for dbg, together w. line 5 of the function.
}
mapfile -n 0 -c 1 -C handleArray < ./koeffs.txt coeffs | sed -r "s/^\++//;s/\++$//;"
The mapfile reads data and produces an array. See help mapfile for a brief syntax introduction.
We need some counting, to know, to which power to raise. Meanwhile we try to get rid of 0-terms.
In the end I use sed to remove leading and trailing plusses.
sh solution
while read line ; do
set -- $line
while test $1 ; do
i=$(($#-1))
case $1 in
0) ;;
*) case $i in
0) j="" ;;
1) j="x" ;;
*) j="x^$i" ;;
esac
result="$result$1$j+";;
esac
shift
done
echo "${result%+}"
result=""
done < infile
$ cat tst.awk
{
out = sep = ""
for (i=1; i<=NF; i++) {
if ($i != 0) {
pwr = NF - i
if ( pwr == 0 ) { sfx = "" }
else if ( pwr == 1 ) { sfx = "x" }
else { sfx = "x^" pwr }
out = out sep $i sfx
sep = "+"
}
}
print out
}
$ awk -f tst.awk file
23x^3+12x^2+33
3x^2+4x+19
First, my test set:
$ cat file
23 12 0 33
3 4 19
0 1 2
2 1 0
Then the awk script:
$ awk 'BEGIN{OFS="+"}{for(i=1;i<=NF;i++)$i=$i (NF-i?"x^" NF-i:"");gsub(/(^|\+)0(x\^[0-9]+)?/,"");sub(/^\+/,""}1' file
23x^3+12x^2+33
3x^2+4x^1+19
1x^1+2
2x^2+1x^1
And an explanation:
$ awk '
BEGIN {
OFS="+" # separate with a + (negative values
} # would be dealt with in gsub
{
for(i=1;i<=NF;i++) # process all components
$i=$i (NF-i?"x^" NF-i:"") # add x and exponent
gsub(/(^|\+)0(x\^[0-9]+)?/,"") # clean 0s and leftover +s
sub(/^\+/,"") # remore leading + if first component was 0
}1' file # output
This might work for you (GNU sed);)
sed -r ':a;/^\S+$/!bb;s/0x\^[^+]+\+//g;s/\^1\+/+/;s/\+0$//;b;:b;h;s/\S+$//;s/\S+\s+/a/g;s/^/cba/;:c;s/(.)(.)\2\2\2\2\2\2\2\2\2\2/\1\1\2/;tc;s/([a-z])\1\1\1\1\1\1\1\1\1/9/;s/([a-z])\1\1\1\1\1\1\1\1/8/;s/([a-z])\1\1\1\1\1\1\1/7/;s/([a-z])\1\1\1\1\1\1/6/;s/([a-z])\1\1\1\1\1/5/;s/([a-z])\1\1\1\1/4/;s/([a-z])\1\1\1/3/;s/([a-z])\1\1/2/;s/([a-z])\1/1/;s/[a-z]/0/g;s/^0+//;G;s/(.*)\n(\S+)\s+/\2x^\1+/;ba' file
This is not a serious solution!
Shows how sed can count, kudos goes to Greg Ubben back in 1989 when he wrote wc in sed!

How to calculate the standard deviation of a column value by AWK in Bash? [duplicate]

This question already has answers here:
standard deviation of an arbitrary number of numbers using bc or other standard utilities
(5 answers)
Closed 5 years ago.
I have a data looks like:
condition A
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
then I calculated the mean value of this condition is 0.875 by using a awk command as below: (basically it's just sum all value divided by number of row)
Mean: cat $a.csv | awk -F"," '$1=="Picture" && $2=="1" && $3=="hit" && $4==1{c++} END {print c/16}'
My question is how to calculate standard deviation of this condition?
I already know SD of this condition is 0.3415650255 calculated by EXCEL...
And I already tried out several awk commands but still cannot get this result right...
cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4=="2"{c++} END {c=0;ssq=0;for (i=1;i<=16;i++){c+=$i;ssq+=$i**2}; print (ssq/16-(c/16)**2)**0.5}'
cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4==2{c++} {delta=$4-(c/16); avg==delta/16;mean2+=delta*($4-avg);} END { avg=c/16; printf "mean: %f. standard deviation: %f \n", avg, sqrt(mean2/16) }'
cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4==2{c++} END { avg=c/16; printf "mean: %f. standard deviation: %f \n", avg, sqrt((c/16-1)-(c/16-1)^2) }'
I still cannot get the right standard deviation in this condition.
Does anyone know where is the problem?
Recall how to calculate standard deviation. You need all the values since you need individual differences from the mean.
Doing manually first, in Excel:
Now you can implement that easily in any language that has arrays and math functions.
In awk:
$ echo "1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0" | tr " " "\n" > file
$ awk 'function sdev(array) {
for (i=1; i in array; i++)
sum+=array[i]
cnt=i-1
mean=sum/cnt
for (i=1; i in array; i++)
sqdif+=(array[i]-mean)**2
return (sqdif/(cnt-1))**0.5
}
{sum1[FNR]=$1}
END {print sdev(sum1)}' file
0.341565

Awk error argument list too long, only two variables

I am trying to sum two numbers of a table (in file) with awk (inside a loop), using variables passed from a bash script like described below. The numbers that I am dealing with ($value) are floating point values. $value only contains one number. Note that the line where the error occurs is 126, 125 is working fine.
111 while read line
112 do
120 n=2
121 sum=0
122
123 for x in $(seq 1 $number)
124 do
125 value=$(echo "$line" | awk -v n="$n" '{print $n}') # I am just getting the values to sum up here
126 sum=$(awk -v sum="$sum" -v value="$value" '{sum = sum + value; print sum}')
n=$((n+1))
129 done
done < $file
Where $number is defined previously.
I get the following error:
./script.sh: line 126: /bin/awk: Argument list too long
I am only trying to pass two variables on the awk command on this line, any idea why I am getting this error?
An example of the table in the "file":
A -0.717616 -0.623398 -0.214494 -0.352871
B -0.19373 -0.140626 -0.0523623 0.0248858
C -0.0822092 -0.302354 0.347158 -0.0373262
D 0.310213 0.312805 0.114366 0.353496
E -0.175354 -0.0263985 -0.125694 -0.155082
Thank you!
There are two problems with your call to awk: you haven't specified an input file, so it is reading from its inherited standard input (which is the file the while loop is also trying to read from), and it outputs each line it reads, not just the value of sum (fixed while I was typing this).
This is a very inefficient way to add up the numbers in the file, by the way, but here is a corrected version:
while read line
do
n=2
sum=0
for x in $(seq 1 $number)
do
value=$(echo "$line" | awk -v n="$n" '{print $n}')
sum=$(awk -v sum="$sum" -v value="$value" 'BEGIN {sum = sum + value; print sum}' </dev/null)
n=$((n+1))
done
done < $file
A better solution:
awk 'BEGIN {sum=0}
{for (i=2;i<=NF;i++) { sum = sum + $i }}
END {print sum}' < $file

bash, find nearest next value, forward and backward

I have a data.txt file
1 2 3 4 5 6 7
cat data.txt
13 245 1323 10.1111 10.2222 60.1111 60.22222
13 133 2325 11.2222 11.333 61.2222 61.3333
13 245 1323 12.3333 12.4444 62.3333 62.44444444
13 245 1323 13.4444 13.5555 63.4444 63.5555
Find next nearest: My target value is 11.6667 and it should find the nearest next value in column 4 as 12.3333
Find previous nearest: My target value is 62.9997 and it should find the nearest previous value in column 6 as 62.3333
I am able to find the next nearest (case 1) by
awk -v c=4 -v t=11.6667 '{a[NR]=$c}END{
asort(a);d=a[NR]-t;d=d<0?-d:d;v = a[NR]
for(i=NR-1;i>=1;i--){
m=a[i]-t;m=m<0?-m:m
if(m<d){
d=m;v=a[i]
}
}
print v
}' f
12.3333
Any bash solution? for finding the previous nearest (case 2)?
Try this:
$ cat tst.awk
{
if ($fld > tgt) {
del = $fld - tgt
if ( (del < minGtDel) || (++gtHit == 1) ) {
minGtDel = del
minGtVal = $fld
}
}
else if ($fld < tgt) {
del = tgt - $fld
if ( (del < minLtDel) || (++ltHit == 1) ) {
minLtDel = del
minLtVal = $fld
}
}
else {
minEqVal = $fld
}
}
END {
print (minGtVal == "" ? "NaN" : minGtVal)
print (minLtVal == "" ? "NaN" : minLtVal)
print (minEqVal == "" ? "NaN" : minEqVal)
}
.
$ awk -v fld=4 -v tgt=11.6667 -f tst.awk file
12.3333
11.2222
NaN
$ awk -v fld=6 -v tgt=62.9997 -f tst.awk file
63.4444
62.3333
NaN
$ awk -v fld=6 -v tgt=62.3333 -f tst.awk file
63.4444
61.2222
62.3333
For the first part:
awk -v v1="11.6667" '$4>v1 {print $4;exit}' file
12.3333
And second part:
awk -v v2="62.9997" '$6>v2 {print p;exit} {p=$6}' file
62.3333
Both in one go:
awk -v v1="11.6667" -v v2="62.9997" '$4>v1 && !p1 {p1=$4} $6>v2 && !p2 {p2=p} {p=$6} END {print p1,p2}' file
12.3333 62.3333
I don't know if this is what you're looking for, but this is what I came up with, not knowing awk:
#!/bin/sh
IFSBAK=$IFS
IFS=$'\n'
best=
for line in `cat $1`; do
IFS=$' \t'
arr=($line)
num=${arr[5]}
[[ -z $best ]] && best=$num
if [ $(bc <<< "$num < 62.997") -eq 1 ]; then
if [ $(bc <<< "$best < $num") -eq 1 ]; then
best=$num
fi
fi
IFS=$'\n'
done
IFS=$IFSBAK
echo $best
If you want, you can add the column and the input value 62.997 as paramters, I didn't to demonstrate that it would look for specifically what you want.
Edited to remove assumption that file is sorted.
You solution looks unnecessarily complicated (storing a whole array and sorting it) and I think you would see the bash solution if you re-thought your awk.
In awk you can detect the first line with
FNR==1 {do something}
so on the first line, set a variable BestYet to the value in the column you are searching.
On subsequent lines, simply test if the value in the column you are checking is
a) less than your target AND
b) greater than `BestYet`
if it is, update BestYet. At the end, print BestYet.
In bash, apply the same logic, but read each line into a bash array and use ${a[n]} to get the n'th element.

Add leading zeroes to awk variable

I have the following awk command within a "for" loop in bash:
awk -v pdb="$pdb" 'BEGIN {file = 1; filename = pdb"_" file ".pdb"}
/ENDMDL/ {getline; file ++; filename = pdb"_" file ".pdb"}
{print $0 > filename}' < ${pdb}.pdb
This reads a series of files with the name $pdb.pdb and splits them in files called $pdb_1.pdb, $pdb_2.pdb, ..., $pdb_21.pdb, etc. However, I would like to produce files with names like $pdb_01.pdb, $pdb_02.pdb, ..., $pdb_21.pdb, i.e., to add padding zeros to the "file" variable.
I have tried without success using printf in different ways. Help would be much appreciated.
Here's how to create leading zeros with awk:
# echo 1 | awk '{ printf("%02d\n", $1) }'
01
# echo 21 | awk '{ printf("%02d\n", $1) }'
21
Replace %02 with the total number of digits you need (including zeros).
Replace file on output with sprintf("%02d", file).
Or even the whole assigment with filename = sprintf("%s_%02d.pdb", pdb, file);.
This does it without resort of printf, which is expensive. The first parameter is the string to pad, the second is the total length after padding.
echo 722 8 | awk '{ for(c = 0; c < $2; c++) s = s"0"; s = s$1; print substr(s, 1 + length(s) - $2); }'
If you know in advance the length of the result string, you can use a simplified version (say 8 is your limit):
echo 722 | awk '{ s = "00000000"$1; print substr(s, 1 + length(s) - 8); }'
The result in both cases is 00000722.
Here is a function that left or right-pads values with zeroes depending on the parameters: zeropad(value, count, direction)
function zeropad(s,c,d) {
if(d!="r")
d="l" # l is the default and fallback value
return sprintf("%" (d=="l"? "0" c:"") "d" (d=="r"?"%0" c-length(s) "d":""), s,"")
}
{ # test main
print zeropad($1,$2,$3)
}
Some tests:
$ cat test
2 3 l
2 4 r
2 5
a 6 r
The test:
$ awk -f program.awk test
002
2000
00002
000000
It's not fully battlefield tested so strange parameters may yield strange results.

Resources