Awk number comparsion in Bash - bash

I'm trying to print records from line number from 10 to 15 from input file number-src. I tried with below code but it prints all records irrespective of line number.
awk '{
count++
if ( $count >= 10 ) AND ( $count <= 15 )
{
printf "\n" $0
}
}' number_src

awk is not bash just like C is not bash, they are separate tools/languages with their very own syntax and semantics:
awk 'NR>=10 && NR<=15' number_src
Get the book Effective Awk Programming, by Arnold Robbins.

Two issues why your script is not working:
logical AND should be &&.
use count as variable name, when referencing it, not $count.
Here is a working version:
awk '{
count++
if ( count >= 10 && count <= 15 )
{
print $0
}
}' numbers_src
As stated in the quickest answer, for your task NR is the awk-way to do the same task.
For further information, please see the relevant documentation entries about boolean expressions and using variables.

Related

Hi, trying to obtain the mean from the array values using awk?

Im new to bash programming. Here im trying to obtain the mean from the array values.
Heres what im trying:
${GfieldList[#]} | awk '{ sum += $1; n++ } END { if (n > 0) print "mean: " sum / n; }';
Using $1 Im not able to get all the values? Guys pls help me out in this...
For each non-empty line of input, this will sum everything on the line and print the mean:
$ echo 21 20 22 | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "mean=" sum / NF; }'
mean=21
How it works
NF
This serves as a condition: the statements which follow will only be executed if the number of fields on this line, NF, evaluates to true, meaning non-zero.
sum=0
This initializes sum to zero. This is only needed if there is more than one line.
for (i=1;i<=NF;i++)sum+=$i
This sums all the fields on this line.
print "mean=" sum / NF
This prints the sum of the fields divided by the number of fields.
The bare
${GfieldList[#]}
will not print the array to the screen. You want this:
printf "%s\n" "${GfieldList[#]}"
All those quotes are definitely needed .

NZEC Error in AWK programming

I am trying to solve the SPOJ problem SIZECON using the awk programming language.Using the below code
awk ' {
t = $1;
while ( t-- ) {
getline b;
x + b * (b > 0);
print x;
}
exit;
}'
OUTPUT:
4(No.of test cases)
5
5
-5
5
6
11
-1
11
The Expected INPUT and OUTPUT is:
Input:
4
5
-5
6
-1
Output:
11
the code works perfectly fine on my linux system and getting error when submitting in spoj (NZEC ERROR).Can anyone help me ?Thanks in Advance.
This might be what you want:
$ awk 'NR<2{t=$0;next} $0>0{s+=$0} NR>t{print s+0;exit}' file
11
I originally was just going to test for t having a value but the requirements on that site just say it will be less than 1000 so I guess it could be zero.
Also you need to print s+0 to ensure you get a numeric value instead of a null string if t is zero or the file is empty.
NR<2 tests for the first input line. It would be more naturally written as NR==1 but I understand you are looking for brevity over clarity.
awk scripts are made of a series of <condition> { <action> } segments, wrapped in an implicit while read loop so the posted script is equivalent to this pseudo-code in a procedural language:
while read line from file
do
lineNr++
if (lineNr < 2) {
t=$0
next
}
if (line > 0) {
s+=$0
}
if (lineNr > t) {
print s+0
exit
}
done
I think you should be able to figure the rest out given that and with google and the awk man pages when needed.
Given the set of integers, find the sum of all positive integers in it.
This is what you're doing? Seems pretty simple:
awk '
{
if ( NR == 1 ) {
total_to_read = $0 + 1
next
}
if ( $0 > 0 ) total += $0
if ( total_to_read == NR ) {
print total
exit
}
}' test.txt
The END phase is what you want to do at the end of the loop. I am simply taking each element in the loop and adding it to total if it's greater than 0.
It's not that simple. He needs to only read the number of values specified by the integer on the first line of input and he needs the briefest possible solution (excepting white space) - Ed Morton
My original answer was to show that you were overthinking Awk. Awk does the loop for you.
I've modified the above program to include the read the first number requirement. No more END needed. I save the first value, and go to the next line. When I get to the total lines to read, I print out that total, and do exit which should end my loop.
You can see this is actually equivalent to the psuedo-code given in Ed Morton's answer. It should be easier to understand.
Ed Morton pointed out that Awk can have a series of <expression> {code} segments. I always knew you could have one, but never thought of doing it multiple times.
This means that I could use this to imply if statements instead of spelling them out. Making your code a wee bit shorter:
awk '
( NR == 1 ) {
total_to_read = $1 + 1
next
}
( $0 > 0 ) {total += $0}
( total_to_read == NR ) {
print total
exit
}' test.txt
To make it even shorter, we could use shorter variable names. Let's use t for total_to_read and s for total:
awk '
( NR == 1 ) {
t = $1 + 1
next
}
( $0 > 0 ) {s += $0}
( t == NR ) {
print s
exit
}' test.txt
A few more tweaks. Instead of equals NR == 1, I'll do NR < 2. NR is the number of records, and if you are talking about NR being less than 2, it has to be 1. You can't have zero or negative number of records in your implied awk loop.
In my original program, I was adding 1 to t (total lines to read), then testing to exit if t == NR. If I don't add 1 to the total lines to read, I save a few characters, and I can test t > NR which saves another character:
awk '
( NR < 2 ) {
t = $0
next
}
( $0 > 0 ) {s += $0}
( t > NR ) {
print s
exit
}' test.txt
Now, let's eliminate all that useless whitespace and cram it all together!
awk 'NR<2{t=$0;next} $0>0{s+=$0} NR>t{print s+0;exit}' test.txt
And, I get Ed Morton's answer... Damn.
Well, at least I hope you understand this step-by-step explanation, and understand how Ed Morton's solution works.

How to efficiently sum two columns in a file with 270,000+ rows in bash

I have two columns in a file, and I want to automate summing both values per row
for example
read write
5 6
read write
10 2
read write
23 44
I want to then sum the "read" and "write" of each row. Eventually after summing, I'm finding the max sum and putting that max value in a file. I feel like I have to use grep -v to rid of the column headers per row, which like stated in the answers, makes the code inefficient since I'm grepping the entire file just to read a line.
I currently have this in a bash script (within a for loop where $x is the file name) to sum the columns line by line
lines=`grep -v READ $x|wc -l | awk '{print $1}'`
line_num=1
arr_num=0
while [ $line_num -le $lines ]
do
arr[$arr_num]=`grep -v READ $x | sed $line_num'q;d' | awk '{print $2 + $3}'`
echo $line_num
line_num=$[$line_num+1]
arr_num=$[$arr_num+1]
done
However, the file to be summed has 270,000+ rows. The script has been running for a few hours now, and it is nowhere near finished. Is there a more efficient way to write this so that it does not take so long?
Use awk instead and take advantage of modulus function:
awk '!(NR%2){print $1+$2}' infile
awk is probably faster, but the idiomatic bash way to do this is something like:
while read -a line; do # read each line one-by-one, into an array
# use arithmetic expansion to add col 1 and 2
echo "$(( ${line[0]} + ${line[1]} ))"
done < <(grep -v READ input.txt)
Note the file input file is only read once (by grep) and the number of externally forked programs is kept to a minimum (just grep, called only once for the whole input file). The rest of the commands are bash builtins.
Using the <( ) process substition, in case variables set in the while loop are required out of scope of the while loop. Otherwise a | pipe could be used.
Your question is pretty verbose, yet your goal is not clear. The way I read it, your numbers are on every second line, and you want only to find the maximum sum. Given that:
awk '
NR%2 == 1 {next}
NR == 2 {max = $1+$2; next}
$1+$2 > max {max = $1+$2}
END {print max}
' filename
You could also use a pipeline with tools that implicitly loop over the input like so:
grep -v read INFILE | tr -s ' ' + | bc | sort -rn | head -1 > OUTFILE
This assumes there are spaces between your read and write data values.
Why not run:
awk 'NR==1 { print "sum"; next } { print $1 + $2 }'
You can afford to run it on the file while the other script it still running. It'll be complete in a few seconds at most (prediction). When you're confident it's right, you can kill the other process.
You can use Perl or Python instead of awk if you prefer.
Your code is running grep, sed and awk on each line of the input file; that's damnably expensive. And it isn't even writing the data to a file; it is creating an array in Bash's memory that'll need to be printed to the output file later.
Assuming that it's always one 'header' row followed by one 'data' row:
awk '
BEGIN{ max = 0 }
{
if( NR%2 == 0 ){
sum = $1 + $2;
if( sum > max ) { max = sum }
}
}
END{ print max }' input.txt
Or simply trim out all lines that do not conform to what you want:
grep '^[0-9]\+\s\+[0-9]\+$' input.txt | awk '
BEGIN{ max = 0 }
{
sum = $1 + $2;
if( sum > max ) { max = sum }
}
END{ print max }' input.txt

increment values in column within file with bash, sed and awk

Please find below an excerpt from one of my file.
1991;1;-7;-3;-9;-4;-7
1991;1;-7;-3;-9;-4;-7
1991;1;-7;-3;-9;-4;-7
1991;2;-14;-11;-14;-4;-14
1991;2;-14;-11;-14;-4;-14
1991;2;-14;-11;-14;-4;-14
1991;3;-7;-3;-15;5;-7
1991;3;-7;-3;-15;5;-7
1991;3;-7;-3;-15;5;-7
1991;4;-15;-9;-21;1;-16
1991;4;-15;-9;-21;1;-16
1991;4;-15;-9;-21;1;-16
1992;1;-12;-6;-19;-2;-12
1992;1;-12;-6;-19;-2;-12
1992;1;-12;-6;-19;-2;-12
1992;2;-16;-7;-22;-12;-15
1992;2;-16;-7;-22;-12;-15
1992;2;-16;-7;-22;-12;-15
1992;3;-22;-15;-25;-16;-24
1992;3;-22;-15;-25;-16;-24
I'm trying through sed or/and awk to add + 1 on the second column for the second row for the second row as long as the year in the first column remains the same.
The results would be the following:
1991;1;-7;-3;-9;-4;-7
1991;2;-7;-3;-9;-4;-7
1991;3;-7;-3;-9;-4;-7
1991;4;-14;-11;-14;-4;-14
1991;5;-14;-11;-14;-4;-14
1991;6;-14;-11;-14;-4;-14
1991;7;-7;-3;-15;5;-7
1991;8;-7;-3;-15;5;-7
1991;9;-7;-3;-15;5;-7
1991;10;-15;-9;-21;1;-16
1991;11;-15;-9;-21;1;-16
1991;12;-15;-9;-21;1;-16
1992;1;-12;-6;-19;-2;-12
1992;2;-12;-6;-19;-2;-12
1992;3;-12;-6;-19;-2;-12
1992;4;-16;-7;-22;-12;-15
1992;5;-16;-7;-22;-12;-15
1992;6;-16;-7;-22;-12;-15
1992;7;-22;-15;-25;-16;-24
1992;8;-22;-15;-25;-16;-24
I've seen countless examples on stackflow but nothing that can lead me close to a solution.
I welcome any suggestions.
Best,
If you always want the 2nd column to be 1 for the line in which the year first appears in column 1, then:
awk -F\; '$1!=l{c=0}{$2=++c}{l=$1}1' OFS=\; input
If you want to maintain whatever was in column 2:
awk -F\; '$1!=l{c=$2}{$2=c++}{l=$1}1' OFS=\; input
This could be done more tersely with awk, but pure bash works fine:
last_year=
counter_val=
while IFS=';' read -r year old_counter rest; do
if [[ $year = "$last_year" ]]; then
(( ++counter_val ))
else
counter_val=1
last_year=$year
fi
printf -v result '%s;' "$year" "$counter_val" "$rest"
printf '%s\n' "${result%;}"
done <input.txt >output.txt
You simply want to increment your second column, and not add one to it? Do you want the second column to go from one onward no matter what the second column is?
awk -F\; '{
if ( NR == 1 ) {
year = $0
}
if ( year == $0 ) {
for (count = 1; count < NF; count++) {
if ( count == 2) {
printf NR ";";
}
else {
printf $count ";";
}
}
print "";
}
else {
print
}
}' test.txt
Awk is a natural program to use because it operates in assuming a loop. Plus, it's math is more natural than plain shell.
The NR means Number of Records and NF means Number of fields. A field is separated by my -F\; parameter, and the record is the line number in my file. The rest of the program is pretty obvious.
Using awk, set the FS (field separator) and OFS (output field separator) to ';' and
for each new year record set the val counter to the start column 2 value. Increment val for each line with that year.
awk -F';' 'BEGIN{OFS=";";y=0}
{ if (y!=$1)
{y=$1;val=$2;print}
else
{val++;print $1,val,$3,$4,$5,$6,$7}}' data_file

Filter a file using shell script tools

I have a file which contents are
E006:Jane:HR:9800:Asst
E005:Bob:HR:5600:Exe
E002:Barney:Purc:2300:PSE
E009:Miffy:Purc:3600:Mngr
E001:Franny:Accts:7670:Mngr
E003:Ostwald:Mrktg:4800:Trainee
E004:Pearl:Accts:1800:SSE
E009:Lala:Mrktg:6566:SE
E018:Popoye:Sales:6400:QAE
E007:Olan:Sales:5800:Asst
I want to fetch List all employees whose emp codes are between E001 and E018 using command including pipes is it possible to get ?
Use sed:
sed -n -e '/^E001:/,/^E018:/p' data.txt
That is, print the lines that are literally between those lines that start with E001 and E018.
If you want to get the employees that are numerically between those, one way to do that would be to do comparisons inline using something like awk (as suggested by hochl). Or, you could take this approach preceded by a sort (if the lines are not already sorted).
sort data.txt | sed -n -e '/^E001:/,/^E018:/p'
You can use awk for such cases:
$ gawk 'BEGIN { FS=":" } /^E([0-9]+)/ { n=substr($1, 2)+0; if (n >= 6 && n <= 18) { print } }' < data.txt
E006:Jane:HR:9800:Asst
E009:Miffy:Purc:3600:Mngr
E009:Lala:Mrktg:6566:SE
E018:Popoye:Sales:6400:QAE
E007:Olan:Sales:5800:Asst
Is that the result you want? This example intentionally only prints employees between 6 and 18 to show that it filters out records. You may print some fields only using $1 or $2 as in print $1 " " $2.
You can try something like this: cut -b2- | awk '{ if ($1 < 18) print "E" $0 }'
Just do string comparison: Since all your sample data matches, I changed the boundaries for illustration
awk -F: '"E004" <= $1 && $1 <= "E009" {print}'
output
E006:Jane:HR:9800:Asst
E005:Bob:HR:5600:Exe
E009:Miffy:Purc:3600:Mngr
E004:Pearl:Accts:1800:SSE
E009:Lala:Mrktg:6566:SE
E007:Olan:Sales:5800:Asst
You can pass the strings as variables if you don't want to hard-code them in the awk script
awk -F: -v start=E004 -v stop=E009 'start <= $1 && $1 <= stop {print}'

Resources