NZEC Error in AWK programming - bash

I am trying to solve the SPOJ problem SIZECON using the awk programming language.Using the below code
awk ' {
t = $1;
while ( t-- ) {
getline b;
x + b * (b > 0);
print x;
}
exit;
}'
OUTPUT:
4(No.of test cases)
5
5
-5
5
6
11
-1
11
The Expected INPUT and OUTPUT is:
Input:
4
5
-5
6
-1
Output:
11
the code works perfectly fine on my linux system and getting error when submitting in spoj (NZEC ERROR).Can anyone help me ?Thanks in Advance.

This might be what you want:
$ awk 'NR<2{t=$0;next} $0>0{s+=$0} NR>t{print s+0;exit}' file
11
I originally was just going to test for t having a value but the requirements on that site just say it will be less than 1000 so I guess it could be zero.
Also you need to print s+0 to ensure you get a numeric value instead of a null string if t is zero or the file is empty.
NR<2 tests for the first input line. It would be more naturally written as NR==1 but I understand you are looking for brevity over clarity.
awk scripts are made of a series of <condition> { <action> } segments, wrapped in an implicit while read loop so the posted script is equivalent to this pseudo-code in a procedural language:
while read line from file
do
lineNr++
if (lineNr < 2) {
t=$0
next
}
if (line > 0) {
s+=$0
}
if (lineNr > t) {
print s+0
exit
}
done
I think you should be able to figure the rest out given that and with google and the awk man pages when needed.

Given the set of integers, find the sum of all positive integers in it.
This is what you're doing? Seems pretty simple:
awk '
{
if ( NR == 1 ) {
total_to_read = $0 + 1
next
}
if ( $0 > 0 ) total += $0
if ( total_to_read == NR ) {
print total
exit
}
}' test.txt
The END phase is what you want to do at the end of the loop. I am simply taking each element in the loop and adding it to total if it's greater than 0.
It's not that simple. He needs to only read the number of values specified by the integer on the first line of input and he needs the briefest possible solution (excepting white space) - Ed Morton
My original answer was to show that you were overthinking Awk. Awk does the loop for you.
I've modified the above program to include the read the first number requirement. No more END needed. I save the first value, and go to the next line. When I get to the total lines to read, I print out that total, and do exit which should end my loop.
You can see this is actually equivalent to the psuedo-code given in Ed Morton's answer. It should be easier to understand.
Ed Morton pointed out that Awk can have a series of <expression> {code} segments. I always knew you could have one, but never thought of doing it multiple times.
This means that I could use this to imply if statements instead of spelling them out. Making your code a wee bit shorter:
awk '
( NR == 1 ) {
total_to_read = $1 + 1
next
}
( $0 > 0 ) {total += $0}
( total_to_read == NR ) {
print total
exit
}' test.txt
To make it even shorter, we could use shorter variable names. Let's use t for total_to_read and s for total:
awk '
( NR == 1 ) {
t = $1 + 1
next
}
( $0 > 0 ) {s += $0}
( t == NR ) {
print s
exit
}' test.txt
A few more tweaks. Instead of equals NR == 1, I'll do NR < 2. NR is the number of records, and if you are talking about NR being less than 2, it has to be 1. You can't have zero or negative number of records in your implied awk loop.
In my original program, I was adding 1 to t (total lines to read), then testing to exit if t == NR. If I don't add 1 to the total lines to read, I save a few characters, and I can test t > NR which saves another character:
awk '
( NR < 2 ) {
t = $0
next
}
( $0 > 0 ) {s += $0}
( t > NR ) {
print s
exit
}' test.txt
Now, let's eliminate all that useless whitespace and cram it all together!
awk 'NR<2{t=$0;next} $0>0{s+=$0} NR>t{print s+0;exit}' test.txt
And, I get Ed Morton's answer... Damn.
Well, at least I hope you understand this step-by-step explanation, and understand how Ed Morton's solution works.

Related

label the lines that have and do not have result in the next line

I have a list like this:
#chrom start end seq
#chrom start end seq
#chrom start end seq
chr1 214435102 214435132 AAACCGGTCAGCTT...
chr1 214435135 214435165 TCAATGGACTGTTC...
#chrom start end seq
chr1 214873901 214873931 CCAAATCCCTCAGG...
As it is seen some of them have results (3rd and 4th) and some of them do not (1st and 2nd)
What I am trying to do is first read the line that starts with '#chrom' and read the line after that line. If the next line also starts with '#chrom' print 0, if it starts with something else print 1. And do it for every line that starts with '#chrom' without passing any.
I am kind of trying to label the ones that have sequences. I am guessing that there would be an easier way of doing it but what I could create up until now is two lines of code;
awk '/#chrom/{getline; print}' raw.txt > nextLine.txt
awk '$1 == "#chrom" { print "0" } $1 != "#chrom" { print "1" }' nextLine.txt > labeled.txt
Expected output in the labeled.txt;
0
0
1
1
I guess the second line of the code works well. However, the line counts that include '#chrom' in the raw.txt and nextLine.txt are not matching. If you could help me with that I would appreciate it.
Thank you
As in life, in software its much easier to do things based on what HAS happened than on what WILL happen. So don't write requirements based on what the NEXT line of input will be, write them based on what the PREVIOUS line of input was and you'll find it much easier to figure out the matching code and that code will be simpler than trying to determine the next line of input.
$ cat tst.awk
($1 == "#chrom") && (NR > 1) {
print ( prev == "#chrom" ? 0 : 1 )
}
{ prev = $1 }
END {
print ( prev == "#chrom" ? 0 : 1 )
}
$ awk -f tst.awk file
0
0
1
1
This should do it:
awk 'BEGIN { chrom=0 } {
if ($1=="#chrom") {
if (chrom==1) print 0; else chrom=1; }
else {
if (chrom==1) print 1; chrom=0 }
}'
One awk idea:
awk '
{ if (prev=="#chrom") # for 1st line of input prev==""
print ($1 == "#chrom" ? 0 : 1) # use ternary operator to determine output
prev=$1
}
' raw.txt
or as a one-liner:
awk '{if (prev=="#chrom") print ($1 == "#chrom" ? 0 : 1); prev=$1}' raw.txt
This generates:
0
0
1
1

Merging word counts with Bash and Unix

I made a Bash script that extracts words from a text file with grep and sed and then sorts them with sort and counts the repetitions with wc, then sort again by frequency. The example output looks like this:
12 the
7 code
7 with
7 add
5 quite
3 do
3 well
1 quick
1 can
1 pick
1 easy
Now I'd like to merge all words with the same frequency into one line, like this:
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy
Is there any way to do that with Bash and standard Unix toolset? Or I would have to write a script / program in some more sophisticated scripting language?
With awk:
$ echo "12 the
7 code
7 with
7 add
5 quite
3 do
3 well
1 quick
1 can
1 pick
1 easy" | awk '{cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2} END {for (e in cnt) print e, cnt[e]} ' | sort -nr
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy
You can do something similar with Bash 4 associative arrays. awk is easier and POSIX though. Use that.
Explanation:
awk splits the line apart by the separator in FS, in this case the default of horizontal whitespace;
$1 is the first field of the count - use that to collect items with the same count in an associative array keyed by the count with cnt[$1];
cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2 is a ternary assignment - if cnt[$1] has no value, just assign the second field $2 to it (The RH of :). If it does have a previous value, concatenate $2 separated by the value of OFS (the LH of :);
At the end, print out the value of the associative array.
Since awk associative arrays are unordered, you need to sort again by the numeric value of the first column. gawk can sort internally, but it is just as easy to call sort. The input to awk does not need to be sorted, so you can eliminate that part of the pipeline.
If you want the digits to be right justified (as your have in your example):
$ awk '{cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2}
END {for (e in cnt) printf "%3s %s\n", e, cnt[e]} '
If you want gawk to sort numerically by descending values, you can add PROCINFO["sorted_in"]="#ind_num_desc" prior to traversing the array:
$ gawk '{cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2}
END {PROCINFO["sorted_in"]="#ind_num_desc"
for (e in cnt) printf "%3s %s\n", e, cnt[e]} '
With single GNU awk expression (without sort pipeline):
awk 'BEGIN{ PROCINFO["sorted_in"]="#ind_num_desc" }
{ a[$1]=(a[$1])? a[$1]" "$2:$2 }END{ for(i in a) print i,a[i]}' file
The output:
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy
Bonus alternative solution using GNU datamash tool:
datamash -W -g1 collapse 2 <file
The output (comma-separated collapsed fields):
12 the
7 code,with,add
5 quite
3 do,well
1 quick,can,pick,easy
awk:
awk '{a[$1]=a[$1] FS $2}!b[$1]++{d[++c]=$1}END{while(i++<c)print d[i],a[d[i]]}' file
sed:
sed -r ':a;N;s/(\b([0-9]+).*)\n\s*\2/\1/;ta;P;D'
You start with sorted data, so you only need a new line when the first field changes.
echo "12 the
7 code
7 with
7 add
5 quite
3 do
3 well
1 quick
1 can
1 pick
1 easy" |
awk '
{
if ($1==last) {
printf(" %s",$2)
} else {
last=$1;
printf("%s%s",(NR>1?"\n":""),$0)
}
}; END {print}'
next time you find yourself trying to manipulate text with a combination of grep and sed and shell and..., stop and just use awk instead - the end result will be clearer, simpler, more efficient, more portable, etc...
$ cat file
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness.
.
$ cat tst.awk
BEGIN { FS="[^[:alpha:]]+" }
{
for (i=1; i<NF; i++) {
word2cnt[tolower($i)]++
}
}
END {
for (word in word2cnt) {
cnt = word2cnt[word]
cnt2words[cnt] = (cnt in cnt2words ? cnt2words[cnt] " " : "") word
printf "%3d %s\n", cnt, word
}
for (cnt in cnt2words) {
words = cnt2words[cnt]
# printf "%3d %s\n", cnt, words
}
}
$
$ awk -f tst.awk file | sort -rn
4 was
4 the
4 of
4 it
2 times
2 age
1 worst
1 wisdom
1 foolishness
1 best
.
$ cat tst.awk
BEGIN { FS="[^[:alpha:]]+" }
{
for (i=1; i<NF; i++) {
word2cnt[tolower($i)]++
}
}
END {
for (word in word2cnt) {
cnt = word2cnt[word]
cnt2words[cnt] = (cnt in cnt2words ? cnt2words[cnt] " " : "") word
# printf "%3d %s\n", cnt, word
}
for (cnt in cnt2words) {
words = cnt2words[cnt]
printf "%3d %s\n", cnt, words
}
}
$
$ awk -f tst.awk file | sort -rn
4 it was of the
2 age times
1 best worst wisdom foolishness
Just uncomment whichever printf line you like in the above script to get whichever type of output you want. The above will work in any awk on any UNIX system.
Using miller's nest verb:
mlr -p nest --implode --values --across-records -f 2 --nested-fs ' ' file
Output:
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy

not getting array value in awk

I want to insert array values with all other contents of testfile.ps into result.ps file but array values not getting printed,please help.
My requirement is every time condition is met array next index value should get printed with other contents of testfile.ps into result.ps
actually arr[0] and arr[1] are big strings in my project but for simplicity i am editing it
#!/bin/bash
a[0]=""lineto""\n""stroke""
a[1]=""476.00"" ""26.00""
awk '{ if($1 == "(Page" ){for (i=0; i<2; i++){print $arr[i]; print $0; }}
else print }' testfile.ps > result.ps
testfile.ps
(Page 1 of 2 )
move
(Page 1 of 3 )
"gsave""\n""2.00"" ""setlinewidth""\n"
result.ps should be
(Page 1 of 2 )
lineto
stroke
move
(Page 1 of 3 )
476.00 26.00
gsave
2.00
setlinewidth
means once second time condition is met array index should be incremented to 1 and it should print a[1]
i applied this approch also,with only single array element but not getting any output
awk -v "a0=$a[0]" 'BEGIN {a[0]=""lineto""stroke""; if($1 == "move" ){for (i in a){ print a0;print $0; }} else print }' testfile.txt
edited:
hi , I have resolved the issue up to some extent but stuck at one place, how can i compare two strings like "a=476.00 1.00 lineto\nstroke\ngrestore\n" and "b=26.00 moveto\n368.00 1.00 lineto\n" in awk command, i am trying
awk -v "a=476.00 1.00 lineto\nstroke\ngrestore\n" -v "b=26.00 moveto\n368.00 1.00 lineto\n" -v "i=$a" '{
if ($1 == "(Page" && ($2%2==0 || $2==1) && $3 == "of"){
print i;
if [ i == a ];then
i=b; print $0;
fi
else if [ i == b ];then
i=c; print $0;
fi
else print $0;
}'testfile.txt
You are using in your awk program a variable arr which is never initialized.
In your case, you want to pass a variable from the shell to awk. From the awk man page:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such
variable values are available to the BEGIN rule of an AWK program.
Hence, you need something like
awk -v "a0=$a[0]" -v "a1=$a[1]" .....
and in a BEGIN block, you can set up your array arr from the variables a0 and a1 in any way you want.
Gather the data to a single var using a separator:
$ awk -v s="lineto\nstroke;476.00 26.00" ' # ; as separator
BEGIN{ n=split(s,a,";") } # split s var to a array
1 # output record
/\(Page/ && i<n { print a[++i] } # if (Page and still data in a
' file
(Page 1 of 2 )
lineto
stroke
move
(Page 1 of 3 )
476.00 26.00
"gsave""\n""2.00"" ""setlinewidth""\n"

Awk number comparsion in Bash

I'm trying to print records from line number from 10 to 15 from input file number-src. I tried with below code but it prints all records irrespective of line number.
awk '{
count++
if ( $count >= 10 ) AND ( $count <= 15 )
{
printf "\n" $0
}
}' number_src
awk is not bash just like C is not bash, they are separate tools/languages with their very own syntax and semantics:
awk 'NR>=10 && NR<=15' number_src
Get the book Effective Awk Programming, by Arnold Robbins.
Two issues why your script is not working:
logical AND should be &&.
use count as variable name, when referencing it, not $count.
Here is a working version:
awk '{
count++
if ( count >= 10 && count <= 15 )
{
print $0
}
}' numbers_src
As stated in the quickest answer, for your task NR is the awk-way to do the same task.
For further information, please see the relevant documentation entries about boolean expressions and using variables.

How can I remove selected lines with an awk script?

I'm piping a program's output through some awk commands, and I'm almost where I need to be. The command thus far is:
myprogram | awk '/chk/ { if ( $12 > $13) printf("%s %d\n", $1, $12 - $13); else printf("%s %d\n", $1, $13 - $12) } ' | awk '!x[$0]++'
The last bit is a poor man's uniq, which isn't available on my target. Given the chance the command above produces an output such as this:
GR_CB20-chk_2, 0
GR_CB20-chk_2, 3
GR_CB200-chk_2, 0
GR_CB200-chk_2, 1
GR_HB20-chk_2, 0
GR_HB20-chk_2, 6
GR_HB20-chk_2, 0
GR_HB200-chk_2, 0
GR_MID20-chk_2, 0
GR_MID20-chk_2, 3
GR_MID200-chk_2, 0
GR_MID200-chk_2, 2
What I'd like to have is this:
GR_CB20-chk_2, 3
GR_CB200-chk_2, 1
GR_HB20-chk_2, 6
GR_HB200-chk_2, 0
GR_MID20-chk_2, 3
GR_MID200-chk_2, 2
That is, I'd like to print only line that has a maximum value for a given tag (the first 'field'). The above example is representative of the at data in that the output will be sorted (as though it had been piped through a sort command).
Based on my answer to a similar need, this script keeps things in order and doesn't accumulate a big array. It prints the line with the highest value from each group.
#!/usr/bin/awk -f
{
s = substr($0, 0, match($0, /,[^,]*$/))
if (s != prevs) {
if ( FNR > 1 ) print prevline
prevval = $2
prevline = $0
}
else if ( $2 > prevval ) {
prevval = $2
prevline = $0
}
prevs = s
}
END {
print prevline
}
If you don't need the items to be in the same order they were output from myprogram, the following works:
... | awk '{ if ($2 > x[$1]) x[$1] = $2 } END { for (k in x) printf "%s %s", k, x[k] }'

Resources