Having trouble with simple awk script - bash

This works in the command line:
awk -F, '$5>1900 {print $2}' presidents.csv
But I am not sure how to do it in an awk script.
This is what I have tried so far in the script.
BEGIN {FS=","}
if ($5>1900)
{print $2}

The stuff you have in single quotes:
awk -F, '$5>1900 {print $2}' presidents.csv
Is an AWK script. Just stick that in a file:
BEGIN {FS=","}
$5>1900 {print $2}

AWK requires a pattern to search for and an action to be taken when it finds the pattern. Either pattern or action can be omitted but not both.
In your case first its a syntax mistake as you haven't enclosed your action within braces ( {} ) and so this will cause AWK to think both pattern and action got omitted. We can use relational expression as pattern in AWK and so instead of using an if statement you can use the relational expression as below.
$ cat test.awk
#!/usr/bin/awk -f
BEGIN {
FS=",";
}
$5 >= 1900 {
print $2;
}

AWk script would be like,
#!/usr/bin/awk
BEGIN {FS=",";}
{
if ($5>1900)
{print $2;}
}
That is, you need to enclose the if condition inside curly braces.

If you'd like a shell script that runs your awk command you can put the following into a file:
#!/bin/bash
awk -F, '$1>1900 {print $2}' $1
To make it so you can actually execute the script, you can run:
chmod +x ./script_file_name_here
Then you can run
./script_file_name_here presidents.csv
and have the same thing happen.

Related

delete all line after a specific date

I have a lot of *.csv files. I want to delete the content after a specific line. I will delete all lines after 20031231
How do I solve this problem with some lines of a shell script?
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
Test,20040101,000100,0.73342,0.744318
quick and dirty but without any other info about constraint
sed '1,/20031231/p;d' YourFile
If you want to use a shell script, the best is to use awk. This will do the trick:
awk 'BEGIN {FS=","} {if ($2 == "20031231") print $0}' input.csv > output.csv
This code will write to a different file only the lines that have 20031231.
ignores empty lines and unmatched data
awk file:
$ cat awk.awk
{
if($2<="20031231" && $0!=""){
print $0
}else{
next
}
}
execution:
$ awk -F',' -f awk.awk input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
one liner:
$ awk -F',' '{if($2<="20031231" && $0!=""){print $0}else{next}}' input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
with Miller (http://johnkerl.org/miller/doc/)
mlr --nidx --fs "," filter '$2>20031231' input
gives you
Test,20040101,000100,0.73342,0.744318
With awk please try:
awk -F, '$2<=20031231' input.csv

Why awk if conditional matching is wrong

In my project, I have two files.
The content userid is :
6534
4524
4522
6635
The content userpwinfo.txt is:
nsgg315_RJ:x:4520:100::/home-gg/users/nsgg315_RJ:/bin/bash
nsgg316_ZJY:x:4521:100::/home-gg/users/nsgg316_ZJY:/bin/bash
nsgg317_CPA:x:4522:100::/home-gg/users/nsgg317_CPA:/bin/bash
nsgg318_ZRL:x:4523:100::/home-gg/users/nsgg318_ZRL:/bin/bash
nsgg319_YYM:x:4524:100::/home-gg/users/nsgg319_YYM:/bin/bash
Now I want to print the username which id is in userid. I writed a bash shell like:
for i in $(cat userid)
do
#username=`awk -F: '{if($3=="$i") print $1}' /root/userpwinfo.txt`
#username=`awk -F: '$3=="$i" {print $1}' /root/userpwinfo.txt`
#username=`awk -F: '{if($3~/$i/) print $1}' /root/userpwinfo.txt`
username=`awk -F: '{if($3==$i) print $1}' /root/userpwinfo.txt`
echo $username
done
But unlucky, it shows nothing. The correct result should be:
nsgg319_YYM
nsgg317_CPA
I have tried in command line:
awk -F: '{if($3==4524) print $1}' /root/userpwinfo.txt
It is OK
Maybe if($3==$i) is wrong in shell, Who can help me?
Your $i is the shell variable, but it's inside the quotation mark ' so awk will try to interpret it instead of the shell.
Try this:
username=`awk -F: '{if($3=='$i') print $1}' /root/userpwinfo.txt`
Note that the $i is between ' marks, meaning it's outside of the block that will be interpreted by awk, meaning it should be interpreted by the shell.
Also note that if you have an empty line in the input file, your awk command would be if($3==) which is invalid and will yield an error.
I'd like to comment also that awk is meant to have a filter and an execution block. You shouldn't need to write an if inside a block, unless you want something unusual. Meaning, your command would be more appropriately written as:
username=`awk -F: '($3=='$i'){print $1}' /root/userpwinfo.txt`
Note that even this is not a very good solution, but you already have much to think about with only these changes. When you're more familiar with awk or getting more professional, come back and check the comments. ;)
If username is what you needed using the 2 files, you could try
$ cat userpwinfo.txt
nsgg315_RJ:x:4520:100::/home-gg/users/nsgg315_RJ:/bin/bash
nsgg316_ZJY:x:4521:100::/home-gg/users/nsgg316_ZJY:/bin/bash
nsgg317_CPA:x:4522:100::/home-gg/users/nsgg317_CPA:/bin/bash
nsgg318_ZRL:x:4523:100::/home-gg/users/nsgg318_ZRL:/bin/bash
nsgg319_YYM:x:4524:100::/home-gg/users/nsgg319_YYM:/bin/bash
$ cat userid.txt
6534
4524
4522
6635
$ awk -F":" ' { if( NR==FNR ) { a[$3]=$1; next } ; if(a[$1]) print a[$1] }' userpwinfo.txt userid.txt
nsgg319_YYM
nsgg317_CPA

AWK alias not printing

The below awk command (copied and pasted from stackoverflow) works fine from the command line but doesnt print anything when aliased
awk '/WORD/ {print $3}' log.log | awk 'BEGIN{c=0} length($0){a[c]=$0;c++}END{p5=(c/100*5); p5=p5%1?int(p5)+1:p5; print a[c-p5-1]}'
alias getperc="awk '/WORD/ {print \$3}' log.log | awk 'BEGIN{c=0} length(\$0){a[c]=$0;c++}END{p5=(c/100*5); p5=p5%1?int(p5)+1:p5; print a[c-p5-1]}'"
I am fairly new to using bash. What am I missing here?
Don't use aliases. They require an additional layer of quoting, which is troublesome (as here), and they prevent you from being able to usefully parameterize or add conditional logic to your code.
A simple transliteration to a function is:
getperc() { awk '/WORD/ {print $3}' log.log | awk 'BEGIN{c=0} length($0){a[c]=$0;c++}END{p5=(c/100*5); p5=p5%1?int(p5)+1:p5; print a[c-p5-1]}'; }
A slightly more capable one, which will still use log.log by default, but which will also let you provide an alternate input file name (as in getperc alternate.log) or pipe to your function (as in cat alternate.log | getperc):
getperc() {
[[ -t 0 || $1 ]] || set -- - # use "-" (stdin) as input file if not a TTY
# ...this will let you pipe to your function.
awk '/WORD/ {print $3}' "${1:-log.log}" | awk 'BEGIN{c=0} length($0){a[c]=$0;c++}END{p5=(c/100*5); p5=p5%1?int(p5)+1:p5; print a[c-p5-1]}'
}
I think there is confusion by bash regarding $3 and $0 it thinks they are argument of the alias. you can verify this by
try this in bash
alias ech="echo {print \$3}"
it will print just
{print }
but now try
alias ech="echo {print \$\3}"
it will print what you expected
{print $3}
Let me know if this solves your problem

passing for loop index into awk

I am trying to pass a for loop index i into awk but keep getting unexpected token awk errors.
First I tried using the -v option within awk:
for i in "${myarray}"
awk -v var=$i '/var/{print}' myfile.dat
done
I also tried calling the variable directly using single quotes:
for i in "${myarray}"
awk '/'"$i"'/{print}' myfile.dat
done
My end goal is to learn how to pass a for loop index variable through awk as the search pattern. I'd like the above code to search through myfile.dat and print lines which contain the strings in myarray.
There are 2 problems:
Array traversing should be like this for i in "${myarray[#]}"
awk treats text between /.../ as regex literal, to use a variable use $0 ~ var.
Your code should be:
for i in "${myarray[#]}"; do
awk -v var="$i" '$0 ~ var' myfile.dat
done
{print} is default action in awk that you can omit as shown above.
you can do the same loop free as well, e.g.,
echo "${myarray[#]}" | tr ' ' '|' | awk 'NR==FNR{pat=$0; next} $0 ~ pat' - file

Grouping command substitution without double quotes? "$()" without the ""

I'm writing a script that involves generating Awk programs and running them via awk $(...), as in
[lynko#hephaestus] ~ % awk $(echo 'BEGIN { print "hello!" }')
The generated program is going to be more complicated in the end, but first I want to make sure this is possible. In the past I've done
[lynko#hephaestus] ~ % program=$(echo 'BEGIN { print "hello" }')
[lynko#hephaestus] ~ % awk "$program"
hello!
where the grouping is unsurprising. But the first example (under GNU awk, which gives a more helpful error message than mawk which is default on my other machine) gives
[lynko#hephaestus] ~ % awk $(echo 'BEGIN { print "hello!" }')
awk: cmd. line:1: BEGIN blocks must have an action part
presumably because this is executed as awk BEGIN { print "hello!" } rather than awk 'BEGIN { print "hello!" }'. Is there a way I can force $(...) to remain as one group? I'd rather not use "$()" since I'd have to escape all the double-quotes in the program generator.
I'm running Bash 4.2.37 and mawk 1.3.3 on Crunchbang Waldorf.
Put quotes around it. You don't need to escape the double quotes inside it:
awk "$(echo 'BEGIN { print "hello!" }')"
I'm also wondering why you are using an echo statement. Awk doesn't need one.
awk 'BEGIN { print "Awk SQUAWK!" }'
That will work perfectly.

Resources