Local vs Supplied Shell Script Arguments - bash

Writing a shell script that receives 3 arguments but within the script one of the commands needs to check the first value after a delimiter is applied
#!/bin/bash
awk -F'\t' '$1 ~ /$1/&&/$2/ {print $1FS$3}' $3
this command is called:
bash search.sh 5 AM filename.txt
And should execute as follows:
awk -F'\t' '$1 ~ /5/&&/AM/ {print $1FS$3}' filename.txt
The command functions properly outside of the shell script, returns nothing right now when using it inside the shell script.
filename.txt :
03:00:00 AM John-James Hayward Evalyn Howell Chyna Mercado
04:00:00 AM Chyna Mercado Cleveland Hanna Katey Bean
05:00:00 AM Katey Bean Billy Jones Evalyn Howell
06:00:00 AM Evalyn Howell Saima Mcdermott Cleveland Hanna
07:00:00 AM Cleveland Hanna Abigale Rich Billy Jones
Expected output:
05:00:00 AM Billy Jones

Your arguments are not being expanded by the shell when you single quote them. You can use awk variables (as suggested by #JohnKugelman
) to more clearly separate shell and awk code:
#!/bin/bash
awk -F'\t' -v p="$1" -v p2="$2" '($0 ~ p && $0 ~ p2) {print $1FS$3}' "$3"
I use generic variable names here (p and p2) to emphasize that you are not anchoring your regex so they really do match on the hole line instead of hour and am/pm as intended.

Don't embed shell variables in an awk script.
Here's a solution with some explanatory comments:
#!/bin/bash
[[ $# -lt 2 ]] && exit 1 ## two args required, plus files or piped/redirected input
hour="$(printf '%02d' "$1")" ## add a leading zero if neccesary
pm=${2^^} ## capitalise
shift 2
time="^$hour:.* $pm\$" ## match the whole time field
awk -F '\t' -v time="$time" \
'$1 ~ time {print $1,$3}' "$#" ## if it matches, print fields 1 and 3 (date, second name)
Usage is bash search.bash HOUR PM [FILE] ..., or ./search HOUR PM [FILE] ... if you make it executable. For example ./search 5 am file.txt or ./search 05 AM file.txt.
I'm assuming that every field is delimited by tabs.

This is probably what you're trying to do (untested, using any awk):
#!/usr/bin/env bash
awk -v hour="$1" -v ampm="$2" '
BEGIN {
FS = OFS = "\t"
time = sprintf("%02d:00:00", hour)
}
($1 == time) && ($2 == ampm) {
print $1, $3
}
' "${3:--}"
Note that the above would work even if your input file contained 10:00:00 AM and the arguments used were 1 AM. Some of the other solutions would fail given that as they're using a partial regexp comparison and so the arg 1 would match the 1s in input of 10:00:00, 11:00:00, or 12:00:00.

Related

Comparing dates in awk shell

Hello I'm trying to make a script to search for specific info from a file and print it. My case is this : I have a file in the format of : id|lastname|firstname|birthday| . I want to call the script and given a date argument and the file to make it show me all the "people" born after the date I've given.
Let me show you my code :
#!/bin/bash
case $1 in
--born-since )
d=($2 +%F); # this one puts the date I've given into the variable d
grep -vE '^#' $4 | awk -F "|" ' $4 >= $d '
;;
esac
I call this script in the form of :
./script --born-since <date> -f <file>
Point is it's not doing what I want it to do. I prints wrong results.
For example in a file with 4 dates ( 1989-12-03,1984-02-18,1988-10-14,1980-02-02), given the date of 1985-05-13 it prints only the person with date 1984-02-18 which is incorrect.
It's probably comparing something else and not the date. Any advice ?
With single awk process:
awk -v d="1985-09-09" -F'|' '$4 >= d' file

Counting lines in a file matching specific string

Suppose I have more than 3000 files file.gz with many lines like below. The fields are separated by commas. I want to count only the line in which the 21st field has today's date (ex:20171101).
I tried this:
awk -F',' '{if { $21 ~ "TZ=GMT+30 date '+%d-%m-%y'" } { ++count; } END { print count; }}' file.txt
but it's not working.
Using awk, something like below
awk -F"," -v toSearch="$(date '+%Y%m%d')" '$21 ~ toSearch{count++}END{print count}' file
The date '+%Y%m%d' produces the date in the format as you requested, e.g. 20170111. Then matching that pattern on the 21st field and counting the occurrence and printing it in the END clause.
Am not sure the Solaris version of grep supports the -c flag for counting the number of pattern matches, if so you can do it as
grep -c "$(date '+%Y%m%d')" file
Another solution using gnu-grep
grep -Ec "([^,]*,){20}$(date '+%Y%m%d')" file
explanation: ([^,]*,){20} means 20 fields before the date to be checked
Using awk and process substitution to uncompress a bunch of gzs and feed them to awk for analyzing and counting:
$ awk -F\, 'substr($21,1,8)==strftime("%Y%m%d"){i++}; END{print i}' * <(zcat *gz)
Explained:
substr($21,1,8) == strftime("%Y%m%d") { # if the 8 first bytes of $21 match date
i++ # increment counter
}
END { # in the end
print i # output counter
}' * <(zcat *gz) # zcat all gzs to awk
If Perl is an option, this solution works on all 3000 gzipped files:
zcat *.gz | perl -F, -lane 'BEGIN{chomp($date=`date "+%Y%m%d"`); $count=0}; $count++ if $F[20] =~ /^$date/; END{print $count}'
These command-line options are used:
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-n loop around each line of the input file
-e execute the perl code
-F autosplit modifier, in this case splits on ,
BEGIN{} executes before the main loop.
The $date and $count variables are initialized.
The $date variable is set to the result of the shell command date "+%Y%m%d"
$F[20] is the 21st element in #F
If the 21st element starts with $date, increment $count
END{} executes after the main loop
Using grep and cut instead of awk and avoiding regular expressions:
cut -f21 -d, file | grep -Fc "$(date '+%Y%m%d')"

awk command to convert date format in a file

Given below is the file content and the awk command used:
Input file:in_t.txt
1,ABC,SSS,20-OCT-16,4,1,0,5,0,0,0,0
2,DEF,AAA,20-JUL-16,4,1,0,5,0,0,0,0
Expected outfile:
SSS|2016-10-20,5
AAA|2016-07-20,5
I tried the below command:
awk -F , '{print $3"|"$(date -d 4)","$8}' in_t.txt
Got the outfile as:
SSS|20-OCT-16,5
AAA|20-JUL-16,5
Only thing I want to know is on how to format the date with the same awk command. Tried with
awk -F , '{print $3"|"$(date -d 4)","$8 +%Y-%m-%d}' in_t.txt
Getting syntax error. Can I please get some help on this?
Better to do this in shell itself and use date -d to convert the date format:
#!/bin/bash
while IFS=',' read -ra arr; do
printf "%s|%s,%s\n" "${arr[2]}" $(date -d "${arr[3]}" '+%Y-%m-%d') "${arr[7]}"
done < file
SSS|2016-10-20,5
AAA|2016-07-20,5
What's your definition of a single command? A call to awk is a single shell command. This may be what you want:
$ awk -F'[,-]' '{ printf "%s|20%02d-%02d-%02d,%s\n", $3, $6, (match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",$5)+2)/3, $4, $10 }' file
SSS|2016-10-20,5
AAA|2016-07-20,5
BTW it's important to remember that awk is not shell. You can't call shell tools (e.g. date) directly from awk any more than you could from C. When you wrote $(date -d 4) awk saw an unset variable named date (numeric value 0) from which you extracted the value of an unset variable named d (also 0) to get the numeric result 0 which you then concatenated with the number 4 to get 04 and then applied the $ operator to to get the contents of field $04 (=$4). The output has nothing to do with the shell command date.
From Unix.com
Just tweaked it a little to suit your needs
awk -v var="20-OCT-16" '
BEGIN{
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
for (i=1; i<=12; i++) mdigit[month[i]]=i
m=toupper(substr(var,4,3))
dat="20"substr(var,8,2)"-"sprintf("%02d",mdigit[m])"-"substr(var,1,2)
print dat
}'
2016-10-20
Explanation:
Prefix 20 {20}
Substring from 8th position to 2 positions {16}
Print - {-}
Check for the month literal (converting into uppercase) and assign numbers (mdigit) {10}
Print - {-}
Substring from 1st position to 2 positions {20}
This may work for you also.
awk -F , 'BEGIN {months = " JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"}
{ num = index(months, substr($4,4,3)) / 3
if (length(num) == 1) {num = "0" num}
date = "20" substr($4,8,2) "-" num "-" substr($4,1,2)
print $3"|" date "," $8}' in_t.txt
You were close with your call to date. You can indeed use it with getline to parse and output the date value:
awk -F',' '{
parsedate="date --date="$2" +%Y-%m-%d"
parsedate | getline mydate
close(parsedate)
print $3"|"mydate","$8
}'
Explanation:
-F',' sets the field separator (delimiter) to comma
parsedate="date --date="$2" +%Y-%m-%d" leverages date's ability to convert the 2nd field to a given output format and assigns that command to the variable "parsedate"
parsedate | getline mydate runs your custom "parsedate" command, and assigns the output to the mydate variable
close (parsedate) prevents certain errors with multiline input/output (See Running a system command in AWK for discussion of getline and close())
print $3"|"mydate","$8 outputs the contents of the original line separated by pipe and comma with the new "mydate" value substituted for field 2.

How to print a line which first number is bigger than given parameter?

I'm learning bash still and so I want to make a shell script called myscript.sh.
If I specify a threshold like myscript.sh 5 < text.txt
I want it to only print the lines which first collumn is bigger than 5.
My text.txt file is of this structure:
5 15:00 email#email.com
3 14:00 email#email2.com
8 13:00 email#email.com
my code is
NUMBER=$1
awk -F' ' '$1>NUMBER{ print $0 }'
but it still prints everything and if instead of NUMBER I enter any number it works perfectly
I want the output to be if I write myscript.sh 3 < text.txt
5 15:00 email#email.com
8 13:00 email#email.com
use -v option to pass variable
#!/bin/bash
awk -v NUMBER=$1 '{ if ($1> NUMBER) print $0 }'
In your code, the variable NUMBER is not set and therefore $1>NUMBER always evaluates to true for a number larger than 0. You need to pass the variable to the awk script:
awk '$1>NUMBER{ print $0 }' NUMBER="$1" data.txt

Round down to nearest 5 minutes

The date command returns the current date. I want the nearest 5 minute interval. For e.g.
# date
Thu Mar 15 16:06:42 IST 2012
In this case I want to return ...
Mar 15 16:05:00
Is it possible in the shell script? or is there any one liner for this?
Update:
the date is in this format...
2012-03-10 12:59:59
Latest update:
The following command works as expected. Thanks for the response.
head r_SERVER_2012-03-10-12-55-00 | awk -F'^' '{print $7}' | awk '{split($2, a, ":"); printf "%s %s:%02d:00\n", $1, a[1],int(a[2]/5)*5}'
Correct result:
2012-03-10 12:55:00
But I want to show other fields as well other than date. The following does not work:
head r_SERVER_2012-03-10-12-55-00 | awk -F'^' '{print $1, $2, $7, $8}' | awk '{split($2, a, ":"); printf "%s %s:%02d:00\n", $1, a[1],int(a[2]/5)*5}'
Wrong result:
565 14718:00:00
It should be ...
565 123 2012-03-10 12:55:00 country
date | awk '{split($4, a, ":"); printf "%s %s %s:%02d:00", $2, $3, a[1],int(a[2]/5)*5}'
$ date="2012-03-10 12:59:59"
$ read d h m s < <(IFS=:; echo $date)
$ printf -v var "%s %s:%d:00" $d $h $(( m-(m%5) ))
$ echo "$var"
2012-03-10 12:55:00
I use process substitution in the read command to isolate changes to IFS in a subshell. `
If you have GNU AWK available, you could use this:
| gawk '{t=mktime(gensub(/[-:]/," ","g")); print strftime("%Y-%m-%d %H:%M:%S",int(t/5)*5);}'
This uses the int() function, which truncates, which sort of means "round down". If you decide you'd prefer to "round" (i.e. go to the "nearest" 5 second increment), replace int(t/5) with int((t+2.5)/5).
Of course, if you're feeling masochistic, you can do this in pure shell. This one only truncates rather than rounding up.
[ghoti#pc ~]$ fmt="%Y-%m-%d %H:%M:%S"
[ghoti#pc ~]$ date "+$fmt"
2012-03-15 07:53:37
[ghoti#pc ~]$ date "+$fmt" | while read date; do stamp="`date -jf \"$fmt\" \"$date\" '+%s'`"; date -r `dc -e "$stamp 5/ 5* p"` "+$fmt"; done
2012-03-15 07:53:35
Note that I'm using FreeBSD. If you're using Linux, then you might need to use different options for the date command (in particular, the -r and -f options I think). I'm runninB this in bash, but it should work in pure Bourne shell if that's what you need.

Resources