awk command to convert date format in a file - shell

Given below is the file content and the awk command used:
Input file:in_t.txt
1,ABC,SSS,20-OCT-16,4,1,0,5,0,0,0,0
2,DEF,AAA,20-JUL-16,4,1,0,5,0,0,0,0
Expected outfile:
SSS|2016-10-20,5
AAA|2016-07-20,5
I tried the below command:
awk -F , '{print $3"|"$(date -d 4)","$8}' in_t.txt
Got the outfile as:
SSS|20-OCT-16,5
AAA|20-JUL-16,5
Only thing I want to know is on how to format the date with the same awk command. Tried with
awk -F , '{print $3"|"$(date -d 4)","$8 +%Y-%m-%d}' in_t.txt
Getting syntax error. Can I please get some help on this?

Better to do this in shell itself and use date -d to convert the date format:
#!/bin/bash
while IFS=',' read -ra arr; do
printf "%s|%s,%s\n" "${arr[2]}" $(date -d "${arr[3]}" '+%Y-%m-%d') "${arr[7]}"
done < file
SSS|2016-10-20,5
AAA|2016-07-20,5

What's your definition of a single command? A call to awk is a single shell command. This may be what you want:
$ awk -F'[,-]' '{ printf "%s|20%02d-%02d-%02d,%s\n", $3, $6, (match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",$5)+2)/3, $4, $10 }' file
SSS|2016-10-20,5
AAA|2016-07-20,5
BTW it's important to remember that awk is not shell. You can't call shell tools (e.g. date) directly from awk any more than you could from C. When you wrote $(date -d 4) awk saw an unset variable named date (numeric value 0) from which you extracted the value of an unset variable named d (also 0) to get the numeric result 0 which you then concatenated with the number 4 to get 04 and then applied the $ operator to to get the contents of field $04 (=$4). The output has nothing to do with the shell command date.

From Unix.com
Just tweaked it a little to suit your needs
awk -v var="20-OCT-16" '
BEGIN{
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
for (i=1; i<=12; i++) mdigit[month[i]]=i
m=toupper(substr(var,4,3))
dat="20"substr(var,8,2)"-"sprintf("%02d",mdigit[m])"-"substr(var,1,2)
print dat
}'
2016-10-20
Explanation:
Prefix 20 {20}
Substring from 8th position to 2 positions {16}
Print - {-}
Check for the month literal (converting into uppercase) and assign numbers (mdigit) {10}
Print - {-}
Substring from 1st position to 2 positions {20}

This may work for you also.
awk -F , 'BEGIN {months = " JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"}
{ num = index(months, substr($4,4,3)) / 3
if (length(num) == 1) {num = "0" num}
date = "20" substr($4,8,2) "-" num "-" substr($4,1,2)
print $3"|" date "," $8}' in_t.txt

You were close with your call to date. You can indeed use it with getline to parse and output the date value:
awk -F',' '{
parsedate="date --date="$2" +%Y-%m-%d"
parsedate | getline mydate
close(parsedate)
print $3"|"mydate","$8
}'
Explanation:
-F',' sets the field separator (delimiter) to comma
parsedate="date --date="$2" +%Y-%m-%d" leverages date's ability to convert the 2nd field to a given output format and assigns that command to the variable "parsedate"
parsedate | getline mydate runs your custom "parsedate" command, and assigns the output to the mydate variable
close (parsedate) prevents certain errors with multiline input/output (See Running a system command in AWK for discussion of getline and close())
print $3"|"mydate","$8 outputs the contents of the original line separated by pipe and comma with the new "mydate" value substituted for field 2.

Related

How to filter output lines from bash command, based on dates in start of the line?

I am getting following lines as an output of some bash pipe
output
20200604_tsv
20200605_tsv
20200606_tsv
20200706_tsv
I have a date variable in YYYYMMDD format in a variable
filter_date="20200605"
I want to apply the date operation on the output lines i.e. pick lines only where line's first part (before '_') is less than equal to filter_date.
i.e. Expected output
20200604_tsv
20200605_tsv
How to achieve this filtering in bash pipe?
I have tried following (lexicographically match the string) but not able to filter and get original names.
BASH_CMD_THAT_OUTPUT_LINES | sort | awk '{name = ($1); print name <= "20200605*"}'
## Answer
1
0
0
0
Could you please try following, written and tested with shown samples in GNU awk.
awk -v filter_date="20200605" '
BEGIN{
FS=OFS="_"
filter=mktime(substr(filter_date,1,4)" "substr(filter_date,5,2)" "substr(filter_date,7,2) " 00 00 00")}
{
curr_dat=mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2) " 00 00 00")
}
filter<curr_dat{ exit }
1
' Input_file
Explanation: Adding detailed explanation for above.
awk -v filter_date="20200605" ' ##Starting awk program from here and creating awk variable filter_date which is date set by OP till where we need to get the lines.
BEGIN{ ##Starting BEGIN section for this program from here.
FS=OFS="_" ##Setting field separator and output field separator as _ here.
filter=mktime(substr(filter_date,1,4)" "substr(filter_date,5,2)" "substr(filter_date,7,2) " 00 00 00")} ##Creating filter variable which is mktime function having sub string function in it to get value inn cpoh time for current line.
{
curr_dat=mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2) " 00 00 00") ##Creating curr_dat variable which has mktime function in it which has sub string of current line to get its epoch time for current line.
}
filter<curr_dat{ exit } ##Checking condition if filter date is lesser than current date then exit from program.
1 ##1 will print current line which will happen when current date is either lesser than or equal to current date.
' Input_file ##Mentioning Input_file name here.
Awk has the power to convert strings to numbers very easily by stripping what is redundant. Eg. The string 123_foo is converted to 123 if you add 0 to it. So the following operation would do what you request:
command | awk '($0+0 < 20200605)'
This method works excellently if you have a sortable date-format like YYYYMMDD. If you have a different format such as YYYYDDMM, you have to use different techniques by first converting the format. Eg.
command | awk '{d=substr($0,1,4)substr($0,7,2)substr($0,5,2)}(d+0 < 20200605)'
Remark that in the last solution, you have to invert your months and days in the last number: i.e. 20200605 is YYYYMMDD and not YYYYDDMM
I have found a simple way to match lexicographically.
following is test data and answer simulation
## 1. Test data
cat > /tmp/tmp_test_data <<EOF
20200605_tsv
20200607_tsv
20200604_tsv
20200718_tsv
20200606_tsv
EOF
## 2. Threshold date
check_date="20200605"
## 3. Sort, Filter and output
cat /tmp/tmp_test_data \
| sort \
| awk -v check_d=${check_date} '{
name = ($1); \
dt = (substr(name, 0, 8)); \
if (dt <= check_d) \
{print name}\
}'
Bash only:
while read line
do
[[ $line =~ ^[0-9]{8} ]] && [ ${line::8} -le 20200605 ] && echo $line
done < file # actually command | while ...

Comparing dates in awk shell

Hello I'm trying to make a script to search for specific info from a file and print it. My case is this : I have a file in the format of : id|lastname|firstname|birthday| . I want to call the script and given a date argument and the file to make it show me all the "people" born after the date I've given.
Let me show you my code :
#!/bin/bash
case $1 in
--born-since )
d=($2 +%F); # this one puts the date I've given into the variable d
grep -vE '^#' $4 | awk -F "|" ' $4 >= $d '
;;
esac
I call this script in the form of :
./script --born-since <date> -f <file>
Point is it's not doing what I want it to do. I prints wrong results.
For example in a file with 4 dates ( 1989-12-03,1984-02-18,1988-10-14,1980-02-02), given the date of 1985-05-13 it prints only the person with date 1984-02-18 which is incorrect.
It's probably comparing something else and not the date. Any advice ?
With single awk process:
awk -v d="1985-09-09" -F'|' '$4 >= d' file

How to convert date with awk

My file temp.txt
ID53,20150918,2015-09-19,,0,CENTER<br>
ID54,20150911,2015-09-14,,0,CENTER<br>
ID55,20150911,2015-09-14,,0,CENTER
I need to replace and convert the 2nd field (yyyymmdd) for seconds
I try it, but only the first line is replaced
awk -F"," '{ ("date -j -f ""%Y%m%d"" ""20150918"" ""+%s""") | getline $2; print }' OFS="," temp.txt
and tried to like this
awk -F"," '{system("date -j -f ""%Y%m%d"" "$2" ""+%s""") | getline $2; print }' temp.txt
the output is:
1442619474
sh: 0: command not found
ID53,20150918,2015-09-19,,0,CENTER
1442014674
ID54,20150911,2015-09-14,,0,CENTER
1442014674
ID55,20150911,2015-09-14,,0,CENTER
Using gsub also could not
awk -F"," '{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" ""+%s""")",$2); print}' OFS="," temp.txt
awk: syntax error at source line 1
context is
{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" >>> ""+% <<< s""")",$2); print}
awk: illegal statement at source line 1
extra )
I need the output to be so. How to?
ID53,1442619376,2015-09-19,,0,CENTER
ID54,1442014576,2015-09-14,,0,CENTER
ID55,1442014576,2015-09-14,,0,CENTER
This GNU awk script should make it. If it is not yet installed on your mac, I suggest installing macport and then GNU awk. You can also install a decent version of bash, date and other important utilities for which the default are really disappointing on OSX.
BEGIN { FS = ","; OFS = FS; }
{
y = substr($2, 1, 4);
m = substr($2, 5, 2);
d = substr($2, 7, 2);
$2 = mktime(y " " m " " d " 00 00 00");
print;
}
Put it in a file (e.g. txt2ts.awk) and process your file with:
$ awk -f txt2ts.awk data.txt
ID53,1442527200,2015-09-19,,0,CENTER<br>
ID54,1441922400,2015-09-14,,0,CENTER<br>
ID55,1441922400,2015-09-14,,0,CENTER
Note that we do not have the same timestamps. I let you try to understand where it comes from, it is another problem.
Explanations: substr(s, m, n) returns the n-characters sub-string of s that starts at position m (starting with 1). mktime("YYYY MM DD HH MM SS") converts the date string into a timestamp (seconds since epoch). FS and OFS are the input and output filed separators, respectively. The commands between the curly braces of the BEGIN pattern are executed at the beginning only while the others are executed on each line of the file.
You could use substr:
printf "%s-%s-%s", substr($6,0,4), substr($6,5,2), substr($6,7,2)
Assuming that the 6th field was 20150914, this would produce 2015-09-14

format date in file using awk

Content of the file is
Feb-01-2014 one two
Mar-02-2001 three four
I'd like to format the first field (the date) to %Y%m%d format
I'm trying to use a combination of awk and date command, but somehow this is failing even though i got the feeling i'm almost there:
cat infile | awk -F"\t" '{$1=system("date -d " $1 " +%Y%m%d");print $1"\t"$2"\t"$3}' > test
this prints out date's usage pages which makes me think that the date command is triggered properly, but there is something wrong with the argument, do you see the issue somewhere?
i'm not that familiar with awk,
You don't need date for this, its simply rearranging the date string:
$ awk 'BEGIN{FS=OFS="\t"} {
split($1,t,/-/)
$1 = sprintf("%s%02d%s", t[3], (match("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3, t[2])
}1' file
20140201 one two
20010302 three four
You can use:
while read -r a _; do
date -d "$a" '+%Y%m%d'
done < file
20140201
20010302
system() returns the exit code of the command.
Instead:
cat infile | awk -F"\t" '{"date -d " $1 " +%Y%m%d" | getline d;print d"\t"$2"\t"$3}'
$ awk '{var=system("date -d "$1" +%Y%m%d | tr -d \"\\n\"");printf "%s\t%s\t%s\n", var, $2, $3}' file
201402010 one two
200103020 three four

using system date command for data conversion using awk variables

Log file is looking like:
28 Feb 2014,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",STARTED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
28 Feb 2014,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",TERMINATED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
Desired output:
2014/02/28,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",STARTED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
2014/02/28,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",TERMINATED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
Shell Command that is doing the conversion:
date -d "28 Feb 2014" +%Y/%m/%d
Question:
How I can do this conversion using awk (later on I need to do conversions between different time zones that's why date command is the one to be used and no sed or other methods to manipulate the chars)
For now tried several options but none are working properly:
Version 1 (for some reason, date command is not run using the full awk variable and give me error so no result):
awk '
BEGIN { FS = "," }
{
while ("date -d $1 +%Y/%m/%d" | getline ddd) print ddd;
}
' _SOURCE_FILE
Version 2 (this is not working as desired but give me an extra line and add a "0" in it that is the system execution code):
awk '
BEGIN { FS = "," }
{
$1 = system("date -d \"$1\" +%Y/%m/%d")
print $0
}
' _SOURCE_FILE
Help is more than appreciated.
awk doesn't expand variables inside strings, use concatenation. There's also no need to use while when the command only produces one line of output.
"date -d " $1 "+%Y/%m/%d" | getline ddd;
$1 = ddd;
print $0;

Resources