using system date command for data conversion using awk variables - shell

Log file is looking like:
28 Feb 2014,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",STARTED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
28 Feb 2014,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",TERMINATED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
Desired output:
2014/02/28,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",STARTED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
2014/02/28,12:43:10,SAST,1821996800,10.144.22.91,494225040,"CONNECT",TERMINATED,0,0,0,10.144.22.91:59172,->,1.1.1.6:80
Shell Command that is doing the conversion:
date -d "28 Feb 2014" +%Y/%m/%d
Question:
How I can do this conversion using awk (later on I need to do conversions between different time zones that's why date command is the one to be used and no sed or other methods to manipulate the chars)
For now tried several options but none are working properly:
Version 1 (for some reason, date command is not run using the full awk variable and give me error so no result):
awk '
BEGIN { FS = "," }
{
while ("date -d $1 +%Y/%m/%d" | getline ddd) print ddd;
}
' _SOURCE_FILE
Version 2 (this is not working as desired but give me an extra line and add a "0" in it that is the system execution code):
awk '
BEGIN { FS = "," }
{
$1 = system("date -d \"$1\" +%Y/%m/%d")
print $0
}
' _SOURCE_FILE
Help is more than appreciated.

awk doesn't expand variables inside strings, use concatenation. There's also no need to use while when the command only produces one line of output.
"date -d " $1 "+%Y/%m/%d" | getline ddd;
$1 = ddd;
print $0;

Related

unix shell script to get nth business day

Referencing the solution posted on this unix.com thread for getting the Nth business day of the month, I tried to get the 16th business day of the month using the following code, but it doesn't work.
currCal=`/usr/bin/cal`
BUSINESS_DAYS=`echo $($currCal|nawk 'NR>2 {print substr($0,4,14)}' |tr "\n" " ")`
The error when executing this is:
nawk: syntax error at source line 1 context is
NR>2 {print >>> substr(test. <<< sh,4,14)}
nawk: illegal statement at source line 1
I'm guessing it takes $0 as the script name, causing the syntax error. Please help.
There seem to be a few issues with what you have above.
First, I agree with #John1024 that in order to get the nawk error you've posted, you must actually be running:
BUSINESS_DAYS=`echo $($currCal|nawk "NR>2 {print substr($0,4,14)}" |tr "\n" " ")`
with double quotes around the nawk script.
Furthermore, once you resolve the nawk error, you're going to run into issues with how you are using currCal. You get the actual output of the cal command into the currCal variable, but then are using the variable value (that is the output of cal) as a command before the | rather than echoing it into the pipe or something similar.
This brings up an additional question of why you're using echo on the result of a subshell command (the $() part) within another subshell (the outer ``s).
Finally, the two lines you show above only get a list of the business days in the current month into the BUSINESS_DAYS variable. They do not output/save the 16th such day.
Taking all of the above into consideration (and also changing to use the $() subshell syntax consistently), you might want one of the following invocations:
If you really need to cache the current month's calendar and want to pull multiple days:
currCal="$(/usr/bin/cal)"
BUSINESS_DAYS="$(echo "${currCal}" | \
nawk 'NR>2 {print substr($0,4,14)}' | \
tr "\n" " ")"
DAY=16
DAYTH_DAY="$(echo "${BUSINESS_DAYS}" | nawk -v "day=${DAY}" '{ print $day }')
If this is just a one-and-done:
DAY=16
DAYTH_DAY="$(/usr/bin/cal | \
nawk 'NR>2 {print substr($0,4,14)}' | \
tr "\n" " " | \
nawk -v "day=${DAY}" '{ print $day }')"
One more note: the processing here can be simplified if done entirely in awk(/nawk), but I wanted to stick to the basic framework you had already chosen.
Update per the request in the comment:
A pure POSIX awk version:
DAY=16
DAYTH="$(cal | awk -v "day=${DAY}" '
(NR < 3) { next ; }
/^.[0-9 ]/ { $1="" ; }
/^ / || (NF == 7) { $NF="" ; }
{ hold=hold $0 ; }
END { split(hold,arr," ") ; print arr[day] ; }')"
Yes, simplified is a matter of opinion, and I'm sure someone can make this more concise. Explanation of how this works:
Skip the header of the cal output:
(NR < 3) { next ; }
For weeks that have a date on the Sunday, trim the date of that Sunday:
/^.[0-9 ]/ { $1="" ; }
For weeks that start after Sunday (first week of a month) or weeks that have a full seven days, trim the date of Saturday for that week:
/^ / || (NF == 7) { $NF="" ; }
Once the lines only have the dates of weekdays, curry them into hold:
{ hold=hold $0 ; }
At the end, split hold on spaces so we can grab the Nth day:
END { split(hold,arr," ") ; print arr[day] ; }')"
No awk, just software tools:
set -- $(cal -h | rev | cut --complement -b-5,20- | rev | tail -n +3) ; \
shift 15 ; echo $1
Output:
22
The output of cal is tricky to parse because:
It's right justified.
It's space delimited.
One or two digit dates means two or one delimiting spaces.
More leading spaces for first days of month.
Parsing won't quite work without the -h option, (turn off 'today' highlighting).

awk command to convert date format in a file

Given below is the file content and the awk command used:
Input file:in_t.txt
1,ABC,SSS,20-OCT-16,4,1,0,5,0,0,0,0
2,DEF,AAA,20-JUL-16,4,1,0,5,0,0,0,0
Expected outfile:
SSS|2016-10-20,5
AAA|2016-07-20,5
I tried the below command:
awk -F , '{print $3"|"$(date -d 4)","$8}' in_t.txt
Got the outfile as:
SSS|20-OCT-16,5
AAA|20-JUL-16,5
Only thing I want to know is on how to format the date with the same awk command. Tried with
awk -F , '{print $3"|"$(date -d 4)","$8 +%Y-%m-%d}' in_t.txt
Getting syntax error. Can I please get some help on this?
Better to do this in shell itself and use date -d to convert the date format:
#!/bin/bash
while IFS=',' read -ra arr; do
printf "%s|%s,%s\n" "${arr[2]}" $(date -d "${arr[3]}" '+%Y-%m-%d') "${arr[7]}"
done < file
SSS|2016-10-20,5
AAA|2016-07-20,5
What's your definition of a single command? A call to awk is a single shell command. This may be what you want:
$ awk -F'[,-]' '{ printf "%s|20%02d-%02d-%02d,%s\n", $3, $6, (match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",$5)+2)/3, $4, $10 }' file
SSS|2016-10-20,5
AAA|2016-07-20,5
BTW it's important to remember that awk is not shell. You can't call shell tools (e.g. date) directly from awk any more than you could from C. When you wrote $(date -d 4) awk saw an unset variable named date (numeric value 0) from which you extracted the value of an unset variable named d (also 0) to get the numeric result 0 which you then concatenated with the number 4 to get 04 and then applied the $ operator to to get the contents of field $04 (=$4). The output has nothing to do with the shell command date.
From Unix.com
Just tweaked it a little to suit your needs
awk -v var="20-OCT-16" '
BEGIN{
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
for (i=1; i<=12; i++) mdigit[month[i]]=i
m=toupper(substr(var,4,3))
dat="20"substr(var,8,2)"-"sprintf("%02d",mdigit[m])"-"substr(var,1,2)
print dat
}'
2016-10-20
Explanation:
Prefix 20 {20}
Substring from 8th position to 2 positions {16}
Print - {-}
Check for the month literal (converting into uppercase) and assign numbers (mdigit) {10}
Print - {-}
Substring from 1st position to 2 positions {20}
This may work for you also.
awk -F , 'BEGIN {months = " JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"}
{ num = index(months, substr($4,4,3)) / 3
if (length(num) == 1) {num = "0" num}
date = "20" substr($4,8,2) "-" num "-" substr($4,1,2)
print $3"|" date "," $8}' in_t.txt
You were close with your call to date. You can indeed use it with getline to parse and output the date value:
awk -F',' '{
parsedate="date --date="$2" +%Y-%m-%d"
parsedate | getline mydate
close(parsedate)
print $3"|"mydate","$8
}'
Explanation:
-F',' sets the field separator (delimiter) to comma
parsedate="date --date="$2" +%Y-%m-%d" leverages date's ability to convert the 2nd field to a given output format and assigns that command to the variable "parsedate"
parsedate | getline mydate runs your custom "parsedate" command, and assigns the output to the mydate variable
close (parsedate) prevents certain errors with multiline input/output (See Running a system command in AWK for discussion of getline and close())
print $3"|"mydate","$8 outputs the contents of the original line separated by pipe and comma with the new "mydate" value substituted for field 2.

How to convert date with awk

My file temp.txt
ID53,20150918,2015-09-19,,0,CENTER<br>
ID54,20150911,2015-09-14,,0,CENTER<br>
ID55,20150911,2015-09-14,,0,CENTER
I need to replace and convert the 2nd field (yyyymmdd) for seconds
I try it, but only the first line is replaced
awk -F"," '{ ("date -j -f ""%Y%m%d"" ""20150918"" ""+%s""") | getline $2; print }' OFS="," temp.txt
and tried to like this
awk -F"," '{system("date -j -f ""%Y%m%d"" "$2" ""+%s""") | getline $2; print }' temp.txt
the output is:
1442619474
sh: 0: command not found
ID53,20150918,2015-09-19,,0,CENTER
1442014674
ID54,20150911,2015-09-14,,0,CENTER
1442014674
ID55,20150911,2015-09-14,,0,CENTER
Using gsub also could not
awk -F"," '{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" ""+%s""")",$2); print}' OFS="," temp.txt
awk: syntax error at source line 1
context is
{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" >>> ""+% <<< s""")",$2); print}
awk: illegal statement at source line 1
extra )
I need the output to be so. How to?
ID53,1442619376,2015-09-19,,0,CENTER
ID54,1442014576,2015-09-14,,0,CENTER
ID55,1442014576,2015-09-14,,0,CENTER
This GNU awk script should make it. If it is not yet installed on your mac, I suggest installing macport and then GNU awk. You can also install a decent version of bash, date and other important utilities for which the default are really disappointing on OSX.
BEGIN { FS = ","; OFS = FS; }
{
y = substr($2, 1, 4);
m = substr($2, 5, 2);
d = substr($2, 7, 2);
$2 = mktime(y " " m " " d " 00 00 00");
print;
}
Put it in a file (e.g. txt2ts.awk) and process your file with:
$ awk -f txt2ts.awk data.txt
ID53,1442527200,2015-09-19,,0,CENTER<br>
ID54,1441922400,2015-09-14,,0,CENTER<br>
ID55,1441922400,2015-09-14,,0,CENTER
Note that we do not have the same timestamps. I let you try to understand where it comes from, it is another problem.
Explanations: substr(s, m, n) returns the n-characters sub-string of s that starts at position m (starting with 1). mktime("YYYY MM DD HH MM SS") converts the date string into a timestamp (seconds since epoch). FS and OFS are the input and output filed separators, respectively. The commands between the curly braces of the BEGIN pattern are executed at the beginning only while the others are executed on each line of the file.
You could use substr:
printf "%s-%s-%s", substr($6,0,4), substr($6,5,2), substr($6,7,2)
Assuming that the 6th field was 20150914, this would produce 2015-09-14

Insert a date in a column using awk

I'm trying to format a date in a column of a csv.
The input is something like: 28 April 1966
And I'd like this output: 1966-04-28
which can be obtain with this code:
date -d "28 April 1966" +%F
So now I thought of mixing awk and this code to format the entire column but I can't find out how.
Edit :
Example of input : (separators "|" are in fact tabs)
1 | 28 April 1966
2 | null
3 | null
4 | 30 June 1987
Expected output :
1 | 1966-04-28
2 | null
3 | null
4 | 30 June 1987
A simple way is
awk -F '\\| ' -v OFS='| ' '{ cmd = "date -d \"" $3 "\" +%F 2> /dev/null"; cmd | getline $3; close(cmd) } 1' filename
That is:
{
cmd = "date -d \"" $3 "\" +%F 2> /dev/null" # build shell command
cmd | getline $3 # run, capture output
close(cmd) # close pipe
}
1 # print
This works because date doesn't print anything to its stdout if the date is invalid, so the getline fails and $3 is not changed.
Caveats to consider:
For very large files, this will spawn a lot of shells and processes in those shells (one each per line). This can become a noticeable performance drag.
Be wary of code injection. If the CSV file comes from an untrustworthy source, this approach is difficult to defend against an attacker, and you're probably better off going the long way around, parsing the date manually with gawk's mktime and strftime.
EDIT re: comment: To use tabs as delimiters, the command can be changed to
awk -F '\t' -v OFS='\t' '{ cmd = "date -d \"" $3 "\" +%F 2> /dev/null"; cmd | getline $3; close(cmd) } 1' filename
EDIT re: comment 2: If performance is a worry, as it appears to be, spawning processes for every line is not a good approach. In that case, you'll have to do the parsing manually. For example:
BEGIN {
OFS = FS
m["January" ] = 1
m["February" ] = 2
m["March" ] = 3
m["April" ] = 4
m["May" ] = 5
m["June" ] = 6
m["July" ] = 7
m["August" ] = 8
m["September"] = 9
m["October" ] = 10
m["November" ] = 11
m["December" ] = 12
}
$3 !~ /null/ {
split($3, a, " ")
$3 = sprintf("%04d-%02d-%02d", a[3], m[a[2]], a[1])
}
1
Put that in a file, say foo.awk, and run awk -F '\t' -f foo.awk filename.csv.
This should work with your given input
awk -F'\\|' -vOFS="|" '!/null/{cmd="date -d \""$3"\" +%F";cmd | getline $3;close(cmd)}1' file
Output
| 1 |1966-04-28
| 2 | null
| 3 | null
| 4 |1987-06-30
I would suggest using a language that supports parsing dates, like perl:
$ cat file
1 28 April 1966
2 null
3 null
4 30 June 1987
$ perl -F'\t' -MTime::Piece -lane 'print "$F[0]\t",
$F[1] eq "null" ? $F[1] : Time::Piece->strptime($F[1], "%d %B %Y")->strftime("%F")' file
1 1966-04-28
2 null
3 null
4 1987-06-30
The Time::Piece core module allows you to parse and format dates, using the standard format specifiers of strftime. This solution splits the input on a tab character and modifies the format if the second field is not "null".
This approach will be much faster than using system calls or invoking subprocesses, as everything is done in native perl.
Here is how you can do this in pure BASH and avoid calling system or getline from awk:
while IFS=$'\t' read -ra arr; do
[[ ${arr[1]} != "null" ]] && arr[1]=$(date -d "${arr[1]}" +%F)
printf "%s\t%s\n" "${arr[0]}" "${arr[1]}"
done < file
1 1966-04-28
2 null
3 null
4 1987-06-30
Only one date call and no code injection problem is possible, see the following:
This script extracts the dates (using awk) into a temporary file processes them with one "date" call and merges the results back (using awk).
Code
awk -F '\t' 'match($3,/null/) { $3 = "0000-01-01" } { print $3 }' input > temp.$$
date --file=temp.$$ +%F > dates.$$
awk -F '\t' -v OFS='\t' 'BEGIN {
while ( getline < "'"dates.$$"'" > 0 )
{
f1_counter++
if ($0 == "0000-01-01") {$0 = "null"}
date[f1_counter] = $0
}
}
{$3 = date[NR]}
1' input.$$
One-liner using bash process redirections (no temporary files):
inputfile=/path/to/input
awk -F '\t' -v OFS='\t' 'BEGIN {while ( getline < "'<(date -f <(awk -F '\t' 'match($3,/null/) { $3 = "0000-01-01" } { print $3 }' "$inputfile") +%F)'" > 0 ){f1_counter++; if ($0 == "0000-01-01") {$0 = "null"}; date[f1_counter] = $0}}{$3 = date[NR]}1' "$inputfile"
Details
here is how it can be used:
# configuration
input=/path/to/input
temp1=temp.$$
temp2=dates.$$
output=output.$$
# create the sample file (optional)
#printf "\t%s\n" $'1\t28 April 1966' $'2\tnull' $'3\tnull' $'4\t30 June 1987' > "$input"
# Extract all dates
awk -F '\t' 'match($3,/null/) { $3 = "0000-01-01" } { print $3 }' "$input" > "$temp1"
# transform the dates
date --file="$temp1" +%F > "$temp2"
# merge csv with transformed date
awk -F '\t' -v OFS='\t' 'BEGIN {while ( getline < "'"$temp2"'" > 0 ){f1_counter++; if ($0 == "0000-01-01") {$0 = "null"}; date[f1_counter] = $0}}{$3 = date[NR]}1' "$input" > "$output"
# print the output
cat "$output"
# cleanup
rm "$temp1" "$temp2" "$output"
#rm "$input"
Caveats
Using "0000-01-01" as a temporary placeholder for invalid (null) dates
The code should be faster than other methods calling "date" a lot of times, but it reads the input file two times.

Sed print with evaluated date command on back reference

I have a file (as one often does) with dates in *nix time as seconds from the Epoch, followed by a message and a final "thread" field I am wanting to select. All separated with a '|' as exported from a sqlite DB...
e.g
1306003700|SENT|21
1277237887|SENT|119
1274345263|SENT|115
1261168663|RECV|21
1306832459|SENT|80
1306835346|RECV|80
Basically, I can use sed easily enough to select and print lines that match the "thread" field and print the respective times with messages, thus:
> cat file | sed -n "s/^\([0-9]*\)\|\(.*\)\|80$/\1 : \2/p"
1306832459 : SENT
1306835346 : RECV
But what I really want to do is also pass the time field through the unix date command, so:
> cat file | sed -n "s/^\([0-9]*\)\|\(.*\)\|80$/`date -r \1` : \2/p"
But this doesn't seem to work - even though it seems to accept it. It just prints out the same (start of Epoch) date:
Thu 1 Jan 1970 01:00:01 BST : SENT
Thu 1 Jan 1970 01:00:01 BST : RECV
How/can I evaluate/interpolate the back reference \1 to the date command?
Maybe sed isn't the way to match these lines (and format the output in one go)...
awk is perfect for this.
awk -F"|" '$3 == '80' { print system("date -r " $1), ":", $2 }' myfile.txt
Should work.(Can't guarantee that the system call is right though, didn't test it)
This pure bash
wanted=80
(IFS=\|; while read sec message thread
do
[[ $thread == $wanted ]] && echo $(date -r $sec) : $message
done) < datafile.txt
print
Tue May 31 11:00:59 CEST 2011 : SENT
Tue May 31 11:49:06 CEST 2011 : RECV
You can quote variables in " " for the better safety...
Perl is handy here:
perl -MPOSIX -F'\|' -lane '
next unless $F[2] == "80";
print(strftime("%Y-%m-%d %T", localtime $F[0]), " : ", $F[1])
' input.file
This might work for you:
sed -n 's/^\([0-9]*\)|\(.*\)|80$/echo "$(date -d #\1) : \2"/p' file | sh
or if you have GNU sed:
sed -n 's/^\([0-9]*\)|\(.*\)|80$/echo "$(date -d #\1) : \2"/ep' file
Using awk:
$(awk -F'|' '/80$/{printf("echo $(date -d #%s) : %s;",$1,$2);}' /path/to/file)
The date command will be executed before the sed command.
It may be easiest to break out perl or python for this job, or you can use some kind of bash loop otherwise.
Inspired by Rafe's answer above:
awk -F"|" '$3 == '80' { system("date -d #" $1 " | tr -d \"\n\""); print " :", $2 }' myfile.txt
For my version of date, it seems that -d is the argument, instead of -r, and that it requires a leading #. The other change here is that the system call executes (so date actually does the printing, newline and all - hence the tr) and returns the exit code. We don't really care to print the exit code (0), so we move the system call outside of any awk prints.

Resources