Sed print with evaluated date command on back reference - bash

I have a file (as one often does) with dates in *nix time as seconds from the Epoch, followed by a message and a final "thread" field I am wanting to select. All separated with a '|' as exported from a sqlite DB...
e.g
1306003700|SENT|21
1277237887|SENT|119
1274345263|SENT|115
1261168663|RECV|21
1306832459|SENT|80
1306835346|RECV|80
Basically, I can use sed easily enough to select and print lines that match the "thread" field and print the respective times with messages, thus:
> cat file | sed -n "s/^\([0-9]*\)\|\(.*\)\|80$/\1 : \2/p"
1306832459 : SENT
1306835346 : RECV
But what I really want to do is also pass the time field through the unix date command, so:
> cat file | sed -n "s/^\([0-9]*\)\|\(.*\)\|80$/`date -r \1` : \2/p"
But this doesn't seem to work - even though it seems to accept it. It just prints out the same (start of Epoch) date:
Thu 1 Jan 1970 01:00:01 BST : SENT
Thu 1 Jan 1970 01:00:01 BST : RECV
How/can I evaluate/interpolate the back reference \1 to the date command?
Maybe sed isn't the way to match these lines (and format the output in one go)...

awk is perfect for this.
awk -F"|" '$3 == '80' { print system("date -r " $1), ":", $2 }' myfile.txt
Should work.(Can't guarantee that the system call is right though, didn't test it)

This pure bash
wanted=80
(IFS=\|; while read sec message thread
do
[[ $thread == $wanted ]] && echo $(date -r $sec) : $message
done) < datafile.txt
print
Tue May 31 11:00:59 CEST 2011 : SENT
Tue May 31 11:49:06 CEST 2011 : RECV
You can quote variables in " " for the better safety...

Perl is handy here:
perl -MPOSIX -F'\|' -lane '
next unless $F[2] == "80";
print(strftime("%Y-%m-%d %T", localtime $F[0]), " : ", $F[1])
' input.file

This might work for you:
sed -n 's/^\([0-9]*\)|\(.*\)|80$/echo "$(date -d #\1) : \2"/p' file | sh
or if you have GNU sed:
sed -n 's/^\([0-9]*\)|\(.*\)|80$/echo "$(date -d #\1) : \2"/ep' file

Using awk:
$(awk -F'|' '/80$/{printf("echo $(date -d #%s) : %s;",$1,$2);}' /path/to/file)

The date command will be executed before the sed command.
It may be easiest to break out perl or python for this job, or you can use some kind of bash loop otherwise.

Inspired by Rafe's answer above:
awk -F"|" '$3 == '80' { system("date -d #" $1 " | tr -d \"\n\""); print " :", $2 }' myfile.txt
For my version of date, it seems that -d is the argument, instead of -r, and that it requires a leading #. The other change here is that the system call executes (so date actually does the printing, newline and all - hence the tr) and returns the exit code. We don't really care to print the exit code (0), so we move the system call outside of any awk prints.

Related

replacing timestamp with date using sed

When using SED to replace timestamps with human readable dates the timestamp is always replaced with the epoch date plus the value placed after"\"
I have this working with Perl example but prefer to use sed. I have tried varying escape sequences and quoting "'` etc.
sed -re "s/([0-9]{10})/$(date -d #\1)/g" mac.txt
input (one string):
834|task|3||1561834555|Ods|12015|info|Task HMI starting 837|task|3||1561834702|Nailsd|5041|info|Configured with engine 6000.8403 (/opt/NAI/LinuxShield/engine/lib/liblnxfv.so), dats 9297.0000 (/opt/NAI/LinuxShield/engine/dat), 197 extensions, 0 extra drivers
Expect date conversion but results are:
834|task|3||Wed Dec 31 19:00:01 EST 1969|Ods|12015|info|Task HMI starting 837|task|3||Wed Dec 31 19:00:01 EST 1969|Nailsd|5041|info|Configured with engine 6000.8403 (/opt/NAI/LinuxShield/engine/lib/liblnxfv.so), dats 9297.0000 (/opt/NAI/LinuxShield/engine/dat), 197 extensions, 0 extra drivers 838|task.
basically:
this is what is being called:
$(date -d #\1) instead of $(date -d #\1561834555)
sed never sees the $(date -d #\1) -- the shell has executed that command substitution before sed launches.
You could something like this:
sed -Ee 's/([0-9]{10})/$(date -d #\1)/g' -e 's/^/echo "/' -e 's/$/"/' mac.txt | sh
(note the single quotes, preventing the shell from doing any expansions)
However, it's much more sensible to use a language that has date facilities built-in. GNU awk:
gawk -F '|' -v OFS='|' '{
for (i=1; i<=NF; i++)
if ($i ~ /^[0-9]{10}$/)
$i = strftime("%c", $i)
print
}' mac.txt
Perl will probably be installed:
perl -MPOSIX=strftime -pe 's{\b(\d{10})\b}{ strftime("%c", localtime $1) }ge' mac.txt
# or, this is more precise as it only substitutes *fields* that consist of a timestamp
perl -MPOSIX=strftime -F'\|' -lape '$_ = join "|", map { s/^(\d{10})$/ strftime("%c", localtime $1) /ge; $_ } #F' mac.txt

Convert date to timestamp in bash (with miliseconds)

I have CSV file in the following format
20170102 00:00:00.803,
20170102 00:00:01.265,
20170102 00:00:05.818,
I've managed to add slashes with
sed -r 's#(.{4})(.{2})(.{2})(.{2})(.{2})#\1/\2/\3 \4:\5:#' file.csv > newfile.csv
as below, to enable coversion to timestamp
2017/01/02 0:0::00:00.803
2017/01/02 0:0::00:01.265
2017/01/02 0:0::00:05.818
But after using
cat newfile.csv | while read line ; do echo $line\;$(date -d "$t" "+%s%N") ; done > nextfile.csv
I got :
2017/01/02 0:0::00:00.803,1499727600000000000
2017/01/02 0:0::00:01.265,1499727600000000000
2017/01/02 0:0::00:05.818,1499727600000000000
There's probably something wrong my data, but I'm too much of a beginner to be able to get missing values. It would be very much appreciated if you could drop me some sed/awk magic.Thanks!
EDIT: I need to have a timestamp with miliseconds, but all I got for now is just zeros (how typical)
Not sure if this is what you are after but you could just parse the output without date to form the date stamp.
awk '{ print substr($0,1,4)"/"substr($0,5,2)"/"substr($0,7,2)" "substr($0,10,2)":"substr($0,13,2)":"substr($0,16) }' dates.csv
We use awk to pull out the extract of the line concerning day, month, year etc (substr function) and then use print to output the data in the required format.
gawk solution:
awk -F',' '{ match($1,/^([0-9]{4})([0-9]{2})([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})/,a);
print mktime(sprintf("%d %d %d %d %d %d",a[1],a[2],a[3],a[4],a[5],a[6]))*1000 + a[7] }' file.csv
The output:
1483308000803
1483308001265
1483308005818
The original format is accepted by date as time stamp. You need not sed it. I believe you need "date,milliseconds since 1970-01-01 00:00:00 UTC" in your output. Try this in bash.
generateoutput.sh
#!/bin/bash
while read -r line
do
echo -n $line,
echo `date -d "$line" "+%s%N"` / 1000000 | bc
done < <(sed 's/,//g' $1)
where timestamp.csv is your file in original format.
bc - basic calculator, convert nanoseconds to milliseconds
Parsing large files line by line is bound to take time.
If you need better performance, split your original file. I suggest doing this in a new directory
split -l 100000 -d <filename>
Run generateoutput.sh in parallel for each of this file and tee -a output
ls -l x* | awk '{print $9}' | xargs -n1 -P4 generateoutput.sh | tee -a output.csv

Counting lines in a file matching specific string

Suppose I have more than 3000 files file.gz with many lines like below. The fields are separated by commas. I want to count only the line in which the 21st field has today's date (ex:20171101).
I tried this:
awk -F',' '{if { $21 ~ "TZ=GMT+30 date '+%d-%m-%y'" } { ++count; } END { print count; }}' file.txt
but it's not working.
Using awk, something like below
awk -F"," -v toSearch="$(date '+%Y%m%d')" '$21 ~ toSearch{count++}END{print count}' file
The date '+%Y%m%d' produces the date in the format as you requested, e.g. 20170111. Then matching that pattern on the 21st field and counting the occurrence and printing it in the END clause.
Am not sure the Solaris version of grep supports the -c flag for counting the number of pattern matches, if so you can do it as
grep -c "$(date '+%Y%m%d')" file
Another solution using gnu-grep
grep -Ec "([^,]*,){20}$(date '+%Y%m%d')" file
explanation: ([^,]*,){20} means 20 fields before the date to be checked
Using awk and process substitution to uncompress a bunch of gzs and feed them to awk for analyzing and counting:
$ awk -F\, 'substr($21,1,8)==strftime("%Y%m%d"){i++}; END{print i}' * <(zcat *gz)
Explained:
substr($21,1,8) == strftime("%Y%m%d") { # if the 8 first bytes of $21 match date
i++ # increment counter
}
END { # in the end
print i # output counter
}' * <(zcat *gz) # zcat all gzs to awk
If Perl is an option, this solution works on all 3000 gzipped files:
zcat *.gz | perl -F, -lane 'BEGIN{chomp($date=`date "+%Y%m%d"`); $count=0}; $count++ if $F[20] =~ /^$date/; END{print $count}'
These command-line options are used:
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-n loop around each line of the input file
-e execute the perl code
-F autosplit modifier, in this case splits on ,
BEGIN{} executes before the main loop.
The $date and $count variables are initialized.
The $date variable is set to the result of the shell command date "+%Y%m%d"
$F[20] is the 21st element in #F
If the 21st element starts with $date, increment $count
END{} executes after the main loop
Using grep and cut instead of awk and avoiding regular expressions:
cut -f21 -d, file | grep -Fc "$(date '+%Y%m%d')"

awk command to convert date format in a file

Given below is the file content and the awk command used:
Input file:in_t.txt
1,ABC,SSS,20-OCT-16,4,1,0,5,0,0,0,0
2,DEF,AAA,20-JUL-16,4,1,0,5,0,0,0,0
Expected outfile:
SSS|2016-10-20,5
AAA|2016-07-20,5
I tried the below command:
awk -F , '{print $3"|"$(date -d 4)","$8}' in_t.txt
Got the outfile as:
SSS|20-OCT-16,5
AAA|20-JUL-16,5
Only thing I want to know is on how to format the date with the same awk command. Tried with
awk -F , '{print $3"|"$(date -d 4)","$8 +%Y-%m-%d}' in_t.txt
Getting syntax error. Can I please get some help on this?
Better to do this in shell itself and use date -d to convert the date format:
#!/bin/bash
while IFS=',' read -ra arr; do
printf "%s|%s,%s\n" "${arr[2]}" $(date -d "${arr[3]}" '+%Y-%m-%d') "${arr[7]}"
done < file
SSS|2016-10-20,5
AAA|2016-07-20,5
What's your definition of a single command? A call to awk is a single shell command. This may be what you want:
$ awk -F'[,-]' '{ printf "%s|20%02d-%02d-%02d,%s\n", $3, $6, (match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",$5)+2)/3, $4, $10 }' file
SSS|2016-10-20,5
AAA|2016-07-20,5
BTW it's important to remember that awk is not shell. You can't call shell tools (e.g. date) directly from awk any more than you could from C. When you wrote $(date -d 4) awk saw an unset variable named date (numeric value 0) from which you extracted the value of an unset variable named d (also 0) to get the numeric result 0 which you then concatenated with the number 4 to get 04 and then applied the $ operator to to get the contents of field $04 (=$4). The output has nothing to do with the shell command date.
From Unix.com
Just tweaked it a little to suit your needs
awk -v var="20-OCT-16" '
BEGIN{
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
for (i=1; i<=12; i++) mdigit[month[i]]=i
m=toupper(substr(var,4,3))
dat="20"substr(var,8,2)"-"sprintf("%02d",mdigit[m])"-"substr(var,1,2)
print dat
}'
2016-10-20
Explanation:
Prefix 20 {20}
Substring from 8th position to 2 positions {16}
Print - {-}
Check for the month literal (converting into uppercase) and assign numbers (mdigit) {10}
Print - {-}
Substring from 1st position to 2 positions {20}
This may work for you also.
awk -F , 'BEGIN {months = " JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"}
{ num = index(months, substr($4,4,3)) / 3
if (length(num) == 1) {num = "0" num}
date = "20" substr($4,8,2) "-" num "-" substr($4,1,2)
print $3"|" date "," $8}' in_t.txt
You were close with your call to date. You can indeed use it with getline to parse and output the date value:
awk -F',' '{
parsedate="date --date="$2" +%Y-%m-%d"
parsedate | getline mydate
close(parsedate)
print $3"|"mydate","$8
}'
Explanation:
-F',' sets the field separator (delimiter) to comma
parsedate="date --date="$2" +%Y-%m-%d" leverages date's ability to convert the 2nd field to a given output format and assigns that command to the variable "parsedate"
parsedate | getline mydate runs your custom "parsedate" command, and assigns the output to the mydate variable
close (parsedate) prevents certain errors with multiline input/output (See Running a system command in AWK for discussion of getline and close())
print $3"|"mydate","$8 outputs the contents of the original line separated by pipe and comma with the new "mydate" value substituted for field 2.

format date in file using awk

Content of the file is
Feb-01-2014 one two
Mar-02-2001 three four
I'd like to format the first field (the date) to %Y%m%d format
I'm trying to use a combination of awk and date command, but somehow this is failing even though i got the feeling i'm almost there:
cat infile | awk -F"\t" '{$1=system("date -d " $1 " +%Y%m%d");print $1"\t"$2"\t"$3}' > test
this prints out date's usage pages which makes me think that the date command is triggered properly, but there is something wrong with the argument, do you see the issue somewhere?
i'm not that familiar with awk,
You don't need date for this, its simply rearranging the date string:
$ awk 'BEGIN{FS=OFS="\t"} {
split($1,t,/-/)
$1 = sprintf("%s%02d%s", t[3], (match("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3, t[2])
}1' file
20140201 one two
20010302 three four
You can use:
while read -r a _; do
date -d "$a" '+%Y%m%d'
done < file
20140201
20010302
system() returns the exit code of the command.
Instead:
cat infile | awk -F"\t" '{"date -d " $1 " +%Y%m%d" | getline d;print d"\t"$2"\t"$3}'
$ awk '{var=system("date -d "$1" +%Y%m%d | tr -d \"\\n\"");printf "%s\t%s\t%s\n", var, $2, $3}' file
201402010 one two
200103020 three four

Resources