In shell (no bash because of Alpine) using BusyBox, how can I compare two dates both formatted as Tue Aug 30 12:01:37 GMT 2022?
I want to know which one comes first. date doesn't support this input format. I'm only interested in whole days. The time isn't interesting for me. So two dates on the same day but a different time are equal to me.
Of course I could put all the names of the months in a lookup table and use the index of the month as its integer value (to be able to compare) but I have the feeling I shouldn't be the one programming that out...
Update:
/opt/scripts $ a="Tue Aug 30 12:01:37 GMT 2022"
/opt/scripts $ date -d "$a" +%s
date: invalid date 'Tue Aug 30 12:01:37 GMT 2022'
/opt/scripts $ date --help
BusyBox v1.34.1 (2022-04-04 10:19:27 UTC) multi-call binary.
Usage: date [OPTIONS] [+FMT] [[-s] TIME]
Display time (using +FMT), or set time
-u Work in UTC (don't convert to local time)
[-s] TIME Set time to TIME
-d TIME Display TIME, not 'now'
-D FMT FMT (strptime format) for -s/-d TIME conversion
-r FILE Display last modification time of FILE
-R Output RFC-2822 date
-I[SPEC] Output ISO-8601 date
SPEC=date (default), hours, minutes, seconds or ns
Recognized TIME formats:
#seconds_since_1970
hh:mm[:ss]
[YYYY.]MM.DD-hh:mm[:ss]
YYYY-MM-DD hh:mm[:ss]
[[[[[YY]YY]MM]DD]hh]mm[.ss]
'date TIME' form accepts MMDDhhmm[[YY]YY][.ss] instead
/opt/scripts $
Install dateutils https://pkgs.alpinelinux.org/package/edge/community/x86/dateutils . Use strptime to convert the date to seconds. Compare seconds.
apk add dateutils
a=$(strptime -f %s -i "%a %b %d %T %Z %Y" "Tue Aug 30 12:01:37 GMT 2022")
b=$(strptime -f %s -i "%a %b %d %T %Z %Y" "Tue Aug 30 12:01:38 GMT 2022")
[ "$a" -lt "$b")
You may have to rely on awk:
/ # cat /etc/alpine-release
3.16.0
/ # echo $a
Tue Aug 30 12:01:37 GMT 2022
/ # TZ=GMT awk -v a="$a" 'BEGIN {
> split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", months)
> split(a, date)
> gsub(/:/, " ", date[4])
>
> for (i=1; i<=12; i++) {
> if (date[2] == months[i]) {
> timestamp = date[6] " " i " " date[3] " " date[4]
> print mktime(timestamp)
> exit
> }
> }
>
> print "hmm, " date[2] " is an unknown month"
> exit 1
> }'
1661860897
Ok, my alpine busybox copy of date doesn't recognize strings as month either.
You want "slick", stick with Glenn's awk solution, so long as the time functions work for you. I hacked out the least-slick kluge using just echo, date, read, if's, and a lot of tempfiles - it's an ugly mess, but it works, and it was a fun exercise in using only the most basic stuff.
/tmp $ ./script
#! /bin/sh
cat "$0"
cd /tmp
echo "01">Jan
echo "02">Feb
echo "03">Mar
echo "04">Apr
echo "05">May
echo "06">Jun
echo "07">Jul
echo "08">Aug
echo "09">Sep
echo "10">Oct
echo "11">Nov
echo "12">Dec
echo "Tue Aug 30 12:01:37 GMT 2022">a_raw
read -r a_raw<a_raw
echo "Fri Jun 3 09:26:55 CDT 2022">b_raw
read -r b_raw<b_raw
read -r _ Mon DD tim z YYYY<a_raw
read -r MM<"$Mon"
date -d "$YYYY-$MM-$DD" +"%s">a_epoch
read -r a_epoch<a_epoch
read -r _ Mon DD tim z YYYY<b_raw
read -r MM<"$Mon"
date -d "$YYYY-$MM-$DD" +"%s">b_epoch
read -r b_epoch<b_epoch
if [ "$a_epoch" -lt "$b_epoch" ]
then echo "$a_raw ($a_epoch) is before $b_raw ($b_epoch)"
else if [ "$a_epoch" -gt "$b_epoch" ]
then echo "$a_raw ($a_epoch) is after $b_raw ($b_epoch)"
else if [ "$a_epoch" -eq "$b_epoch" ]
then echo "$a_raw ($a_epoch) is same as $b_raw ($b_epoch)"
fi
fi
fi
rm Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec a_raw b_raw a_epoch b_epoch
Tue Aug 30 12:01:37 GMT 2022 (1661817600) is after Fri Jun 3 09:26:55 CDT 2022 (1654214400)
original
What do you mean "date doesn't support this input format"?
Something like this ought to work in sh, though I confess I don't have an alpine handy...
a="Tue Aug 30 12:01:37 GMT 2022"
b="Fri Jun 3 09:26:55 CDT 2022"
a_epoch=`date -d "$a" +%s`
b_epoch=`date -d "$b" +%s`
echo "A: [$a] ($a_epoch)"
echo "B: [$b] ($b_epoch)"
if [ "$a_epoch" -lt "$b_epoch" ]; then echo "$a is before $b"; fi
if [ "$a_epoch" -gt "$b_epoch" ]; then echo "$a is after $b"; fi
if [ "$a_epoch" -eq "$b_epoch" ]; then echo "$a is same as $b"; fi
Should say something like
A: [Tue Aug 30 12:01:37 GMT 2022] (1661860897)
B: [Fri Jun 3 09:26:55 CDT 2022] (1654266415)
Tue Aug 30 12:01:37 GMT 2022 is after Fri Jun 3 09:26:55 CDT 2022
There are cleaner ways, but this should get you started.
Lemme spin up a container and try there, brb...
I have a large dataset for analysis and I am looking for shell scripting to filter the rows only to what I require, so I am able to load the dataset for further analysis in R.
The structure for data is as follows:
Size,ModifiedTime,AccessTime,contentid
4886,"Jun 11, 2009 06:51:08 PM","Mar 15, 2013 09:24:53 AM",000000285b7925f511b3159a72f80a4a
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
84848,"Feb 12, 2007 12:40:00 PM","Apr 07, 2014 09:39:03 AM",000001cec02017ca3eb81ddc4cd1c9ff
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
264158,"Dec 08, 2009 03:28:14 PM","Apr 08, 2013 11:52:15 AM",000003020ba74b9d1b6075d3c1b8fcb3
725963,"Sep 29, 2008 03:45:21 PM","May 17, 2011 08:48:40 AM",0000034b98d29d84ce7b61ee68be7658
1340,"Sep 07, 2011 03:36:54 AM","Mar 12, 2013 02:55:01 AM",000004ed899e26ae1c9b1ece35a98af1
75264,"Jul 28, 2011 05:09:58 PM","Jun 07, 2014 04:21:28 PM",000005a09fd2eb706c5800eb06084160
198724,"Jul 23, 2012 02:25:58 PM","Jan 21, 2013 12:58:07 PM",0000060b9d552c35f281b5033dcfa1b4
It is essentially a large csv file.
Now I want to filter rows for which AccessTime is less than 10 years and then write it into a separate csv file, which in this case should print 2nd row (excluding header)
I tried the following: create a temp time variable and compare with the AccessTime, if it's less then print row.
BEGIN{
FPAT = "([^,]+)|(\"[^\"]+\")"; #this to read csv as some column value contains ,
OFS=",";
date=$(date -d "-3650 days" +"%s"); #temp time variable in epoch format
}
{
command="date -d" $6 " +%s"; #$6 refers to AccessTime column
( command | getline temp ); #converts Accesstime value to epoch format
close(command);
if(temp<date) print $6
}
But when I run this command, it doesn't print anything.
Any help is much appreciated.
Desired output:
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
$ awk '
BEGIN {
m["Jan"]="01" # lookups for months
m["Feb"]="02" # Feb -> 02
m["Mar"]="03" # Mar -> 03
m["Apr"]="04" # etc.
m["May"]="05"
m["Jun"]="06"
m["Jul"]="07"
m["Aug"]="08"
m["Sep"]="09"
m["Oct"]="10"
m["Nov"]="11" # below we get todays date
m["Dec"]="12" # 10 years ago
dcmd="date +\"%Y%m%d,\" --date=\"10 years ago\"" # returns 20101204,
if((dcmd | getline d)<=0) # if getline fails
exit 1 # exit
# d=strftime("%Y%m%d")-10^5 "," # use this for GNU awk
}
$9 m[$7] $8>=d' file # explained below
d gets value 20101204. (notice the trailing comma) from the date +"%Y%m%d," --date="10 years ago". Reading the AccessTime from the file and rearranging the components with $9 m[$7] $8, for example, for Mar 15, 2013 is 20130315, (notice the comma again). The condition is the comparison of those two dates.
Output:
4886 Jun 11, 2009 06:51:08 PM Mar 15, 2013 09:24:53 AM 000000285b7925f511b3159a72f80a4a
84848 Feb 12, 2007 12:40:00 PM Apr 07, 2014 09:39:03 AM 000001cec02017ca3eb81ddc4cd1c9ff
264158 Dec 08, 2009 03:28:14 PM Apr 08, 2013 11:52:15 AM 000003020ba74b9d1b6075d3c1b8fcb3
725963 Sep 29, 2008 03:45:21 PM May 17, 2011 08:48:40 AM 0000034b98d29d84ce7b61ee68be7658
1340 Sep 07, 2011 03:36:54 AM Mar 12, 2013 02:55:01 AM 000004ed899e26ae1c9b1ece35a98af1
75264 Jul 28, 2011 05:09:58 PM Jun 07, 2014 04:21:28 PM 000005a09fd2eb706c5800eb06084160
198724 Jul 23, 2012 02:25:58 PM Jan 21, 2013 12:58:07 PM 0000060b9d552c35f281b5033dcfa1b4
With GNU awk for time functions, FPAT, and gensub():
$ cat tst.awk
BEGIN {
OFS = ","
FPAT = "([^" OFS "]*)|(\"[^\"]+\")"
now = strftime("%Y %m %d %H %M %S")
year = gensub(/ .*/,"",1,now)
rest = gensub(/[^ ]+/,"",1,now)
secs = mktime((year-10) rest)
thresh = strftime("%Y%m%d%H%M%S",secs)
}
NR > 1 {
split($3,t,/[ ,:]+/)
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3
hour = t[4] + ( (t[7] == "PM") && (t[4] < 12) ? 12 : 0 )
curr = sprintf("%04d%02d%02d%02d%02d%02d", t[3], mthNr, t[2], hour, t[5], t[6])
}
(NR == 1) || (curr < thresh)
$ awk -f tst.awk file
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
Based on your shown samples, written and tested with shown samples in GNU awk. Also considering that you need to compare your 2nd occurrence of date with current date. Also this solution is not dealing with leap seconds concept(trying to figure out another way too here).
awk '
BEGIN{
num=split("jan,feb,mar,apr,may,jun,jul,aug,sept,oct,nov,dec",arr1,",")
for(i=1;i<=num;i++){
month[arr1[i]]=sprintf("%02d",i)
}
}
match($0,/[AP]M.*[AP]M/){
val=substr($0,RSTART,RLENGTH)
sub(/^[AP]M +/,"",val)
sub(/ [AP]M +$/,"",val)
split(val,array,"[ ,]")
dat=array[4] OFS month[tolower(array[1])] OFS array[2] OFS array[5]
timE=(systime()-mktime(gensub(/[ ":-]/," ","g",dat)))/(365*60*24*60)
if(timE>10){ print }
}
' Input_file
This will not print header in case you need to print it then add FNR==1{print;next} before match function.
Another shorter awk solution.
$ awk -F, -v ct=$(date "+%s") ' NR>1 { dc="date -d"$4 $5 " \"+%s\""; dc|getline t; yrs=(ct - t)/(24*60*60*365) } yrs>10 || NR==1 ' monte.txt
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
$
Explanation:
The date command works if we just pass the string representation of access time.
$ date -d"Jun 11, 2009 06:51:08 PM"
Thu Jun 11 18:51:08 IST 2009
It works even without the comma
$ date -d"Jun 11 2009 06:51:08 PM"
Thu Jun 11 18:51:08 IST 2009
So there is no need to clean the data. Just passing $4 and $5 from the input file with comma as delimiter would work.
For comparison, I have used the epoch
awk -F, -v ct=$(date "+%s") ' #get the current epoch seconds via ct
NR>1 {
dc="date -d"$4 $5 " \"+%s\""; #build the date command using access time $4 and %5
dc|getline t; #execute the command and get the output in temp t
yrs=(ct - t)/(24*60*60*365) #calcualte the number of years between ct and t
}
yrs>10 || NR==1 #print if diff yrs > 10 or NR==1 for header
'
Another solution:
If you want to apply the logic for 10 years in the date command, then we need to just remove the double quotes from the $5.
$ awk -F, -v ct=$(date "+%s") ' NR>1 { c5=substr($5,1,length($5)-1);dc="date -d"$4 c5 " + 10 years \" \"+%s\""; dc|getline t } t<ct ' monte.txt
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
$
EpochConverter turns a timestamp value like 1586775709496 into Monday, April 13, 2020 11:01:49.496 AM.
Unfortunately, the date tool on MacOs expects seconds, not milliseconds, and gives a wrong year:
> date -r 1586775709496
Thu Dec 2 15:24:56 CET 52252
This existing question only explains the obvious: you can divide by 1000 (cut of the trailing 3 digits) and the built-in date tool will work.
But: that is not what I am looking for. I am looking for a "straightforward" way to turn such millisecond based timestamps into "human readable" including the milliseconds. Are there ways to achieve that?
timestamp=1586775709496
ms=$(( $timestamp % 1000 ))
echo "$(date -r $(( $timestamp / 1000 )) +"%a, %b %d, %Y %H:%M:%S").$ms"
Mon, Apr 13, 2020 12:01:49.496
you can edit the date format string to get exactly the result you need.
With gnu date I believe that would be:
$ a=1586775709496
$ LC_ALL=C date -u --date=#"$((a/1000)).$(printf "%03d" $((a%1000)))" +"%A, %B %2d, %Y %H:%M:%S.%3N %p"
Monday, April 13, 2020 11:01:49.496 PM
The %3N is something that GNU date supports and it prints only milliseconds.
I guess because the last 3 characters of input are just in the output, you could just input them where they should be, removing the need for %N extension:
$ a=1586775709496;
$ LC_ALL=C date -u --date=#"$((a/1000))" +"%A, %B %2d, %Y %H:%M:%S.$(printf "%03d" $((a%1000))) %p"
I've currently got a string as below:
integration#{Wed Nov 19 14:17:32 2014} branch: thebranch
This is contained in a file, and I parse the string. However I want the value between the brackets {Wed Nov 19 14:17:32 2014}
I have zero experience with Sed, and to be honest I find it a little cryptic.
So far I've managed to use the following command, however the output is still the entire string.
What am I doing wrong?
sed -e 's/[^/{]*"\([^/}]*\).*/\1/'
To get the values which was between {, }
$ sed 's/^[^{]*{\([^{}]*\)}.*/\1/' file
Wed Nov 19 14:17:32 2014
This is very simple to do with awk, not complicate regex.
awk -F"{|}" '{print $2}' file
Wed Nov 19 14:17:32 2014
It sets the field separator to { or }, then your data will be in the second field.
FS could be set like this to:
awk -F"[{}]" '{print $2}' file
To see all field:
awk -F"{|}" '{print "field#1="$1"\nfield#2="$2"\nfield#3="$3}' file
field#1=integration#
field#2=Wed Nov 19 14:17:32 2014
field#3= branch: thebranch
This might work
sed -e 's/[^{]*\({[^}]*}\).*/\1/g'
Test
$ echo "integration#{Wed Nov 19 14:17:32 2014} branch: thebranch" | sed -e 's/[^{]*{\([^}]*\)}.*/\1/g'
Wed Nov 19 14:17:32 2014
Regex
[^{]* Matches anything other than the {, That is integration#
([^}]*) Capture group 1
\{ Matches {
[^}]* matches anything other than }, That is Wed Nov 19 14:17:32 2014
\} matches a }
.* matches the rest
Simply, below command also get the data...
echo "integration#{Wed Nov 19 14:17:32 2014} branch: thebranch" | sed 's/.*{\(.*\)}.*/\1/g'