Date difference based on yyymm in unix - shell

#####DATE1=201609
#### DATE2=201508
How to calculate the difference between these two date and get output as count of no of month
ie
201609-201508=13month

The calculation of the time difference is generally a complicated task, even for a single calendar type (and there are many). Many programming languages have a built-in support for date and time manipulation operations, including calculation of the time difference. But the most useful feature available in the popular shells is the date command which lacks this feature, unfortunately.
Therefore, we should whether write a script in another language, or make some assumptions such as the number of days in the year.
For example, in Perl the task is done with just four lines of code:
perl -e $(cat <<'PerlScript'
use Time::Piece;
my $t1 = Time::Piece->strptime($ARGV[0], '%Y%m');
my $t2 = Time::Piece->strptime($ARGV[1], '%Y%m');
printf "%d months\n", ($t1 - $t2)->months;
PerlScript
) 201609 201508
However, the difference of Time::Piece objects is an instance of Time::Seconds which actually assumes that
there are 24 hours in a day, 7 days in a week, 365.24225 days in a
year and 12 months in a year.
which indirectly confirms my words regarding the complexity of the task.
Then let's make the same assumption, and write a simple shell script:
DATE1=201609
DATE2=201508
printf '(%d - %d) / 2629744.2\n' \
$(date -d ${DATE1}01 +%s) \
$(date -d ${DATE2}01 +%s) | bc
where 2629744.2 is the number of seconds in month, i.e. 3600 * 24 * (365.24225 / 12).
Note, most of the shells do not support floating point arithmetic. That's why we need to invoke external tools such as bc.
The script outputs 13. This is a portable version. You may run it in the standard shell, Bash, Korn shell, or Zsh, for instance. If you want to put the result into a variable, just wrap the printf command in $( ... ):
months=$(printf '(%d - %d) / 2629744.2\n' \
$(date -d ${DATE1}01 +%s) \
$(date -d ${DATE2}01 +%s) | bc)
printf '%d - %d = %d months\n' $DATE1 $DATE2 $months

You can try below solution -
[vipin#hadoop ~]$ cat time.awk
{
diff1=((substr($1,1,4)) - (substr($2,1,4)))
diff2=((substr($1,5,2)) - (substr($2,5,2)))
if(diff1 > 0)
{
if(diff2 > 0)
{
print (diff1+diff2)
}
else if (diff2 = 0)
{
print (diff1+diff2)
}
else
{
diff2=(12+((substr($1,5,2)-(substr($2,5,2)))))
diff1=(diff1-1)
print (diff1+diff2)
}
}
else if(diff1 == 0)
{
if(diff2 > 0)
{
print (diff1+diff2)
}
else if (diff2 == 0)
{
print (diff1+diff2)
}
else
{
print "Argument 2 is greater than 1"
}
}
else
{
print "Argument 2 is greater than 1"
}
}
Test Cases -
[vipin#hadoop ~]$ cat time1.txt && awk -f time.awk time1.txt
201611 201601
10
[vipin#hadoop ~]$ cat time2.txt && awk -f time.awk time2.txt
201601 201611
Argument 2 is greater than 1
[vipin#hadoop ~]$ cat time3.txt && awk -f time.awk time3.txt
201511 201601
Argument 2 is greater than 1
[vipin#hadoop ~]$ cat time4.txt && awk -f time.awk time4.txt
201611 201611
0

Let's try something quick and dirty that won't work for all months in history, but it's probably good enough: convert YYYYMM to the number of months since year 0:
$ ym2m() {
if [[ $1 =~ ^([0-9]{4})([0-9]{2})$ ]]; then
echo $(( 10#${BASH_REMATCH[1]} * 12 + 10#${BASH_REMATCH[2]} ))
else
return 1
fi
}
$ ym2m 201609
24201
$ ym2m 201508
24188
$ echo $(( $(ym2m 201609) - $(ym2m 201508) ))
13
Notes:
requires bash version 4.3 (I think)
ym2m => "year-month to months"
uses 10#number in the arithmetic expression to ensure "08" and "09" are not treated as invalid octal numbers.

Related

how to print a result data in same the line in bash command?

I have my command below and I want to have the result in the same line with delimeters. My command:
Array=("GET" "POST" "OPTIONS" "HEAD")
echo $(date "+%Y-%m-%d %H:%M")
for i in "${Array[#]}"
do
cat /home/log/myfile_log | grep "$(date "+%d/%b/%Y:%H")"| awk -v last5=$(date --date="-5 min" "+%M") -F':' '$3>=last5 && $3<last5+5{print}' | egrep -a "$i" | wc -l
done
Results is:
2019-01-01 13:27
1651
5760
0
0
I want to have the result below:
2019-01-01 13:27,1651,5760,0,0
It looks (to me) like the overall objective is to scan /home/log/myfile.log for entries that have occurred within the last 5 minutes and which match one of the 4 entries in ${Array[#]}, keeping count of the matches along the way and finally printing the current date and the counts to a single line of output.
I've opted for a complete rewrite that uses awk's abilities of pattern matching, keeping counts and generating a single line of output:
date1=$(date "+%Y-%m-%d %H:%M") # current date
date5=$(date --date="-5 min" "+%M") # date from 5 minutes ago
awk -v d1="${date1}" -v d5="${date5}" -F":" '
BEGIN { keep=0 # init some variables
g=0
p=0
o=0
h=0
}
$3>=d5 && $3<d5+5 { keep=1 } # do we keep processing this line?
!keep { next } # if not then skip to next line
/GET/ { g++ } # increment our counters
/POST/ { p++ }
/OPTIONS/ { o++ }
/HEAD/ { h++ }
{ keep=0 } # reset keep flag for next line
# print results to single line of output
END { printf "%s,%s,%s,%s,%s\n", d1, g, p, o, h }
' <(grep "$(date '+%d/%b/%Y:%H')" /home/log/myfile_log)
NOTE: The OP may need to revisit the <(grep "$(date ...)" /home/log/myfile.log) to handle timestamp periods that span hours, days, months and years, eg, 14:59 - 16:04, 12/31/2019 23:59 - 01/01/2020 00:04, etc.
Yeah, it's a bit verbose but a bit easier to understand; OP can rewrite/reduce as sees fit.

Is it really slow to handle text file(more than 10K lines) with shell script?

I have a file with more than 10K lines of record.
Within each line, there are two date+time info. Below is an example:
"aaa bbb ccc 170915 200801 12;ddd e f; g; hh; 171020 122030 10; ii jj kk;"
I want to filter out the lines the days between these two dates is less than 30 days.
Below is my source code:
#!/bin/bash
filename="$1"
echo $filename
touch filterfile
totalline=`wc -l $filename | awk '{print $1}'`
i=0
j=0
echo $totalline lines
while read -r line
do
i=$[i+1]
if [ $i -gt $[j+9] ]; then
j=$i
echo $i
fi
shortline=`echo $line | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'`
date1=`echo $shortline | awk '{print $1}'`
date2=`echo $shortline | awk '{print $2}'`
if [ $date1 -gt 700000 ]
then
continue
fi
d1=`date -d $date1 +%s`
d2=`date -d $date2 +%s`
diffday=$[(d2-d1)/(24*3600)]
#diffdays=`date -d $date2 +%s` - `date -d $date1 +%s`)/(24*3600)
if [ $diffday -lt 30 ]
then
echo $line >> filterfile
fi
done < "$filename"
I am running it in cywin. It took about 10 second to handle 10 lines. I use echo $i to show the progress.
Is it because i am using some wrong way in my script?
This answer does not answer your question but gives an alternative method to your shell script. The answer to your question is given by Sundeep's comment :
Why is using a shell loop to process text considered bad practice?
Furthermore, you should be aware that everytime you call sed, awk, echo, date, ... you are requesting the system to execute a binary which needs to be loaded into memory etc etc. So if you do this in a loop, it is very inefficient.
alternative solution
awk programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. gawk extended the awk standard with time-handling functions. The one you are interested in is :
mktime(datespec [, utc-flag ]) Turn datespec into a timestamp in the
same form as is returned by systime(). It is similar to the function
of the same name in ISO C. The argument, datespec, is a string of the
form "YYYY MM DD HH MM SS [DST]". The string consists of six or seven
numbers representing, respectively, the full year including century,
the month from 1 to 12, the day of the month from 1 to 31, the hour of
the day from 0 to 23, the minute from 0 to 59, the second from 0 to
60, and an optional daylight-savings flag.
The values of these numbers need not be within the ranges specified;
for example, an hour of -1 means 1 hour before midnight. The
origin-zero Gregorian calendar is assumed, with year 0 preceding year
1 and year -1 preceding year 0. If utc-flag is present and is either
nonzero or non-null, the time is assumed to be in the UTC time zone;
otherwise, the time is assumed to be in the local time zone. If the
DST daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
If datespec does not contain enough elements or if the resulting time
is out of range, mktime() returns -1.
As your date format is of the form yymmdd HHMMSS we need to write a parser function convertTime for this. Be aware in this function we will pass times of the form yymmddHHMMSS. Furthermore, using a space delimited fields, your times are located in field $4$5 and $11$12. As mktime converts the time to seconds since 1970-01-01 onwards, all we need to do is to check if the delta time is smaller than 30*24*3600 seconds.

awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ t1=convertTime($4$5); t2=convertTime($11$12)}
(t2-t1 < 30*3600*24) { print }' <file>
If you are not interested in the real delta time (your sed line removes the actual time of the day), than you can adopt it to :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s "00 00 00"
return mktime(s)
}
{ t1=convertTime($4); t2=convertTime($11)}
(t2-t1 < 30*3600*24) { print }' <file>
If the dates are not in the fields, you can use match to find them :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ match($0,/[0-9]{6} [0-9]{6}/);
t1=convertTime(substr($0,RSTART,RLENGTH));
a=substr($0,RSTART+RLENGTH)
match(a,/[0-9]{6} [0-9]{6}/)
t2=convertTime(substr(a,RSTART,RLENGTH))}
(t2-t1 < 30*3600*24) { print }' <file>
With some modifications, often without speed in mind, I can reduce the processing time by 50% - which is a lot:
#!/bin/bash
filename="$1"
echo "$filename"
# touch filterfile
totalline=$(wc -l < "$filename")
i=0
j=0
echo "$totalline" lines
while read -r line
do
i=$((i+1))
if (( i > ((j+9)) )); then
j=$i
echo $i
fi
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
if (( date1 > 700000 ))
then
continue
fi
d1=$(date -d "$date1" +%s)
d2=$(date -d "$date2" +%s)
diffday=$(((d2-d1)/(24*3600)))
# diffdays=$(date -d $date2 +%s) - $(date -d $date1 +%s))/(24*3600)
if (( diffday < 30 ))
then
echo "$line" >> filterfile
fi
done < "$filename"
Some remarks:
# touch filterfile
Well - the later CMD >> filterfile overwrites this file and creates one, if it doesn't exist.
totalline=$(wc -l < "$filename")
You don't need awk, here. The filename output is surpressed if wc doesn't see the filename.
Capturing the output in an array:
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
allows us array access and saves another call to awk.
On my machine, your code took about 42s for 2880 lines (on your machine 2880 s?) and about 19s for the same file with my code.
So I suspect, if you aren't running it on an i486-machine, that cygwin might be a slowdown. It's a linux environment for windows, isn't it? Well, I'm on a core Linux system. Maybe you try the gnu-utils for Windows - the last time I looked for them, they were advertised as gnu-utils x32 or something, maybe there is an a64-version available by now.
And the next thing I would have a look at, is the date calculation - that might be a slowdown too.
2880 lines isn't that much, so I don't suspect that my SDD drive plays a huge role in the game.

Randomly loop over days in bash-script

At the moment, I have a while-loop that takes a starting date, runs a python script with the day as the input, then takes the day + 1 until a certain due date is reached.
day_start=2016-01-01
while [ "$day_start"!=2018-01-01 ] ;
do
day_end=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
python script.py --start="$day_start" --end="$day_end";
day_start=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
done
I would like to do the same thing, but now to pick a random day between 2016-01-01 and 2018-01-01 and repeat until all days have been used once. I think it should be a for-loop instead of this while loop, but I have trouble to specify the for-loop over this date-range in bash. Does anyone have an idea how to formulate this?
It can take quite a long time if you randomly choose the dates because of the Birthday Problem. (You'll hit most of the dates over and over again but the last date can take quite some time).
The best idea I can give you is this:
Create all dates as before in a while loop (only the day_start-line)
Output all dates into a temporary file
Use sort -R on this file ("shuffles" the contents and prints the result)
Loop over the output from sort -R and you'll have dates randomly picked until all were reached.
Here's an example script which incorporates my suggestions:
#!/bin/bash
day_start=2016-01-01
TMPFILE="$(mktemp)"
while [ "$day_start" != "2018-01-01" ] ;
do
day_start=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
echo "${day_start}"
done > "${TMPFILE}"
sort -R "${TMPFILE}" | while read -r day_start
do
day_end=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
python script.py --start="$day_start" --end="$day_end";
done
rm "${TMPFILE}"
By the way, without the spaces in the while [ "$day_start" != "2018-01-01" ];, bash won't stop your script.
Fortunately, from 16 to 18 there was no leap year (or was it, and it just works because of that)?
Magic number: 2*365 = 730
The i % 100, just to have less output.
for i in {0..730}; do nd=$(date -d "2016/01/01"+${i}days +%D); if (( i % 100 == 0 || i == 730 )); then echo $nd ; fi; done
01/01/16
04/10/16
07/19/16
10/27/16
02/04/17
05/15/17
08/23/17
12/01/17
12/31/17
With the format instruction (here +%D), you might transform the output to your needs, date --help helps.
In a better readable format, and with +%F:
for i in {0..730}
do
nd=$(date -d "2016/01/01"+${i}days +%F)
echo $nd
done
2016-01-01
2016-04-10
2016-07-19
...
For a random distribution, use shuf (here, for bevity, with 7 days):
for i in {0..6}; do nd=$(date -d "2016/01/01"+${i}days +%D); echo $nd ;done | shuf
01/04/16
01/07/16
01/05/16
01/01/16
01/03/16
01/06/16
01/02/16

Compare Date-Time Stamps [duplicate]

This question already has an answer here:
How to compare two DateTime strings and return difference in hours? (bash shell)
(1 answer)
Closed 8 years ago.
I am looking to write a shell script that will compare the time between two date-time stamps in the format:
2013-12-10 13:25:30.123
2013-12-10 13:25:31.123
I can split the date and time if required (as the comparison should never be more than one second - I am looking at a reporting rate), so I can format the time as 13:25:30.123 / 13:25:31.123.
To just find the newer (or older) of the two timestamps, you could just use a string comparison operator:
time1="2013-12-10 13:25:30.123"
time2="2013-12-10 13:25:31.123"
if [ "$time1" > "$time2" ]; then
echo "the 2nd timestamp is newer"
else
echo "the 1st timestamp is newer"
fi
And, to find the time difference (tested):
ns1=$(date --date "$time1" +%s%N)
ns2=$(date --date "$time2" +%s%N)
echo "the difference in seconds is:" `bc <<< "scale=3; ($ns2 - $ns1) / 1000000000"` "seconds"
Which, in your case prints
the difference in seconds is: 1.000 seconds
Convert them into timestamps before comparing:
if [ $(date -d "2013-12-10 13:25:31.123" +%s) -gt $(date -d "2013-12-10 13:25:30.123" +%s) ]; then
echo "blub";
fi
With Perl using the included Time::Piece library:
perl -MTime::Piece -nE '
BEGIN {
$, = "\t";
sub to_seconds {
my ($dt, $frac) = (shift =~ /(.*)(\.\d*)$/);
return(Time::Piece->strptime($dt, "%Y-%m-%d %T")->epoch + $frac);
}
}
if ($. > 1) {
$a = to_seconds($_);
$b = to_seconds($prev);
say $a, $b, $a-$b
}
$prev = $_
'<<END
2013-12-10 13:25:30.123
2013-12-10 13:25:31.123
2013-12-10 13:25:42.042
END
1386681931.123 1386681930.123 1
1386681942.042 1386681931.123 10.9190001487732

Convert a given time to seconds in solaris

I have a time like 2013-04-29 08:17:58. And i need to convert it to seconds since epoch time.
No perl please and my OS is solaris. +%s does not work. nawk 'BEGIN{print srand()}' converts the current time to seconds but does not convert a given time to seconds.
Thanks
Here is a shell function that doesn't require perl:
function d2ts
{
typeset d=$(echo "$#" | tr -d ':- ' | sed 's/..$/.&/')
typeset t=$(mktemp) || return -1
typeset s=$(touch -t $d $t 2>&1) || { rm $t ; return -1 ; }
[ -n "$s" ] && { rm $t ; return -1 ; }
truss -f -v 'lstat,lstat64' ls -d $t 2>&1 | nawk '/mt =/ {printf "%d\n",$10}'
rm $t
}
$ d2ts 2013-04-29 08:17:58
1367216278
Note that the returned value depends on your timezone.
$ TZ=GMT d2ts 2013-04-29 08:17:58
1367223478
How it works:
The first line converts the parameters to a format suitable for touch (here "2013-04-29
08:17:58" -> "201304290817.58" )
The second line creates a temporary file
The third line change the modification time of the just created file to the required value
The fourth line aborts the function if setting the time failed, i.e. if the provided time is invalid
The fifth line traces the ls command to get the file modification time and prints it as an integer
The sixth line removes the temporary file
In C/C++ on UNIX, you can convert directly:
struct tm tm;
time_t t;
tm.tm_isdst = -1;
if (strptime("2013-04-29 08:17:58", "%Y-%m-%d %H:%M:%S", &tm) != NULL) {
t = mktime(tm);
}
cout << "seconds since epoch: " << t;
See Opengroup manpage strptime() for the example.
I admire the strace/touch niftyness and the creativity behind it. Though, well, just don't do this in a tight loop ...

Resources