Converting Dates in Shell - shell

How can I convert one date format to another format in a shellscript?
Example:
the old format is
MM-DD-YY HH:MM
but I want to convert it into
YYYYMMDD.HHMM

Like "20${D:6:2}${D:0:2}${D:3:2}.${D:9:2}${D:12:2}00", if the old date in the $D variable.

Take advantage of the shell's word splitting and the positional parameters:
date="12-31-11 23:59"
IFS=" -:"
set -- $date
echo "20$3$1$2.$4$5" #=> 20111231.2359

myDate="21-12-11 23:59"
#fmt is DD-MM-YY HH:MM
outDate="20${myDate:6:2}${myDate:3:2}${myDate:0:2}.${myDate:9:2}${myDate:12:2}00"
case "${outDate}" in
2[0-9][0-9][0-9][0-1][0-9][0-3][0-9].[0-2][0-9][0-5][[0-9][0-5][[0-9] )
: nothing_date_in_correct_format
;;
* ) echo bad format for ${outDate} >&2
;;
esac
Note that if you have a large file to process, then the above is an expensive(ish) process. For filebased data I would recommend something like
cat infile
....|....|21-12-11 23:59|22-12-11 00:01| ...|
awk '
function reformatDate(inDate) {
if (inDate !~ /[0-3][0-9]-[0-1][0-9]-[0-9][0-9] [0-2][0-9]:[0-5][[0-9]/) {
print "bad date format found in inDate= "inDate
return -1
}
# in format assumed to be DD-MM-YY HH:MM(:SS)
return (2000 + substr(inDate,7,2) ) substr(inDate,4,2) substr(inDate, 1,2) \
"." substr(inDate,10,2) substr(inDate,13,2) \
( substr(inDate,16,2) ? substr(inDate,16,2) : "00" )
}
BEGIN {
#add or comment out for each column of data that is a date value to convert
# below is for example, edit as needed.
dateCols[3]=3
dateCols[4]=4
# for awk people, I call this the pragmatic use of associative arrays ;-)
#assuming pipe-delimited data for columns
#....|....|21-12-11 23:59|22-12-11 00:01| ...|
FS=OFS="|"
}
# main loop for each record
{
for (i=1; i<=NF; i++) {
if (i in dateCols) {
#dbg print "i=" i "\t$i=" $i
$i=reformatDate($i)
}
}
print $0
}' infile
output
....|....|20111221.235900|20111222.000100| ...|
I hope this helps.

There is a good answer down already, but you said you wanted an alternative in the comments, so here is my [rather awful in comparison] method:
read sourcedate < <(echo "12-13-99 23:59");
read sourceyear < <(echo $sourcedate | cut -c 7-8);
if [[ $sourceyear < 50 ]]; then
read fullsourceyear < <(echo -n 20; echo $sourceyear);
else
read fullsourceyear < <(echo -n 19; echo $sourceyear);
fi;
read newsourcedate < <(echo -n $fullsourceyear; echo -n "-"; echo -n $sourcedate | cut -c -5);
read newsourcedate < <(echo -n $newsourcedate; echo -n $sourcedate | cut -c 9-14);
read newsourcedate < <(echo -n $newsourcedate; echo :00);
date --date="$newsourcedate" +%Y%m%d.%H%M%S
So, the first line just reads a date in, then we get the two-digit year, then we append it to '20' or '19' based on if it's less than 50 (so this would give you years from 1950 to 2049 - feel free to shift the line). Then we append a hyphen and the month and date. Then we append a space and the time, and lastly we append ':00' as the seconds (again feel free to make your own default). Lastly we use GNU date to read it in (since it's been standardized now) and print it in a different format (which you can edit).
It's a lot longer and uglier than cutting up the string, but having the format in the last line may be worth it. Also you could shorten it significantly with the shorthand you just learned in the first answer.
Good luck.

Related

Is it really slow to handle text file(more than 10K lines) with shell script?

I have a file with more than 10K lines of record.
Within each line, there are two date+time info. Below is an example:
"aaa bbb ccc 170915 200801 12;ddd e f; g; hh; 171020 122030 10; ii jj kk;"
I want to filter out the lines the days between these two dates is less than 30 days.
Below is my source code:
#!/bin/bash
filename="$1"
echo $filename
touch filterfile
totalline=`wc -l $filename | awk '{print $1}'`
i=0
j=0
echo $totalline lines
while read -r line
do
i=$[i+1]
if [ $i -gt $[j+9] ]; then
j=$i
echo $i
fi
shortline=`echo $line | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'`
date1=`echo $shortline | awk '{print $1}'`
date2=`echo $shortline | awk '{print $2}'`
if [ $date1 -gt 700000 ]
then
continue
fi
d1=`date -d $date1 +%s`
d2=`date -d $date2 +%s`
diffday=$[(d2-d1)/(24*3600)]
#diffdays=`date -d $date2 +%s` - `date -d $date1 +%s`)/(24*3600)
if [ $diffday -lt 30 ]
then
echo $line >> filterfile
fi
done < "$filename"
I am running it in cywin. It took about 10 second to handle 10 lines. I use echo $i to show the progress.
Is it because i am using some wrong way in my script?
This answer does not answer your question but gives an alternative method to your shell script. The answer to your question is given by Sundeep's comment :
Why is using a shell loop to process text considered bad practice?
Furthermore, you should be aware that everytime you call sed, awk, echo, date, ... you are requesting the system to execute a binary which needs to be loaded into memory etc etc. So if you do this in a loop, it is very inefficient.
alternative solution
awk programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. gawk extended the awk standard with time-handling functions. The one you are interested in is :
mktime(datespec [, utc-flag ]) Turn datespec into a timestamp in the
same form as is returned by systime(). It is similar to the function
of the same name in ISO C. The argument, datespec, is a string of the
form "YYYY MM DD HH MM SS [DST]". The string consists of six or seven
numbers representing, respectively, the full year including century,
the month from 1 to 12, the day of the month from 1 to 31, the hour of
the day from 0 to 23, the minute from 0 to 59, the second from 0 to
60, and an optional daylight-savings flag.
The values of these numbers need not be within the ranges specified;
for example, an hour of -1 means 1 hour before midnight. The
origin-zero Gregorian calendar is assumed, with year 0 preceding year
1 and year -1 preceding year 0. If utc-flag is present and is either
nonzero or non-null, the time is assumed to be in the UTC time zone;
otherwise, the time is assumed to be in the local time zone. If the
DST daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
If datespec does not contain enough elements or if the resulting time
is out of range, mktime() returns -1.
As your date format is of the form yymmdd HHMMSS we need to write a parser function convertTime for this. Be aware in this function we will pass times of the form yymmddHHMMSS. Furthermore, using a space delimited fields, your times are located in field $4$5 and $11$12. As mktime converts the time to seconds since 1970-01-01 onwards, all we need to do is to check if the delta time is smaller than 30*24*3600 seconds.

awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ t1=convertTime($4$5); t2=convertTime($11$12)}
(t2-t1 < 30*3600*24) { print }' <file>
If you are not interested in the real delta time (your sed line removes the actual time of the day), than you can adopt it to :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s "00 00 00"
return mktime(s)
}
{ t1=convertTime($4); t2=convertTime($11)}
(t2-t1 < 30*3600*24) { print }' <file>
If the dates are not in the fields, you can use match to find them :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ match($0,/[0-9]{6} [0-9]{6}/);
t1=convertTime(substr($0,RSTART,RLENGTH));
a=substr($0,RSTART+RLENGTH)
match(a,/[0-9]{6} [0-9]{6}/)
t2=convertTime(substr(a,RSTART,RLENGTH))}
(t2-t1 < 30*3600*24) { print }' <file>
With some modifications, often without speed in mind, I can reduce the processing time by 50% - which is a lot:
#!/bin/bash
filename="$1"
echo "$filename"
# touch filterfile
totalline=$(wc -l < "$filename")
i=0
j=0
echo "$totalline" lines
while read -r line
do
i=$((i+1))
if (( i > ((j+9)) )); then
j=$i
echo $i
fi
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
if (( date1 > 700000 ))
then
continue
fi
d1=$(date -d "$date1" +%s)
d2=$(date -d "$date2" +%s)
diffday=$(((d2-d1)/(24*3600)))
# diffdays=$(date -d $date2 +%s) - $(date -d $date1 +%s))/(24*3600)
if (( diffday < 30 ))
then
echo "$line" >> filterfile
fi
done < "$filename"
Some remarks:
# touch filterfile
Well - the later CMD >> filterfile overwrites this file and creates one, if it doesn't exist.
totalline=$(wc -l < "$filename")
You don't need awk, here. The filename output is surpressed if wc doesn't see the filename.
Capturing the output in an array:
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
allows us array access and saves another call to awk.
On my machine, your code took about 42s for 2880 lines (on your machine 2880 s?) and about 19s for the same file with my code.
So I suspect, if you aren't running it on an i486-machine, that cygwin might be a slowdown. It's a linux environment for windows, isn't it? Well, I'm on a core Linux system. Maybe you try the gnu-utils for Windows - the last time I looked for them, they were advertised as gnu-utils x32 or something, maybe there is an a64-version available by now.
And the next thing I would have a look at, is the date calculation - that might be a slowdown too.
2880 lines isn't that much, so I don't suspect that my SDD drive plays a huge role in the game.

To print past 2years weekends dates using shell scripting

I want the code in Bash scripting
"It should print the dates in the below manner
From : 2015-October-03 2015-October-04(in the next line again it should print)
2015-October-10 2015-October-11
" "
" "
To :2017-October-21 2017-October-22
2017-October-28 2017-October-29
So, this should print all the months from the 2015-till date weekend dates in the above format only. please help me at the earliest
The following is the solution for your query.
Solution:-
#!/bin/bash
Date_Diff_Count=` echo $[$[$(date +%s)-$(date -d "2015-01-01" +%s)]/60/60/24] `
for i in ` seq -$Date_Diff_Count 0 `
do
VALUE=`date -d "+$i day" | egrep -i "Sat|Sun" | awk -F" " '{print $2" "$3" "$6}'`
[[ ! -z ${VALUE} ]] && date -d "${VALUE}" +%Y-%B-%d
done > sample.txt
paste -d " " - - < sample.txt
Output
2015-January-03 2015-January-04
2015-January-10 2015-January-11
2015-January-17 2015-January-18
2015-January-24 2015-January-25
2015-January-31 2015-February-01
...
2016-May-07 2016-May-08
2016-May-14 2016-May-15
2016-May-21 2016-May-22
2016-May-28 2016-May-29
...
2017-October-07 2017-October-08
2017-October-14 2017-October-15
2017-October-21 2017-October-22
2017-October-28 2017-October-29
Explanation
Date_Diff_Count is the variable i.e. getting number of days by
subtracting the start date from the current date. Based on your wish
you can edit the start date.
For loop is starting from -Date_Diff_Count to 0 for Ex: if
Date_Diff_Count is 500, for loop sequence starts from -500 to 0.
Value is where we are fetching only year,month and date after doing pipe on the output of date and egrep command.
if value is not zero then we are converting date into the format YYYY-month-DD
Final output will be saved in sample.txt file
Final paste command is to merge 2 consecutive lines into a single line. If you want to merge 3 lines then use paste -d " " - - -
d is delimiter to separate the merged lines. You can use any other operators based on your requirements.

How to compare a field of a file with current timestamp and print the greater and lesser data?

How do I compare current timestamp and a field of a file and print the matched and unmatched data. I have 2 columns in a file (see below)
oac.bat 09:09
klm.txt 9:00
I want to compare the timestamp(2nd column) with current time say suppose(10:00) and print the output as follows.
At 10:00
greater.txt
xyz.txt 10:32
mnp.csv 23:54
Lesser.txt
oac.bat 09:09
klm.txt 9:00
Could anyone help me on this please ?
I used awk $0 > "10:00", which gives me only 2nd column details but I want both the column details and I am taking timestamp from system directly from system with a variable like
d=`date +%H:%M`
With GNU awk you can just use it's builtin time functions:
awk 'BEGIN{now = strftime("%H:%M")} {
split($NF,t,/:/)
cur=sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
With other awks just set now using -v and date up front, e.g.:
awk -v now="$(date +"%H:%M")" '{
split($NF,t,/:/)
cur = sprintf("%02d:%02d",t[1],t[2])
print > ((cur > now ? "greater" : "lesser") ".txt")
}' file
The above is untested since you didn't provide input/output we could test against.
Pure Bash
The script can be implemented in pure Bash with the help of date command:
# Current Unix timestamp
let cmp_seconds=$(date +%s)
# Read file line by line
while IFS= read -r line; do
let line_seconds=$(date -d "${line##* }" +%s) || continue
(( line_seconds <= cmp_seconds )) && \
outfile=lesser || outfile=greater
# Append the line to the file chosen above
printf "%s\n" "$line" >> "${outfile}.txt"
done < file
In this script, ${line##* } removes the longest match of '* ' (any character followed by a space) pattern from the front of $line thus fetching the last column (the time). The time column is supposed to be in one of the following formats: HH:MM, or H:MM. Actually, date's -d option argument
can be in almost any common format. It can contain month names, time zones, ‘am’ and ‘pm’, ‘yesterday’, etc.
We use the flexibility of this option to convert the time (HH:MM, or H:MM) to Unix timestamp.
The let builtin allows arithmetic to be performed on shell variables. If the last let expression fails, or evaluates to zero, let returns 1 (error code), otherwise 0 (success). Thus, if for some reason the time column is in invalid format, the iteration for such line will be skipped with the help of continue.
Perl
Here is a Perl version I have written just for fun. You may use it instead of the Bash version, if you like.
# For current date
#cmp_seconds=$(date +%s)
# For specific hours and minutes
cmp_seconds=$(date -d '10:05' +%s)
perl -e '
my #t = localtime('$cmp_seconds');
my $minutes = $t[2] * 60 + $t[1];
while (<>) {
/ (\d?\d):(\d\d)$/ or next;
my $fh = ($1 * 60 + $2) > $minutes ? STDOUT : STDERR;
printf $fh "%s", $_;
}' < file >greater.txt 2>lesser.txt
The script computes the number of minutes in the following way:
HH:MM = HH * 60 + MM minutes
If the number of minutes from the file are greater then the number of minutes for the current time, it prints the next line to the standard output, otherwise to standard error. Finally, the standard output is redirected to greater.txt, and the standard error is redirected to lesser.txt.
I have written this script for demonstration of another approach (algorithm), which can be implemented in different languages, including Bash.

Sorting and printing a file in bash UNIX

I have a file with a bunch of paths that look like so:
7 /usr/file1564
7 /usr/file2212
6 /usr/file3542
I am trying to use sort to pull out and print the path(s) with the most occurrences. Here it what I have so far:
cat temp| sort | uniq -c | sort -rk1 > temp
I am unsure how to only print the highest occurrences. I also want my output to be printed like this:
7 1564
7 2212
7 being the total number of occurrences and the other numbers being the file numbers at the end of the name. I am rather new to bash scripting so any help would be greatly appreciated!
To emit only the first line of output (with the highest number, since you're doing a reverse numeric sort immediately prior), pipe through head -n1.
To remove all content which is not either a number or whitespace, pipe through tr -cd '0-9[:space:]'.
To filter for only the values with the highest number, allowing there to be more than one:
{
read firstnum name && printf '%s\t%s\n' "$firstnum" "$name"
while read -r num name; do
[[ $num = $firstnum ]] || break
printf '%s\t%s\n' "$num" "$name"
done
} < temp
If you want to avoid sort and you are allowed to use awk, then you can do this:
awk '{
if($1>maxcnt) {s=$1" "substr($2,10,4); maxcnt=$1} else
if($1==maxcnt) {s=s "\n"$1" "substr($2,10,4)}} END{print s}' \
temp

Convert a given time to seconds in solaris

I have a time like 2013-04-29 08:17:58. And i need to convert it to seconds since epoch time.
No perl please and my OS is solaris. +%s does not work. nawk 'BEGIN{print srand()}' converts the current time to seconds but does not convert a given time to seconds.
Thanks
Here is a shell function that doesn't require perl:
function d2ts
{
typeset d=$(echo "$#" | tr -d ':- ' | sed 's/..$/.&/')
typeset t=$(mktemp) || return -1
typeset s=$(touch -t $d $t 2>&1) || { rm $t ; return -1 ; }
[ -n "$s" ] && { rm $t ; return -1 ; }
truss -f -v 'lstat,lstat64' ls -d $t 2>&1 | nawk '/mt =/ {printf "%d\n",$10}'
rm $t
}
$ d2ts 2013-04-29 08:17:58
1367216278
Note that the returned value depends on your timezone.
$ TZ=GMT d2ts 2013-04-29 08:17:58
1367223478
How it works:
The first line converts the parameters to a format suitable for touch (here "2013-04-29
08:17:58" -> "201304290817.58" )
The second line creates a temporary file
The third line change the modification time of the just created file to the required value
The fourth line aborts the function if setting the time failed, i.e. if the provided time is invalid
The fifth line traces the ls command to get the file modification time and prints it as an integer
The sixth line removes the temporary file
In C/C++ on UNIX, you can convert directly:
struct tm tm;
time_t t;
tm.tm_isdst = -1;
if (strptime("2013-04-29 08:17:58", "%Y-%m-%d %H:%M:%S", &tm) != NULL) {
t = mktime(tm);
}
cout << "seconds since epoch: " << t;
See Opengroup manpage strptime() for the example.
I admire the strace/touch niftyness and the creativity behind it. Though, well, just don't do this in a tight loop ...

Resources