Fill the missing dates using awk - bash

I have some missing dates in a file. e.g.
$cat ifile.txt
20060805
20060807
20060808
20060809
20060810
20060813
20060815
20060829
20060901
20060903
20060904
20060905
20070712
20070713
20070716
20070717
The dates are in the format YYYYMMDD. My intention is fill the missing dates in between the dates if they are missing maximum for 5 day e.g.
20060805
20060806 ---- This was missed
20060807
20060808
20060809
20060810
20060811 ----- This was missed
20060812 ----- This was missed
20060813
20060814 ----- This was missed
20060815
20060829
20060830 ------ This was missed
20060831 ------ This was missed
20060901
20060902 ------ This was missed
20060903
20060904
20060905
20070712
20070713
20070714 ----- This was missed
20070715 ----- This was missed
20070716
20070717
Other dates are not needed where there is a gap of more than 5 days. For example, I don't need to fill the dates between 20060815 and 20060829, because the gap between them is more than 5 days.
I am doing it in following ways, but don't get anything.
#!/bin/sh
awk BEGIN'{
a[NR]=$1
} {
for(i=1; i<NR; i++)
if ((a[NR+1]-a[NR]) <= 5)
for (j=1; j<(a[NR+1]-a[NR]); j++)
print a[j]
}' ifile.txt
Desired output:
20060805
20060806
20060807
20060808
20060809
20060810
20060811
20060812
20060813
20060814
20060815
20060829
20060830
20060831
20060901
20060902
20060903
20060904
20060905
20070712
20070713
20070714
20070715
20070716
20070717

Could you please try following, written and tested with shown samples in GNU awk.
awk '
FNR==1{
print
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
next
}
{
found=i=diff=""
curr_time=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
diff=(curr_time-prev)/86400
if(diff>1){
while(++i<=diff){ print strftime("%Y%m%d", prev+86400*i) }
found=1
}
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
}
!found
' Input_file

The following seems to work:
stringtodate() {
echo "${1:0:4}-${1:4:2}-${1:6:2} 12:00:00"
}
datetoseconds() {
LC_ALL=C date -d "$(stringtodate "$1")" +%s
}
secondstodate() {
LC_ALL=C date -d "#$1" +%Y%m%d
}
outputdatesbetween() {
local start=$1
local stop=$2
for ((i = $1; i < $2; i += 3600*24)); do
secondstodate "$i"
done
}
prev=
while IFS= read -r line; do
now=$(datetoseconds "$line")
if [[ -n "$prev" ]] &&
((
now - prev > 3600 * 24 &&
now - prev < 3600 * 24 * 5
))
then
outputdatesbetween "$((prev + 3600 * 24))" "$now"
fi
echo "$line"
prev="$now"
done < 1
Tested on repl

Here is a quick GNU awk script. We use GNU awk to make use of the time-functions mktime and strftime:
awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0",1) }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i,1) }
{print; p=t}' file
Using mktime we convert the time into the total seconds since 1970. The function strftime converts it back to the desired format. Be aware that we enable the UTC-flag in both functions to ensure that we do not end up with surprises around Daylight-Saving-Time. Furthermore, since we already make use of GNU awk, we can further use the FIELDWIDTHS to determine the field lengths.
note: If your awk does not support the UTC-flag in mktime and strftime, you can run the following:
TZ=UTC awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0") }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i) }
{print; p=t}' file

Related

Extract hours from csv using bash-script

I have csv file having below details, want to fetch the details based on time like
if hour is in betwwen 10 and 18 it should print be as morning, and rest of lines print night.
time,id
2022-08-01T00:09:14+09:00,PKA990
2022-08-01T06:48:24+09:00,PKA990
2022-08-01T08:27:23+09:00,
2022-08-01T11:04:18+09:00,ABCD890
2022-08-01T11:23:22+09:00,ABCD890
2022-08-01T11:30:14+09:00,
2022-08-01T12:01:12+09:00,ABCD890
2022-08-01T15:11:59+09:00,JIKOPL8
2022-08-01T18:20:53+09:00,TUVNDGD
im expecting output be like
time,id,session
2022-08-01T00:09:14+09:00,PKA990,night
2022-08-01T06:48:24+09:00,PKA990,night
2022-08-01T08:27:23+09:00,LOADING,night
2022-08-01T11:04:18+09:00,ABCD890,morning
2022-08-01T11:23:22+09:00,ABCD890,morning
2022-08-01T11:30:14+09:00,LOADING,morning
2022-08-01T12:01:12+09:00,ABCD890,morning
2022-08-01T15:11:59+09:00,JIKOPL8,morning
2022-08-01T18:20:53+09:00,TUVNDGD,night
Please suggest.
Sorry i haved edited some rows .... when ever there is an blank it should fill with "LOADING"
apologize
Here an awk solution:
#! /bin/bash
LC_ALL=en_US awk '
BEGIN {
FS=OFS=","
}
{
$1 = $1
}
NR == 1 {
print $0, "session", "day_of_week"
next
}
$2 == "" {
$2 = "LOADING"
}
{
date_timestamp=sprintf("%04d %02d %02d 00 00 00", substr($1, 1, 4), substr($1, 6, 2), substr($1, 9, 2))
day_of_week=tolower(strftime("%A", mktime(date_timestamp)))
if ($1 ~ /^[-0-9]+T(10|11|12|13|14|15|16|17):/) {
session="morning"
} else {
session="night"
}
}
{
print $0, session, day_of_week
}
' <"datetime.csv"
Output:
time,id,session,day_of_week
2022-08-01T00:09:14+09:00,PKA990,night,monday
2022-08-01T06:48:24+09:00,PKA990,night,monday
2022-08-01T08:27:23+09:00,LOADING,night,monday
2022-08-01T11:04:18+09:00,ABCD890,morning,monday
2022-08-01T11:23:22+09:00,ABCD890,morning,monday
2022-08-01T11:30:14+09:00,LOADING,morning,monday
2022-08-01T12:01:12+09:00,ABCD890,morning,monday
2022-08-01T15:11:59+09:00,JIKOPL8,morning,monday
2022-08-01T18:20:53+09:00,TUVNDGD,night,monday
A pure bash solution:
#! /bin/bash
INPUT_FILENAME="datetime.csv"
FLAG_FIRST=1
FS=","
OFS=","
while read -r LINE; do
if [[ ${FLAG_FIRST} -eq 1 ]]; then
printf "%s%s%s%s%s\n" "${LINE//${FS}/${OFS}}" "${OFS}" "session" "${OFS}" "day_of_week"
FLAG_FIRST=0
continue
fi
# Ignore empty lines
[[ -z "${LINE}" ]] && continue
# If LINE ends with coma (assume id field is empty):
# put "LOADING" token at end of line
[[ "${LINE}" =~ ${FS}$ ]] && LINE+="LOADING"
# LC_ALL=en_US to obtains day of week in english (I'am french)
# Use ${LINE%%T*} to use date field without hour and timezone
DAY_OF_WEEK=$(LC_ALL=en_US date +%A --date "${LINE%%T*}")
# In english, day of week begins with an uppercase letter:
# use a coma after variable name to put it in lowercase
if [[ "${LINE}" =~ ^[-0-9]+T(10|11|12|13|14|15|16|17): ]]; then
SESSION="morning"
else
SESSION="night"
fi
printf "%s%s%s%s%s\n" "${LINE//${FS}/${OFS}}" "${OFS}" "${SESSION}" "${OFS}" "${DAY_OF_WEEK,}"
done < <(cat "${INPUT_FILENAME}"; echo)
Output:
time,id,session,day_of_week
2022-08-01T00:09:14+09:00,PKA990,night,monday
2022-08-01T06:48:24+09:00,PKA990,night,monday
2022-08-01T08:27:23+09:00,LOADING,night,monday
2022-08-01T11:04:18+09:00,ABCD890,morning,monday
2022-08-01T11:23:22+09:00,ABCD890,morning,monday
2022-08-01T11:30:14+09:00,LOADING,morning,monday
2022-08-01T12:01:12+09:00,ABCD890,morning,monday
2022-08-01T15:11:59+09:00,JIKOPL8,morning,monday
2022-08-01T18:20:53+09:00,TUVNDGD,night,monday

how to print a result data in same the line in bash command?

I have my command below and I want to have the result in the same line with delimeters. My command:
Array=("GET" "POST" "OPTIONS" "HEAD")
echo $(date "+%Y-%m-%d %H:%M")
for i in "${Array[#]}"
do
cat /home/log/myfile_log | grep "$(date "+%d/%b/%Y:%H")"| awk -v last5=$(date --date="-5 min" "+%M") -F':' '$3>=last5 && $3<last5+5{print}' | egrep -a "$i" | wc -l
done
Results is:
2019-01-01 13:27
1651
5760
0
0
I want to have the result below:
2019-01-01 13:27,1651,5760,0,0
It looks (to me) like the overall objective is to scan /home/log/myfile.log for entries that have occurred within the last 5 minutes and which match one of the 4 entries in ${Array[#]}, keeping count of the matches along the way and finally printing the current date and the counts to a single line of output.
I've opted for a complete rewrite that uses awk's abilities of pattern matching, keeping counts and generating a single line of output:
date1=$(date "+%Y-%m-%d %H:%M") # current date
date5=$(date --date="-5 min" "+%M") # date from 5 minutes ago
awk -v d1="${date1}" -v d5="${date5}" -F":" '
BEGIN { keep=0 # init some variables
g=0
p=0
o=0
h=0
}
$3>=d5 && $3<d5+5 { keep=1 } # do we keep processing this line?
!keep { next } # if not then skip to next line
/GET/ { g++ } # increment our counters
/POST/ { p++ }
/OPTIONS/ { o++ }
/HEAD/ { h++ }
{ keep=0 } # reset keep flag for next line
# print results to single line of output
END { printf "%s,%s,%s,%s,%s\n", d1, g, p, o, h }
' <(grep "$(date '+%d/%b/%Y:%H')" /home/log/myfile_log)
NOTE: The OP may need to revisit the <(grep "$(date ...)" /home/log/myfile.log) to handle timestamp periods that span hours, days, months and years, eg, 14:59 - 16:04, 12/31/2019 23:59 - 01/01/2020 00:04, etc.
Yeah, it's a bit verbose but a bit easier to understand; OP can rewrite/reduce as sees fit.

Aggregating csv file in bash script

I have csv file with multiple lines. Each line has the same number of columns. What I need to do is to group those lines by a few specified columns and aggregate data from other columns. Example of input file:
proces1,pathA,5-May-2011,10-Sep-2017,5
proces2,pathB,6-Jun-2014,7-Jun-2015,2
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces1,pathA,11-Sep-2017,15-Oct-2017,2
For above example I need to group lines by first two columns. From 3rd column I need to choose the min value, for 4th column max value, and 5th column should have the sum. So, for such input file I need output:
proces1,pathA,5-May-2011,15-Oct-2017,7
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces2,pathB,6-Jun-2014,7-Jun-2015,2
I need to process it in bash (I can use awk or sed as well).
With bash and sort:
#!/bin/bash
# create associative arrays
declare -A month2num=([Jan]=1 [Feb]=2 [Mar]=3 [Apr]=4 [May]=5 [Jun]=6 [Jul]=7 [Aug]=8 [Sep]=9 [Oct]=10 [Nov]=11 [Dec]=12])
declare -A p ds de # date start and date end
declare -A -i sum # set integer attribute
# function to convert 5-Jun-2011 to 20110605
date2num() { local d m y; IFS="-" read -r d m y <<< "$1"; printf "%d%.2d%.2d\n" $y ${month2num[$m]} $d; }
# read all columns to variables p1 p2 d1 d2 s
while IFS="," read -r p1 p2 d1 d2 s; do
# if associative array is still empty for this entry
# fill with current strings/value
if [[ -z ${p[$p1,$p2]} ]]; then
p[$p1,$p2]="$p1,$p2"
ds[$p1,$p2]="$d1"
de[$p1,$p2]="$d2"
sum[$p1,$p2]="$s"
continue
fi
# compare strings, set new strings and sum value
if [[ ${p[$p1,$p2]} == "$p1,$p2" ]]; then
[[ $(date2num "$d1") < $(date2num ${ds[$p1,$p2]}) ]] && ds[$p1,$p2]="$d1"
[[ $(date2num "$d2") > $(date2num ${de[$p1,$p2]}) ]] && de[$p1,$p2]="$d2"
sum[$p1,$p2]=sum[$p1,$p2]+s
fi
done < file
# print content of all associative arrays with key vom associative array p
for i in "${!p[#]}"; do echo "${p[$i]},${ds[$i]},${de[$i]},${sum[$i]}"; done
Usage: ./script.sh | sort
Output to stdout:
proces1,pathA,5-May-2011,15-Oct-2017,7
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces2,pathB,6-Jun-2014,7-Jun-2015,2
See: help declare, help read and of course man bash
With awk + sort
awk -F',|-' '
BEGIN{
A["Jan"]="01"
A["Feb"]="02"
A["Mar"]="03"
A["Apr"]="04"
A["May"]="05"
A["Jun"]="06"
A["July"]="07"
A["Aug"]="08"
A["Sep"]="09"
A["Oct"]="10"
A["Nov"]="11"
A["Dec"]="12"
}
{
B[$1","$2]=B[$1","$2]+$9
z=sprintf( "%.2d",$3)
y=sprintf("%s",$5 A[$4] z)
if(!start[$1$2])
{
end[$1$2]=0
start[$1$2]=99999999
}
if (y < start[$1$2])
{
start[$1$2]=y
C[$1","$2]=$3"-"$4"-"$5
}
x=sprintf( "%.2d",$6)
w=sprintf("%s",$8 A[$7] x)
if(w > end[$1$2] )
{
end[$1$2]=w
D[$1","$2]=$6"-"$7"-"$8
}
}
END{
for (i in B)print i "," C[i] "," D[i] "," B[i]
}
' infile | sort
Extended GNU awk solution:
awk -F, 'function parse_date(d_str){
split(d_str, d, "-");
t = mktime(sprintf("%d %d %d 00 00 00", d[3], m[d[2]], d[1]));
return t
}
BEGIN{ m["Jan"]=1; m["Feb"]=2; m["Mar"]=3; m["Apr"]=4; m["May"]=5; m["Jun"]=6;
m["Jul"]=7; m["Aug"]=8; m["Sep"]=9; m["Oct"]=10; m["Nov"]=11; m["Dec"]=12;
}
{
k=$1 SUBSEP $2;
if (k in a){
if (parse_date(a[k]["min"]) > parse_date($3)) { a[k]["min"]=$3 }
if (parse_date(a[k]["max"]) < parse_date($4)) { a[k]["max"]=$4 }
} else {
a[k]["min"]=$3; a[k]["max"]=$4
}
a[k]["sum"]+= $5
}
END{
for (i in a) {
split(i, j, SUBSEP);
print j[1], j[2], a[i]["min"], a[i]["max"], a[i]["sum"]
}
}' OFS=',' file
The output:
proces1,pathA,5-May-2011,15-Oct-2017,7
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces2,pathB,6-Jun-2014,7-Jun-2015,2

bash routine to return the page number of a given line number from text file

Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed

Converting Dates in Shell

How can I convert one date format to another format in a shellscript?
Example:
the old format is
MM-DD-YY HH:MM
but I want to convert it into
YYYYMMDD.HHMM
Like "20${D:6:2}${D:0:2}${D:3:2}.${D:9:2}${D:12:2}00", if the old date in the $D variable.
Take advantage of the shell's word splitting and the positional parameters:
date="12-31-11 23:59"
IFS=" -:"
set -- $date
echo "20$3$1$2.$4$5" #=> 20111231.2359
myDate="21-12-11 23:59"
#fmt is DD-MM-YY HH:MM
outDate="20${myDate:6:2}${myDate:3:2}${myDate:0:2}.${myDate:9:2}${myDate:12:2}00"
case "${outDate}" in
2[0-9][0-9][0-9][0-1][0-9][0-3][0-9].[0-2][0-9][0-5][[0-9][0-5][[0-9] )
: nothing_date_in_correct_format
;;
* ) echo bad format for ${outDate} >&2
;;
esac
Note that if you have a large file to process, then the above is an expensive(ish) process. For filebased data I would recommend something like
cat infile
....|....|21-12-11 23:59|22-12-11 00:01| ...|
awk '
function reformatDate(inDate) {
if (inDate !~ /[0-3][0-9]-[0-1][0-9]-[0-9][0-9] [0-2][0-9]:[0-5][[0-9]/) {
print "bad date format found in inDate= "inDate
return -1
}
# in format assumed to be DD-MM-YY HH:MM(:SS)
return (2000 + substr(inDate,7,2) ) substr(inDate,4,2) substr(inDate, 1,2) \
"." substr(inDate,10,2) substr(inDate,13,2) \
( substr(inDate,16,2) ? substr(inDate,16,2) : "00" )
}
BEGIN {
#add or comment out for each column of data that is a date value to convert
# below is for example, edit as needed.
dateCols[3]=3
dateCols[4]=4
# for awk people, I call this the pragmatic use of associative arrays ;-)
#assuming pipe-delimited data for columns
#....|....|21-12-11 23:59|22-12-11 00:01| ...|
FS=OFS="|"
}
# main loop for each record
{
for (i=1; i<=NF; i++) {
if (i in dateCols) {
#dbg print "i=" i "\t$i=" $i
$i=reformatDate($i)
}
}
print $0
}' infile
output
....|....|20111221.235900|20111222.000100| ...|
I hope this helps.
There is a good answer down already, but you said you wanted an alternative in the comments, so here is my [rather awful in comparison] method:
read sourcedate < <(echo "12-13-99 23:59");
read sourceyear < <(echo $sourcedate | cut -c 7-8);
if [[ $sourceyear < 50 ]]; then
read fullsourceyear < <(echo -n 20; echo $sourceyear);
else
read fullsourceyear < <(echo -n 19; echo $sourceyear);
fi;
read newsourcedate < <(echo -n $fullsourceyear; echo -n "-"; echo -n $sourcedate | cut -c -5);
read newsourcedate < <(echo -n $newsourcedate; echo -n $sourcedate | cut -c 9-14);
read newsourcedate < <(echo -n $newsourcedate; echo :00);
date --date="$newsourcedate" +%Y%m%d.%H%M%S
So, the first line just reads a date in, then we get the two-digit year, then we append it to '20' or '19' based on if it's less than 50 (so this would give you years from 1950 to 2049 - feel free to shift the line). Then we append a hyphen and the month and date. Then we append a space and the time, and lastly we append ':00' as the seconds (again feel free to make your own default). Lastly we use GNU date to read it in (since it's been standardized now) and print it in a different format (which you can edit).
It's a lot longer and uglier than cutting up the string, but having the format in the last line may be worth it. Also you could shorten it significantly with the shorthand you just learned in the first answer.
Good luck.

Resources