Converting date to unix epoch using awk in log files - bash

I have file containing multiple lines in format "[dd.mm.yyyy.] text value". I need to convert this to "Unix epoch| text value". I tried to use awk to do this but I can't seem to find the correct command
For example, if the file is:
[30.08.2013 13:54:49.126] Foo
[30.08.2013 13:56:49.126] Bar
[30.08.2013 13:59:49.126] Foo bar
I use the following (probably too complex awk command):
cat sample.txt | cut -c 2- |awk -F'[. :]' ' { $cmd="date --date " "\""$3$2$1" "$4":"$5":"$6"\""" +%s" ; $cmd |& getline epoch; close($cmd); printf epoch"|"; print $0 ;}';
The problem is that I get the time in epoch correctly but I can't access the rest of the line. The $0 (and other $ variables) contain the date command. So the output is
1377863689|date --date "20130830 13:54:49" +%s
1377863809|date --date "20130830 13:56:49" +%s
1377863989|date --date "20130830 13:59:49" +%s
What I wish to get is
1377863689|Foo
1377863809|Bar
1377863989|Foo bar
Is there a (preferably simple) way of accomplishing this? Should I use some other tool?

Assuming you have gawk (fair assumption since you are using GNU date) you can do this all internally to gawk:
$ awk 'match($0, /\[(.*)\] (.*)/, a) &&
match(a[1], /([0-9]{2})\.([0-9]{2})\.([0-9]{4}) ([0-9:]+)(\.[0-9]+)/,b) {
gsub(/:/," ",b[4])
s=b[3] " " b[2] " " b[1] " " b[4]
print mktime(s) "|" a[2]
}' file
1377896089|Foo
1377896209|Bar
1377896389|Foo bar
Or, a Bash solution:
while IFS= read -r line; do
if [[ "$line" =~ \[([[:digit:]]{2})\.([[:digit:]]{2})\.([[:digit:]]{4})\ +([[:digit:]:]+)\.([[:digit:]]+)\]\ +(.*) ]]
then
printf "%s|%s\n" $(gdate +"%s" --date="${BASH_REMATCH[3]}${BASH_REMATCH[2]}${BASH_REMATCH[1]} ${BASH_REMATCH[4]}") "${BASH_REMATCH[6]}"
fi
done <file

I propose to simplify it to
IFS=' |.|[';
while read -r _ day month year hour _ name; do
date=$(date --date "$year$month$day $hour" +%s);
echo "$date|$name";
done < sample.txt
Or, if you prefer to continue with awk
awk -F'[\\[\\]. ]' '{
split($0,a,"] ")
("date --date \"" $4$3$2" "$5"\" +%s") |& getline date
printf "%s|%s\n",date,a[2]
}' sample.txt

Related

Shell script - is there a faster way to write date/time per second between start and end time?

I have this script (which works fine) that will write all the date/time per second, from a start date/time till an end date/time to a file
while read line; do
FIRST_TIMESTAMP="20230109-05:00:01" #this is normally a variable that changes with each $line
LAST_TIMESTAMP="20230112-07:00:00" #this is normally a variable that changes with each $line
date=$FIRST_TIMESTAMP
while [[ $date < $LAST_TIMESTAMP || $date == $LAST_TIMESTAMP ]]; do
date2=$(echo $date |sed 's/ /-/g' |sed "s/^/'/g" |sed "s/$/', /g")
echo "$date2" >> "OUTPUTFOLDER/output_LABELS_$line"
date=$(date -d "$date +1 sec" +"%Y%m%d %H:%M:%S")
done
done < external_file
However this sometimes needs to run 10 times, and the start date/time and end date/time sometimes lies days apart.
Which makes the script take a long time to write all that data.
Now I am wondering if there is a faster way to do this.
Avoid using a separate date call for each date. In the next example I added a safety parameter maxloop, avoiding loosing resources when the dates are wrong.
#!/bin/bash
awkdates() {
maxloop=1000000
awk \
-v startdate="${first_timestamp:0:4} ${first_timestamp:4:2} ${first_timestamp:6:2} ${first_timestamp:9:2} ${first_timestamp:12:2} ${first_timestamp:15:2}" \
-v enddate="${last_timestamp:0:4} ${last_timestamp:4:2} ${last_timestamp:6:2} ${last_timestamp:9:2} ${last_timestamp:12:2} ${last_timestamp:15:2}" \
-v maxloop="${maxloop}" \
'BEGIN {
T1=mktime(startdate);
T2=mktime(enddate);
linenr=1;
while (T1 <= T2) {
printf("%s\n", strftime("%Y%m%d %H:%M:%S",T1));
T1+=1;
if (linenr++ > maxloop) break;
}
}'
}
mkdir -p OUTPUTFOLDER
while IFS= read -r line; do
first_timestamp="20230109-05:00:01" #this is normally a variable that changes with each $line
last_timestamp="20230112-07:00:00" #this is normally a variable that changes with each $line
awkdates >> "OUTPUTFOLDER/output_LABELS_$line"
done < <(printf "%s\n" "line1" "line2")
Using epoch time (+%s and #) with GNU date and GNU seq to
produce datetimes in ISO 8601 date format:
begin=$(date -ud '2023-01-12T00:00:00' +%s)
end=$(date -ud '2023-01-12T00:00:12' +%s)
seq -f "#%.0f" "$begin" 1 "$end" |
date -uf - -Isec
2023-01-12T00:00:00+00:00
2023-01-12T00:00:01+00:00
2023-01-12T00:00:02+00:00
2023-01-12T00:00:03+00:00
2023-01-12T00:00:04+00:00
2023-01-12T00:00:05+00:00
2023-01-12T00:00:06+00:00
2023-01-12T00:00:07+00:00
2023-01-12T00:00:08+00:00
2023-01-12T00:00:09+00:00
2023-01-12T00:00:10+00:00
2023-01-12T00:00:11+00:00
2023-01-12T00:00:12+00:00
if you're using macOS/BSD's date utility instead of the gnu one, the equivalent command to parse would be :
(bsd)date -uj -f '%FT%T' '2023-01-12T23:34:45' +%s
1673566485
...and the reverse process is using -r flag instead of -d, sans "#" prefix :
(bsd)date -uj -r '1673566485' -Iseconds
2023-01-12T23:34:45+00:00
(gnu)date -u -d '#1673566485' -Iseconds
2023-01-12T23:34:45+00:00

How do I write an output from bash script to the second line of the CSV file that contains headers?

I have a bash script that writes its output to the beginning of a CSV file. I need it to maintain the headers on the first line. I tried to use awk and sed but didn't succeed.
I got the main script which is used to make a SSH connection:
for n in $(cat list.txt)
do
ssh -t root#$n /etc/m_chkdsk_app.sh
done
list.txt contains servers names
server1
server2
server3
server4
and the run the following script on the remote computers
if [ -f /lnxfiler/diskstatus/m_chkdsk.csv ]
then
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && df -h | grep /dev/mapper/rootvg-var | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/New_m_chkdsk.csv
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/New_m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/New_m_chkdsk.csv && df -h | grep "/dev/mapper/rootvg-sap " | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/New_m_chkdsk.csv
cat /lnxfiler/diskstatus/m_chkdsk.csv >> /lnxfiler/diskstatus/New_m_chkdsk.csv
mv /lnxfiler/diskstatus/New_m_chkdsk.csv /lnxfiler/diskstatus/m_chkdsk.csv
else
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/m_chkdsk.csv && df -h | grep /dev/mapper/rootvg-var | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/m_chkdsk.csv
printf "$(cat /proc/sys/kernel/hostname)" >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "," >> /lnxfiler/diskstatus/m_chkdsk.csv && printf "$(date +%d-%m-%Y)" >> /lnxfiler/diskstatus/m_chkdsk.csv && df -h | grep "/dev/mapper/rootvg-sap " | awk '{printf "," $2 "," $3 "," $5 "," $6 "\n"}' >> /lnxfiler/diskstatus/m_chkdsk.csv
fi
exit
When I run the main script, I need all the output of the script to be added after the header.
Server Name,Date,Disk Size,Used,Use%,Mounted on
server1,08-09-2020,2.0G,363M,20%,/var
server1,08-09-2020,15G,41M,1%,/usr/sap
server1,08-09-2020,200G,237M,1%,/suse_manager
server2,08-09-2020,2.0G,138M,8%,/var
server2,08-09-2020,20G,6.6G,36%,/srv
server2,08-09-2020,80G,6.7G,9%,/srv/NFS
server3,08-09-2020,2.0G,363M,20%,/var
server3,08-09-2020,15G,41M,1%,/usr/sap
server4,08-09-2020,2.0G,138M,8%,/var
server4,08-09-2020,20G,6.6G,36%,/srv
server4,08-09-2020,80G,6.7G,9%,/srv/NFS
Here's a quick refactoring.
Driver script; don't read lines with for:
head -n 1 result.csv >newresult.csv
while IFS= read -r host; do
do
ssh -t "root#$host" /etc/m_chkdsk_app.sh </dev/null
done < list.txt >>newresult.csv
mv newresult.csv result.csv
Remote script:
df -h /dev/mapper/rootvg-var /dev/mapper/rootvg-sap |
awk -v date=$(date +%d-%m-%Y) 'BEGIN { OFS="," }
NR==FNR { host=$0; next }
/\/dev/ { print host, date, $2, $3, $5, $6 }
' /proc/sys/kernel/hostname - |
tee /lnxfiler/diskstatus/m_chkdsk.csv
The original script had tremendous amounts of repetition but it's of course possible that I have overlooked some crucial difference between almost identical code snippets. That's actually one of the reasons to avoid repeating yourself.
Selectively overwriting the old results on the remote server in slightly different fashion depending on whether the file already exists seemed entirely superfluous, so I took that out.
This assumes that you have old results in result.csv and that the first header line is so incredibly hard to get right that you have to copy it from the old file. It would probably be easier to just hard-code the script to write a new first line.
Depending on how robust you need this to be, maybe actually add set -e to the start of both scripts. If you don't absolutely have to store the results on the remote disk as well, that would cut out one of the main failure scenarios, and simpli|y the script still more.
If you have a file1.csv with data already in it, you can write all your new data to a tempfile.csv and then import it with an easy sed -
sed -i '1rtempfile.csv' file1.csv
This will read and add in the content of tempfile immediately after line 1.

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

Parsing date and time format - Bash

I have date and time format like this(yearmonthday):
20141105 11:30:00
I need assignment year, month, day, hour and minute values to variable.
I can do it year, day and hour like this:
year=$(awk '{print $1}' log.log | sed 's/^\(....\).*/\1/')
day=$(awk '{print $1}' log.log | sed 's/^.*\(..\).*/\1/')
hour=$(awk '{print $2}' log.log | sed 's/^\(..\).*/\1/')
How can I do this for month and minute?
--
And I need that every line of my log file:
20141105 11:30:00 /bla/text.1
20141105 11:35:00 /bla/text.2
20141105 11:40:00 /bla/text.3
....
I'm trying read line by line this log file and do this:
mkdir -p "/bla/backup/$year/$month/$day/$hour/$minute"
mv $file "/bla/backup/$year/$month/$day/$hour/$minute"
Here is my not working code:
#!/bin/bash
LOG=/var/log/LOG
while read line
do
year=${line:0:4}
month=${line:4:2}
day=${line:6:2}
hour=${line:9:2}
minute=${line:12:2}
file=$(awk '{print $3}')
if [ -f "$file" ]; then
printf -v path "%s/%s/%s/%s/%s" $year $month $day $hour $minute
mkdir -p "/bla/backup/$path"
mv $file "/bla/backup/$path"
fi
done < $LOG
You don't need to call out to awk to date at all, use bash's substring operations
d="20141105 11:30:00"
yr=${d:0:4}
mo=${d:4:2}
dy=${d:6:2}
hr=${d:9:2}
mi=${d:12:2}
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" $yr $mo $dy $hr $mi
echo "$dir"
/bla/2014/11/05/11/30/
Or directly, without all the variables.
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" ${d:0:4} ${d:4:2} ${d:6:2} ${d:9:2} ${d:12:2}
Given your log file:
while read -r date time file; do
d="$date $time"
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" ${d:0:4} ${d:4:2} ${d:6:2} ${d:9:2} ${d:12:2}
mkdir -p "$dir"
mv "$file" "$dir"
done < filename
or, making a big assumption that there are no whitespace or globbing characters in your filenames:
sed -r 's#(....)(..)(..) (..):(..):.. (.*)#mv \6 /blah/\1/\2/\3/\4/\5#' | sh
date command also do this work
#!/bin/bash
year=$(date +'%Y' -d'20141105 11:30:00')
day=$(date +'%d' -d'20141105 11:30:00')
month=$(date +'%m' -d'20141105 11:30:00')
minutes=$(date +'%M' -d'20141105 11:30:00')
echo "$year---$day---$month---$minutes"
You can use only one awk
month=$(awk '{print substr($1,5,2)}' log.log)
year=$(awk '{print substr($1,0,4)}' log.log)
minute=$(awk '{print substr($2,4,2)}' log.log)
etc
I guess you are processing the log file, which each line starts with the date string. You may have already written a loop to handle each line, in your loop, you could do:
d="$(awk '{print $1,$2}' <<<"$line")"
year=$(date -d"$d" +%Y)
month=$(date -d"$d" +%m)
day=$(date -d"$d" +%d)
min=$(date -d"$d" +%M)
Don't repeat yourself.
d='20141105 11:30:00'
IFS=' ' read -r year month day min < <(date -d"$d" '+%Y %d %m %M')
echo "year: $year"
echo "month: $month"
echo "day: $day"
echo "min: $min"
The trick is to ask date to output the fields you want, separated by a character (here a space), to put this character in IFS and ask read to do the splitting for you. Like so, you're only executing date once and only spawn one subshell.
If the date comes from the first line of the file log.log, here's how you can assign it to the variable d:
IFS= read -r d < log.log
eval "$(
echo '20141105 11:30:00' \
| sed 'G;s/\(....\)\(..\)\(..\) \(..\):\(..\):\(..\) *\(.\)/Year=\1\7Month=\2\7Day=\3\7Hour=\4\7Min=\5\7Sec=\6/'
)"
pass via a assignation string to evaluate. You could easily adapt to also check the content by replacing dot per more specific pattern like [0-5][0-9] for min and sec, ...
posix version so --posix on GNU sed
I wrote a function that I usually cut and paste into my script files
function getdate()
{
local a
a=(`date "+%Y %m %d %H %M %S" | sed -e 's/ / /'`)
year=${a[0]}
month=${a[1]}
day=${a[2]}
hour=${a[3]}
minute=${a[4]}
sec=${a[5]}
}
in the script file, on a line of it's own
getdate
echo "year=$year,month=$month,day=$day,hour=$hour,minute=$minute,second=$sec"
Of course, you can modify what I provided or use answer [6] above.
The function takes no arguments.

using if in awk in comparision with todays date

I am looking for a command which helps me use if in awk and equates it to the current date.
A/B folder has files with different dates. I need to filter out files of the present day whenever script runs
A) Gives an output with all the dates,
date=`date +"%Y-%m-%d"`
s3cmd ls --recursive s3://A/B/ | grep A-B | grep .tar | awk '{ if ($1 -eq "$date" ) print $1" "$2 " " $3 " " $4 }' | sort -r
B)Replaces $1 which contains dates with "$date" to all of them
date=`date +"%Y-%m-%d"`
s3cmd ls --recursive s3://A/B/ | grep A-B | grep .tar | awk '{ if ($1 = "$date" ) print $1" "$2 " " $3 " " $4 }' | sort -r
C)Does not give any output. leaves blank
date=`date +"%Y-%m-%d"`
s3cmd ls --recursive s3://A/B/ | grep A-B | grep .tar | awk '{ if ($1 == "$date" ) print $1" "$2 " " $3 " " $4 }' | sort -r
if I remove "" it does not give me any output in all the cases.
The shell doesn't substitute variables inside double quotes. You should assign an awk variable from the shell variable. Also, the equality comparison is ==, not -eq or =.
awk -v date="$date" '$1 == date { print $1" "$2 " " $3 " " $4 }'
You don't really need awk for that. Just use find and say
find /path/to/search/ -type f ! -newermt $(date +"%Y-%m-%d")
$(..) is command substitution and what it will do is expand to current date in the format YYYY-MM-DD.
! -newermt is find option to look for files older than specified date
-type f will only look for files
awk is not shell. It is a completely separate tool with it's own syntax and capabilities. Therefore you should not expect to be able to use shell variables or shell syntax in an awk script. Try this:
s3cmd ls --recursive s3://A/B/ |
awk -v date="$date" -v OFS=" " '/A-B/ && /.tar/ && ($1 == date) { print $1, $2, $3, $4 }' |
sort -r
You probably actually meant \.tar instead of .tar though and as #jaypal said, this is a job for find, not ls piped to awk.

Resources