How to sum 2 hours to date column in csv file - bash

I have csv file consist of 2 columns, name and date in 24 hours format
Name, log_date
John, 11/29/2017 23:00
And i want to add 2 hours to log date to change date and time to be as below
John, 11/30/2017 01:00
I tried to add it by below command but with no success
awk - F 'NR>1{$4+=(2/24);}1' OFS="," IN.csv > OUT.csv
I get the below output
2017.08
in values of the log date column
So please help

You need a language that has datetime arithmetic. Perl for example:
perl -MTime::Piece -F'/,\s*/' -slane '
$datetime = Time::Piece->strptime($F[1], $fmt);
$F[1] = ($datetime + 7200)->strftime($fmt);
print join ", ", #F
' -- -fmt="%m/%d/%Y %H:%M" <<END
John, 11/29/2017 11:00
END
John, 11/29/2017 13:00
Given your input, there's no way to indicate that the time is 11 PM. How are you supposed to know that?

below is oneliner in python. This is really not a useable code, but I believe you can get idea of using one-liners. This one-liner can be made yet simpler.
python -c "s=r'John, 11/29/2017 13:00';
print(s.replace(s.split(\" \")[-1].split(\":\")[0],str(int(s.split(\" \")[-1].split(\":\")[0])+2)));";
Output
John, 11/29/2017 15:00
Yet, this will not roll over the date like if 23+2 = 25 which should suppose to be 1:00

All you're looking for is documented here.
Using space as a field separator :
{
split($2,D,"/")
split($3,H,":")
# format for mktime is "YYYY MM DD HH MM SS [DST]"
d = D[3] " " D[1] " " D[2]" " H[1] " " H[2] " 00"
t=mktime(d)
t = t + 7200 # add two hours
$2 = strftime("%m/%d/%Y",t)
$3 = strftime("%H:%M",t)
}1

awk -F',' '{if(NR>1){printf("%s, ", $1);system("date -d \"+2 hours " $2 "\" +\"%m/%d/%Y %H:%M\"")}else{print $0}}' IN.csv > OUT.csv

Related

Select Records From File by Date

I have a file format like:-
10077083725 06-OCT-17 32 06-OCT-17
10077083725 09-OCT-17 35 09-OCT-17
I want to select records (around 1 million) based on the date conditions in column 4, as in within the last 2 months. 2 solutions:-
1) Convert the format of the date to something like:- 20170610, and then do simple string comparison using awk.
2) Directly do date comparisons.
Can you suggest which one is better. Also, I was unable to find a solution for the second using shell scripts, so any tips there would be useful.
awk solution on MacOSX with coreutils installed. To use this on linux, change gdate to date. My answer is a sort of a combination of both your solutions:
# cat tst.awk
BEGIN{d=conv(d)}
d <= conv($4)
func conv(str) {
cmd = "gdate -d '" str "' +'%Y%m%d'"
res = ((cmd | getline line) > 0 ? line : "")
close(cmd)
return res
}
Use as input:
# cat file
10077083725 06-OCT-17 32 06-OCT-17
10077083725 09-OCT-17 35 09-OCT-17
then:
# awk -v d="-9 days" -f tst.awk file
10077083725 09-OCT-17 35 09-OCT-17

Combine awk and another command to send report to user

I need small help related to Unix shell script using awk.
I have a file like below:
139341 8.61248 python_dev ntoma2 r 07/17/2017 07:27:43 gpuml#acepd1641.udp.finco.com 1
139342 8.61248 python_val ntoma2 r 07/17/2017 07:27:48 gpuml#acepd1611.udp.finco.com 1
139652 8.61248 python_dev ntoma2 r 07/17/2017 10:55:57 gpuml#acepd1671.udp.finco.com 1
Which is space separated. I need to get 1st col and 4th col which are job-id and user-name(ntoma2 in this case) based on 6th col (which is date in date formate - mm/dd/yyyy), older than 7days. Compare 6th column with current date and I need to get cols which are older than 7days.
I have below one to get Job id and user name of older than 7 days:
cat filename.txt | awk -v dt="$(date "--date=$(date) -7 day" +%m/%d/%Y)" -F" " '/qw/{ if($6<dt) print $4,":",$1 }' >> ./longRunningJob.$$
Also i have another command to get email ids like below using user-name (from the above 4th col):
/ccore/pbis/bin/enum-members "adsusers" | grep ^UNIX -B3 | grep <User-Name> -B2 | grep UPN | awk '{print $2}'
I need to combined above 2 commands and need to send a report to every user as like below:
echo "Hello <User Name>, There is a long running job which is of job-id: <job-id> more than 7days, so please kill the job or let us know if we can help. Thank you!" | mailx -s "Long Running Job"
NOTE: if user name repeated, all the list should go in one email.
I am not sure how can i combine these 2 and send email to user, can some one please help me?
Thank you in advance!!
Vasu
You can certainly do this in awk -- easier in gawk because of date support.
Just to give you an outline of how to do this, I wrote this in Ruby:
$ cat file
139341 8.61248 python_dev ntoma2 r 07/10/2017 07:27:43 gpuml#acepd1641.udp.finco.com 1
139342 8.61248 python_val ntoma2 r 07/09/2017 07:27:48 gpuml#acepd1611.udp.finco.com 1
139652 8.61248 python_dev ntoma2 r 07/17/2017 10:55:57 gpuml#acepd1671.udp.finco.com 1
$ ruby -lane 'BEGIN{ require "date"
jobs=Hash.new { |h,k| h[k]=[] }
users=Hash.new()
pn=7.0
}
t=DateTime.parse("%s %s" % [$F[5].split("/").rotate(-1).join("-"), $F[6]])
ti_days=(DateTime.now-t).to_f
ts="%d days, %d hours, %d minutes and %d seconds" % [60,60,24]
.reduce([ti_days*86400]) { |m,o| m.unshift(m.shift.divmod(o)).flatten }
users[$F[3]]=$F[7]
jobs[$F[3]] << "Job: %s has been running %s" % [$F[0], ts] if (DateTime.now-t).to_f > pn
END{
jobs.map { |id, v|
w1,w2=["is a","job"]
w1,w2=["are","jobs"] if v.length>1
s="Hello #{id}, There #{w1} long running #{w2} running more than the policy of #{pn.to_i} days. Please kill the #{w2} or let us know if we can help. Thank you!\n\t" << v.join("\n\t")
puts "#{users[id]} \n#{s}"
# s is the formated email address and body. You take it from here...
}
}
' /tmp/file
gpuml#acepd1671.udp.finco.com
Hello ntoma2, There are long running jobs running more than the policy of 7 days. Please kill the jobs or let us know if we can help. Thank you!
Job: 139341 has been running 11 days, 9 hours, 28 minutes and 44 seconds
Job: 139342 has been running 12 days, 9 hours, 28 minutes and 39 seconds
I got the Solution, but there is a bug in it, here is the solution:
!#/bin/bash
{ qstat -u \*; /ccore/pbis/bin/enum-members "adsusers"; } | awk -v dt=$(date "--date=$(date) -7 day" +%m/%d/%Y) '
/^User obj/ {
F2 = 1
FS = ":"
T1 = T2 = ""
next
}
!F2 {
if (NR < 3) next
if ($5 ~ "qw" && $6 < dt) JID[$4] = $1 "," JID[$4]
next
}
/^UPN/ {T1 = $2
}
/^Display/ {T2 = $2
}
/^Alias/ {gsub (/ /, _, $2)
EM[$2] = T1
DN[$2] = T2
}
END {for (j in JID) {print "echo -e \"Hello " DN[j] " \\n \\nJob(s) with job id(s): " JID[j] " executing more than last 7 days, hence request you to take action, else job(s) will be killed in another 1 day \\n \\n Thank you.\" | mailx -s \"Long running job for user: " DN[j] " (" j ") and Job ID(s): " JID[j] "\" " EM[j]
}
}
' | sh
The bug in the above code is -- the if condition of date compare (as shown below) is is not working as expected, i am really not sure how to compare the $6 and the variable dt (both of format mm/dd/yyyy). I think i should use either mkdate() or something else. can some one please help?
if ($5 ~ "qw" && $6 < dt)
Thank you!!
Vasu

Delete an entire row if date is less than 50 days of current date

I need help to delete a row if date is older than n days at specified column.My file contains following. From the below file , I need to find out the entries less than 50 days old of current date in column 4 and delete the entire row.
ABC, 2017-02-03, 123, 2012-09-08
BDC, 2017-01-01, 456, 2015-09-05
Test, 2017-01-05, 789, 2017-02-03
My desired output is follows.
ABC, 2017-02-03, 123, 2012-09-08
BDC, 2017-01-01, 456, 2015-09-05
Note: I have an existing script and need to integrate this to the existing one.
you can leverage date command for this task, which will simplify the script
$ awk -v t=$(date -d"-50 day" +%Y-%m-%d) '$4<t' input > output
which will have this content in the output file
ABC, 2017-02-03, 123, 2012-09-08
BDC, 2017-01-01, 456, 2015-09-05
replace input/output with your file names
You can use a gawk logic something like below,
gawk '
BEGIN {FS=OFS=",";date=strftime("%Y %m %d %H %M %S")}
{
split($4, d, "-")
epoch = mktime(d[1] " " d[2] " " d[3] " " "00" " " "00" " " "00")
if ( ((mktime(date) - epoch)/86400 ) > 50) print
}' file
The idea is to use GNU Awk string functions strftime() and mtkime() for date conversion. The former produces a timestamp in YYYY MM DD HH MM SS format which mktime uses for conversion to EPOCH time.
Once the two times, i.e. the current timestamp (date) and epoch from $4 in file are converted to EPOCH, the difference is divided by 86400 to get the differences in days and only those lines whose difference is greater 50 are printed.

Extract data from log [duplicate]

This question already has an answer here:
bash only email if occurrence since last alert
(1 answer)
Closed 7 years ago.
I have logs in format
##<01-Mar-2015 03:48:18 o'clock GMT> <info>
##<01-Mar-2015 03:48:20 o'clock GMT> <info>
##<01-Mar-2015 03:48:30 o'clock GMT> <info>
##<01-Mar-2015 03:48:39 o'clock GMT> <info>
I got to write shell script to extract data of last 5 minutes from the last recorded data in the log file and then search a string in it.I am new to shell script , I used grep command but its of no use.Can anyone help me here.
I tried the below script
#!/bin/bash
H=1 ## Hours
LOGFILE=/path/to/logfile.txt
X=$(( H * 60 * 60 )) ## Hours converted to seconds
function get_ts {
DATE="${1%%\]*}"; DATE="${DATE##*\[}"; DATE=${DATE/:/ }; DATE=${DATE//\// }
TS=$(date -d "$DATE" '+%s')
}
get_ts "$(tail -n 1 "$LOGFILE")"
LAST=$TS
while read -r LINE; do
get_ts "$LINE"
(( (LAST - TS) <= X )) && echo "$LINE"
done < "$LOGFILE"
and on running it get the below error
get_ts: DATE=${DATE/:/ }: 0403-011 The specified substitution is not valid for this command.
IF you use awk, you can use date to get data for example last 5 minutes like this:
awk '$0>=from' from="$(date +"##<%d-%b-%Y %H:%M:%S" -d -5min)" logile
PS, you need the date command to match your format.
I'd parse the date into seconds since epoch and compare that with the system time:
TZ=GMT awk -F '[#<> :-]+' 'BEGIN { split("Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec", mnames, ","); for(i = 1; i <= 12; ++i) m[mnames[i]] = i } mktime($4 " " m[$3] " " $2 " " $5 " " $6 " " $7) + 300 >= systime()' filename
The -F '[#<> :-]+' is to split the date into individual parts, so that $2 is the day, $3 the month, $4 the year, and so forth. Then the code works as follows:
BEGIN {
# build a mapping from month name to number (to use in mktime)
split("Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec", mnames, ",")
for(i = 1; i <= 12; ++i) m[mnames[i]] = i
}
# build a numerically comparable timestamp from the split date, and
# select all lines whose timestamp is not more than 300 seconds behind
# the system time.
mktime($4 " " m[$3] " " $2 " " $5 " " $6 " " $7) + 300 >= systime()
Setting the TZ environment variable to GMT (with TZ=GMT before the awk call) will make mktime interpret the time stamps as GMT.

Humanized dates with awk?

I have this awk script that runs through a file and counts every occurrence of a given date. The date format in the original file is the standard date format, like this: Thu Mar 5 16:46:15 EST 2009 I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.
In order to get the output to be sorted by date, I converted the dates to a different format that I could sort with bash sort.
Now, my output looks like this:
Date Count
03/05/2009 2
03/06/2009 1
05/13/2009 7
05/22/2009 14
05/23/2009 7
05/25/2009 7
05/29/2009 11
06/02/2009 12
06/03/2009 16
I'd really like the output to have more human readable dates, like this:
Mar 5, 2009
Mar 6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun 2, 2009
Jun 3, 2009
Any suggestions for a way I could do this? If I could do this on the fly when I output the count values that would be best.
UPDATE:
Here's my solution incorporating ghostdog74's example code:
grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
{++total[$0]} #pump dates into associative array
END {
for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
}' | sort -t \t | awk ' #send to sort, then to cleanup
BEGIN {printf "%s\t%s\r\n","Date","Count"}
{t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
}'
rm dates.txt
Sorry this looks so messy. I've tried to put clarifying comments in.
Use awk's sort and date's stdin to greatly simplify the script
Date will accept input from stdin so you can eliminate one pipe to awk and the temporary file. You can also eliminate a pipe to sort by using awk's array sort and as a result, eliminate another pipe to awk. Also, there's no need for a coprocess.
This script uses date for the monthname conversion which would presumably continue to work in other languages (ignoring the timezone and month/day order issues, though).
The end result looks like "grep|date|awk". I have broken it into separate lines for readability (it would be about half as big if the comments were eliminated):
grep -i "E[DS]T 2009" original.txt |
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk '
BEGIN { printf "%s\t%s\r\n","Date","Count" }
{ ++total[$0] #pump dates into associative array }
END {
idx=1
for (item in total) {
d[idx]=item;idx++ # copy the array indices into the contents of a new array
}
c=asort(d) # sort the contents of the copy
for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
}
}'
I get testy when I see someone using grep and awk (and sed, cut, ...) in a pipeline. Awk can fully handle the work of many utilities.
Here's a way to clean up your updated code to run in a single instance of awk (well, gawk), and using sort as a co-process:
gawk '
BEGIN {
IGNORECASE = 1
}
function mon2num(mon) {
return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
}
/ E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
month=$2
day=$3
year=$6
date=sprintf("%4d%02d%02d", year, mon2num(month), day)
total[date]++
human[date] = sprintf("%3s %2d, %4d", month, day, year)
}
END {
sort_coprocess = "sort"
for (date in total) {
print date |& sort_coprocess
}
close(sort_coprocess, "to")
print "Date\tCount"
while ((sort_coprocess |& getline date) > 0) {
print human[date] "\t" total[date]
}
close(sort_coprocess)
}
' original.txt
if you are using gawk
awk 'BEGIN{
s="03/05/2009"
m=split(s,date,"/")
t=date[3]" "date[2]" "date[1]" 0 0 0"
print strftime("%b %d",mktime(t))
}'
the above is just an example, as you did not show your actual code and so cannot incorporate it into your code.
Why don't you prepend your awk-date to the original date? This yields a sortable key, but is human readable.
(Note: to sort right, you should make it yyyymmdd)
If needed, cut can remove the prepended column.
Gawk has strftime(). You can also call the date command to format them (man). Linux Forums gives some examples.

Resources