Changing date format in CSV file using Ubuntu bash/awk - bash

I have csv files where date is in wrong format. Incoming format is e.g. 15.11.2015 and I should change is to %Y-%m-%d (2015-11-15). I've tried to create an bash/awk script to where I change this value which is in column 43. First row is a header. So far I've managed to create a script which finds the value and replaces it with front slashes:
awk -v FS=";" 'NR>1{split($43,a,".");$43=a[2]"/"a[1]"/"a[3]}1' OFS=";" fileIn
I've tried to change the format with date command but I haven't found a way to use it in awk script. This would print the date in right format:
date -d 11/25/2015 +%Y-%m-%d
EDIT. I need to change the format, otherwise the leading zeros are missed if date or month is < 10.

I followed 123's advice and used sprintf and padding and my working solution is now:
awk -v FS=";" 'NR>1{split($43,a,".");$43=sprintf("%04d-%02d-%02d",a[3],a[2],a[1])}1' OFS=";" fileIn
EDIT. Cleaned after 123's comment.

Related

Awk - filter only dates with certain format from text file

I have a .txt file with many lines of text on macOS. I would like to filter only dates and have them saved in order of appearance line by line in a new text file.
I am, however, not interested in all dates, only in those who are complete, looking like 02/03/2019, and those where the number of days is below 13, i. e. 01...12.
Then, I would like to have those dates removed where the number for the day and month are the same like 01/01/2019 and 02/02/2019 etc.
How can I achieve this with awk or similar software in bash?
If perl is a choice:
perl -ne 'print if m:(\d\d)/(\d\d)/(\d\d\d\d): && $1 < 13 && $1 != $2' dates.txt >newdates.txt
this assumes this format /dd/mm/yyyy
Note that I am using a m: : notation instead of the usual / / for regex matching. Thus I do not need to escape the / slashes in the date.
Deleting Dates Inside a Text File
The following command will delete all dates of the form✱ aa/bb/cccc where aa = bb < 13. The original file will be copied to yourFile.txt.bak as a backup and the new text with deleted dates will overwrite the old file.
sed -E -i.bak 's:\b(0[0-9]|1[0-2])/\1/[0-9]{4}\b::g' yourFile.txt
If you want to insert something instead of just deleting the dates you can do so by writing the replacement between the two ::. For instance sed … 's:…:deleted date:/g' … will replace each matching date with the text deleted date.
✱ Note that it doesn't matter for your criterion whether the date format is dd/mm/yyyy or mm/dd/yyyy since your are only interested in dates where dd and mm are equal.
Extracting Specific Dates From A Text File
If you do not want to delete, but only extract specific dates as mentioned in your comment, you can use the following command.
grep -Eo '\b([0-9]{2}/){2}[0-9]{4}\b' yourFile.txt | awk -F/ '$1<13 && $1!=$2'
This will extract all dates in dd/mm/yyyy (!) format where mm ≠ dd < 13. The dates are printed in order of appearance on stdin. If you want to save them to a file append > yourOutputFile.txt to the end of the command.

Adding months using shell script

Currently I have a below record in a file.
ABC,XYZ,123,Sep-2018
Looking for a command in linux which will add months and give the output. For example If I want to add 3 months. Expected output is:
ABC,XYZ,123,Dec-2018
Well,
date -d "1-$(echo "ABC,XYZ,123,Sep-2018" | awk -F "," '{ print $4 }')+3 months" "+%b-%Y"
(Careful, that code continues past the edge of the box.)
Shows you how to get it working. Just replace the echo with a shell variable as you loop through the dates.
Basically, you use awk to grab just the date portion, add a 1- to the front to turn it into a real date then use the date command to do the math and then tell it to give you just the month abbreviation and year.
The line above gives just the date portion. The first part can be found using:
stub=`echo "ABC,XYZ,123,Dec-2018" | awk -F "," '{ printf("%s,%s,%s,",$1,$2,$3) }'`
You can use external date or (g)awk's datetime related function to do it. However you have to prepare the string to parse. Here is another way to do the job:
First prepare an index file, we name it month.txt:
Jan
Feb
......
...
Nov
Dec
Then run this:
awk -F'-|,' -v OFS="," 'NR==FNR{m[NR]=$1;a[$1]=NR;next}
{i=a[$4]; if(i==12){i=1;++$5}else i++
$4=m[i]"-"$5;NF--}7' month.txt file
With this example file:
ABC,XYZ,123,Jan-2018
ABC,XYZ,123,Nov-2018
ABC,XYZ,123,Dec-2018
You will get:
ABC,XYZ,123,Feb-2018
ABC,XYZ,123,Dec-2018
ABC,XYZ,123,Jan-2019
update
Oh, I didn't notice that you want to add 3 months. Here is the updated codes for it:
awk -F'-|,' -v OFS="," 'NR==FNR{m[NR]=$1;a[$1]=NR;next}
{i=a[$4]+3; if(i>12){i=i-12;++$5}
$4=m[i]"-"$5;NF--}7' month.txt file
Now with the same input, you get:
ABC,XYZ,123,Apr-2018
ABC,XYZ,123,Feb-2019
ABC,XYZ,123,Mar-2019

Unix Shell Scripting using Date Command

Ok, so i'm trying to write a scrpit to wc files using the date command. The format of the files, for example, goes like this: testfile20170104.gz.
Now the files are set up to have yesterday's date with the format yyyymmdd. So if today is 1/5/2017 the file will have the previous day of 1/4/2017 in the yyyymmdd format, as you see in the example above.
Normally to count the file all one needs to do is simply input: gzcat testfile20170104.gz|wc -l to get the word count.
However, what I want to do is run a script or even a for loop that gzcat the file but instead of having to copy and paste the filename in the command line, I want to use the date command to input put yesterday's date in the filename with the format of yyyymmdd.
So as a template something like this:
gzcat testfile*.gz|wc -l | date="-1 days"+%Y%m%d
Now I know what I have above is COMPLETELY wrong but you get the picture. I want to replace the '*' with the output from the date command, if that makes sense...
Any help will be much much appreciated!
Thanks!
You want:
filename="testfile$( date -d yesterday +%Y%m%d ).gz"
zcat "$filename"

How to strip date in csv output using shell script?

I have a few csv extracts that I am trying to fix up the date on, they are as follows:
"Time Stamp","DBUID"
2016-11-25T08:28:33.000-8:00,"5tSSMImFjIkT0FpiO16LuA"
The first column is always the "Time Stamp", I would like to convert this so it only keeps the date "2016-11-25" and drops the "T08:28:33.000-8:00".
The end result would be..
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
There are plenty of files with different dates.
Is there a way to do this in ksh? Some kind of for each loop to loop through all the files and replace the long time-stamp and leave just the date?
Use sed:
$ sed '2,$s/T[^,]*//' file
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
How it works:
2,$ # Skip header (first line) removing this will make a
# replacement on the first line as well.
s/T[^,]*// # Replace everything between T (inclusive) and , (exclusive)
# `[^,]*' Matches everything but `,' zero or more times
Here's one solution using a standard aix utility,
awk -F, -v OFS=, 'NR>1{sub(/T.*$/,"",$1)}1' file > file.cln && mv file.cln file
output
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
(but I no longer have access to an aix environment, so only tested with my local awk).
NR>1 skips the header line, and the sub() is limited to only the first field (up to the first comma). The trailing 1 char is awk shorthand for {print $0}.
If your data layout changes and you get extra commas in your data, this may required fixing.
IHTH
Using sed:
sed -i "s/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\).*,/\1-\2-\3,/" file.csv
Output:
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
-i edit files inplace
s substitute
This is a perfect job for awk, but unlike the previous answer, I recommend using the substring function.
awk -F, 'NR > 1{$1 = substr($1,1,10)} {print $0}' file.txt
Explanation
-F,: The -F flag sets a field separator, in this case a comma
NR > 1: Ignore the first row
$1: Refers to the first field
$1 = substr($1,1,10): Sets the first field to the first 10 characters of the field. In the example, this is the date portion
print $0: This will print the entire row

How to grep files from specific date to EOF awk

I have a little problem with printing data from file from date to end of file, namely, I have file:
2016/08/10-12:45:14.970000 <some_data> <some_data> ...
2016/08/10-12:45:15.970000 <some_data> <some_data> ...
2016/08/10-12:45:18.970000 <some_data> <some_data> ...
2016/08/10-12:45:19.970000 <some_data> <some_data> ...
And this file has hundreds lines.
And I have to print file from one point in the time to end of file but I don't know precise time when row in logfile appeared.
And I need to print data from date 2016/08/10-12:45:16to end of file, I want to receive file looks like that:
2016/08/10-12:45:18.970000
2016/08/10-12:45:19.970000
OK if I know specific date from which I want to print data everything is easy
awk '/<start_time>/,/<end/'
awk '/2016\/08\/10-12:45:18/,/<end/'
But if I don't know specific date, I know only approximate date 2016/08/10-12:45:16 it's harder.
Can any one please help me?
You can benefit from the fact that the time format you are using supports alphanumerical comparison. With awk the command can look like this:
awk -v start='2016/08/10-12:45:16' '$1>=start' file
You can use mktime function of awk to check for time:
awk -v TIME="2016/08/10-12:45:16" '
BEGIN{
gsub("[/:-]"," ",TIME)
reftime=mktime(TIME)
}
{
t=$1
sub("[0-9]*$","",t)
gsub("[/:-]"," ",t)
if(mktime(t)>reftime)
print
}' file
This script take your reference time and convert it into number and then compare it to time found in the file.
Note the sub and gsub are only to convert your specific time format to the time format understood by awk.
You should be able to do this simply with awk:
awk '{m = "2016/08/10-12:45:18"} $0 ~ m,0 {print}' file
If you weren't sure exactly the time or date you could do:
awk '{m = "2016/08/10-12:45:1[6-8]"} $0 ~ m,0 {print}' file
This should print from your specified date and time around 12:45 +16-18 seconds to the files end. The character class [6-8] treats the seconds as a range from the original time 12:45:1...
Output:
2016/08/10-12:45:18.970000 somedata 3
2016/08/10-12:45:19.970000 somedata 4

Resources