format date in file using awk - bash

Content of the file is
Feb-01-2014 one two
Mar-02-2001 three four
I'd like to format the first field (the date) to %Y%m%d format
I'm trying to use a combination of awk and date command, but somehow this is failing even though i got the feeling i'm almost there:
cat infile | awk -F"\t" '{$1=system("date -d " $1 " +%Y%m%d");print $1"\t"$2"\t"$3}' > test
this prints out date's usage pages which makes me think that the date command is triggered properly, but there is something wrong with the argument, do you see the issue somewhere?
i'm not that familiar with awk,

You don't need date for this, its simply rearranging the date string:
$ awk 'BEGIN{FS=OFS="\t"} {
split($1,t,/-/)
$1 = sprintf("%s%02d%s", t[3], (match("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3, t[2])
}1' file
20140201 one two
20010302 three four

You can use:
while read -r a _; do
date -d "$a" '+%Y%m%d'
done < file
20140201
20010302

system() returns the exit code of the command.
Instead:
cat infile | awk -F"\t" '{"date -d " $1 " +%Y%m%d" | getline d;print d"\t"$2"\t"$3}'

$ awk '{var=system("date -d "$1" +%Y%m%d | tr -d \"\\n\"");printf "%s\t%s\t%s\n", var, $2, $3}' file
201402010 one two
200103020 three four

Related

Reformat date in text file (.csv) with sed and date

This is the input .csv file
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
.
.
.
Now, I want to convert the second column (date) to this specific format
date -d '10/12/2017 2:10pm' +'%Y/%m/%d %H:%M:%S', so that "10/12/2017 2:10pm" converts to "2017/10/12 14:10:00"
Expecting output file
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
.
.
.
I know it can be done by using bash or python, but I want to do it in one-line command. Any ideas? Is there a way to pass date result to sed?
One-liner awk approach.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd) }1' OFS=, infile #>>outfile
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
This will output changes in your Terminal, you need to redirect the output to a file if you need record the output or use FILENAME to redirect the output to the input infile itself.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd); print >FILENAME }' OFS=, infile
Or with GNU awk implementations which does support -i inplace identifier for in-place replace. see 'awk' save modifications in place
You can do it in one line, but that begs the question -- "How long of a line do you want?" Since you have it labeled 'shell' and not bash, etc., you are a bit limited in your string handling. POSIX shell provides enough to do what you want, but it isn't the speediest remedy. You are either going to end up with an awk or sed solution that calls date or a shell solution that calls awk or sed to parse old date from the original file and feeds the result to date to get your new date. You will have to work out which provides the most efficient remedy.
As far as the one-liner goes, you can do something similar to the following while remaining POSIX compliant. It simply uses awk to get the 2nd field from the file, pipes the result to a while loop which uses expr length "$field" to get the length and uses that within expr substr "$field" "2" <length expression - 2> to chop the double-quotes from the end of the original date olddt, followed by date -d "$olddt" +'%Y/%m/%d %H:%M:%S' to get newdt and finally sed -i "s;$olddt;$newdt;" to perform the substitution in place. Your one-liner (shown with auto line-continuations for readability)
$ awk -F, '{print $2}' timefile.txt |
while read -r field; do
olddt="$(expr substr "$field" "2" "$(($(expr length "$field") - 2))")";
newdt=$(date -d "$olddt" +'%Y/%m/%d %H:%M:%S');
sed -i "s;$olddt;$newdt;" timefile.txt; done
Example Input File
$ cat timefile.txt
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
Resulting File
$ cat timefile.txt
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
There are probably faster ways to do it, but this is a reasonable length one-liner (relatively speaking).
Revised less ugly sed method:
sed 's/^.*,"\|",.*//g;h;s#.*#date "+%Y/%m/%d %T" -d "&"#e;H;g;s#\n\|$#,#g;s/^/s,/' input.csv | sed -f - input.csv
Spread out, (it works the same):
sed 's/^.*,"\|",.*//g
h;
s#.*#date "+%Y/%m/%d %T" -d "&"#e;
H;
g;
s#\n\|$#,#g;
s/^/s,/' input.csv | sed -f - input.csv
Output:
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
How it works:
The first sed block uses the evaluate command to run date, the output of which is used to generate some new sed substitute commands. To show the new s commands, temporarily replace the shell script | pipe with a # comment:
s,10/11/2017 2:10pm,2017/10/11 14:10:00,
s,10/12/2017 3:10pm,2017/10/12 15:10:00,
These are piped to the second sed.

Convert date to timestamp in bash (with miliseconds)

I have CSV file in the following format
20170102 00:00:00.803,
20170102 00:00:01.265,
20170102 00:00:05.818,
I've managed to add slashes with
sed -r 's#(.{4})(.{2})(.{2})(.{2})(.{2})#\1/\2/\3 \4:\5:#' file.csv > newfile.csv
as below, to enable coversion to timestamp
2017/01/02 0:0::00:00.803
2017/01/02 0:0::00:01.265
2017/01/02 0:0::00:05.818
But after using
cat newfile.csv | while read line ; do echo $line\;$(date -d "$t" "+%s%N") ; done > nextfile.csv
I got :
2017/01/02 0:0::00:00.803,1499727600000000000
2017/01/02 0:0::00:01.265,1499727600000000000
2017/01/02 0:0::00:05.818,1499727600000000000
There's probably something wrong my data, but I'm too much of a beginner to be able to get missing values. It would be very much appreciated if you could drop me some sed/awk magic.Thanks!
EDIT: I need to have a timestamp with miliseconds, but all I got for now is just zeros (how typical)
Not sure if this is what you are after but you could just parse the output without date to form the date stamp.
awk '{ print substr($0,1,4)"/"substr($0,5,2)"/"substr($0,7,2)" "substr($0,10,2)":"substr($0,13,2)":"substr($0,16) }' dates.csv
We use awk to pull out the extract of the line concerning day, month, year etc (substr function) and then use print to output the data in the required format.
gawk solution:
awk -F',' '{ match($1,/^([0-9]{4})([0-9]{2})([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})/,a);
print mktime(sprintf("%d %d %d %d %d %d",a[1],a[2],a[3],a[4],a[5],a[6]))*1000 + a[7] }' file.csv
The output:
1483308000803
1483308001265
1483308005818
The original format is accepted by date as time stamp. You need not sed it. I believe you need "date,milliseconds since 1970-01-01 00:00:00 UTC" in your output. Try this in bash.
generateoutput.sh
#!/bin/bash
while read -r line
do
echo -n $line,
echo `date -d "$line" "+%s%N"` / 1000000 | bc
done < <(sed 's/,//g' $1)
where timestamp.csv is your file in original format.
bc - basic calculator, convert nanoseconds to milliseconds
Parsing large files line by line is bound to take time.
If you need better performance, split your original file. I suggest doing this in a new directory
split -l 100000 -d <filename>
Run generateoutput.sh in parallel for each of this file and tee -a output
ls -l x* | awk '{print $9}' | xargs -n1 -P4 generateoutput.sh | tee -a output.csv

Counting lines in a file matching specific string

Suppose I have more than 3000 files file.gz with many lines like below. The fields are separated by commas. I want to count only the line in which the 21st field has today's date (ex:20171101).
I tried this:
awk -F',' '{if { $21 ~ "TZ=GMT+30 date '+%d-%m-%y'" } { ++count; } END { print count; }}' file.txt
but it's not working.
Using awk, something like below
awk -F"," -v toSearch="$(date '+%Y%m%d')" '$21 ~ toSearch{count++}END{print count}' file
The date '+%Y%m%d' produces the date in the format as you requested, e.g. 20170111. Then matching that pattern on the 21st field and counting the occurrence and printing it in the END clause.
Am not sure the Solaris version of grep supports the -c flag for counting the number of pattern matches, if so you can do it as
grep -c "$(date '+%Y%m%d')" file
Another solution using gnu-grep
grep -Ec "([^,]*,){20}$(date '+%Y%m%d')" file
explanation: ([^,]*,){20} means 20 fields before the date to be checked
Using awk and process substitution to uncompress a bunch of gzs and feed them to awk for analyzing and counting:
$ awk -F\, 'substr($21,1,8)==strftime("%Y%m%d"){i++}; END{print i}' * <(zcat *gz)
Explained:
substr($21,1,8) == strftime("%Y%m%d") { # if the 8 first bytes of $21 match date
i++ # increment counter
}
END { # in the end
print i # output counter
}' * <(zcat *gz) # zcat all gzs to awk
If Perl is an option, this solution works on all 3000 gzipped files:
zcat *.gz | perl -F, -lane 'BEGIN{chomp($date=`date "+%Y%m%d"`); $count=0}; $count++ if $F[20] =~ /^$date/; END{print $count}'
These command-line options are used:
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-n loop around each line of the input file
-e execute the perl code
-F autosplit modifier, in this case splits on ,
BEGIN{} executes before the main loop.
The $date and $count variables are initialized.
The $date variable is set to the result of the shell command date "+%Y%m%d"
$F[20] is the 21st element in #F
If the 21st element starts with $date, increment $count
END{} executes after the main loop
Using grep and cut instead of awk and avoiding regular expressions:
cut -f21 -d, file | grep -Fc "$(date '+%Y%m%d')"

Using and manipulating date variable inside AWK

I assume that there may be a better way to do it but the only one I came up with was using AWK.
I have a file with name convention like following:
testfile_2016_03_01.txt
Using one command I am trying to shift it by one day testfile_20160229.txt
I started from using:
file=testfile_2016_03_01.txt
IFS="_"
arr=($file)
datepart=$(echo ${arr[1]}-${arr[2]}-${arr[3]} | sed 's/.txt//')
date -d "$datepart - 1 days" +%Y%m%d
the above works fine, but I really wanted to do it in AWK. The only thing I found was how to use "date" inside AWK
new_name=$(echo ${file##.*} | awk -F'_' ' {
"date '+%Y%m%d'" | getline date;
print date
}')
echo $new_name
okay so two things happen here. For some reason $4 also contains .txt even though I removed it(?) ##.*
And the main problem is I don't know how to pass the variables to that date, the below doesn't work
`awk -F'_' '{"date '-d "2016-01-01"' '+%Y%m%d'" | getline date; print date}')
ideally I want 2016-01-01 to be variables coming from the file name $2-$3-$4 and substract 1 day but I think I'm getting way too many single and double quotes here and my brain is losing..
Equivalent awk command:
file='testfile_2016_03_01.txt'
echo "${file%.*}" |
awk -F_ '{cmd="date -d \"" $2"-"$3"-"$4 " -1 days\"" " +%Y%m%d";
cmd | getline date; close(cmd); print date}'
20160229
WIth GNU awk for time functions:
$ file=testfile_2016_03_01.txt
$ awk -v file="$file" 'BEGIN{ split(file,d,/[_.]/); print strftime(d[1]"_%Y%m%d."d[5],mktime(d[2]" "d[3]" "d[4]" 12 00 00")-(24*60*60)) }'
testfile_20160229.txt
This might work for you:
file='testfile_2016_03_01.txt'
IFS='_.' read -ra a <<< "$file"
date -d "${a[1]}${a[2]}${a[3]} -1 day" "+${a[0]}_%Y%m%d.${a[4]}"

Adding a single date to the first column of a file

I have a file that looks like
1234-00AA12 .02
5678-11BB34 .03
In a bash script I have an expression like
day=$(...)
that greps a date in the format YYYY/MM/DD (if this matters), let's say 2014/01/21 for specificity.
I want to produce the following:
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
(The first column is the day, the second and third columns are fixed as "1").
After a bit of googling I tried:
cat file|awk -F "-" '{split($2,array," "); printf "%s,%s,%s,%s,%s,%s\n",$day,"1","1",$1,array[1],array[2]}'> output.csv
but $day isn't working with awk.
Any help would be appreciated.
Try this awk:
awk -v d=$(date '+%Y/%m/%d') '{print d,1,1,$1,$2}' OFS=, file
2014/02/07,1,1,1234-00AA12,.02
2014/02/07,1,1,5678-11BB34,.03
$ awk -v day="$day" 'BEGIN{FS="[ -]";OFS=","} {print day,1,1,$1,$2,$3}' file
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
awk wouldn't understand shell variables. You need to pass those to it:
awk -vdd="$day" -F "-" '{split($2,array," "); printf "%s,%s,%s,%s,%s,%s\n",dd,"1","1",$1,array[1],array[2]}'
Moreover, rather than saying:
cat file | awk ...
avoid the useless use of cat:
awk file
With bash
day="2014/01/21"
(
IFS=,
while IFS=" -" read -ra fields; do
new=( "$day" 1 1 "${fields[#]}" )
echo "${new[*]}"
done < file
)
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
I run the while loop in a subshell just to keep changes to IFS localized.

Resources