Bash script to convert a date and time column to unix timestamp in .csv - bash

I am trying to create a script to convert two columns in a .csv file which are date and time into unix timestamps. So i need to get the date and time column from each row, convert it and insert it into an additional column at the end containing the timestamp.
Could anyone help me? So far i have discovered the unix command to convert any give time and date to unixstamp:
date -d "2011/11/25 10:00:00" "+%s"
1322215200
I have no experience with bash scripting could anyone get me started?
Examples of my columns and rows:
Columns: Date, Time,
Row 1: 25/10/2011, 10:54:36,
Row 2: 25/10/2011, 11:15:17,
Row 3: 26/10/2011, 01:04:39,
Thanks so much in advance!

You don't provide an exerpt from your csv-file, so I'm using this one:
[foo.csv]
2011/11/25;12:00:00
2010/11/25;13:00:00
2009/11/25;19:00:00
Here's one way to solve your problem:
$ cat foo.csv | while read line ; do echo $line\;$(date -d "${line//;/ }" "+%s") ; done
2011/11/25;12:00:00;1322218800
2010/11/25;13:00:00;1290686400
2009/11/25;19:00:00;1259172000
(EDIT: Removed an uneccessary variable.)
(EDIT2: Altered the date command so the script actually works.)

this should do the job:
awk 'BEGIN{FS=OFS=", "}{t=$1" "$2; "date -d \""t"\" +%s"|getline d; print $1,$2,d}' yourCSV.csv
note
you didn't give any example. and you mentioned csv, so I assume that the column separator in your file should be "comma".
test
kent$ echo "2011/11/25, 10:00:00"|awk 'BEGIN{FS=OFS=", "}{t=$1" "$2; "date -d \""t"\" +%s"|getline d; print $1,$2,d}'
2011/11/25, 10:00:00, 1322211600

Now two imporvements:
First: No need for cat foo.csv, just stream that via < foo.csv into the while loop.
Second: No need for echo & tr to create the date stringformat. Just use bash internal pattern and substitute and do it inplace
while read line ; do echo ${line}\;$(date -d "${line//;/ }" +'%s'); done < foo.csv

Related

Use awk to analyze csv file - combined with shell 'date' command in awk

I have a .csv file which has dates and the answer about enjoyable or not:
2019-04-1,enjoyable
2019-04-2,unenjoyable
2019-04-3,unenjoyable
2019-04-4,enjoyable
2019-04-5,unenjoyable
2019-04-6,unenjoyable
2019-04-7,enjoyable
2019-04-8,unenjoyable
2019-04-9,unenjoyable
2019-04-10,enjoyable
2019-04-11,enjoyable
2019-04-12,enjoyable
2019-04-13,unenjoyable
2019-04-14,enjoyable
2019-04-15,unenjoyable
2019-04-16,unenjoyable
2019-04-17,unenjoyable
2019-04-18,enjoyable
2019-04-19,unenjoyable
2019-04-20,unenjoyable
2019-04-21,unenjoyable
2019-04-22,unenjoyable
2019-04-23,unenjoyable
2019-04-24,unenjoyable
2019-04-25,unenjoyable
2019-04-26,unenjoyable
What I want to do is to print the day of the week in the third column seperate by ',' like this:
2019-04-1,enjoyable,2
2019-04-2,unenjoyable,3
I tried:
dates=$(awk '{FS=","}{print $1,$2}' weather_stat.csv')
weeks=$(
for vars in $dates[first_row]
do
echo $(date -j -f '%Y-%m-%d' $vars "+%w")
done
)
merge($dates,$weeks)
The first part of the code works without any problem, but in the second part, I am confused about how to get the data in the first row (so I use dates[first_row] to mean the first row in dates variable) from the variable "dates" so we can apply 'date' method on it
And for the third part, I want to merge these two tables together. I found the 'join' function but it seem to work on two files instead of two variables(I don't want to have any new files during the process)
Could anyone tells me how to get the rows in a variable instead of a file in shell and the way to merge two table-like variables?
As you're learning shell scripting, here's some code to study:
to read your csv file, and get the weekday number for each date in the file:
while IFS=, read -r date rest; do echo "$date,$(date -d "$date" +%w)"; done < file.csv
to join the output of that command with your file:
weekdays=$(while IFS=, read -r date rest; do echo "$date,$(date -d "$date" +%w)"; done < file.csv)
join -t, file.csv <(echo "$weekdays")
or, without needing to store the result in an intermediate variable
join -t, file.csv <(
while IFS=, read -r date rest; do echo "$date,$(date -d "$date" +%w)"; done < file.csv
)
The newlines within the <() are not necessary, but useful for maintainable code.
However, you can see that this is less efficient because you have to process the file twice. With awk you only have to read through the file once.
With GNU awk:
awk' BEGIN{FS=OFS=","}
{ split($1,a,"-")
t=sprintf("%0.4d %0.2d %0.2d 00 00 00",a[1],a[2],a[3]);
print $0,strftime("%w",mktime(t))
}' file.csv
With only your Bourne shell, so less efficient than awk if you have a lot of lines in your CSV file:
while IFS=, read date enjoy; do
date -d "$date" +"$date,$enjoy,%w"
done < your.csv

bash, get line form file with condition for particular column

I've got file with two columns, for example line of this file looks like this: username,data
username,20150706
I'm trying to get lines that dates in the second column are older than eg. 20151231.
EDIT, SOLUTION: I want to do it in a loop and read the dates (the second column of the file) then compare it with another date, eg. 20151231. Then if date is let say older than 20151231 write whole line (username,date) to file. And here's how I do it:
date1=20151231
while read -r line;
do
date2=$(echo $line | cut -d "," -f2);
if (( date2 < date1 ));
then
echo "$line" >> filename2;
fi
done < filename
So one little problem solved :)
Thanks in advance for any suggestions.
you can solve this with awk:
awk -F',' -v threshold=20151231 '$2<=threshold' file
hope this helps you!

Split file by date and keep header in Bash

I need to split a TSV file by date using whatever standard CLI tools come with OS X 10.10; e.g. sed, awk, etc. FYI the shell is Bash
The input file has a header row and follows a tab separated format (the date and time is in the first column) — I'm adding "\t" bellow to show the tabs, and "…" to indicate the rows have many more columns:
Transaction Date\t Account Number\t…
9/16/2004 12:00:00 AM\t ABC00147223\t…
9/17/2004 12:00:00 AM\t ABC00147223\t…
10/05/2004 12:00:00 AM\t ABC00147223\t…
The output should be:
A separate file for each unique year AND month (based on the example above I would get 2 output files: 9/2004 and 10/2004)
Maintain the first/header row of the original file
Filename in the form YYYYMM.txt
Thank you for your help.
If you want to do pure in bash shell do as below...
#!/bin/bash
datafile=inputdatafile.dat
ctr=0;
while read line
do
# counter to keep track of line number
ctr=$((ctr + 1))
# skip header line for processing
if [[ $ctr -gt 1 ]];
then
# create filename using date field present in record
vdate=${line%% *}
vday1=${vdate%%/*}
vday=`printf "%02d" $vday1` # day with padding 0
vyear=${vdate##*/} # year
vfilename="${vyear}${vday}.txt" # filname in YYYYMM.txt format
# check if file exists or not then put header record in it
if [ ! -f $vfilename ]; then
head -1 $datafile > $vfilename
fi
# put the record in that file
echo "$line" >> $vfilename
fi
done < $datafile
Not sure how big your data files are but its never a good idea to parse large files using shell scripting instead use other utils like awk, sed, grep, etc for it.
For big files and using nawk / gawk one-liner use as below ... it will do all you need.
# use nawk or gawk if you don't get the expected results using awk
$nawk '{if(NR==1)h=$0;} {if(NR>1){ split($1,a,"/"); fn=sprintf("%04d%02d.txt",a[3],a[1]); if(system( "[ ! -f " fn " ] ")==0)print h >> fn; print >> fn;} }' inputdatafile.dat

How to extract one column of a csv file

If I have a csv file, is there a quick bash way to print out the contents of only any single column? It is safe to assume that each row has the same number of columns, but each column's content would have different length.
You could use awk for this. Change '$2' to the nth column you want.
awk -F "\"*,\"*" '{print $2}' textfile.csv
yes. cat mycsv.csv | cut -d ',' -f3 will print 3rd column.
The simplest way I was able to get this done was to just use csvtool. I had other use cases as well to use csvtool and it can handle the quotes or delimiters appropriately if they appear within the column data itself.
csvtool format '%(2)\n' input.csv
Replacing 2 with the column number will effectively extract the column data you are looking for.
Landed here looking to extract from a tab separated file. Thought I would add.
cat textfile.tsv | cut -f2 -s
Where -f2 extracts the 2, non-zero indexed column, or the second column.
Here is a csv file example with 2 columns
myTooth.csv
Date,Tooth
2017-01-25,wisdom
2017-02-19,canine
2017-02-24,canine
2017-02-28,wisdom
To get the first column, use:
cut -d, -f1 myTooth.csv
f stands for Field and d stands for delimiter
Running the above command will produce the following output.
Output
Date
2017-01-25
2017-02-19
2017-02-24
2017-02-28
To get the 2nd column only:
cut -d, -f2 myTooth.csv
And here is the output
Output
Tooth
wisdom
canine
canine
wisdom
incisor
Another use case:
Your csv input file contains 10 columns and you want columns 2 through 5 and columns 8, using comma as the separator".
cut uses -f (meaning "fields") to specify columns and -d (meaning "delimiter") to specify the separator. You need to specify the latter because some files may use spaces, tabs, or colons to separate columns.
cut -f 2-5,8 -d , myvalues.csv
cut is a command utility and here is some more examples:
SYNOPSIS
cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-d delim] [-s] [file ...]
I think the easiest is using csvkit:
Gets the 2nd column:
csvcut -c 2 file.csv
However, there's also csvtool, and probably a number of other csv bash tools out there:
sudo apt-get install csvtool (for Debian-based systems)
This would return a column with the first row having 'ID' in it.
csvtool namedcol ID csv_file.csv
This would return the fourth row:
csvtool col 4 csv_file.csv
If you want to drop the header row:
csvtool col 4 csv_file.csv | sed '1d'
First we'll create a basic CSV
[dumb#one pts]$ cat > file
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
Then we get the 1st column
[dumb#one pts]$ awk -F , '{print $1}' file
a
1
a
1
Many answers for this questions are great and some have even looked into the corner cases.
I would like to add a simple answer that can be of daily use... where you mostly get into those corner cases (like having escaped commas or commas in quotes etc.,).
FS (Field Separator) is the variable whose value is dafaulted to
space. So awk by default splits at space for any line.
So using BEGIN (Execute before taking input) we can set this field to anything we want...
awk 'BEGIN {FS = ","}; {print $3}'
The above code will print the 3rd column in a csv file.
The other answers work well, but since you asked for a solution using just the bash shell, you can do this:
AirBoxOmega:~ d$ cat > file #First we'll create a basic CSV
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
And then you can pull out columns (the first in this example) like so:
AirBoxOmega:~ d$ while IFS=, read -a csv_line;do echo "${csv_line[0]}";done < file
a
1
a
1
a
1
a
1
a
1
a
1
So there's a couple of things going on here:
while IFS=, - this is saying to use a comma as the IFS (Internal Field Separator), which is what the shell uses to know what separates fields (blocks of text). So saying IFS=, is like saying "a,b" is the same as "a b" would be if the IFS=" " (which is what it is by default.)
read -a csv_line; - this is saying read in each line, one at a time and create an array where each element is called "csv_line" and send that to the "do" section of our while loop
do echo "${csv_line[0]}";done < file - now we're in the "do" phase, and we're saying echo the 0th element of the array "csv_line". This action is repeated on every line of the file. The < file part is just telling the while loop where to read from. NOTE: remember, in bash, arrays are 0 indexed, so the first column is the 0th element.
So there you have it, pulling out a column from a CSV in the shell. The other solutions are probably more practical, but this one is pure bash.
You could use GNU Awk, see this article of the user guide.
As an improvement to the solution presented in the article (in June 2015), the following gawk command allows double quotes inside double quoted fields; a double quote is marked by two consecutive double quotes ("") there. Furthermore, this allows empty fields, but even this can not handle multiline fields. The following example prints the 3rd column (via c=3) of textfile.csv:
#!/bin/bash
gawk -- '
BEGIN{
FPAT="([^,\"]*)|(\"((\"\")*[^\"]*)*\")"
}
{
if (substr($c, 1, 1) == "\"") {
$c = substr($c, 2, length($c) - 2) # Get the text within the two quotes
gsub("\"\"", "\"", $c) # Normalize double quotes
}
print $c
}
' c=3 < <(dos2unix <textfile.csv)
Note the use of dos2unix to convert possible DOS style line breaks (CRLF i.e. "\r\n") and UTF-16 encoding (with byte order mark) to "\n" and UTF-8 (without byte order mark), respectively. Standard CSV files use CRLF as line break, see Wikipedia.
If the input may contain multiline fields, you can use the following script. Note the use of special string for separating records in output (since the default separator newline could occur within a record). Again, the following example prints the 3rd column (via c=3) of textfile.csv:
#!/bin/bash
gawk -- '
BEGIN{
RS="\0" # Read the whole input file as one record;
# assume there is no null character in input.
FS="" # Suppose this setting eases internal splitting work.
ORS="\n####\n" # Use a special output separator to show borders of a record.
}
{
nof=patsplit($0, a, /([^,"\n]*)|("(("")*[^"]*)*")/, seps)
field=0;
for (i=1; i<=nof; i++){
field++
if (field==c) {
if (substr(a[i], 1, 1) == "\"") {
a[i] = substr(a[i], 2, length(a[i]) - 2) # Get the text within
# the two quotes.
gsub(/""/, "\"", a[i]) # Normalize double quotes.
}
print a[i]
}
if (seps[i]!=",") field=0
}
}
' c=3 < <(dos2unix <textfile.csv)
There is another approach to the problem. csvquote can output contents of a CSV file modified so that special characters within field are transformed so that usual Unix text processing tools can be used to select certain column. For example the following code outputs the third column:
csvquote textfile.csv | cut -d ',' -f 3 | csvquote -u
csvquote can be used to process arbitrary large files.
I needed proper CSV parsing, not cut / awk and prayer. I'm trying this on a mac without csvtool, but macs do come with ruby, so you can do:
echo "require 'csv'; CSV.read('new.csv').each {|data| puts data[34]}" | ruby
I wonder why none of the answers so far have mentioned csvkit.
csvkit is a suite of command-line tools for converting to and working
with CSV
csvkit documentation
I use it exclusively for csv data management and so far I have not found a problem that I could not solve using cvskit.
To extract one or more columns from a cvs file you can use the csvcut utility that is part of the toolbox. To extract the second column use this command:
csvcut -c 2 filename_in.csv > filename_out.csv
csvcut reference page
If the strings in the csv are quoted, add the quote character with the q option:
csvcut -q '"' -c 2 filename_in.csv > filename_out.csv
Install with pip install csvkit or sudo apt install csvkit.
Simple solution using awk. Instead of "colNum" put the number of column you need to print.
cat fileName.csv | awk -F ";" '{ print $colNum }'
csvtool col 2 file.csv
where 2 is the column you are interested in
you can also do
csvtool col 1,2 file.csv
to do multiple columns
You can't do it without a full CSV parser.
If you know your data will not be quoted, then any solution that splits on , will work well (I tend to reach for cut -d, -f1 | sed 1d), as will any of the CSV manipulation tools.
If you want to produce another CSV file, then xsv, csvkit, csvtool, or other CSV manipulation tools are appropriate.
If you want to extract the contents of one single column of a CSV file, unquoting them so that they can be processed by subsequent commands, this Python 1-liner does the trick for CSV files with headers:
python -c 'import csv,sys'$'\n''for row in csv.DictReader(sys.stdin): print(row["message"])'
The "message" inside of the print function selects the column.
If the CSV file doesn't have headers:
python -c 'import csv,sys'$'\n''for row in csv.reader(sys.stdin): print(row[1])'
Python's CSV library supports all kinds of CSV dialects, so if your CSV file uses different conventions, it's possible to support them with relatively little change to the code.
Been using this code for a while, it is not "quick" unless you count "cutting and pasting from stackoverflow".
It uses ${##} and ${%%} operators in a loop instead of IFS. It calls 'err' and 'die', and supports only comma, dash, and pipe as SEP chars (that's all I needed).
err() { echo "${0##*/}: Error:" "$#" >&2; }
die() { err "$#"; exit 1; }
# Return Nth field in a csv string, fields numbered starting with 1
csv_fldN() { fldN , "$1" "$2"; }
# Return Nth field in string of fields separated
# by SEP, fields numbered starting with 1
fldN() {
local me="fldN: "
local sep="$1"
local fldnum="$2"
local vals="$3"
case "$sep" in
-|,|\|) ;;
*) die "$me: arg1 sep: unsupported separator '$sep'" ;;
esac
case "$fldnum" in
[0-9]*) [ "$fldnum" -gt 0 ] || { err "$me: arg2 fldnum=$fldnum must be number greater or equal to 0."; return 1; } ;;
*) { err "$me: arg2 fldnum=$fldnum must be number"; return 1;} ;;
esac
[ -z "$vals" ] && err "$me: missing arg2 vals: list of '$sep' separated values" && return 1
fldnum=$(($fldnum - 1))
while [ $fldnum -gt 0 ] ; do
vals="${vals#*$sep}"
fldnum=$(($fldnum - 1))
done
echo ${vals%%$sep*}
}
Example:
$ CSVLINE="example,fields with whitespace,field3"
$ $ for fno in $(seq 3); do echo field$fno: $(csv_fldN $fno "$CSVLINE"); done
field1: example
field2: fields with whitespace
field3: field3
You can also use while loop
IFS=,
while read name val; do
echo "............................"
echo Name: "$name"
done<itemlst.csv

Shell script to extract data from file between two date ranges

I have a huge file, with each line starting with a timestamp as shown below. I need a way to grep lines between two dates. Is there any easy way to do this using sed or awk instead of extracting out date fields in each line and comparing day/month/year?
example, need to extract data between 2013-06-01 to 2013-06-15 by checking the timestamp in the first field
File contents:
2013-06-02T19:44:59;(3305,3308,2338,102116);aaaa;xxxx
2013-06-14T20:01:58;(2338);aaaa;xxxx
2013-06-12T20:01:58;(3305,3308,2338);bbbb;xxxx
2013-06-13T20:01:59;(3305,3308,2338,102116);bbbb;xxxx
2013-06-13T20:02:53;(2338);bbbb;xxxx
2013-06-13T20:02:53;(3305,3308,2338);aaaa2;xxxx
2013-06-13T20:02:54;(3305,3308,2338,102116);aaaa2;xxxx
2013-06-14T20:31:58;(2338);aaaa2;xxxx
2013-06-14T20:31:58;(3305,3308,2338);aaaa;xxxx
2013-06-15T20:31:59;(3305,3308,2338,102116);bbbb;xxxx
2013-06-16T20:32:53;(2338);aaaa;xxxx
2013-06-16T20:32:53;(3305,3308,2338);aaaa2;xxxx
2013-06-16T20:32:54;(3305,3308,2338,102116);bbbb;xxxx
It may not have been your first choice but Perl is great for this task.
perl -ne "print if ( m/2013-06-02/ .. m/2013-06-15/ )" myfile.txt
The way this works is that if the first trigger is matched (i.e. m/2013-06-02/) then the condition (print) will be executed on each line until the second trigger is matched (i.e. m/2013-06-15).
However this trick won't work if you specify m/2013-06-01/ as a trigger because this is never matched in your file.
A less exciting technique is to extract some text from each line and test that:
perl -ne 'if ( m/^([0-9-]+)/ ) { $date = $1; print if ( $date ge "2013-06-01" and $date le "2013-06-15" ) }' myfile.txt
(Tested both expressions and working).
You can try something like:
awk -F'-|T' '$1==2013 && $2==06 && $3>=01 && $3<=15' hugefile
You can use sed to print all lines between two patterns. In this case, you will have to sort the file first because the dates are interleaved:
$ sort file | sed -n '/2013-06-12/,/2013-06-15/p'
2013-06-12T20:01:58;(3305,3308,2338);bbbb;xxxx
2013-06-13T20:01:59;(3305,3308,2338,102116);bbbb;xxxx
2013-06-13T20:02:53;(2338);bbbb;xxxx
2013-06-13T20:02:53;(3305,3308,2338);aaaa2;xxxx
2013-06-13T20:02:54;(3305,3308,2338,102116);aaaa2;xxxx
2013-06-14T20:01:58;(2338);aaaa;xxxx
2013-06-14T20:31:58;(2338);aaaa2;xxxx
2013-06-14T20:31:58;(3305,3308,2338);aaaa;xxxx
2013-06-15T20:31:59;(3305,3308,2338,102116);bbbb;xxxx

Resources