I have an input file like this:
Peter,Melbourne,30.5.1982
Simon,Sydney,21.2.1990
Tom,Adelaide,22.9.1980
What I'd like to do is re-order the content of the file by the date column and save it to a file.
Like:
Tom,Adelaide,22.9.1980
Peter,Melbourne,30.5.1982
Simon,Sydney,21.2.1990
The whole thing should be done in Powershell..
Cheers!
#vonPryz put a good answer, you can do it a bit shorter.
# Read input data
$c = Import-Csv -Header #("Name","City","Date") c:\temp\data.txt -Delimiter ","
# Get the good globalization info
$oz = new-object Globalization.CultureInfo("en-AU")
#
$c | Sort-Object {[System.DateTime]::Parse($_.date, $oz)}
# Write output data
$d | Export-Csv c:\temp\datasSortedVyDate.csv -NoTypeInformation
Use Import-CSV to read the data. Then convert each date to a DateTime object, so sorting will compare dates, not strings. Finally, sort the CSV data by Date column. Like so,
# Read input data
$c = Import-Csv -Header #("Name","City","Date") c:\temp\data.txt -Delimiter ","
# Print the data. This looks just like what we have read from the file
$c
Name City Date
Peter Melbourne 30.5.1982
Simon Sydney 21.2.1990
Tom Adelaide 22.9.1980
Let's sort the data
$c | sort -Property Date
Name City Date
Simon Sydney 21.2.1990
Tom Adelaide 22.9.1980
Peter Melbourne 30.5.1982
Huh? Sorting didn't work. This is as the Date column contains string values. In string sorting, the sort doesn't care about year part, as two first characters are enough to sort strings. This is common a caveat.
How to overcome this? One needs to convert the date into date objects that will sort nicely by comparing year and month parts too. First off, create a culture info that is used to tell if you are using mm-dd-yyyy, dd-mm-yyyy, or some other format.
# Eh, mate, Melbourne is down under
$oz = new-object Globalization.CultureInfo("en-AU")
# Loop through each row and convert the date member to date, using Aussie culture.
for($i=0;$i -ne $c.count; $i++) {
$c[$i].Date = [Convert]::ToDateTime($c[$i].Date, $oz)
}
# Now the sort works as expected:
$c | sort -Property Date
Name City Date
Tom Adelaide 22.9.1980 0:00:00
Peter Melbourne 30.5.1982 0:00:00
Simon Sydney 21.2.1990 0:00:00
Related
I'm wondering how do i sort this example based on time. I have already sorted it based on everything else, but i just cannot figure out how to go sort it using time (the 07:30 part for example).
My current code:
sort -t"_" -k3n -k2M -k5n (still need to implement the time sort for the last sort)
What still needs to be sorted is the time:
Dunaj_Dec_2000_day_1_13:00.jpg
Rim_Jan_2001_day_1_13:00.jpg
Ljubljana_Nov_2002_day_2_07:10.jpg
Rim_Jan_2003_day_3_08:40.jpg
Rim_Jan_2003_day_3_08:30.jpg
Any help or just a point in the right direction is greatly appreciated!
Alphabetically; 24h time with a fixed number of digits is okay to sort using a plain alphabetic sort.
sort -t"_" -k3n -k2M -k5n -k6 # default sorting
sort -t"_" -k3n -k2M -k5n -k6V # version-number sort.
There's also a version sort V which would work fine.
I have to admit to shamelessly stealing from this answer on SO:
How to split log file in bash based on time condition
awk -F'[_:.]' '
BEGIN {
months["Jan"] = 1
months["Feb"] = 2
months["Mar"] = 3
months["Apr"] = 4
months["May"] = 5
months["Jun"] = 6
months["Jul"] = 7
months["Aug"] = 8
months["Sep"] = 9
months["Oct"] = 10
months["Nov"] = 11
months["Dec"] = 12
}
{ print mktime($3" "months[$2]" "$5" "$6" "$7" 00"), $0 }
' input | sort -n | cut -d\ -f2-
Use _:.\ field separator characters to parse each file name.
Initialize an associative array so we can map month names to numerical values (1-12)
Uses awk function mktime() - it takes a string in the format of "YYYY MM DD HH MM SS [ DST ]" as per https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html. Each line of input is print with a column prepending with the time in epoch seconds.
The results are piped to sort -n which will sort numerically the first column
Now that the results are sorted, we can remove the first column with cut
I have a MAC, so I had to use gawk to get the mktime function (it's not available with MacOS awk normally ). mawk is another option I've read.
I've created a Bash script to look through files within a directory, pull data based on an identifier then use that data to fill an sqlite3 table. It seems some information will fill into the same row as the key, and some won't. The script is as follows:
#!/bin/bash
sqlite3 review.sql "CREATE TABLE Review(Review_ID INTEGER PRIMARY KEY, Author TEXT, Date TEXT);"
path="/home/me/Downloads/test/*"
for i in $path
do
total=$(grep -c '<Author>' $i)
count=1
while [ $count -le $total ]
do
date=$(grep -m$count '<Date>' $i | sed 's#<Date>##' | tail -n1)
author=$(grep -m$count '<Author>' $i | sed 's#<Author>##' | tail -n1)
sqlite3 review.sql "INSERT INTO Review(Author,Date) VALUES('$author','$date');"
((count++))
done
done
The Files I'm look through look like this:
<Author>john
<Date>Jan 6, 2009
<Author>jacob
<Date>Dec 26, 2008
<Author>rachael
<Date>Dec 14, 2008
when I query for the primary key and the date attributes i get this as expected
sqlite> SELECT Review_ID, Date FROM Review;
Review_ID Date
---------- ------------
1 Jan 6, 2009
2 Dec 26, 2008
3 Dec 14, 2008
4 Jan 7, 2009
5 Jan 5, 2009
6 Nov 14, 2008
but when i query for the primary key and author i get this
sqlite> SELECT Review_ID, Author FROM Review;
Review_ID Author
---------- ----------
john
jacob
rachael
Jean
kareem
may
Upon doing some more testing, it definitely seems to have problems with some of the text string. For example I tried adding last names, and get this result:
Review_ID Author
---------- ------------
1 john jacob
2 jacob richa
rae simon
Jean jak
5 kareem jabr
6 may flower
It does better, but still doesn't like a couple of them, i thought maybe something to do with three letters, but then "may" wouldn't be showing up, but indeed if I add a letter to "rae" and a letter to "jak" the 3 and 4 do actually show up in the Review_ID column. I noticed the same thing happens if a column contains a "$", like in "$173" for example. The text, I really can't figure out though, there doesn't seem to be an obvious absolute pattern to what it accepts and what it doesn't. I made up the names in order to simplify this post, but just to give a few more examples I'll include a some more examples from what I'm actually working with to show a few more strings that work and ones that don't.
1 everywhereman2
RW53
Marilyn1949
fallriverma
8 SweetwaterMill
AuntSusie006
13 Traveler34NewJe
madmatriarch
2 Savvytourist2
greatvictory
25 Lightsleeper999
strollaround
30 Lucygoosey1985
lesbriggs
3 miguelluna019
lulubaby
1 myassesdragon
tomu023
BrettOcean
46 A TripAdvisor M
dmills1956
julcarl
49 A TripAdvisor M
TSW42
lass=
After a dry run, remove the echo to take the safety off.
awk -v RS='' -v FS='\n?<[^>]+>' '{print $2 ":" $3}' \
/home/me/Downloads/test/* |while IFS=: read author date; do
echo sqlite review.sql "INSERT INTO Review(Author,Date) VALUES('$author','$date');"
done
This is a brittle awk-to-bash solution based on using the regular expression \n?<[^>]+> as Field Separator and a blank line '' as Record Separator. The FS expression means "optional newline followed by a string enclosed by angles." We then output the fields with a simple separator : and read them into bash.
You can make system calls in awk with system(), but it gets very messy very quickly. Better to export the clean data in this case.
The script with awk and printf is:
#!/bin/bash
sqlite3 review.sql "CREATE TABLE Review(Review_ID INTEGER PRIMARY KEY, Author TEXT, Date TEXT);"
path="/home/drew/Downloads/testcases/*"
for i in $path
do
total=$(grep -c '<Author>' $i)
count=1
while [ $count -le $total ]
do
date=$(grep -m$count '<Date>' $i | sed 's#<Date>##' | tail -n1)
author=$(grep -m$count '<Author>' $i | sed 's#<Author>##' | tail -n1 | awk '{printf "- %s -", $1}')
echo $author
((count++))
done
done
I'm not sure, but I feel like I shouldn't have to add echo to it with printf, but without it nothing prints. With the echo I get the following output:
-Jeanjakey
- kareem -
- may -
- RW53 -
-Marilyn1949
-AuntSusie006
-madmatriarch
-strollaround
-lulubaby
-tomu023
-julcarl
-slass
It seems to somewhat work, but the spacing disappears, any last names disappear, and it seems to do different things to different inputs. With the string concatenation I use the script:
#!/bin/bash
sqlite3 review.sql "CREATE TABLE Review(Review_ID INTEGER PRIMARY KEY, Author TEXT, Date TEXT);"
path="/home/drew/Downloads/testcases/*"
for i in $path
do
total=$(grep -c '<Author>' $i)
count=1
while [ $count -le $total ]
do
date=$(grep -m$count '<Date>' $i | sed 's#<Date>##' | tail -n1)
author2="- "
author2+=$(grep -m$count '<Author>' $i | sed 's#<Author>##' | tail -n1)
author2+=" -"
echo $author
((count++))
done
done
with this I get the output:
-Jeanjakey
-kareem jabron
-may flow she
-RW53
-Marilyn1949
-AuntSusie006
-madmatriarch
-strollaround
-lulubaby
-tomu023
-julcarl
-slass
and a string reassignment:
author2=$(grep -m$count '<Author>' $i | sed 's#<Author>##' | tail -n1)
author2="- $author2 - "
gives the same output.
Good day. Ive been trying to sort the following data from a txt file using shell script but as of now I`ve been unable to do so.
Here is what the data on the file looks like,:
Name:ID:Date
Clinton Mcdaniel:100:16/04/2016
Patience Mccarty:101:18/03/2013
Carol Holman:102:24/10/2013
Roth Lamb:103:11/02/2015
Chase Gardner:104:14/06/2014
Jacob Tucker:105:05/11/2013
Maite Barr:106:24/04/2014
Acton Galloway:107:18/01/2013
Helen Orr:108:10/05/2014
Avye Rose:109:07/06/2014
What i want to do is being able to sort this by Date instead of name or ID.
When i execute the following code i get this:
Code:
sort -t "/" -k3.9 -k3.4 -k3
Result:
Acton Galloway:107:18/01/2013
Amaya Lynn:149:11/08/2013
Anne Sullivan:190:12/01/2013
Bruno Hood:169:01/08/2013
Cameron Phelps:187:17/11/2013
Carol Holman:102:24/10/2013
Chaney Mcgee:183:11/09/2013
Drew Fowler:173:28/07/2013
Hadassah Green:176:17/01/2013
Jacob Tucker:105:05/11/2013
Jenette Morgan:160:28/11/2013
Lael Aguirre:148:29/05/2013
Lareina Morin:168:06/05/2013
Laura Mercado:171:06/06/2013
Leonard Richard:154:02/06/2013
As you can see it only sorts by the year, but the months and everything else are still a little out of place. Does anyone knows how to correctly sort this by date?
EDIT:
Well, I`ve found how to do it, answer below:
Code: sort -n -t":" -k3.9 -k3.4,3.5 -k3
Result:
Anne Sullivan:190:12/01/2013
Hadassah Green:176:17/01/2013
Acton Galloway:107:18/01/2013
Nasim Gonzalez:163:18/01/2013
Patience Mccarty:101:18/03/2013
Sacha Stevens:164:01/04/2013
Lareina Morin:168:06/05/2013
Lael Aguirre:148:29/05/2013
Leonard Richard:154:02/06/2013
Laura Mercado:171:06/06/2013
Drew Fowler:173:28/07/2013
Bruno Hood:169:01/08/2013
Virginia Puckett:144:08/08/2013
Moses Mckay:177:09/08/2013
Amaya Lynn:149:11/08/2013
Chaney Mcgee:183:11/09/2013
Willa Bond:153:22/09/2013
Oren Flores:184:27/09/2013
Olga Buckley:181:11/10/2013
Carol Holman:102:24/10/2013
Jacob Tucker:105:05/11/2013
Veda Gillespie:125:09/11/2013
Thor Workman:152:12/11/2013
Cameron Phelps:187:17/11/2013
Jenette Morgan:160:28/11/2013
Mason Contreras:129:29/12/2013
Martena Sosa:158:30/12/2013
Vivian Stevens:146:20/01/2014
Benedict Massey:175:02/03/2014
Macey Holden:127:01/04/2014
Orla Estrada:174:06/04/2014
Maite Barr:106:24/04/2014
Helen Orr:108:10/05/2014
Randall Colon:199:27/05/2014
Avye Rose:109:07/06/2014
Cleo Decker:117:12/06/2014
Chase Gardner:104:14/06/2014
Mark Lynn:113:21/06/2014
Geraldine Solis:197:24/06/2014
Thor Wheeler:180:25/06/2014
Aimee Martin:192:21/07/2014
Gareth Cervantes:166:26/08/2014
Serena Fernandez:122:24/09/2014
`
The sort you are using will fail for any date before year 2000 (e.g. 1999 will sort after 2098). Continuing from your question in the comment, you currently show
sort -n -t":" -k3.9 -k3.4,3.5 -k3
You should use
sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2
Explanation:
Your -t separates the fields on each colon. (':') The -k KEYDEF where KEYDEF is in the form f[.c][opt] (that's field.character option) (you need no separate option after character). Your date field is (field 3):
d d / m m / y y y y
1 2 3 4 5 6 7 8 9 0 -- chars counting from 1 in field 3
So you first sort by -k3.9 (the 9th character in field 3) which is the 2-digit year in the 4-digit field. You really want to sort on -k3.7 (which is the start of the 4-digit date)
You next sort by the month (characters 4,5) which is fine.
Lastly, you sort on -k3 (which fails to limit the characters considered). Just as you have limited the sort on the month to chars 4,5, you should limit the sort of the days to characters 1,2.
Putting that together gives you sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2. Hope that answers your question from the comment.
You're hamstrung by your (terrible, IMO) date format. Here's a bit of a Schwartzian transform:
awk -F'[:/]' '{printf "%s%s%s %s\n", $NF, $(NF-1), $(NF-2), $0}' file | sort -n | cut -d' ' -f2-
That extracts the year, month, day and adds it as a separate word to the start of each line. Then you can sort quite simply. Then discard that date.
I am completely new to shell scripting.
I need to change the format of given date to customized format like i have date in a variable with format as MM/DD/YY HH:MM:SS but i want date in a format as MM/DD/YYY HH:MM:SS which is having four digits of the year.
We can change the sys date format but i need the same in a variable.
My code as below
START_DATE="12/20/14 05:59:01"
yr=`echo $START_DATE | cut -d ' ' -f1 | cut -d '/' -f3`
yr_len=`echo $yr | wc -c`
if [ $yr_len -lt 4 ]
then
tmp_yr="20${yr}";
else
1=1;
fi
ln=`echo $tmp_yr|wc -c`
After this i strucked in reframe the same date in wanted format.
Can some one please help me
Regards,
Sai.
Using GNU date:
date -d'02/16/15 09:16:04' "+%m/%d/%Y %T"
produces
02/16/2015 09:16:04
which is what you want. See man date for details about the formatting, or this question for a number of great examples.
One option may be using the date/time functions inside awk. Here is a oneliner:
echo '02/16/15 09:16:04' | sed 's\[/:]\ \g' | awk '{d0=$3+2000FS$1FS$2FS$4FS$5FS$6; d1=mktime(d0);print strftime("%m/%d/%Y %T", d1) }'
output is:
02/16/2015 09:16:04
You can find more strftime formats in https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
I have this awk script that runs through a file and counts every occurrence of a given date. The date format in the original file is the standard date format, like this: Thu Mar 5 16:46:15 EST 2009 I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.
In order to get the output to be sorted by date, I converted the dates to a different format that I could sort with bash sort.
Now, my output looks like this:
Date Count
03/05/2009 2
03/06/2009 1
05/13/2009 7
05/22/2009 14
05/23/2009 7
05/25/2009 7
05/29/2009 11
06/02/2009 12
06/03/2009 16
I'd really like the output to have more human readable dates, like this:
Mar 5, 2009
Mar 6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun 2, 2009
Jun 3, 2009
Any suggestions for a way I could do this? If I could do this on the fly when I output the count values that would be best.
UPDATE:
Here's my solution incorporating ghostdog74's example code:
grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
{++total[$0]} #pump dates into associative array
END {
for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
}' | sort -t \t | awk ' #send to sort, then to cleanup
BEGIN {printf "%s\t%s\r\n","Date","Count"}
{t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
}'
rm dates.txt
Sorry this looks so messy. I've tried to put clarifying comments in.
Use awk's sort and date's stdin to greatly simplify the script
Date will accept input from stdin so you can eliminate one pipe to awk and the temporary file. You can also eliminate a pipe to sort by using awk's array sort and as a result, eliminate another pipe to awk. Also, there's no need for a coprocess.
This script uses date for the monthname conversion which would presumably continue to work in other languages (ignoring the timezone and month/day order issues, though).
The end result looks like "grep|date|awk". I have broken it into separate lines for readability (it would be about half as big if the comments were eliminated):
grep -i "E[DS]T 2009" original.txt |
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk '
BEGIN { printf "%s\t%s\r\n","Date","Count" }
{ ++total[$0] #pump dates into associative array }
END {
idx=1
for (item in total) {
d[idx]=item;idx++ # copy the array indices into the contents of a new array
}
c=asort(d) # sort the contents of the copy
for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
}
}'
I get testy when I see someone using grep and awk (and sed, cut, ...) in a pipeline. Awk can fully handle the work of many utilities.
Here's a way to clean up your updated code to run in a single instance of awk (well, gawk), and using sort as a co-process:
gawk '
BEGIN {
IGNORECASE = 1
}
function mon2num(mon) {
return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
}
/ E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
month=$2
day=$3
year=$6
date=sprintf("%4d%02d%02d", year, mon2num(month), day)
total[date]++
human[date] = sprintf("%3s %2d, %4d", month, day, year)
}
END {
sort_coprocess = "sort"
for (date in total) {
print date |& sort_coprocess
}
close(sort_coprocess, "to")
print "Date\tCount"
while ((sort_coprocess |& getline date) > 0) {
print human[date] "\t" total[date]
}
close(sort_coprocess)
}
' original.txt
if you are using gawk
awk 'BEGIN{
s="03/05/2009"
m=split(s,date,"/")
t=date[3]" "date[2]" "date[1]" 0 0 0"
print strftime("%b %d",mktime(t))
}'
the above is just an example, as you did not show your actual code and so cannot incorporate it into your code.
Why don't you prepend your awk-date to the original date? This yields a sortable key, but is human readable.
(Note: to sort right, you should make it yyyymmdd)
If needed, cut can remove the prepended column.
Gawk has strftime(). You can also call the date command to format them (man). Linux Forums gives some examples.