Sorting by Date in Shell

Sorting by Date in Shell - shell

Good day. Ive been trying to sort the following data from a txt file using shell script but as of now I`ve been unable to do so.
Here is what the data on the file looks like,:
Name:ID:Date
Clinton Mcdaniel:100:16/04/2016
Patience Mccarty:101:18/03/2013
Carol Holman:102:24/10/2013
Roth Lamb:103:11/02/2015
Chase Gardner:104:14/06/2014
Jacob Tucker:105:05/11/2013
Maite Barr:106:24/04/2014
Acton Galloway:107:18/01/2013
Helen Orr:108:10/05/2014
Avye Rose:109:07/06/2014
What i want to do is being able to sort this by Date instead of name or ID.
When i execute the following code i get this:
Code:
sort -t "/" -k3.9 -k3.4 -k3
Result:
Acton Galloway:107:18/01/2013
Amaya Lynn:149:11/08/2013
Anne Sullivan:190:12/01/2013
Bruno Hood:169:01/08/2013
Cameron Phelps:187:17/11/2013
Carol Holman:102:24/10/2013
Chaney Mcgee:183:11/09/2013
Drew Fowler:173:28/07/2013
Hadassah Green:176:17/01/2013
Jacob Tucker:105:05/11/2013
Jenette Morgan:160:28/11/2013
Lael Aguirre:148:29/05/2013
Lareina Morin:168:06/05/2013
Laura Mercado:171:06/06/2013
Leonard Richard:154:02/06/2013
As you can see it only sorts by the year, but the months and everything else are still a little out of place. Does anyone knows how to correctly sort this by date?
EDIT:
Well, I`ve found how to do it, answer below:
Code: sort -n -t":" -k3.9 -k3.4,3.5 -k3
Result:
Anne Sullivan:190:12/01/2013
Hadassah Green:176:17/01/2013
Acton Galloway:107:18/01/2013
Nasim Gonzalez:163:18/01/2013
Patience Mccarty:101:18/03/2013
Sacha Stevens:164:01/04/2013
Lareina Morin:168:06/05/2013
Lael Aguirre:148:29/05/2013
Leonard Richard:154:02/06/2013
Laura Mercado:171:06/06/2013
Drew Fowler:173:28/07/2013
Bruno Hood:169:01/08/2013
Virginia Puckett:144:08/08/2013
Moses Mckay:177:09/08/2013
Amaya Lynn:149:11/08/2013
Chaney Mcgee:183:11/09/2013
Willa Bond:153:22/09/2013
Oren Flores:184:27/09/2013
Olga Buckley:181:11/10/2013
Carol Holman:102:24/10/2013
Jacob Tucker:105:05/11/2013
Veda Gillespie:125:09/11/2013
Thor Workman:152:12/11/2013
Cameron Phelps:187:17/11/2013
Jenette Morgan:160:28/11/2013
Mason Contreras:129:29/12/2013
Martena Sosa:158:30/12/2013
Vivian Stevens:146:20/01/2014
Benedict Massey:175:02/03/2014
Macey Holden:127:01/04/2014
Orla Estrada:174:06/04/2014
Maite Barr:106:24/04/2014
Helen Orr:108:10/05/2014
Randall Colon:199:27/05/2014
Avye Rose:109:07/06/2014
Cleo Decker:117:12/06/2014
Chase Gardner:104:14/06/2014
Mark Lynn:113:21/06/2014
Geraldine Solis:197:24/06/2014
Thor Wheeler:180:25/06/2014
Aimee Martin:192:21/07/2014
Gareth Cervantes:166:26/08/2014
Serena Fernandez:122:24/09/2014
`

The sort you are using will fail for any date before year 2000 (e.g. 1999 will sort after 2098). Continuing from your question in the comment, you currently show
sort -n -t":" -k3.9 -k3.4,3.5 -k3
You should use
sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2
Explanation:
Your -t separates the fields on each colon. (':') The -k KEYDEF where KEYDEF is in the form f[.c][opt] (that's field.character option) (you need no separate option after character). Your date field is (field 3):
d d / m m / y y y y
1 2 3 4 5 6 7 8 9 0 -- chars counting from 1 in field 3
So you first sort by -k3.9 (the 9th character in field 3) which is the 2-digit year in the 4-digit field. You really want to sort on -k3.7 (which is the start of the 4-digit date)
You next sort by the month (characters 4,5) which is fine.
Lastly, you sort on -k3 (which fails to limit the characters considered). Just as you have limited the sort on the month to chars 4,5, you should limit the sort of the days to characters 1,2.
Putting that together gives you sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2. Hope that answers your question from the comment.

You're hamstrung by your (terrible, IMO) date format. Here's a bit of a Schwartzian transform:
awk -F'[:/]' '{printf "%s%s%s %s\n", $NF, $(NF-1), $(NF-2), $0}' file | sort -n | cut -d' ' -f2-
That extracts the year, month, day and adds it as a separate word to the start of each line. Then you can sort quite simply. Then discard that date.

Related

How do I sort a "MON_YYYY_day_NUM" time with UNIX tools?

I'm wondering how do i sort this example based on time. I have already sorted it based on everything else, but i just cannot figure out how to go sort it using time (the 07:30 part for example).
My current code:
sort -t"_" -k3n -k2M -k5n (still need to implement the time sort for the last sort)
What still needs to be sorted is the time:
Dunaj_Dec_2000_day_1_13:00.jpg
Rim_Jan_2001_day_1_13:00.jpg
Ljubljana_Nov_2002_day_2_07:10.jpg
Rim_Jan_2003_day_3_08:40.jpg
Rim_Jan_2003_day_3_08:30.jpg
Any help or just a point in the right direction is greatly appreciated!

Alphabetically; 24h time with a fixed number of digits is okay to sort using a plain alphabetic sort.
sort -t"_" -k3n -k2M -k5n -k6 # default sorting
sort -t"_" -k3n -k2M -k5n -k6V # version-number sort.
There's also a version sort V which would work fine.

I have to admit to shamelessly stealing from this answer on SO:
How to split log file in bash based on time condition
awk -F'[_:.]' '
BEGIN {
months["Jan"] = 1
months["Feb"] = 2
months["Mar"] = 3
months["Apr"] = 4
months["May"] = 5
months["Jun"] = 6
months["Jul"] = 7
months["Aug"] = 8
months["Sep"] = 9
months["Oct"] = 10
months["Nov"] = 11
months["Dec"] = 12
}
{ print mktime($3" "months[$2]" "$5" "$6" "$7" 00"), $0 }
' input | sort -n | cut -d\ -f2-
Use _:.\ field separator characters to parse each file name.
Initialize an associative array so we can map month names to numerical values (1-12)
Uses awk function mktime() - it takes a string in the format of "YYYY MM DD HH MM SS [ DST ]" as per https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html. Each line of input is print with a column prepending with the time in epoch seconds.
The results are piped to sort -n which will sort numerically the first column
Now that the results are sorted, we can remove the first column with cut
I have a MAC, so I had to use gawk to get the mktime function (it's not available with MacOS awk normally ). mawk is another option I've read.

Sorting using -k

I tried this solution to my list and I can't get what I want after sorting.
I got list:
m_2_mdot_3_a_1.dat ro= 303112.12
m_1_mdot_2_a_0.dat ro= 300.10
m_2_mdot_1_a_3.dat ro= 221.33
m_3_mdot_1_a_1.dat ro= 22021.87
I used sort -k 2 -n >name.txt
I would like to get list from the lowest ro to the highest ro. What I did wrong?
I got a sorting but by the names of 1 column or by last value but like: 1000, 100001, 1000.2 ... It sorted like by only 4 meaning numbers or something.

cat test.txt | tr . , | sort -k3 -g | tr , .
The following link gave a good answer Sort scientific and float
In brief,
you need -g option to sort on decimal numbers;
the -k option start
from 1 not 0;
and by default locale, sort use , as seperator
for decimal instead of .
However, be careful if your name.txt contains , characters

Since there's a space or a tab between ro= and the numeric value, you need to sort on the 3rd column instead of the 2nd. So your command will become:
cat input.txt | sort -k 3 -n

Break up a string using AWK or PERL

I know you can use field delimiters to break up a field in AWK, however I have a question regarding a string without any delimiters. I need to process to following data, and I'm not sure how to start:
RyanWehe989987412rwehe#asu.edu2025550126CO2001BlakeStDenver80205
JosephLee605497184josephl#mailinator.com3035550103CO5986BudweiserWayAlamosa81101
AmyJohnson783333251amyj#mailinator.com6515550164MN14N5thStMinneapolis55403
DanielJEverhard314849866everhard#asu.edu5059358554NM8830JohnsonRdAlbuquerque87122
PhilipEPeterson325764011peterson#asu.edu4561238888WA542468thAveLacey98513
MattVNulk124085733nulk#asu.edu2093865442KSManhattanStRiley87512
BrandonTLyons123456123btlyons1#asu.edu5755595459AZ635WElmStMesa85212
RogerATurtle983421567rat#gmail.com8587754321IA3400SWIslanDrdDesmoines50021
MarcJWhiz745629754marcwhiz76#yahoo.com6195323200CA215NCollegeGroveWaySandiego91210
I want to format the raw data into this:
Ryan Wehe, 989-98-7412
2001 Blake St
Denver, CO 80205
wehe#asu.edu
(202) 555-0126
Joseph Lee, 605-49-7184
5986 Budweiser Way
Alamosa, CO 81101
josephl#mailinator.com
(303) 555-0103
AmyJohnson, 783-33-3251
14 N 5th St
Minneapolis, MN 55403
amyj#mailinator.com
(651) 555-0164

To the best of my knowledge, Awk provides no facilitity for using capture groups to define the field separator.
In consideration of this I think a quick hack might be your best option:
cat addresses.txt | perl -ne '/([A-Z][[:lower:]]*)([A-Z]*[[:lower:]]*)([0-9]{9})(.*?\.\w{2,3})([0-9]{10})(.*?)([0-9]{5})/ && print "$1 $2 $3 $4 $5 $6\n"'
Which returns this:
Ryan Wehe 989987412 rwehe#asu.edu 2025550126 CO2001BlakeStDenver 80205
Joseph Lee 605497184 josephl#mailinator.com 3035550103 CO5986BudweiserWayAlamosa 81101
Amy Johnson 783333251 amyj#mailinator.com 6515550164 MN14N5thStMinneapolis 55403
Daniel JEverhard 314849866 everhard#asu.edu 5059358554 NM8830JohnsonRdAlbuquerque 87122
Philip EPeterson 325764011 peterson#asu.edu 4561238888 WA 54246
Matt VNulk 124085733 nulk#asu.edu 2093865442 KSManhattanStRiley 87512
Brandon TLyons 123456123 btlyons1#asu.edu 5755595459 AZ635WElmStMesa 85212
Roger ATurtle 983421567 rat#gmail.com 8587754321 IA3400SWIslanDrdDesmoines 50021
Marc JWhiz 745629754 marcwhiz76#yahoo.com 6195323200 CA215NCollegeGroveWaySandiego 91210
Your answer uses both formats so I was unsure if you you need to break names apart (i.e Ryan Wehe instead of RyanWehe), adjusting it to this is fairly straitforward.

Sorting on nth column bash issue

I am running the following command to process some CSV data
grep -i "area harvested.*2005" ps1_apricot_countries_2005.csv | sed 's/\"//g'
This results in the following output (top 7 records shown only as sample):
Afghanistan,31,Area Harvested,2005,Ha,5200.00000,F
Africa +,31,Area Harvested,2005,Ha,59536.00000,A
Albania,31,Area Harvested,2005,Ha,400.00000,F
Algeria,31,Area Harvested,2005,Ha,22888.00000,
Americas +,31,Area Harvested,2005,Ha,11496.00000,A
Argentina,31,Area Harvested,2005,Ha,2200.00000,F
Armenia,31,Area Harvested,2005,Ha,5300.00000,
Asia +,31,Area Harvested,2005,Ha,272644.00000,A
As it can be seen this is sorted alphabetically on the first column.
I am trying to pipe this into sort so that I can sort the above data in descending order based on the 6th numeric comma separated column.
I tried:
grep -i "area harvested.*2005" ps1_apricot_countries_2005.csv | sed 's/\"//g' | sort -k6rn
However this resulted in the following (top 7 records shown only as sample):
Afghanistan,31,Area Harvested,2005,Ha,5200.00000,F
Africa +,31,Area Harvested,2005,Ha,59536.00000,A
Albania,31,Area Harvested,2005,Ha,400.00000,F
Algeria,31,Area Harvested,2005,Ha,22888.00000,
Americas +,31,Area Harvested,2005,Ha,11496.00000,A
Argentina,31,Area Harvested,2005,Ha,2200.00000,F
Armenia,31,Area Harvested,2005,Ha,5300.00000,
It still appears to be sorted on the first column and not the 6th column in descending order. Could anyone please explain how to correct the approach above to achieve this?

You can use this sort:
sort -t, -rnk6
to sort on 6th numeric field descending, delimited by ,.
-t, is used to tell sort that fields are delimited by comma.
-rnk6 is used to sort in reverse numerical order on field 6
This will give this output:
Asia +,31,Area Harvested,2005,Ha,272644.00000,A
Africa +,31,Area Harvested,2005,Ha,59536.00000,A
Algeria,31,Area Harvested,2005,Ha,22888.00000,
Americas +,31,Area Harvested,2005,Ha,11496.00000,A
Armenia,31,Area Harvested,2005,Ha,5300.00000,
Afghanistan,31,Area Harvested,2005,Ha,5200.00000,F
Argentina,31,Area Harvested,2005,Ha,2200.00000,F
Albania,31,Area Harvested,2005,Ha,400.00000,F

Generate ID number from a name in bash

Currently I have a bunch of names that are tied to numbers, for example:
Joe Bloggs - 17
John Smith - 23
Paul Smith - 24
Joe Bloggs - 32
Using the name and the number I'd like to generate a random/unique ID made of 4 numbers that also ends with the initial number.
So for example, Joe Bloggs and 17 would make something random/unique like: xxxx17.
Is this possible in bash? Would it be better in some other language?
This would be used on debian and darwin based systems.

It is impossible to ensure than 4-digit hash (checksum) would be unique for a set of 10 character long names.
As an alternative, you can try
file="./somefile"
paste -d"\0\n" <(seq -f "%04g" 9999 | sort -R | head -$(grep -c '' "$file")) <(grep -oP '\d+' "$file")
for better readability
paste -d"\0\n" <(
seq -f "%04g" 9999 | gsort -R | head -$(grep -c '' "$file")
) <(
grep -oP '\d+' "$file"
)
for your input produces something like:
010817
161523
748024
269032
All lines are in the form RRRRXX, where:
the RRRR is an guaranteed unique and random number (from the range 0001 up to 9999)
the XX is the number from your input
decomposition:
seq produces 9999 4-digit numbers (ofc, each number is unique)
sort -R sorts the lines in random order (based on their hash, so get unique random numbers)
head - from the random list show only first N lines, where the N is the number of lines in your file,
the number of lines is counted by grep -c '' (better than wc -l)
the grep -oP filters the numbers from your file
finally the paste combines the two inputs to the final output
the <(..) <(..) is process substitution

Each name, after you add their number, becomes unique already unless there are two Joe Bloggs 17. In your case, there are two Joe Bloggs, one with 17 and 32. Put those together, you have uniqueness "Joe Bloggs 17" and "Joe Bloggs 32" are not the same. Using this, you can simply assign a number to each name + number pair and remember that number in an associative array (dictionary). No need to be random. When you find a name that isn't already in the dictionary, just keep incrementing the number and, then, associate the new number with the name. If uniqueness is the only goal, then you are in good shape for 10,000 people.
Python is a great language for this, but you can make associative arrays in BASH too.

You can get very close to doing exactly what you want using the random string generated by $(date +%N) and then selecting 4 digits to use as the first for characters in the new ID. You can choose from the beginning if you want IDs that are closer together, or from the mid part of the string for more randomness. After selecting your random 4, then just keep track of the ones used in an array and check against the array as each new ID is assigned. This overhead is negligible for 10,000 or so IDs:
#!/bin/bash
declare -a used4=0 # array to hold IDs you have assigned
declare -i dupid=0 # a flag to prompt regeneration in case of a dup
while read -r line || [ -n "$line" ]; do
name=${line% -*}
id2=${line##* }
while [ $dupid -eq 0 ]; do
ns=$(date +%N) # fill variable with nanoseconds
fouri=${ns:4:4} # take 4 integers (mid 4 for better randomness)
# test for duplicate (this is BASH only test - use loop if portability needed)
[[ "$fouri" =~ "${used4[#]}" ]] && continue
newid="${fouri}${id2}" # contatinate 4ints + orig 2 digit id
used4+=( "$fouri" ) # add 4ints to used4 array
dupid=1
done
dupid=0 # reset flag
printf "%s => %s\n" "$line" "$newid"
done<"$1"
output:
$ bash fourid.sh dat/nameid.dat
Joe Bloggs - 17 => 762117
John Smith - 23 => 603623
Paul Smith - 24 => 210424
Joe Bloggs - 32 => 504732

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sorting by Date in Shell - shell

Related

How do I sort a "MON_YYYY_day_NUM" time with UNIX tools?

Sorting using -k

Break up a string using AWK or PERL

Sorting on nth column bash issue

Generate ID number from a name in bash

Categories

Resources