I'm wondering how do i sort this example based on time. I have already sorted it based on everything else, but i just cannot figure out how to go sort it using time (the 07:30 part for example).
My current code:
sort -t"_" -k3n -k2M -k5n (still need to implement the time sort for the last sort)
What still needs to be sorted is the time:
Dunaj_Dec_2000_day_1_13:00.jpg
Rim_Jan_2001_day_1_13:00.jpg
Ljubljana_Nov_2002_day_2_07:10.jpg
Rim_Jan_2003_day_3_08:40.jpg
Rim_Jan_2003_day_3_08:30.jpg
Any help or just a point in the right direction is greatly appreciated!
Alphabetically; 24h time with a fixed number of digits is okay to sort using a plain alphabetic sort.
sort -t"_" -k3n -k2M -k5n -k6 # default sorting
sort -t"_" -k3n -k2M -k5n -k6V # version-number sort.
There's also a version sort V which would work fine.
I have to admit to shamelessly stealing from this answer on SO:
How to split log file in bash based on time condition
awk -F'[_:.]' '
BEGIN {
months["Jan"] = 1
months["Feb"] = 2
months["Mar"] = 3
months["Apr"] = 4
months["May"] = 5
months["Jun"] = 6
months["Jul"] = 7
months["Aug"] = 8
months["Sep"] = 9
months["Oct"] = 10
months["Nov"] = 11
months["Dec"] = 12
}
{ print mktime($3" "months[$2]" "$5" "$6" "$7" 00"), $0 }
' input | sort -n | cut -d\ -f2-
Use _:.\ field separator characters to parse each file name.
Initialize an associative array so we can map month names to numerical values (1-12)
Uses awk function mktime() - it takes a string in the format of "YYYY MM DD HH MM SS [ DST ]" as per https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html. Each line of input is print with a column prepending with the time in epoch seconds.
The results are piped to sort -n which will sort numerically the first column
Now that the results are sorted, we can remove the first column with cut
I have a MAC, so I had to use gawk to get the mktime function (it's not available with MacOS awk normally ). mawk is another option I've read.
I tried this solution to my list and I can't get what I want after sorting.
I got list:
m_2_mdot_3_a_1.dat ro= 303112.12
m_1_mdot_2_a_0.dat ro= 300.10
m_2_mdot_1_a_3.dat ro= 221.33
m_3_mdot_1_a_1.dat ro= 22021.87
I used sort -k 2 -n >name.txt
I would like to get list from the lowest ro to the highest ro. What I did wrong?
I got a sorting but by the names of 1 column or by last value but like: 1000, 100001, 1000.2 ... It sorted like by only 4 meaning numbers or something.
cat test.txt | tr . , | sort -k3 -g | tr , .
The following link gave a good answer Sort scientific and float
In brief,
you need -g option to sort on decimal numbers;
the -k option start
from 1 not 0;
and by default locale, sort use , as seperator
for decimal instead of .
However, be careful if your name.txt contains , characters
Since there's a space or a tab between ro= and the numeric value, you need to sort on the 3rd column instead of the 2nd. So your command will become:
cat input.txt | sort -k 3 -n
I know you can use field delimiters to break up a field in AWK, however I have a question regarding a string without any delimiters. I need to process to following data, and I'm not sure how to start:
RyanWehe989987412rwehe#asu.edu2025550126CO2001BlakeStDenver80205
JosephLee605497184josephl#mailinator.com3035550103CO5986BudweiserWayAlamosa81101
AmyJohnson783333251amyj#mailinator.com6515550164MN14N5thStMinneapolis55403
DanielJEverhard314849866everhard#asu.edu5059358554NM8830JohnsonRdAlbuquerque87122
PhilipEPeterson325764011peterson#asu.edu4561238888WA542468thAveLacey98513
MattVNulk124085733nulk#asu.edu2093865442KSManhattanStRiley87512
BrandonTLyons123456123btlyons1#asu.edu5755595459AZ635WElmStMesa85212
RogerATurtle983421567rat#gmail.com8587754321IA3400SWIslanDrdDesmoines50021
MarcJWhiz745629754marcwhiz76#yahoo.com6195323200CA215NCollegeGroveWaySandiego91210
I want to format the raw data into this:
Ryan Wehe, 989-98-7412
2001 Blake St
Denver, CO 80205
wehe#asu.edu
(202) 555-0126
Joseph Lee, 605-49-7184
5986 Budweiser Way
Alamosa, CO 81101
josephl#mailinator.com
(303) 555-0103
AmyJohnson, 783-33-3251
14 N 5th St
Minneapolis, MN 55403
amyj#mailinator.com
(651) 555-0164
To the best of my knowledge, Awk provides no facilitity for using capture groups to define the field separator.
In consideration of this I think a quick hack might be your best option:
cat addresses.txt | perl -ne '/([A-Z][[:lower:]]*)([A-Z]*[[:lower:]]*)([0-9]{9})(.*?\.\w{2,3})([0-9]{10})(.*?)([0-9]{5})/ && print "$1 $2 $3 $4 $5 $6\n"'
Which returns this:
Ryan Wehe 989987412 rwehe#asu.edu 2025550126 CO2001BlakeStDenver 80205
Joseph Lee 605497184 josephl#mailinator.com 3035550103 CO5986BudweiserWayAlamosa 81101
Amy Johnson 783333251 amyj#mailinator.com 6515550164 MN14N5thStMinneapolis 55403
Daniel JEverhard 314849866 everhard#asu.edu 5059358554 NM8830JohnsonRdAlbuquerque 87122
Philip EPeterson 325764011 peterson#asu.edu 4561238888 WA 54246
Matt VNulk 124085733 nulk#asu.edu 2093865442 KSManhattanStRiley 87512
Brandon TLyons 123456123 btlyons1#asu.edu 5755595459 AZ635WElmStMesa 85212
Roger ATurtle 983421567 rat#gmail.com 8587754321 IA3400SWIslanDrdDesmoines 50021
Marc JWhiz 745629754 marcwhiz76#yahoo.com 6195323200 CA215NCollegeGroveWaySandiego 91210
Your answer uses both formats so I was unsure if you you need to break names apart (i.e Ryan Wehe instead of RyanWehe), adjusting it to this is fairly straitforward.
I am running the following command to process some CSV data
grep -i "area harvested.*2005" ps1_apricot_countries_2005.csv | sed 's/\"//g'
This results in the following output (top 7 records shown only as sample):
Afghanistan,31,Area Harvested,2005,Ha,5200.00000,F
Africa +,31,Area Harvested,2005,Ha,59536.00000,A
Albania,31,Area Harvested,2005,Ha,400.00000,F
Algeria,31,Area Harvested,2005,Ha,22888.00000,
Americas +,31,Area Harvested,2005,Ha,11496.00000,A
Argentina,31,Area Harvested,2005,Ha,2200.00000,F
Armenia,31,Area Harvested,2005,Ha,5300.00000,
Asia +,31,Area Harvested,2005,Ha,272644.00000,A
As it can be seen this is sorted alphabetically on the first column.
I am trying to pipe this into sort so that I can sort the above data in descending order based on the 6th numeric comma separated column.
I tried:
grep -i "area harvested.*2005" ps1_apricot_countries_2005.csv | sed 's/\"//g' | sort -k6rn
However this resulted in the following (top 7 records shown only as sample):
Afghanistan,31,Area Harvested,2005,Ha,5200.00000,F
Africa +,31,Area Harvested,2005,Ha,59536.00000,A
Albania,31,Area Harvested,2005,Ha,400.00000,F
Algeria,31,Area Harvested,2005,Ha,22888.00000,
Americas +,31,Area Harvested,2005,Ha,11496.00000,A
Argentina,31,Area Harvested,2005,Ha,2200.00000,F
Armenia,31,Area Harvested,2005,Ha,5300.00000,
It still appears to be sorted on the first column and not the 6th column in descending order. Could anyone please explain how to correct the approach above to achieve this?
You can use this sort:
sort -t, -rnk6
to sort on 6th numeric field descending, delimited by ,.
-t, is used to tell sort that fields are delimited by comma.
-rnk6 is used to sort in reverse numerical order on field 6
This will give this output:
Asia +,31,Area Harvested,2005,Ha,272644.00000,A
Africa +,31,Area Harvested,2005,Ha,59536.00000,A
Algeria,31,Area Harvested,2005,Ha,22888.00000,
Americas +,31,Area Harvested,2005,Ha,11496.00000,A
Armenia,31,Area Harvested,2005,Ha,5300.00000,
Afghanistan,31,Area Harvested,2005,Ha,5200.00000,F
Argentina,31,Area Harvested,2005,Ha,2200.00000,F
Albania,31,Area Harvested,2005,Ha,400.00000,F
Currently I have a bunch of names that are tied to numbers, for example:
Joe Bloggs - 17
John Smith - 23
Paul Smith - 24
Joe Bloggs - 32
Using the name and the number I'd like to generate a random/unique ID made of 4 numbers that also ends with the initial number.
So for example, Joe Bloggs and 17 would make something random/unique like: xxxx17.
Is this possible in bash? Would it be better in some other language?
This would be used on debian and darwin based systems.
It is impossible to ensure than 4-digit hash (checksum) would be unique for a set of 10 character long names.
As an alternative, you can try
file="./somefile"
paste -d"\0\n" <(seq -f "%04g" 9999 | sort -R | head -$(grep -c '' "$file")) <(grep -oP '\d+' "$file")
for better readability
paste -d"\0\n" <(
seq -f "%04g" 9999 | gsort -R | head -$(grep -c '' "$file")
) <(
grep -oP '\d+' "$file"
)
for your input produces something like:
010817
161523
748024
269032
All lines are in the form RRRRXX, where:
the RRRR is an guaranteed unique and random number (from the range 0001 up to 9999)
the XX is the number from your input
decomposition:
seq produces 9999 4-digit numbers (ofc, each number is unique)
sort -R sorts the lines in random order (based on their hash, so get unique random numbers)
head - from the random list show only first N lines, where the N is the number of lines in your file,
the number of lines is counted by grep -c '' (better than wc -l)
the grep -oP filters the numbers from your file
finally the paste combines the two inputs to the final output
the <(..) <(..) is process substitution
Each name, after you add their number, becomes unique already unless there are two Joe Bloggs 17. In your case, there are two Joe Bloggs, one with 17 and 32. Put those together, you have uniqueness "Joe Bloggs 17" and "Joe Bloggs 32" are not the same. Using this, you can simply assign a number to each name + number pair and remember that number in an associative array (dictionary). No need to be random. When you find a name that isn't already in the dictionary, just keep incrementing the number and, then, associate the new number with the name. If uniqueness is the only goal, then you are in good shape for 10,000 people.
Python is a great language for this, but you can make associative arrays in BASH too.
You can get very close to doing exactly what you want using the random string generated by $(date +%N) and then selecting 4 digits to use as the first for characters in the new ID. You can choose from the beginning if you want IDs that are closer together, or from the mid part of the string for more randomness. After selecting your random 4, then just keep track of the ones used in an array and check against the array as each new ID is assigned. This overhead is negligible for 10,000 or so IDs:
#!/bin/bash
declare -a used4=0 # array to hold IDs you have assigned
declare -i dupid=0 # a flag to prompt regeneration in case of a dup
while read -r line || [ -n "$line" ]; do
name=${line% -*}
id2=${line##* }
while [ $dupid -eq 0 ]; do
ns=$(date +%N) # fill variable with nanoseconds
fouri=${ns:4:4} # take 4 integers (mid 4 for better randomness)
# test for duplicate (this is BASH only test - use loop if portability needed)
[[ "$fouri" =~ "${used4[#]}" ]] && continue
newid="${fouri}${id2}" # contatinate 4ints + orig 2 digit id
used4+=( "$fouri" ) # add 4ints to used4 array
dupid=1
done
dupid=0 # reset flag
printf "%s => %s\n" "$line" "$newid"
done<"$1"
output:
$ bash fourid.sh dat/nameid.dat
Joe Bloggs - 17 => 762117
John Smith - 23 => 603623
Paul Smith - 24 => 210424
Joe Bloggs - 32 => 504732