I am new to bash scripting and I have written a script to match regex and output lines to print to a file.
However, each line contains multiple columns, one of which is the timestamp column, which appears in the form YYYYMMDDHHMMSSTTT (to millisecond) as shown below.
20180301050630663,ABC,,,,,,,,,,
20180301050630664,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630666,ABC,,,,,,,,,,
20180301050630667,ABC,,,,,,,,,,
20180301050630668,ABC,,,,,,,,,,
20180301050630663,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630661,ABC,,,,,,,,,,
20180301050630662,ABC,,,,,,,,,,
My code is written as follow:
awk -F "," -v OFS=","'{if($2=="ABC"){print}}' < $i>> "$filename"
How can I modify my code such that it can sort the rows by timestamp (YYYYMMDDHHMMSSTTT) in ascending order before printing to file?
You can use a very simple sort command, e.g.
sort yourfile
If you want to insure sort only looks at the datestamp, you can tell sort to only use the first command separated field as your sorting criteria, e.g.
sort -t, -k1 yourfile
Example Use/Output
With your data save in a file named log, you could do:
$ sort -t, -k1 log
20180301050630661,ABC,,,,,,,,,,
20180301050630662,ABC,,,,,,,,,,
20180301050630663,ABC,,,,,,,,,,
20180301050630663,ABC,,,,,,,,,,
20180301050630664,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630666,ABC,,,,,,,,,,
20180301050630667,ABC,,,,,,,,,,
20180301050630668,ABC,,,,,,,,,,
Let me know if you have any problems.
Just add a pipeline.
awk -F "," '$2=="ABC"' < "$i" |
sort -n >> "$filename"
In the general case, to sort on column 234. try sort -t, -k234,234n
Notice alse the quoting around "$i", like you already have around "$filename", and the simplifications of the Awk script.
If you are using gawk you can do:
$ awk -F "," -v OFS="," '$2=="ABC"{a[$1]=$0} # Filter lines that have "ABC"
END{ # set the sort method
PROCINFO["sorted_in"] = "#ind_num_asc"
for (e in a) print a[e] # traverse the array of lines
}' file
An alternative is to use sed and sort:
sed -n '/^[0-9]*,ABC,/p' file | sort -t, -k1 -n
Keep in mind that both of these methods are unrelated to the shell used. Bash is just executing the tools (sed, awk, sort, etc) that are otherwise part of the OS.
Bash itself could do the sort in pure Bash but it would be long and slow.
I am trying to sort files in a directory, depending on the 'date string' attached in the file name, for example files looks as below
SSA_F12_05122013.request.done
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
Where 05122013,12142012 and 01062013 represents the dates in format.
Please help me in providing a unix shell script to sort these files on the date string present in their file name(in descending and ascending order).
Thanks in advance.
Hmmm... why call on heavyweights like awk and Perl when sort itself has the capability to define what exactly to sort by?
ls SSA_F*.request.done | sort -k 1.13,1.16 -k 1.9,1.10 -k 1.11,1.12
Each -k option defines a "sort key":
-k 1.13,1.16
This defines a sort key ranging from field 1, column 13 to field 1, column 16. (A field is by default delimited by whitespace, which your filenames don't have.)
If your filenames are varying in length, defining the underscore as field separator (using the -t option) and then addressing columns in the third field would be the way to go.
Refer to man sort for details. Use the -r option to sort in descending order.
one way with awk and sort:
ls -1|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|sort|awk '$0=$NF'
if we break it down:
ls -1|
awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
the ls -1 just example. I think you have your way to get the file list, one per line.
test a little bit:
kent$ echo "SSA_F13_12142012.request.done
SSA_F12_05122013.request.done
SSA_F14_01062013.request.done"|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
SSA_F12_05122013.request.done
ls -lrt *.done | perl -lane '#a=split /_|\./,$F[scalar(#F)-1];$a[2]=~s/(..)(..)(....)/$3$2$1/g;print $a[2]." ".$_' | sort -rn | awk '{$1=""}1'
ls *.done | perl -pe 's/^.*_(..)(..)(....)/$3$2$1$&/' | sort -rn | cut -b9-
this would do +
I am trying to learn bash/shell *nix commands /scripting.
So rather than writing a python program, I thought of trying it out using bash/awk etc but am having a hard time.
I have a huge text (its actually csv )file
id_1, id_2, some attributes.
I want to sort this file based on id2?
how do i do this?
Thanks
Use the --key option for sort.
For example, the following sorts input.csv on the second field (using comma as a field separator) and writes the output to output.csv.
sort --key=2,2 -t',' input.csv > output.csv
p.s. Don't forget to use the -n option if you're doing a numerical sort.
For more info, see the man page for sort.
You can use -k option of sort(1)
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
sort -t, -k2 filename.csv
I don't have a shell to verify, but basically you need to specify the separator and the sort key
checkout the command cut:
cat file.cvs | cut -d";" -f 2 | sort
I assumed your csv is semi-colon separated, but you can change it.
Save into a different name:
cat file.cvs | cut -d";" -f 2 | sort > newfile.txt