How to sort data according to the date in bash? - bash

I need to write a bash program that sorts the data according to the date and displays the name of the person who recently joined the organization.
I have an employees.txt file with data in it with delimiter |. But when I am trying to sort the data using sort command like
sort -t'|' -k5,5 employees.txt | head -1 | cut -d'|' -f2
this is only sorting according to the first column of the whole date i.e DD-MM-YYYY sorting only on DD.
employees.txt File data format
ID | NAME | POST | DEPARTMENT | JOINING DATE | SALARY
101 | Jhon McClare | Manager | Content | 23-02-2001 | 83000
102 | Alena Croft | Snr. Manager | Accounts | 01-01-2019 | 88888
103 | Jeremy | Director | Sales | 20-03-2012 | 89786
104 | Williams | Manager | Marketing | 23-06-2001 | 73000
The above image should give Alena Croft as the answer.

The relevant field be must rendered suitable for sorting, that is, in the form of YYYY-MM-DD, using a utility such as sed or awk. For example, with GNU sed:
sed -E 's/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\2-\1/' employees.txt |
sort -r -t'|' -k5,5 | head -n1 | cut -d'|' -f2

The trick is to change the format of date to YYYY-MM-DD.
$ cat people.txt | sed -E 's/([0-9]+)\-([0-9]+)\-([0-9]+)/\3\-\2\-\1/' | sort -t'|' -k5,5r | head -1 | cut -d'|' -f2
Alena Croft
Also note that when sorting we need to do in reverse order (descending order) since we want the most recent date.

Related

Redirect the table generated from Beeline to text file without the grid (Shell Script)

I am currently trying to find a way to redirect the standard output from beeline shell to text file without the grid. The biggest problem I am facing right now is that my columns have negative values and when I'm using regex to remove the '-', it is affecting the column values.
+-------------------+
| col |
+-------------------+
| -100 |
| 22 |
| -120 |
| -190 |
| -800 |
+-------------------+
Here's what I'm doing:
beeline -u jdbc:hive2://localhost:10000/default \
-e "SELECT * FROM $db.$tbl;" | sed 's/\+//g' | sed 's/\-//g' | sed 's/\|//g' > table.txt
I am trying to clean this file so I can read all the data into a variable.
Assumming all your data has the same pattern , where no significant '-' are wrapped in '+' :
[root#machine]# cat boo
+-------------------+
| col |
+-------------------+
| -100 |
| 22 |
| -120 |
| -190 |
| -800 |
+-------------------+
[root#machine]# cat boo | sed 's/\+-*+//g' | sed 's/\--//g' | sed 's/|//g'
col
-100
22
-120
-190
-800

Bash extract strings between two characters

I have the output of query result into a bash variable, stored as a single line.
-------------------------------- | NAME | TEST_DATE | ----------------
--------------------- | TESTTT_1 | 2019-01-15 | | TEST_2 | 2018-02-16 | | TEST_NAME_3 | 2020-03-17 | -------------------------------------
I would like to ignore the column names(NAME | TEST_DATE) and store actual values of each name and test_date as a tuple in an array.
So here is the logic I am thinking, I would like to extract third string onwards between two '|' characters. These strings are comma separated and when a space is encountered we start the next tuple in the array.
Expected output:
array=(TESTTT_1,2019-01-15 TEST_2,2018-02-16 TEST_NAME_3,2020-03-17)
Any help is appreciated. Thanks.
let say your
String is stored in variable a (or pipe our query output to below command
echo "$a"
-------------------------------- | NAME | TEST_DATE | ----------------
--------------------- | TESTTT_1 | 2019-01-15 | | TEST_2 | 2018-02-16 | | TEST_NAME_3 | 2020-03-17 | ------------------------------------
Command to obtain desired results is:
array="$(echo "$a" | cut -d '|' -f2,3,5,6,8,9 | tail -n1 | sed 's/ | /,/g')
Above will store ourput in variable named array as you expected
Output of above command is:
echo "$array"
TESTTT_1,2019-01-15,TEST_2,2018-02-16,TEST_NAME_3,2020-03-17
Explanation of command: output of echo $a will be piped into cut and using '|' as delimeter it will cut fields 2,3,5,6,8,9 then the output is piped into tail to remove the undesired NAME and TEST_DATE columns and provide values only and then as per your expected output | will be converted to , using sed.
Here in this string you are having only three dates if you have more then just in cut command add more field numbers and as per format of your string field numbers will be in following style 2,3,5,6,8,9,11,12,14,15 .... and so on.
Hope it solved your problem.
echo "$a" | awk -F "|" '{ for(i=2; i<=NF; i++){ print $i }}' | sed -e '1,3d' -e '$d' | tr ' ' '\n' | sed '/^$/d' | sed 's/^/,/g' | sed -e 'N;s/\n/ /' | sed 's/^.//g' | xargs | sed 's/ ,/, /g'
Above is awk based solution
Output:
TESTTT_1, 2019-01-15 TEST_2, 2018-02-16 TEST_NAME_3, 2020-03-17
Is it ok.

From awk output, how to cut or trim characters in columns

At the moment
I want to trim .fmbi1a5nn9sp5o4qy3eyazeq5.eddvrl9sa8t448pb38vibj8ef: and .ilwio0k43fgqt4jqzyfadx19v: so the output take less space :)
First step:
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | column -t
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5.eddvrl9sa8t448pb38vibj8ef: Up 7 days
mon_prometheus.1.ilwio0k43fgqt4jqzyfadx19v: Up 7 days
I know
I can do something like:
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | rev | cut -d"." -f2- | rev
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5
mon_prometheus.1
The issue
is that I'm losing the other columns :-/
Idea
It would sound logical to do something like this (with awk) but it does not work. Any ideas?
docker ps --format "{{.Names}} : {{.Status}}" | sort -k1 | awk '{(print $1 | rev | cut -d"." -f2- | rev),$2,$3,$4,$5,$6}' | column -t
Thank you in advance!
P
to cut the last dot extension
$ docker ... | sort | awk '{sub(/\.[^.]*$/,"",$1)}1' file | column -t
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5 Up 7 days
mon_prometheus.1 Up 7 days
or, delete anything longer than 20 chars after a dot.
$ ... | sed -e 's/\(\.[a-z0-9:]\{20,\}\)* / /' | column -t
mon_node-exporter Up 7 days
mon_prometheus.1 Up 7 days
Works! This trick will make my life so much easier.
(I removed file)
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | awk '{sub(/\.[^.]*$/,"",$1)}1' | column -t;
mon_grafana.1 Up 24 hours
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5 Up 23 hours
Question #2:
Now how would you proceed to cut the characters after the first dot?
Cheers!

Fetch particular column value from rows with specified condition using shell script

I have a sample output from a command
+--------------------------------------+------------------+---------------------+-------------------------------------+
| id | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+-------------------------------------+
| 04584e8a-c210-430b-8028-79dbf741797c | | 99.99.99.91 | |
| 12d2257c-c02b-4295-b910-2069f583bee5 | 20.0.0.92 | 99.99.99.92 | 37ebfa4c-c0f9-459a-a63b-fb2e84ab7f92 |
| 98c5a929-e125-411d-8a18-89877d3c932b | | 99.99.99.93 | |
| f55e54fb-e50a-4800-9a6e-1d75004a2541 | 20.0.0.94 | 99.99.99.94 | fe996e76-ffdb-4687-91a0-9b4df2631b4e |
+--------------------------------------+------------------+---------------------+-------------------------------------+
Now I want to fetch all the "floating _ip_address" for which "port_id" & "fixed_ip_address" fields are blank/empty (In above sample 99.99.99.91 & 99.99.99.93)
How can I do it with shell scripting?
You can use sed:
fl_ips=($(sed -nE 's/\|.*\|.*\|(.*)\|\s*\|/\1/p' inputfile))
Here inputfile is the table provided in the question. The array fl_ips contains the output of sed:
>echo ${#fl_ips[#]}
2 # Array has two elements
>echo ${fl_ips[0]}
99.99.99.91
>echo ${fl_ips[1]}
99.99.99.93

Cleaning up IP output on command line [duplicate]

This question already has answers here:
How to clean up masscan output (-oL)
(4 answers)
Closed 6 years ago.
I have a problem with the output L options ("grep-able" output); for instance, it outputs this:
| 14.138.12.21:123 | unknown | disabled |
| 14.138.184.122:123 | unknown | disabled |
| 14.138.179.27:123 | unknown | disabled |
| 14.138.20.65:123 | unknown | disabled |
| 14.138.12.235:123 | unknown | disabled |
| 14.138.178.97:123 | unknown | disabled |
| 14.138.182.153:123 | unknown | disabled |
| 14.138.178.124:123 | unknown | disabled |
| 14.138.201.191:123 | unknown | disabled |
| 14.138.180.26:123 | unknown | disabled |
| 14.138.13.129:123 | unknown | disabled |
The above is neither very readable nor easy to understand.
How can I use Linux command-line utilities, e.g. sed, awk, or grep, to output something as follows, using the file above?
output
14.138.12.21
14.138.184.122
14.138.179.27
14.138.20.65
14.138.12.235
Using awk with field separator as space, and : and getting the second field:
awk -F '[ :]' '{print $2}' file.txt
Example:
% cat file.txt
| 14.138.12.21:123 | unknown | disabled |
| 14.138.184.122:123 | unknown | disabled |
| 14.138.179.27:123 | unknown | disabled |
| 14.138.20.65:123 | unknown | disabled |
| 14.138.12.235:123 | unknown | disabled |
| 14.138.178.97:123 | unknown | disabled |
| 14.138.182.153:123 | unknown | disabled |
| 14.138.178.124:123 | unknown | disabled |
| 14.138.201.191:123 | unknown | disabled |
| 14.138.180.26:123 | unknown | disabled |
| 14.138.13.129:123 | unknown | disabled |
% awk -F '[ :]' '{print $2}' file.txt
14.138.12.21
14.138.184.122
14.138.179.27
14.138.20.65
14.138.12.235
14.138.178.97
14.138.182.153
14.138.178.124
14.138.201.191
14.138.180.26
14.138.13.129
AWK is perfect for cases when you want to split the file by "columns", and you know exactly that the order of values/columns is constant. AWK splits the lines by a field separator (which can be a regular expression like '[: ]'). The column names are accessible by their positions from the left: $1, $2, $3, etc.:
awk -F '[ :]' '{print $2}' src.log
awk -F '[ :|]' '{print $3}' src.log
awk 'BEGIN {FS="[ :|]"} {print $3}' src.log
You can also filter the lines with a regular expression:
awk -F '[ :]' '/138\.179\./ {print $2}' src.log
However, it is impossible to capture substrings with the regular expression groups.
SED is more flexible in regard to regular expressions:
sed -r 's/^[^0-9]*([0-9\.]+)\:.*/\1/' src.log
However, it lacks many useful features of the Perl-like regular expressions we used to use in every day programming. For example, even the extended syntax (-r) fails to interpret \d as a number.
Perhaps, Perl is the most flexible tool for parsing files. You can opt to simple expressions:
perl -n -e '/^\D*([^:]+):/ and print "$1\n"' src.log
or make the matching as strict as you like:
perl -n -e '/^\D*((?:\d{1,3}\.){3}\d{1,3}):/ and print "$1\n"' src.log
using sed
sed -r 's/^ *[|] *([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+):[0-9]{3}.*/\1/

Resources