unix 'sort' command for inline characters - sorting

I have a .txt file of pumpkinsizes that I'm trying to sort by size of pumpkin:
name |size
==========
Joe |5
Mary |10
Bill |2
Jill |1
Adam |20
Mar |5
Roe |10
Mir |3
Foo |9
Bar |12
Baz |0
Currently I'm having great difficulty in getting sort to work properly. Can anyone help me sort my list by pumpkin size without modifying the list structure?

The table headings need special consideration, since "sorting" them will move them to some random line. So we use a two step process:
a) output the table headings. b) sort the rest numerically (-n), reverse
order (-r), with field separator | (-t), starting at field 2 (-k)
$ awk 'NR<=2' in; awk 'NR>2' in | sort -t '|' -nr -k 2
name |size
==========
Adam |20
Bar |12
Roe |10
Mary |10
Foo |9
Mar |5
Joe |5
Mir |3
Bill |2
Jill |1
Baz |0

The key point is the option -k of sort. You can use man sort to see how it works. The solution for your problem follows:
sed -n '3,$p' YOUR_FILENAME| sort -hrt '|' -k 2

You can simply remove the
name |size
==========
by using sed command. Then whatever is left can be sorted using sort command.
sed '1,2d' txt | sort -t "|" -k 2 -n
Here, sed '1,2d' will remove the first 2 lines.
Then sort will tokenize the data on character '|' using option -t.
Since you want to sort based on size which happens to be second token, so the token "size" can be specified by -k 2 option of sort.
Finally, considering "size" as number, so this can be specified by option -n of sort.

You can do this in the shell:
{ read; echo "$REPLY"; read; echo "$REPLY"; sort -t'|' -k2n; } < pumpkins.txt
That reads and prints the first 2 header lines, then sorts the rest.

Related

How to sort data according to the date in bash?

I need to write a bash program that sorts the data according to the date and displays the name of the person who recently joined the organization.
I have an employees.txt file with data in it with delimiter |. But when I am trying to sort the data using sort command like
sort -t'|' -k5,5 employees.txt | head -1 | cut -d'|' -f2
this is only sorting according to the first column of the whole date i.e DD-MM-YYYY sorting only on DD.
employees.txt File data format
ID | NAME | POST | DEPARTMENT | JOINING DATE | SALARY
101 | Jhon McClare | Manager | Content | 23-02-2001 | 83000
102 | Alena Croft | Snr. Manager | Accounts | 01-01-2019 | 88888
103 | Jeremy | Director | Sales | 20-03-2012 | 89786
104 | Williams | Manager | Marketing | 23-06-2001 | 73000
The above image should give Alena Croft as the answer.
The relevant field be must rendered suitable for sorting, that is, in the form of YYYY-MM-DD, using a utility such as sed or awk. For example, with GNU sed:
sed -E 's/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\2-\1/' employees.txt |
sort -r -t'|' -k5,5 | head -n1 | cut -d'|' -f2
The trick is to change the format of date to YYYY-MM-DD.
$ cat people.txt | sed -E 's/([0-9]+)\-([0-9]+)\-([0-9]+)/\3\-\2\-\1/' | sort -t'|' -k5,5r | head -1 | cut -d'|' -f2
Alena Croft
Also note that when sorting we need to do in reverse order (descending order) since we want the most recent date.

Inconsistency in output field separator

We have to find the difference(d) Between last 2 nos and display rows with the highest value of d in ascending order
INPUT
1 | Latha | Third | Vikas | 90 | 91
2 | Neethu | Second | Meridian | 92 | 94
3 | Sethu | First | DAV | 86 | 98
4 | Theekshana | Second | DAV | 97 | 100
5 | Teju | First | Sangamithra | 89 | 100
6 | Theekshitha | Second | Sangamithra | 99 |100
Required OUTPUT
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
awk 'BEGIN{FS="|";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
Output:
4 $ Theekshana $ Second $ DAV $ 97 $ 100$3
5 $ Teju $ First $ Sangamithra $ 89 $ 100$11
3 $ Sethu $ First $ DAV $ 86 $ 98$12
As you can see there is space before and after $ sign but for the last column (avg) there is no space, please explain why its happening
2)
awk 'BEGIN{FS=" | ";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
OUTPUT
4$|$Theekshana$|$Second$|$0
5$|$Teju$|$First$|$0
6$|$Theekshitha$|$Second$|$0
I have not mentiond | as the output field separator but still it appears, why is this happening and the difference is zero too
I am just 6 days old in unix,please answer even if its easy
your field separator is only the pipe symbol, so surrounding whitespace is part of the field definitions and that's what you see in the output. In combined uses pipe has the regex special meaning and need to be escaped. In your second case it means space or space is the field separator.
$ awk 'BEGIN {FS=" *\\| *"; OFS="$"}
{d=sqrt(($NF-$(NF-1))^2); $1=$1;
print d "\t" $0,d}' file | sort -n | tail -3 | cut -f2-
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
a slight rewrite will eliminate the number of fields dependency and fixes the format.

From awk output, how to cut or trim characters in columns

At the moment
I want to trim .fmbi1a5nn9sp5o4qy3eyazeq5.eddvrl9sa8t448pb38vibj8ef: and .ilwio0k43fgqt4jqzyfadx19v: so the output take less space :)
First step:
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | column -t
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5.eddvrl9sa8t448pb38vibj8ef: Up 7 days
mon_prometheus.1.ilwio0k43fgqt4jqzyfadx19v: Up 7 days
I know
I can do something like:
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | rev | cut -d"." -f2- | rev
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5
mon_prometheus.1
The issue
is that I'm losing the other columns :-/
Idea
It would sound logical to do something like this (with awk) but it does not work. Any ideas?
docker ps --format "{{.Names}} : {{.Status}}" | sort -k1 | awk '{(print $1 | rev | cut -d"." -f2- | rev),$2,$3,$4,$5,$6}' | column -t
Thank you in advance!
P
to cut the last dot extension
$ docker ... | sort | awk '{sub(/\.[^.]*$/,"",$1)}1' file | column -t
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5 Up 7 days
mon_prometheus.1 Up 7 days
or, delete anything longer than 20 chars after a dot.
$ ... | sed -e 's/\(\.[a-z0-9:]\{20,\}\)* / /' | column -t
mon_node-exporter Up 7 days
mon_prometheus.1 Up 7 days
Works! This trick will make my life so much easier.
(I removed file)
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | awk '{sub(/\.[^.]*$/,"",$1)}1' | column -t;
mon_grafana.1 Up 24 hours
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5 Up 23 hours
Question #2:
Now how would you proceed to cut the characters after the first dot?
Cheers!

How to insert a different delimiter in between two columns in shell

I 've a file as below
ABc def|0|0|0| 1 | 2| 9|
0 2930|0|0|0|0| 1 | 2| 9|
Now, i want to split the first column with the same delimiter.
output:
ABc|def|0|0|0| 1 | 2| 9|
0|2930|0|0|0|0| 1 | 2| 9|
Please help me out with awk.
You can use sed for this:
$ sed 's/ /|/' file
ABc|def|0|0|0| 1 | 2| 9|
0|2930|0|0|0|0| 1 | 2| 9|
The way it is defined, it just replaces the first space with a |, which is exactly what you need.
With awk it is a bit longer:
$ $ awk 'BEGIN{FS=OFS="|"}{split($1, a, " "); $1=a[1]"|"a[2]}1' file
ABc|def|0|0|0| 1 | 2| 9|
0|2930|0|0|0|0| 1 | 2| 9|
After definining input and output field separator as |, it splits the first field based on space. Then prints the line back.
Another awk
awk '{sub(/ /,"|")}1' file
ABc|def|0|0|0| 1 | 2| 9|
0|2930|0|0|0|0| 1 | 2| 9|
Without the leading space, this works fine.
You said you want to replace the delimiter (space->pipe) in first column.
It could happen that in your first col, there is no space, but in other columns, there are spaces. In this case, you don't want to do any change on that line. Also in your first column, there could be more spaces, I guess you want to have them all replaced. So I cannot think of a shorter way for this problem.
awk -F'|' -v OFS="|" '{gsub(/ /,"|",$1)}7' file
sed 's/^[[:blank:]]\{1,\}/ /;/^\([^|]\{1,\}\)[[:blank:]]\{1,\}\([^|[[:blank:]]\)/ s//\1|\2/'
assuming first column is blank for empty, a blank (or several) as the separator than another non blank or |
this allow this
ABc def|0|0|0| 1 | 2| 9|
def|0|0|0| 1 | 2| 9|
ABc|def|0|0|0| 1 | 2| 9|

bash - extracting lines that contain only 3 columns

I have a file that include the following lines :
2 | blah | blah
1 | blah | blah
3 | blah
2 | blah | blah
1
1 | high | five
3 | five
I wanna extract only the lines that has 3 columns (3 fields, 2 seperators...)
I wanna pipe it to the following commands :
| sort -nbsk1 | cut -d "|" -f1 | uniq -d
So after all I will get only :
2
1
Any suggestions ?
It's a part of homework assignment, we are not allowed to use awk\sed and some more commands.. (grep\tr and whats written above can be used)
Thanks
since you said grep is allowed:
grep -E '^([^|]*\|){2}[^|]*$' file
grep '.*|.*|.*' will select lines with at least three fields and two separators.

Resources