Bash Shell: How do I sort by values on last column, but ignoring the header of a file? - bash

file
ID First_Name Last_Name(s) Average_Winter_Grade
323 Popa Arianna 10
317 Tabarcea Andreea 5.24
326 Balan Ionut 9.935
327 Balan Tudor-Emanuel 8.4
329 Lungu Iulian-Gabriel 7.78
365 Brailean Mircea 7.615
365 Popescu Anca-Maria 7.38
398 Acatrinei Andrei 8
How do I sort it by last column, except for the header ?
This is what file should look like after the changes:
ID First_Name Last_Name(s) Average_Winter_Grade
323 Popa Arianna 10
326 Balan Ionut 9.935
327 Balan Tudor-Emanuel 8.4
398 Acatrinei Andrei 8
329 Lungu Iulian-Gabriel 7.78
365 Brailean Mircea 7.615
365 Popescu Anca-Maria 7.38
317 Tabarcea Andreea 5.24

If it's always 4th column:
head -n 1 file; tail -n +2 file | sort -n -r -k 4,4
If all you know is that it's the last column:
head -n 1 file; tail -n +2 file | awk '{print $NF,$0}' | sort -n -r | cut -f2- -d' '

You'd like to just sort by the last column, but sort doesn't allow you to do that easily. So rewrite the data with the column to be sorted at the beginning of each line:
Ignoring the header for the moment (although this will often work by itself):
awk '{print $NF, $0 | "sort -nr" }' input | cut -d ' ' -f 2-
If you do need to trim the order (eg, it's getting mixed in the sort), you can do things like:
< input awk 'NR==1; NR>1 {print $NF, $0 | "sh -c \"sort -nr | cut -d \\\ -f 2-\"" }'
or
awk 'NR==1{ print " ", $0} NR>1 {print $NF, $0 | "sort -nr" }' OFS=\; input | cut -d \; -f 2-

Related

Store value of impala results in a variable in linux

I have a requirement to retrieve 239, 631 etc from the below output and store it in a variable in linux -- this is output of impala results..
+-----------------+
| organization_id |
+-----------------+
| 239 |
| 631 |
| 632 |
| 633 |
+-----------------+
below is the query I am running.
x=$(impala-shell -q "${ORG_ID}" -ki "${impalaserver}");
How to do it?
Could you please try following. This will be deleting duplicates for all Input_file(even a single id comes in 1 organization_id it will NOT br printed in other stanza then too)
your_command | awk -v s1="'" 'BEGIN{OFS=","} /---/{flag=""} /organization_id/{flag=1;getline;next} flag && !a[$2]++{val=val?val OFS s1 $2 s1:s1 $2 s1} END{print val}'
In case you need to print ids(which are coming in 1 stanza and could come in other stanza of organization_id then try following):
your_command | awk -v s="'" 'BEGIN{OFS=","} /---/{print val;val=flag="";delete a} /organization_id/{flag=1;getline;next} flag && !a[$2]++{val=val?val OFS s1 $2 s1:s1 $2 s1} END{if(val){print val}}'
What about this :
x=$(impala-shell -B -q "${ORG_ID}" -ki "${impalaserver}")
I just added the -B option which removes the pretty-printing and the header.
If you want comma-separated values you can pipe the result to tr :
echo $x | tr ' ' ','

bash uniq, how to show count number at back

Normally when I do cat number.txt | sort -n | uniq -c , I get numbers like this:
3 43
4 66
2 96
1 97
But what I need is the number shows of occurrences at the back, like this:
43 3
66 4
96 2
97 1
Please give advice on how to change this. Thanks.
Use awk to change the order of columns:
cat number.txt | sort -n | uniq -c | awk '{ print $2, $1 }'
Perl version:
perl -lne '$occ{0+$_}++; END {print "$_ $occ{$_}" for sort {$a <=> $b} keys %occ}' < numbers.txt
Through GNU sed,
cat number.txt | sort -n | uniq -c | sed -r 's/^([0-9]+) ([0-9]+)$/\2 \1/g'

Count of Request Patterns

I want to find the count of request patterns in a requests.log file.The requests.log files has requests in the following format
102.232.32.322 "/v1/places?name=ass&lat=22.3&lng=12.12 HTTP 1.1" 23 111
102.232.32.322 "/v1/places/23232 HTTP 1.1" 23 111
102.232.32.322 "/v1/places?name=bcdd&lat=22.23&lng=12.12&quality_score=true HTTP1.1" 23 111
.....
I have so far only been able to cut strings and strip out numbers
cat requests.log | grep /v1/places | cut -c53- |cut -d '"' -f 1 | cut -d' ' -f1 | sed 's/[0-9]//g'
100 /v1/places?name=<name>&lat=<lng>
110 /v1/places/<placeid>
10 /v1/places?name=<name>&lat=<lat>&lng=<&lng>&country_code=<country>
in the above fashion for all the possible patterns ignoring the order of request params
The output should be in the following manner
Another major problem is that the parameters orders is not guaranteed
Using awk:
awk '{sub(/^"/, "", $2); a[$2]++} END{for (i in a) print a[i], i}' OFS='\t' log.file
2 /v1/places/23232
1 /v1/places?name=ass&lat=22.3&lng=12.12
1 /v1/places?name=bcdd&lat=22.23&lng=12.12&quality_score=true

replace string in comma delimiter file using nawk

I need to implement the if condition in the below nawk command to process input file if the third column has more that three digit.Pls help with the command what i am doing wrong as it is not working.
inputfile.txt
123 | abc | 321456 | tre
213 | fbc | 342 | poi
outputfile.txt
123 | abc | 321### | tre
213 | fbc | 342 | poi
cat inputfile.txt | nawk 'BEGIN {FS="|"; OFS="|"} {if($3 > 3) $3=substr($3, 1, 3)"###" print}'
Try:
awk 'length($3) > 3 { $3=substr($3, 1, 3)"###" } 1 ' FS=\| OFS=\| test1.txt
This works with gawk:
awk -F '[[:blank:]]*\\\|[[:blank:]]*' -v OFS=' | ' '
$3 ~ /^[[:digit:]]{4,}/ {$3 = substr($3,1,3) "###"}
1
' inputfile.txt
It won't preserve the whitespace so you might want to pipe through column -t

retrieve and add two numbers of files

In my file I have following structure :-
A | 12 | 10
B | 90 | 112
C | 54 | 34
What I have to do is I have to add column 2 and column 3 and print the result with column 1.
output:-
A | 22
B | 202
C | 88
I retrieve the two columns but dont know how to add
What I did is :-
cut -d ' | ' -f3,5 myfile.txt
How to add those columns and display.
A Bash solution:
#!/bin/bash
while IFS="|" read f1 f2 f3
do
echo $f1 "|" $((f2+f3))
done < file
You can do this easily with awk.
awk '{print $1," | ",($3+$5)'} myfile.txt
wil work perhaps.
You can do this with awk:
awk 'BEGIN{FS="|"; OFS="| "} {print $1 OFS $2+$3}' input_filename
Input:
A | 12 | 10
B | 90 | 112
C | 54 | 34
Output:
A | 22
B | 202
C | 88
Explanation:
awk: invoke the awk tool
BEGIN{...}: do things before starting to read lines from the file
FS="|": FS stands for Field Separator. Think of it as the delimiter that separates each line of your file into fields
OFS="| ": OFS stands for Output Field Separator. Same idea as above, but for output. FS =/= OFS in this case due to formatting
{print $1 OFS $2+$3}: For each line that awk reads, print the first field (the letter), followed by a delimiter specified by OFS, then the sum of field 2 and field 3.
input_filename: awk accepts the input file name as an argument here.

Resources