Bash associative array sorting by value - bash

I get the following output:
Pushkin - 100500
Gogol - 23
Dostoyevsky - 9999
Which is the result of the following script:
for k in "${!authors[#]}"
do
echo $k ' - ' ${authors["$k"]}
done
All I want is to get the output like this:
Pushkin - 100500
Dostoyevsky - 9999
Gogol - 23
which means that the keys in associative array should be sorted by value. Is there an easy method to do so?

You can easily sort your output, in descending numerical order of the 3rd field:
for k in "${!authors[#]}"
do
echo $k ' - ' ${authors["$k"]}
done |
sort -rn -k3
See sort(1) for more about the sort command. This just sorts output lines; I don't know of any way to sort an array directly in bash.
I also can't see how the above can give you names ("Pushkin" et al.) as array keys. In bash, array keys are always integers.

Alternatively you can sort the indexes and use the sorted list of indexes to loop through the array:
authors_indexes=( ${!authors[#]} )
IFS=$'\n' authors_sorted=( $(echo -e "${authors_indexes[#]/%/\n}" | sed -r -e 's/^ *//' -e '/^$/d' | sort) )
for k in "${authors_sorted[#]}"; do
echo $k ' - ' ${authors["$k"]}
done

Extending the answer from #AndrewSchulman, using -rn as a global sort option reverses all columns. In this example, authors with the same associative array value will be output by reverse order of name.
For example
declare -A authors
authors=( [Pushkin]=10050 [Gogol]=23 [Dostoyevsky]=9999 [Tolstoy]=23 )
for k in "${!authors[#]}"
do
echo $k ' - ' ${authors["$k"]}
done | sort -rn -k3
will output
Pushkin - 10050
Dostoyevsky - 9999
Tolstoy - 23
Gogol - 23
Options for sorting specific columns can be provided after the column specifier.
i.e. sort -k3rn
Note that keys can be specified as spans. Here -k3 happens to be fine because it is the final span, but to use only column 3 explicitly (in case further columns were added), it should be specified as -k3,3,
Similarly to sort by column three in descending order, and then column one in ascending order (which is probably what is desired in this example):
declare -A authors
authors=( [Pushkin]=10050 [Gogol]=23 [Dostoyevsky]=9999 [Tolstoy]=23 )
for k in "${!authors[#]}"
do
echo $k ' - ' ${authors["$k"]}
done | sort -k3,3rn -k1,1
will output
Pushkin - 10050
Dostoyevsky - 9999
Gogol - 23
Tolstoy - 23

The best way to sort a bash associative array by VALUE is to NOT sort it.
Instead, get the list of VALUE:::KEYS, sort that list into a new KEY LIST, and iterate through the list.
declare -A ADDR
ADDR[192.168.1.3]="host3"
ADDR[192.168.1.1]="host1"
ADDR[192.168.1.2]="host2"
KEYS=$(
for KEY in ${!ADDR[#]}; do
echo "${ADDR[$KEY]}:::$KEY"
done | sort | awk -F::: '{print $2}'
)
for KEY in $KEYS; do
VAL=${ADDR[$KEY]}
echo "KEY=[$KEY] VAL=[$VAL]"
done
output:
KEY=[192.168.1.1] VAL=[host1]
KEY=[192.168.1.2] VAL=[host2]
KEY=[192.168.1.3] VAL=[host3]

Do something with unsorted keys:
for key in ${!Map[#]}; do
echo $key
done
Do something with sorted keys:
for key in $(for x in ${!Map[#]}; do echo $x; done | sort); do
echo $key
done
Stored sorted keys as array:
Keys=($(for x in ${!Map[#]}; do echo $x; done | sort))

If you can assume the value is always a number (no spaces), but want to allow for the possibility of spaces in the key:
for k in "${!authors[#]}"; do
echo "${authors["$k"]} ${k}"
done | sort -rn | while read number author; do
echo "${author} - ${number}"
done
Example:
$ declare -A authors
$ authors=(['Shakespeare']=1 ['Kant']=2 ['Von Neumann']=3 ['Von Auersperg']=4)
$ for k in "${!authors[#]}"; do echo "${authors["$k"]} ${k}"; done | sort -rn | while read number author; do echo "${author} - ${number}"; done
Von Auersperg - 4
Von Neumann - 3
Kant - 2
Shakespeare - 1
$
The chosen answer seems to work if there are no spaces in the keys, but fails if there are:
$ declare -A authors
$ authors=(['Shakespeare']=1 ['Kant']=2 ['Von Neumann']=3 ['Von Auersperg']=4)
$ for k in "${!authors[#]}"; do echo $k ' - ' ${authors["$k"]}; done | sort -rn -k 3
Kant - 2
Shakespeare - 1
Von Neumann - 3
Von Auersperg - 4
$

Related

How to find the max value in array without using sort cmd in shell script

I have a array=(4,2,8,9,1,0) and I don't want to sort the array to find the highest number in the array because I need to get the index value of the highest number as it is, so I can use it for further reference.
Expected output:
9 index value => 3
Can somebody help me to achieve this?
Slight variation with a loop using the ternary conditional operator and no assumptions about range of values:
arr=(4 2 8 9 1 0)
max=${arr[0]}
maxIdx=0
for ((i = 1; i < ${#arr[#]}; ++i)); do
maxIdx=$((arr[i] > max ? i : maxIdx))
max=$((arr[i] > max ? arr[i] : max))
done
printf '%s index => values %s\n' "$maxIdx" "$max"
The only assumption is that array indices are contiguous. If they aren't, it becomes a little more complex:
arr=([1]=4 [3]=2 [5]=8 [7]=9 [9]=1 [11]=0)
indices=("${!arr[#]}")
maxIdx=${indices[0]}
max=${arr[maxIdx]}
for i in "${indices[#]:1}"; do
((arr[i] <= max)) && continue
maxIdx=$i
max=${arr[i]}
done
printf '%s index => values %s\n' "$maxIdx" "$max"
This first gets the indices into a separate array and sets the initial maximum to the value corresponding to the first index; then, it iterates over the indices, skipping the first one (the :1 notation), checks if the current element is a new maximum, and if it is, stores the index and the maximum.
Without using sort, you can use a simple loop in shell. Here is a sample bash code:
#!/usr/bin/env bash
array=(4 2 8 9 1 0)
for i in "${!array[#]}"; do
[[ -z $max ]] || (( ${array[i]} > $max )) && { max="${array[i]}"; maxind=$i; }
done
echo "max=$max, maxind=$maxind"
max=9, maxind=3
arr=(4 2 8 9 1 0)
paste <(printf "%s\n" "${arr[#]}") <(seq 0 $((${#arr[#]} - 1)) ) |
sort -k1,1 |
tail -n1 |
sed 's/\t/ index value => /'
Print each array element on a newline with printf
Print array indexes with seq
Join both streams using paste
Numerically sort the lines using the first fields (ie. array value) sort
Print the last line tail -n1
The array value and result is separated by a tab. Substitute tab with the output string you want using sed. One could use ex. cut -d, -f2 to get only the index or use read a b <( ... ) to read the numbers into variables, etc.
Using Perl
$ export data=4,2,8,9,1,0
$ echo $data | perl -ne ' map{$i++; if($_>$x) {$x=$_;$id=$i} } split(","); print "max=$x", " index=",--${id},"\n" '
max=9 index=3
$

How to sort lines based on specific part of their value?

When I run the following command:
command list -r machine-a-.* | sort -nr
It gives me the following result:
machine-a-9
machine-a-8
machine-a-72
machine-a-71
machine-a-70
I wish to sort these lines based on the number at the end, in descending order.
( Clearly sort -nr doesn't work as expected. )
You just need the -t and -k options in the sort.
command list -r machine-a-.* | sort -t '-' -k 3 -nr
-t is the separator used to separate the fields.
By giving it the value of '-', sort will see given text as:
Field 1 Field 2 Field 3
machine a 9
machine a 8
machine a 72
machine a 71
machine a 70
-k is specifying the field which will be used for comparison.
By giving it the value 3, sort will sort the lines by comparing the values from the Field 3.
Namely, these strings will be compared:
9
8
72
71
70
-n makes sort treat the fields for comparison as numbers instead of strings.
-r makes sort to sort the lines in reverse order(descending order).
Therefore, by sorting the numbers from Field 3 in reverse order, this will be the output:
machine-a-72
machine-a-71
machine-a-70
machine-a-9
machine-a-8
Here is an example of input to sort:
$ cat 1.txt
machine-a-9
machine-a-8
machine-a-72
machine-a-71
machine-a-70
Here is our short program:
$ cat 1.txt | ( IFS=-; while read A B C ; do echo $C $A-$B-$C; done ) | sort -rn | cut -d' ' -f 2
Here is its output:
machine-a-72
machine-a-71
machine-a-70
machine-a-9
machine-a-8
Explanation:
$ cat 1.txt \ (put contents of file into pipe input)
| ( \ (group some commands)
IFS=-; (set field separator to "-" for read command)
while read A B C ; (read fields in 3 variables A B and C every line)
do echo $C $A-$B-$C; (create output with $C in the beggining)
done
) \ (end of group)
| sort -rn \ (reverse number sorting)
| cut -d' ' -f 2 (cut-off first unneeded anymore field)

Renumbering numbers in a text file based on an unique mapping

I have a big txt file with 2 columns and more than 2 million rows. Every value represents an id and there may be duplicates. There are about 100k unique ids.
1342342345345 34523453452343
0209239498238 29349203492342
2349234023443 99203900992344
2349234023443 182834349348
2923000444 9902342349234
I want to identify each id and re-number all of them starting from 1. It should re-number duplicates also using the same new id. If possible, it should be done using bash.
The output could be something like:
123 485934
34 44834
167 34564
167 2345
2 34564
Doing this in pure bash will be really slow. I'd recommend:
tr -s '[:blank:]' '\n' <file |
sort -un |
awk '
NR == FNR {id[$1] = FNR; next}
{for (i=1; i<=NF; i++) {$i = id[$i]}; print}
' - file
4 8
3 7
5 9
5 2
1 6
With bash and sort:
#!/bin/bash
shopt -s lastpipe
declare -A hash # declare associative array
index=1
# read file and fill associative array
while read -r a b; do
echo "$a"
echo "$b"
done <file | sort -nu | while read -r x; do
hash[$x]="$((index++))"
done
# read file and print values from associative array
while read -r a b; do
echo "${hash[$a]} ${hash[$b]}"
done < file
Output:
4 8
3 7
5 9
5 2
1 6
See: man bash and man sort
Pure Bash, with a single read of the file:
declare -A hash
index=1
while read -r a b; do
[[ ${hash[$a]} ]] || hash[$a]=$((index++)) # assign index only if not set already
[[ ${hash[$b]} ]] || hash[$b]=$((index++)) # assign index only if not set already
printf '%s %s\n' "${hash[$a]}" "${hash[$b]}"
done < file > file.indexed
Notes:
the index is assigned in the order read (not based on sorting)
we make a single pass through the file (not two as in other solutions)
Bash's read is slower than awk; however, if the same logic is implemented in Perl or Python, it will be much faster
this solution is more CPU bound because of the hash lookups
Output:
1 2
3 4
5 6
5 7
8 9
Just keep a monotonic counter and a table of seen numbers; when you see a new id, give it the value of the counter and increment:
awk '!a[$1]{a[$1]=++N} {$1=a[$1]} !a[$2]{a[$2]=++N} {$2=a[$2]} 1' input
awk 'NR==FNR { ids[$1] = ++c; next }
{ print ids[$1], ids[$2] }
' <( { cut -d' ' -f1 renum.in; cut -d' ' -f2 renum.in; } | sort -nu ) renum.in
join the two columns into one then sort the that into numerical order (-n), and make unique (-u), before using awk to use this sequence to generate an array of mappings between old to new ids.
Then for each line in input, swap ids and print.

Finding highest value of variable

So i am trying to find the highest value of the variable. For example o have this:
var1=14
var2=15
var3=16
I want to find the biggest value which is var 3 and save it somewhere. Is there a way to do that?
Something like this:
tmp=`sort -n $var1 $var2 $var3 ` (this is an example)
You'll need to get those numbers into an array, from there it's just:
a=(14 15 16) # Example array
IFS=$'\n'
echo "${a[*]}" | sort -nr | head -n1
This will find the max, by the variable names
#!/bin/bash
maxvarname() {
for i; do
echo "${!i} $i"
done | sort -nr | sed -n '1s/.* \(.*\)/\1/p'
}
#MAIN
#the variables
var1=14
var2=15
var3=16
vname=$(maxvarname var1 var2 var3) #note, arguments are the NAMES (not values e.g. $var1) - without $
echo "Max value is in the variable named: '$vname' and its value is: ${!vname}"
it prints:
Max value is in the variable named: 'var3' and its value is: 16
max=$(echo $var{1,2,3} | tr ' ' '\n' | sort -nr | head -1)
Check below solution if you want to find the maximum value of a variable -
$ cat f
var4=18
var1=14
var2=15
var3=16
$ max=$(sort -t'=' -nrk2 f|head -1)
$ echo $max
var4=18

Get common values in 2 arrays in shell scripting [duplicate]

This question already has answers here:
Intersection of two lists in Bash
(5 answers)
Closed 3 years ago.
I have an
array1 = (20,30,40,50)
array2 = (10,20,30,80,100,110,40)
I have to get the common values from these 2 arrays in my array 3 like:
array3 = (20,30,40)
in ascending sorted order.
Shell and standard Unix utilities are good at dealing with text files.
In that realm, arrays would be text files whose elements are the lines.
To find the common part between two such arrays, there's the standard comm command. comm expects alphabetically sorted input though.
So, if you have two files A and B containing the elements of those two arrays, one per line (which also means the array elements can't contain newline characters), you can find the intersection with
comm -12 <(sort A) <(sort B)
If you want to start with bash arrays (but using arrays in shells is generally a good indication that you're using the wrong tool for your task), you can convert back and forth between the bash arrays and our text file arrays of lines with printf '%s\n' and word splitting:
array_one=(20 30 40 50)
array_two=(10 20 30 80 100 110 40)
IFS=$'\n'; set -f
intersection=($(comm -12 <(
printf '%s\n' "${array_one[#]}" | sort) <(
printf '%s\n' "${array_two[#]}" | sort)))
You almost certainly should not be using shell for this so here's ONE awk solution to your specific problem:
awk 'BEGIN{
split("20,30,40,50",array1,/,/)
split("10,20,30,80,100,110,40",array2,/,/)
for (i=1;i in array1;i++)
for (j=1;j in array2;j++)
if (array1[i] == array2[j])
array3[++k] = array1[i]
for (k=1; k in array3; k++)
printf "array3[%d] = %d\n",k,array3[k]
}'
array3[1] = 20
array3[2] = 30
array3[3] = 40
and if you tell us what you're really trying to do you can get a lot more help.
A pure bash solution using arrays:
#!/bin/bash
array1=(20,30,40,50)
array2=(10,20,30,80,100,110,40)
IFS=,
for i in $array1 $array2;{ ((++tmp[i]));}
for i in ${!tmp[*]};{ [ ${tmp[i]} -gt 1 ] && array3+=($i);}
echo ${array3[*]}
Output
20 30 40
As array3 is not an associative array, the indexes comes in ascending order using ${!array[*]} notation. If You need comma separated list as input, use echo "${array3[*]}".
It can be used if the source elements are integers. It works only if each of the source arrays contain unique numbers..
Here's a solution with standard command line tools (sort and join):
join <(printf %s\\n "${array1[#]}" | sort -u) \
<(printf %s\\n "${array2[#]}" | sort -u) | sort -n
join requires its inputs to be sorted, and does not recognize numerical sort order. Consequently, I sort both lists in the default collation order, join them, and then resort the result numerically.
I also assumed that you'd created the arrays really as arrays, i.e.:
array1=(20 30 40 50)
I think the rest is more or less self-evident, possibly with the help of help printf and man bash.
maybe you can use perl for try.
#!/bin/perl
use warnings;
use strict;
my #array1 = (20,30,40,50);
my #array2 = (10,20,30,80,100,110,40);
my #array3 = ();
foreach my $x (#array1) {
# body...
if (grep(/$x/, #array2)){
print "found $x\n";
#array3=(#array3,$x);
};
}
print #array3
In addition to any of these fine answers, it seems that you also want to sort your array (containing the answer) in ascending order.
You can do that in a number of different ways, including this:
readarray array3 <<<"$(printf "%s\n" "${array3[#]}" | sort -n)"
This method also allows you to filter out duplicate values:
readarray array3 <<<"$(printf "%s\n" "${array3[#]}" | sort -n | uniq)"
And for the sake of the exercise, here's yet another way of solving it:
#!/bin/bash
array1=(20 30 40 50)
array2=(10 20 30 80 100 110 40)
declare -a array3
#sort both arrays
readarray array1 <<<"$(printf "%s\n" "${array1[#]}" | sort -n)"
readarray array2 <<<"$(printf "%s\n" "${array2[#]}" | sort -n)"
# look for values
i2=0
for i1 in ${!array1[#]}; do
while (( i2 < ${#array2[#]} && ${array1[$i1]} > ${array2[$i2]} )); do (( i2++ )); done
[[ ${array1[$i1]} == ${array2[$i2]} ]] && array3+=(${array1[$i1]})
done
echo ${array3[#]}
Consider using python:
In [6]: array1 = (20,30,40,50)
In [7]: array2 = (10,20,30,80,100,110,40)
In [8]: set(array1) & set(array2)
Out[8]: set([40, 20, 30])

Resources