Sort integer by absolute value - bash

I have a list of integers and i want to sort it with sort but i want to sort on the absolute value of the integers. For example 7 0 5 10 -2 should give 0 -2 5 7 10 (integers are separated on multiple lines in my file)
I don't think there is an option in sort to do that but i can't find an other command to sort lines. The -n options sort with the natural order and -g is not what i want.
I tried to look at awk but i don't know if it can help me.

Use
cat numbers.txt | sed -r 's/-([0-9]+)/\1-/g;' | sort -n | sed -r 's/([0-9]+)-/-\1/g;'
the first sed put the minus behind the digits
sort sort by number
the second sed puts the minus again in front of the digits

I can't find this documented anywhere, but when you run sort -Vd it sorts by absolute value. It's a combination of the "version sort" and "numerical sort" options. With 1 5 3 7 -2 -4 -9, version sort on it's own does something like this:
1
3
5
7
-2
-4
-9
And numerical sort on its own sorts like this;
-9
-4
-2
1
3
5
7
And with both options, it sorts like this;
1
-2
3
-4
5
7
-9
I don't know if this is by design or by accident, and I've only tested it in GNU sort. I have found this trick to be very useful for certain code golfing situations.

A one line perl solution. Works more generally on floating point values as well. For example:
$ cat numbers.txt
1 -100 5 -4 7 -9 12 25.3 1.8 -1 33.5
$ perl -lane 'print(join " ", sort {abs($a) <=> abs($b)} #F);' numbers.txt
1 -1 1.8 -4 5 7 -9 12 25.3 33.5 -100
If you want the order to be descending, just reverse the $a and $b variables.

If your file is named fname then the following should work:
paste <(sed 's/-//' fname) fname | sort -n | cut -f 2
The sed strips out the - to generate an absolute value, paste, joins the absolute value as the first column, by which is then sorted. This is then cut out.

Related

Multiply all values in txt file in bash

I have a file that I need to multiply each number with -1. I have tried some commands but the result I get every time is only the first column multiplied with -1. Please help!
The file is as follows:
-1 2 3 -4 5 -6
7 -8 9 10 12 0
The expected output would be
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Commands I have tried are:
awk '{print $0*-1}' file
sed 's/$/ -1*p /' file | bc (syntax error)
sed 's/$/ * -1 /' file | bc (syntax error)
numfmt --from-unit=-1 < file (error: numfmt: invalid unit size: ‘-1’)
With bash and an array:
while read -r -a arr; do
declare -ia 'arr_multiplied=( "${arr[#]/%/*-1}" )';
echo "${arr_multiplied[*]}";
done < file
Output:
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
I got this idea from this Stack Overflow answer by j4x.
One awk approach:
$ awk '{for (i=1;i<=NF;i++) $i=$i*-1} 1' file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Using the <var_or_field><op>=<value> construct:
$ awk '{for (i=1;i<=NF;i++) $i*=-1} 1' file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Using perl and its autosplit mode:
perl -lane 'print join(" ", map { $_ * -1 } #F)' file
To multiply every number in the file with -1, you can use the following 'awk'command:
`awk '{ for (i=1; i<=NF; i++) $i=$i*-1; print }' file`
This command reads each line of the file, and for each field (number) in the line, it multiplies it by -1. It then prints the modified line.
The output will be as follows:
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Alternatively, you can use the following 'sed' command:
sed 's/-\([0-9]*\)/\1/g; s/\([0-9]*\)/-\1/g' file
This command replaces all negative numbers with their positive equivalent, and all positive numbers with their negative equivalent. The output will be the same as above.
For completeness an approach with ruby.
-l Line-ending processing
-a Auto-splitting, provides $F (field, set with -F)
-p Auto-prints $_ (line)
-e Execute code
ruby -lape '$_ = $F.map {|x| x.to_i * -1}.join " "' file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Just switching the signs ought to do.
$: cat file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
$: sed 's/^ */ /; s/ */ /g; s/ -/ /g; s/ / -/g; s/-0/0/g; s/^ *//;' file
-1 2 3 -4 5 -6
7 -8 9 10 12 0
If you don't care about leading spaces or signs on your zeros, you can drop some of that. The logic is flexible, too...
$: sed 's/ *-/+/g; s/ / -/g; s/+/ /g;' x
1 2 3 -4 5 -6
7 -8 9 10 12 -0
There are multiple ways we can do this.
I can think of the following 2 ways
cat file | awk '{for (i=1;i<=NF;i++){ $i*=-1} print}'
This will give out
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
In this method we overwrite the $i value and print $0
Another way
cat random.xml | awk '{for (i=1;i<=NF;i++){printf("%d ",$i*-1)} printf("\n") }'
Gives the output
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
In this method we print the value $i*-1 and so we need to use printf() function
don't "do math" and actually multiply by -1 -
just use regex to flip the signs, and process thousands or even millions of numbers with 3 calls to gsub()

sort: wrong order when comparing according to numerical value

I'm trying to sort the lines in the following file according to numerical value in the second column:
2 117.336136
1 141.003021
1 342.389160
1 169.059006
1 208.173370
1 117.608192
However, for some reason, the following command returns the lines in the wrong order:
cat file | sort -n -k2
1 117.608192
2 117.336136
1 141.003021
1 169.059006
1 208.173370
1 342.389160
The first two lines are swapped. For other lines, the content of the first column does not affect the result.
Without the -k argument, the sort works exacly as expected:
cat file | cut -d' ' -f2 | sort -n
117.336136
117.608192
141.003021
169.059006
208.173370
342.389160
Why is that? Did I misunderstand the meaning of the -k argument?
Additional information:
LC_ALL=cs_CZ.utf8
sort --version gives sort (GNU coreutils) 8.31
Sort sorts files according to your locale settings.
As mentioned by #KamilCuk, the decimal separator for cs_CZ is , instead of .. You can override the default locale with LC_ALL=C.UTF-8 or LC_ALL=C (no UTF-8 support), to use the C local settings for sorting.
For sorting floating point values, you need -g as -n just sorts the integer part.
Also important when using sort is to restrict the sorting to the specific column -k 2,2g as by default sort also will use the rest of the line for sorting.
$ LC_ALL=C.UTF-8 sort -k 2,2g test_sort.txt
2 117.336136
1 117.608192
1 141.003021
1 169.059006
1 208.173370
1 342.389160
$ LC_ALL=C sort -k 2,2g test_sort.txt
2 117.336136
1 117.608192
1 141.003021
1 169.059006
1 208.173370
1 342.389160
$ printf '1\t5.3\t6.0\n2\t5.3\t5.0\n'
1 5.3 6.0
2 5.3 5.0
# Sort uses the rest of the line to sort.
$ printf '1\t5.3\t6.0\n2\t5.3\t5.0\n' | LC_ALL=C.UTF-8 sort -k 2
2 5.3 5.0
1 5.3 6.0
# Sort only uses the second column.
$ printf '1\t5.3\t6.0\n2\t5.3\t5.0\n' | LC_ALL=C.UTF-8 sort -k 2,2
1 5.3 6.0
2 5.3 5.0

unix sort groups by their associated maximum value?

Let's say I have this input file 49142202.txt:
A 5
B 6
C 3
A 4
B 2
C 1
Is it possible to sort the groups in column 1 by the value in column 2? The desired output is as follows:
B 6 <-- B group at the top, because 6 is larger than 5 and 3
B 2 <-- 2 less than 6
A 5 <-- A group in the middle, because 5 is smaller than 6 and larger than 3
A 4 <-- 4 less than 5
C 3 <-- C group at the bottom, because 3 is smaller than 6 and 5
C 1 <-- 1 less than 3
Here is my solution:
join -t$'\t' -1 2 -2 1 \
<(cat 49142202.txt | sort -k2nr,2 | sort --stable -k1,1 -u | sort -k2nr,2 \
| cut -f1 | nl | tr -d " " | sort -k2,2) \
<(cat 49142202.txt | sort -k1,1 -k2nr,2) \
| sort --stable -k2n,2 | cut -f1,3
The first input to join sorted by column 2 is this:
2 A
1 B
3 C
The second input to join sorted by column 1 is this:
A 5
A 4
B 6
B 2
C 3
C 1
The output of join is:
A 2 5
A 2 4
B 1 6
B 1 2
C 3 3
C 3 1
Which is then sorted by the nl line number in column 2 and then the original input columns 1 and 3 are kept with cut.
I know it can be done a lot easier with for example groupby of pandas of Python, but is there a more elegant way of doing it, while sticking to the use of GNU Coreutils such as sort, join, cut, tr and nl? Preferably I want to avoid a memory inefficient awk solution, but please share those as well. Thanks!
As explained in the comment my solution tries to reduce the number of pipes, unnecessary cat commands and more especially the number of pipeline sort operations since sorting is a complex/time consuming operation:
I reached the following solution where f_grp_sort is the input file:
for elem in $(sort -k2nr f_grp_sort | awk '!seen[$1]++{print $1}')
do
grep $elem <(sort -k2nr f_grp_sort)
done
OUTPUT:
B 6
B 2
A 5
A 4
C 3
C 1
Explanations:
sort -k2nr f_grp_sort will generate the following output:
B 6
A 5
A 4
C 3
B 2
C 1
and sort -k2nr f_grp_sort | awk '!seen[$1]++{print $1}' will generate the output:
B
A
C
the awk will just generate in the same order 1 unique element of the first column of the temporary output.
Then the for elem in $(...)do grep $elem <(sort -k2nr f_grp_sort); done
will grep for lines containing B then A, then C what will provide the required output.
Now as enhancement, you can use a temporary file to avoid doing sort -k2nr f_grp_sort operation twice:
$ sort -k2nr f_grp_sort > tmp_sorted_file && for elem in $(awk '!seen[$1]++{print $1}' tmp_sorted_file); do grep $elem tmp_sorted_file; done && rm tmp_sorted_file
So, this won't work for all cases, but if the values in your first column can be turned into bash variables, we can use dynamically named arrays to do this instead of a bunch of joins. It should be pretty fast.
The first while block reads in the contents of the file, getting the first two space separated strings and putting them into col1 and col2. We then create a series of arrays named like ARR_A and ARR_B where A and B are the values from column 1 (but only if $col1 only contains characters that can be used in bash variable names). The array contains the column 2 values associated with these column 1 values.
I use your fancy sort chain to get the order we want column 1 values to print out in, we just loop through them, then for each column 1 array we sort the values and echo out column 1 and column 2.
The dynamc variable bits can be hard to follow, but for the right values in column 1 it will work. Again, if there's any characters that can't be part of a bash variable name in column 1, this solution will not work.
file=./49142202.txt
while read col1 col2 extra
do
if [[ "$col1" =~ ^[a-zA-Z0-9_]+$ ]]
then
eval 'ARR_'${col1}'+=("'${col2}'")'
else
echo "Bad character detected in Column 1: '$col1'"
exit 1
fi
done < "$file"
sort -k2nr,2 "$file" | sort --stable -k1,1 -u | sort -k2nr,2 | while read col1 extra
do
for col2 in $(eval 'printf "%s\n" "${ARR_'${col1}'[#]}"' | sort -r)
do
echo $col1 $col2
done
done
This was my test, a little more complex than your provided example:
$ cat 49142202.txt
A 4
B 6
C 3
A 5
B 2
C 1
C 0
$ ./run
B 6
B 2
A 5
A 4
C 3
C 1
C 0
Thanks a lot #JeffBreadner and #Allan! I came up with yet another solution, which is very similar to my first one, but gives a bit more control, because it allows for easier nesting with for loops:
for x in $(sort -k2nr,2 $file | sort --stable -k1,1 -u | sort -k2nr,2 | cut -f1); do
awk -v x=$x '$1==x' $file | sort -k2nr,2
done
Do you mind, if I don't accept either of your answers, until I have time to evaluate the time and memory performance of your solutions? Otherwise I would probably just go for the awk solution by #Allan.

Sort output in bash script by number of occurances

So I have a text being outputted that has http status codes in one column and an ip adress in the other. I wan't to sort this by number of occurances so that
1 2 1 3 4 5 4 4
Looks like
4 4 4 1 1 2 3 5
This is for the second column of status codes, the ip adresses dont need to be sorted in any particular order
Since 4 is the most common one it should be first and then 1 and so forth.
However all that I can find is how to use uniq for example in order to count the occurances, thereby removing duplicates and prefixing a number to each row.
The regular sort command does not support this as far as i can tell as well.
Any help would be appreciated
You can still use sort | uniq -c, then interpret the number of occurrences by printing the number the given times by looping:
tr ' ' '\n' < file \
| sort | uniq -c | sort -k1,1nr -k2n \
| while read times status ; do
for i in $(seq 1 $times); do
printf '%s ' $status
done
done

Minimal two column numeric input data for `sort` example, with distinct permutations

What's the least number of rows of two-column numeric input needed to produce four unique sort outputs for the following four options:
1. -sn -k1 2. -sn -k2 3. -sn -k1 -k2 4. -sn -k2 -k1 ?
Here's a 6 row example, (with 4 unique outputs):
6 5
3 7
6 3
2 7
4 4
5 2
As a convenience, a function to count those four outputs given 2 columns of numbers, (requires the moreutils pee command), which prints the number of unique outputs:
# Usage: foo c1_1 c2_1 c1_2 c2_2 ...
foo() { echo "$#" | tr -s '[:space:]' '\n' | paste - - | \
pee "sort -sn -k1 | md5sum" \
"sort -sn -k2 | md5sum" \
"sort -sn -k1 -k2 | md5sum" \
"sort -sn -k2 -k1 | md5sum" | \
sort -u | wc -l ; }
So to count the unique permutations of this input:
8 5
3 5
8 4
Run this:
foo 8 5 3 1 8 3
Output:
2
(Only two unique outputs. Not enough...)
Note: This question was inspired by the obscurity of the current version of the sort manual, specifically COLUMNS=65 man sort | grep -A 17 KEYDEF | sed 3,18d. The info sort page's treatment of KEYDEFs is much better.
KEYDEFs are more useful than they might first seem. The -u or --unique switch works nicely with the KEYDEFs, and in effect allows sort to delete unwanted redundant lines, and therefore can furnish a more concise substitute for certain sed or awk scripts and similar pipelines.
I can do it in 3 by varying the whitespace:
1 1
2 1
1 2
Your foo function doesn't produce this kind of output, but since it was only a "convenience" and not a part of the question proper, I declare this answer correct and minimal!
Sneakier version:
2 1
11 1
2 2
(The last line contains a tab; the others don't.)
With the -s option, I can't exploit non-numeric comparisons, but then I can exploit the stability of the sort:
1 2
2 1
1 1
The 1 1 line goes above both of the others if both fields are compared numerically, regardless of which comparison is done first. The ordering of the two comparisons determines the ordering of the other two lines.
On the other hand, if one of the fields isn't used for comparison, the 1 1 line stays below one of the other lines (and which one that is depends on which field is used for comparison).

Resources