Sort file by one key only - bash

I have large log file comprised of input from many sources, with each line prefixed with the hostname. The log is the output of operations happening in parallel across many hosts, so the logs are somewhat jumbled together.
What I'd like to do is sort the logs by hostname and nothing else so that the events for each server still show up natural order. The sort docs below seem to imply that -k1,1 should accomplish this, but still result in the lines being fully sorted.
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line)
I've made a simple test file:
1 grape
1 banana
2 orange
3 lemon
1 apple
and the expected output would be:
1 grape
1 banana
1 apple
2 orange
3 lemon
But the observed output is:
$ sort -k1,1 sort_test.txt
1 apple
1 banana
1 grape
2 orange
3 lemon

sort -s -k 1,1 sort_test.txt
The -s disables 'last-resort' sorting, which sorts on everything that wasn't part of a specified key.

Related

Linux sort by column and in reverse order

I'm trying to sort a file by second column, but in reverse order.
I tried:
sort -k2n -r file.txt
The output is not in reverse order, so it seems -r is being ignored.
I'm in CentOS.
Try to add a space after the -k and before the column position. e.g. something like below
sort -k 2n -r file.txt
I just needed to remove the "n" next to column number:
sort -k 2 -r file.txt
Say we have this.txt
one 1
two 2
three 3
four 4
five 5
Now simply do
$ sort -k2,2nr this.txt
five 5
four 4
three 3
two 2
one 1

Combining lines with same string in Bash

I have a file with a bunch of lines that looks like this:
3 world
3 moon
3 night
2 world
2 video
2 pluto
1 world
1 pluto
1 moon
1 mars
I want to take each line that contains the same word, and combine them while adding the preceding number, so that it looks like this:
6 world
4 moon
3 pluto
3 night
2 video
1 mars
I've been trying combinations with sed, but I can't seem to get it right. My next idea was to sort them, and then check if the following line was the same word, then add them, but I couldn't figure out how to get it to sort by word rather than the number.
Sum and sort:
awk -F" " '{c[$2]+=$1} END {for (i in c){print c[i], i}}' | sort -n -r

Why does the shell sort return weird characters?

I use the following command in order to sort the numerical key-value pairs in the input file. Moreover, I need only a single value for each key. In case there are more values for the same key, I intend to select the minimal one.
Input:
2 20
1 10
2 19
Output:
1 10
2 19
I use this shell command:
sort -n -k1 -k2 $MYFILE | sort -n -u -k1
Everything works fine for small inputs (hundreds of pairs). I tried generated a ~3GB file in order to measure time required to do the sorting but I was nothing but disappointed when the end of the output was similar to this:
%T3�����P����
�6">�<�_!r�=_G�A������O<Ce۱��؉l6���3�$a8�����(_ē����7*���&���x���q&�n�PK����h�>�o�a��t�����,o�^��m��l�192�,����N)�$�)� *i�7�-������k�i���P�W�G
W��㛼�C��E���Ә3�)L
�i�����Q�X����/-S�9�
!�Y��EJ<�.�Q�SwMj��"�rÍI�f�y-P�ؚ;Yz
Where is the problem? Is the input too large and the sord command can't process it? Or maybe the pipe is the problem?

Frequency count of particular field appended to line without deleting duplicates

Trying to work out how to get a frequency appended or prepended to each line in a file WITHOUT deleting duplicate occurrences (which uniq can do for me).
So, if input file is:
mango
mango
banana
apple
watermelon
banana
I need output:
mango 2
mango 2
banana 2
apple 1
watermelon 1
banana 2
All the solutions I have seen delete the duplicates. In other words, what I DON'T want is:
mango 2
banana 2
apple 1
watermelon 1
Basically you cannot do it in one pass without keeping everything in memory. If this is what you want to do, then use python/perl/awk/whatever. The algorithm is quite simple.
Let's do it with standard Unix tools. This is a bit cumbersome and can be improved but should do the work:
$ sort input | uniq -c > input.count
$ nl input | sort -k 2 > input.line
$ join -1 2 -2 2 input.line input.count | sort -k 2 | awk '{print $1 " " $3}
The first step is to count the number occurrences of a given word.
As you said you cannot both repeat and keep line ordering. So we have to fix that. The second step prepends the line number that we will use later to fix the ordering issue.
In the last step, we join the two temporary files on the original word, the second column contains the original line number sort we sort on this key and strip it from the final output.

how sort file with tab delimiter

Now I generate one text file,band the values are stored as \t
value1 value2 valu3.
And I want to sort this text file as the value1
sort a.txt -o a.txt1
And found it happen wrong
google 1 1
google 1 2
google 1 3
=google 1 4
google 1 3
found =google was inserted between google.Why it happened,so stranged.
And I tried sort a.txt -t $'\t' -k 1 -o a.txt1 but it has the same issue.
Your locale apparently specifies that = should be ignored when sorting. Try to replace sort with LC_ALL=C sort. This will run sort with the environment variable LC_ALL temporarily set to C, which will override your locale (in any locale-aware program) to the "traditional" / legacy locale-ignorant "C" locale.
sort -n x.txt
google 1 1
google 1 2
google 1 3
google 1 3
=google 1 4

Resources