Stable sorting two files into one with the duplicates - bash

I've been trying to sort two files and get the output.
say for file 1:
102310863||7097881||6845123||271640||06007709532577||||
102310875||7092992||6840818||023740||10034500635650||||
and file 2:
102310863||7097881||6845193||271640||06007709532577||||
102310875||7092992||6840808||023740||10034500635650||||
The desired output is:
102310863||7097881||6845123||271640||06007709532577||||
102310863||7097881||6845193||271640||06007709532577||||
102310875||7092992||6840818||023740||10034500635650||||
102310875||7092992||6840808||023740||10034500635650||||
I've been trying to use the sort command
sort -t \| -n -k1,1 t1.txt t2.txt
but it is giving me the output
102310863||7097881||6845123||271640||06007709532577||||
102310863||7097881||6845193||271640||06007709532577||||
102310875||7092992||6840808||023740||10034500635650||||
102310875||7092992||6840818||023740||10034500635650||||
which is not what I want because original file order is not preserved.
Is there any other way of doing it to get the desired output?

Using the -s flag performs a stable sort.
sort -s -t \| -k1,1 t1.txt t2.txt
From man sort:
-s, --stable
stabilize sort by disabling last-resort comparison

Related

How to sort by numbers that are part of a filename in bash?

I'm trying to assign a variable in bash to the file in this directory with the largest number before the '.tar.gz' and I'm drawing a complete blank on the best way to approach this:
ls /dirname | sort
daily-500-12345.tar.gz
daily-500-12345678.tar.gz
daily-500-987654321.tar.gz
weekly-200-1111111.tar.gz
monthly-100-8675309.tar.gz
sort -Vrt - -k3,3
-V Natural sort
-r Reverse, so you can use head -1 to get the first line only
-t - Use hyphen as field separator
-k3,3 Sort using only the third field
Output:
daily-500-987654321.tar.gz
daily-500-12345678.tar.gz
monthly-100-8675309.tar.gz
weekly-200-1111111.tar.gz
daily-500-12345.tar.gz

sorting file names ascending where names have a dash in bash

I have a list of files in a folder.
The names are:
1-a
100-a
2-b
20-b
3-x
and I want to sort them like
1-a
2-b
3-x
20-b
100-a
The files are always a number, followed by a dash, followed by anything.
I tried a ls with a col and sort and it works, but I wanted to know if there's a simpler solution.
Forgot to mention: This is bash running on a Mac OS X.
Some ls implementations, GNU coreutils' ls is one of them, support the -v (natural sort of (version) numbers within text) option:
% ls -v
1-a 2-b 3-x 20-b 100-a
or:
% ls -v1
1-a
2-b
3-x
20-b
100-a
Use sort to define the fields.
sort -s -t- -k1,1n -k2 filenames.txt
The -t tells sort to treat - as the field separator in input items. -k1,1n instructs sort to first sort on the first field numerically; -k2 sorts using the remaining fields as the second key in cade the first fields are equal. -s keeps the sort stable (although you could omit it since the entire input string is being used in one field or another).
(Note: I'm assuming the file names do not contain newlines, so that something like ls > filenames.txt is guaranteed to produce a file with one name per line. You could also use ls | sort ... in that case.)

How to simulate "sort -V" on macOS

I have written a bash script that I need to work identically on linux and macOS that relies on the sort command. I am piping the output of git tag -l to sort, to get a list of all the version tags in the correct semantic order. GNU offers -V which makes this automagic but macOS does not support this argument, so I need to figure out how to accomplish this sort order without it.
6.3.1.1
6.3.1.10
6.3.1.11
6.3.1.2
6.3.1.3
...
needs to be sorted as
6.3.1.1
6.3.1.2
6.3.1.3
...
6.3.1.10
6.3.1.11
You can use additional features of git tag to get a list of tags matching a pattern and sorted properly for version tag ordering (typically no leading zeros):
$ git tag --sort v:refname
v0.0.0
v0.0.1
v0.0.2
v0.0.3
v0.0.4
v0.0.5
v0.0.6
v0.0.7
v0.0.8
v0.0.9
v0.0.10
v0.0.11
v0.0.12
From $ man git-tag:
--sort=<type>
Sort in a specific order. Supported type is "refname
(lexicographic order), "version:refname" or "v:refname"
(tag names are treated as versions). Prepend "-" to reverse
sort order. When this option is not given, the sort order
defaults to the value configured for the tag.sort variable
if it exists, or lexicographic order otherwise. See
git config(1).
You can download coreUtils from http://rudix.org/packages/index.html
It contains gnusort with support sort -V sintax
sed 's/\b\([0-9]\)\b/0\1/g' versions.txt | sort | sed 's/\b0\([0-9]\)/\1/g'
To explain why this works, consider the first sed command by itself. With your input as versions.txt, the first sed command adds a leading zero onto single-digit version numbers, producing:
06.03.01.01
06.03.01.02
06.03.01.03
06.03.01.10
06.03.01.11
The above can be sorted normally. After that, it is a matter of removing the added characters. In the full command, the last sed command removes the leading zeros to produce the final output:
6.3.1.1
6.3.1.2
6.3.1.3
6.3.1.10
6.3.1.11
The works as long as version numbers are 99 or less. If you have version numbers over 99 but less than 1000, the command gets only slightly more complicated:
sed 's/\b\([0-9]\)\b/00\1/g ; s/\b\([0-9][0-9]\)\b/0\1/g' versions.txt | sort | sed 's/\b0\+\([0-9]\)/\1/g'
As I don't have a Mac, the above were tested on Linux.
UPDATE: In the comments, Jonathan Leffler says that even though word boundary (\b) is in Mac regex docs, Mac sed doesn't seem to recognize it. He suggests replacing the first sed with:
sed 's/^[0-9]\./0&/; s/\.\([0-9]\)$/.0\1/; s/\.\([0-9]\)\./.0\1./g; s/\.\([0-9]\)\./.0\1./g'
So, the full command might be:
sed 's/^[0-9]\./0&/; s/\.\([0-9]\)$/.0\1/; s/\.\([0-9]\)\./.0\1./g; s/\.\([0-9]\)\./.0\1./g' versions.txt | sort | sed 's/^0// ; s/\.0/./g'
This handles version numbers up to 99.
The standard sort that comes installed on OS X can sort by fields using a separator. So you can sort the version numbers and any suffixes.
This will sort by suffix first and then by the X.Y.Z parts sort -s -t- -k 2,2n | sort -t. -s -k 1,1n -k 2,2n -k 3,3n -k 4,4n, which can also sort the -N-g format version number from the git describe --tags command
0.11.1
0.11.4
0.11.9-1-ge6b0c59
0.12.0
0.12.1
0.12.2-1-g2d0a334
0.13.0
0.13.0-1-g7711b16
0.13.0-2-g32f91bd
0.13.0-3-g83e21c5
0.14.1-alpha
0.14.1
0.14.2
The -3-g83e21c5 above is an example of a suffix that the git describe --tags command will automatically append to the latest tag to to signify the number of commits since the tag (3), and the Git SHA hash of the most recent commit (83e21c5)
To reverse the sort into descending order do this: sort -s -t- -k 2,2nr | sort -t. -s -k 1,1nr -k 2,2nr -k 3,3nr -k 4,4nr
Or you can define a shell function around it.
version_sort() {
# read stdin, sort by version number descending, and write stdout
# assumes X.Y.Z version numbers
# this will sort tags like pr-3001, pr-3002 to the END of the list
# and tags like 2.1.4 BEFORE 2.1.4-gitsha
sort -s -t- -k 2,2nr | sort -t. -s -k 1,1nr -k 2,2nr -k 3,3nr -k 4,4nr
}
or write it into a little file named version-sort, and put into some directory on your PATH. Be sure to chmod +x on the file
#!/usr/bin/env bash
sort -s -t- -k 2,2nr | sort -t. -s -k 1,1nr -k 2,2nr -k 3,3nr -k 4,4nr
brew install coreutils
If corutils are installed you should have gsort on your Mac
gsort --version

"sort -c -k1" compares second field too?

"sort" correctly reports these two lines are out of order:
> echo "a b\na a" | sort -c
sort: -:2: disorder: a a
How do I tell sort to compare only the first field of each line? I tried:
> echo "a b\na a" | sort -c -k1
sort: -:2: disorder: a a
but it failed, as above.
Can I make sort compare the first field of each line only, or must I
used something like sed to trim the lines before comparing them?
EDIT: I'm using "sort (GNU coreutils) 7.2". I tried using a different field separator but it didn't help:
> echo "a b\na a" | sort -k1 -c -t" "
sort: -:2: disorder: a a
although I'm pretty sure space is the default separator anyway.
The following works as expected:
echo "a b\na a" | sort -s -c -k1,1
There were two problems with your sort invocation:
The argument to -k is a key definition that specifies a start and end position. If end position is omitted, it defaults to the last field of the line, not the start field. -k1,1 specifies both, telling sort not to include the second field in the comparison.
sort is not stable by default, which means it doesn't guarantee not to disturb the order of lines that compare equal. Quoting the documentation:
Finally, as a last resort when all keys compare equal, sort compares
entire lines as if no ordering options other than --reverse (-r)
were specified. The --stable (-s) option disables this
"last-resort comparison" so that lines in which all fields compare
equal are left in their original relative order.

How to sort the output obtained with grep -c?

I use the following 'grep' command to get the count of the string alert in each of my files at the given path:
grep 'alert' -F /usr/local/snort/rules/* -c
How do I sort the resulting output in desired order- say ascending order, descending order, ordered by name, etc. An answer specific to these cases is sufficient.
You may freely suggest a command other than grep as well.
Pipe it into sort. Assuming your filenames have no colons, use the "-t" option to specify the colon as field saparator. Use -n for numerical sorting.
Example:
grep 'alert' -F /usr/local/snort/rules/* -c | sort -t: -n -k2
should split lines into fields separated by ":", use the second field for sorting, and treat this as numbers (so 21 is actually later than 3).

Resources