I have just discovered the command :sort n in vim (how did I not know about that?!), which has almost done exactly what I need.
What I am trying to sort, though, is a long list of IP addresses (it's an "allow hosts" file to be Included into our apache config), and it would be nice for :sort n to be able to recognise that 123.45.6.7 should sort before 123.45.16.7 (for example).
Is it a safe assumption that I should be less OCD about it and not worry, because I'm not going to be able to do this without a mildly-complex sed or awk command or something?
To be clear, the rows all look something like:
Allow from 1.2.3.4
Allow from 5.6.7.8
Allow from 9.10.11.12
etc
Vim sort seems to be stable in practice (but it is not guaranteed). Therefore you can try:
:%sort n /.*\./
:%sort n /\.\d\+\./
:%sort n /\./
:%sort n
Which will sort by number after the last dot (* is greedy), then by number after the first dot following a dot and digits, then by number after the first dot, and last by the first number.
A straightforward way to achieve the correct sorting order without
relying on the stability of the sorting algorithm implemented by the
:sort command, is to prepend zeroes to the numbers within the IP
addresses, so that all of the components in them consist of exactly
three digits.
Prepend zeros to the single-digit and two-digit numbers:
:%s/\<\d\d\?\>/0&/g|%&&
Sort the lines comparing IP addresses as text:
:sort r/\(\d\{3}\)\%(\.\d\{3}\)\{3}/
Strip redundant leading zeros:
:%s/\<00\?\ze\d//g
To run all three steps as a single command, one can use the following
one-liner:
:%s/\<\d\d\?\>/0&/g|%&&|sor r/\(\d\{3}\)\%(\.\d\{3}\)\{3}/|%s/\<00\?\ze\d//g
I'm not a vim user so I can't offer a direct way to do it with builtin commands, however it's possible to replace a section of text with the output of it run through a command. So, a simple script like this could be used:
#!/usr/bin/python
import sys
input_lines = sys.stdin.readlines()
sorted_lines = sorted(input_lines,
key=lambda line: map(int, line.split()[-1].split('.')))
for line in sorted_lines:
sys.stdout.write(line)
See https://www.linux.com/learn/tutorials/442419-vim-tips-working-with-external-commands, section "Filtering text through external filters", which explains how you can use this as a filter within vim.
This script should do what you want and will work on any region where all the selected lines end in an IPv4 address.
You can use:
:%!sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n
-t . means use . as delimiter.
Then sort numerically from 4th column to 1st column.
Related
Need help on how to sort characters or numbers after a period(.)
test2.rod1
test1.rod1
test3.rod1
test1.mor2
test2.mor2
test3.mor2
zbcd1.abc1
abcd2.abc1
dbcd3.abc1
I would like the sort result anything after the period (.). Result should be something like below.
abcd2.abc1
dbcd3.abc1
zbcd1.abc1
test1.mor2
test2.mor2
test3.mor2
test2.rod1
test1.rod1
test3.rod1
If you're using a system with Unix like utilities such as MacOS, Linux, BSD, etc, then you can use the system sort command. The secret is to specify the field delimiter, which in your case is a period. The argument is either -t or --field-separator. So the following should work:
sort -t. -k 2 test.dat
Assuming that your data is in a file called test.dat
I am writing a bash script. I have a file like this
./1#1#d41d8cd98f00b204e9800998ecf8427e
./11.txt#2#d41d8cd98f00b204e9800998ecf8427e
./12/1#1#d41d8cd98f00b204e9800998ecf8427e
./12/1#2#d41d8cd98f00b204e9800998ecf8427e
./12/1.txt#1#d41d8cd98f00b204e9800998ecf8427e
./12/1.txt#2#d41d8cd98f00b204e9800998ecf8427e
./12/2.txt#1#d41d8cd98f00b204e9800998ecf8427e
./12/2.txt#2#d41d8cd98f00b204e9800998ecf8427e
./1#2#d41d8cd98f00b204e9800998ecf8427e
./13#2#d41d8cd98f00b204e9800998ecf8427e
./2.txt#1#5d74727d50368c4741d76989586d91de
./2.txt#2#5d74727d50368c4741d76989586d91de
I would like to sort this file, but in a specific way. Let's call characters up to the first # section one, between the two # characters section two. So for example, given a line like this:
./1#2#d41d8cd98f00b204e9800998ecf8427e
Section one: ./1
Section two: 2
What I want to achive is sorting this file according to section one first and then according to section 2. So what is wrong with this example is the 9th line, it should be 2nd.
Is there an easy way to achieve this goal? I am unsure how to tackle this problem. Maybe I should somehow sort this file up to the first # and then again only according to the second section? Even if this is a good answer, not sure how to do it.
Expected result:
./1#1#d41d8cd98f00b204e9800998ecf8427e
./1#2#d41d8cd98f00b204e9800998ecf8427e
./11.txt#2#d41d8cd98f00b204e9800998ecf8427e
./12/1#1#d41d8cd98f00b204e9800998ecf8427e
./12/1#2#d41d8cd98f00b204e9800998ecf8427e
./12/1.txt#1#d41d8cd98f00b204e9800998ecf8427e
./12/1.txt#2#d41d8cd98f00b204e9800998ecf8427e
./12/2.txt#1#d41d8cd98f00b204e9800998ecf8427e
./12/2.txt#2#d41d8cd98f00b204e9800998ecf8427e
./13#2#d41d8cd98f00b204e9800998ecf8427e
./2.txt#1#5d74727d50368c4741d76989586d91de
./2.txt#2#5d74727d50368c4741d76989586d91de
Seems like you just want to sort by more than one key:
$ sort -t# -k1,1 -k2 file
./1#1#d41d8cd98f00b204e9800998ecf8427e
./1#2#d41d8cd98f00b204e9800998ecf8427e
./11.txt#2#d41d8cd98f00b204e9800998ecf8427e
./12/1#1#d41d8cd98f00b204e9800998ecf8427e
./12/1#2#d41d8cd98f00b204e9800998ecf8427e
./12/1.txt#1#d41d8cd98f00b204e9800998ecf8427e
./12/1.txt#2#d41d8cd98f00b204e9800998ecf8427e
./12/2.txt#1#d41d8cd98f00b204e9800998ecf8427e
./12/2.txt#2#d41d8cd98f00b204e9800998ecf8427e
./13#2#d41d8cd98f00b204e9800998ecf8427e
./2.txt#1#5d74727d50368c4741d76989586d91de
./2.txt#2#5d74727d50368c4741d76989586d91de
-k1,1 means sort by only the first field, then -k2 means sort by the rest of the fields, starting from the second. -t# means that fields are separated by a #.
I have a file that contains information in the following form:
"dog/3/cat/6/fish/2/78/90"
(we'll not worry about the last two values here)
Is it possible to sort the contents of the file by the numeric value after the odd numbered slashes with the unix sort command?
For instance, the output might look like this:
dog/4/house/3/frog/89/100
dog/3/mouse/2/chicken/12/68/80
dog/2/cat/5/bird/12/77/90
This should give you what you want, I think:
sort -t/ -k2,2nr -k4,4nr -k6,6nr
unix numeric sort gives strange results, even when I specify the delimiter.
$ cat example.csv # here's a small example
58,1.49270399401
59,0.000192136419373
59,0.00182092924724
59,1.49270399401
60,0.00182092924724
60,1.49270399401
12,13.080339685
12,14.1531049905
12,26.7613447051
12,50.4592437035
$ cat example.csv | sort -n --field-separator=,
58,1.49270399401
59,0.000192136419373
59,0.00182092924724
59,1.49270399401
60,0.00182092924724
60,1.49270399401
12,13.080339685
12,14.1531049905
12,26.7613447051
12,50.4592437035
For this example, sort gives the same result regardless if you specify the delimiter. I know if I set LC_ALL=C then sort starts to give expected behavior again. But I do not understand why the default environment settings, as shown below, would make this happen.
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
I've read from many other questions (e.g. here, here, and here) how to avoid this behavior in sort, but still, this behavior is incredibly weird and unpredictable and has caused me a week of heartache. Can someone explain why sort with default environment settings on Mac OS X (10.8.5) would behave this way? In other words: what is sort doing (with local variables set to en_US.UTF-8) to get that result?
I'm using
sort 5.93 November 2005
$ type sort
sort is /usr/bin/sort
UPDATE
I've discussed this on the gnu-coreutils list and now understand why sort with english unicode default locale settings gave the output it did. Because in English unicode, the comma character "," is considered a numeric (so as to allow for comma's as thousand's (or e.g. hundreds) separators), and sort defaults to "being greedy" when it interprets a line, it read the example numbers as approximately
581.491...
590.000...
590.001...
591.492...
600.001...
601.492...
1213.08...
1214.15...
1226.76...
1250.45...
Although this was not what I had intended and chepner is right that to get the actual result I want, I need to specify that I want sort to key on only the first field. sort defaults to interpreting more of the line as a key rather than just the first field as a key.
This behavior of sort has been discussed in gnu-coreutil's FAQ, and is further specified in the POSIX description of sort.
So that, as Eric Blake on the gnu-coreutil's list put it, if the field-separator is also a numeric (which a comma is) then "Without -k to stop things, [the field-separator] serves as BOTH a separator AND a numeric character - you are sorting on numbers that span multiple fields."
I'm not sure this is entirely correct, but it's close.
sort -n -t, will try to sort numerically by the given key(s). In this case, the key is a tuple consisting of an integer and a float. Such tuples cannot be sorted numerically.
If you explicitly specify which single keys to sort on with
sort -k1,1n -k2,2n -t,
it should work. Now you are explicitly telling sort to first sort on the first field (numerically), then on the second field (also numerically).
I suspect that -n is useful as a global option only if each line of the input consists of a single numerical value. Otherwise, you need to use the -n option in conjunction with the -k option to specify exactly which fields are numbers.
Use sort --debug to find out what's going on.
I've used that to explain in detail your issue at:
http://lists.gnu.org/archive/html/coreutils/2013-10/msg00004.html
If you use
cat example.csv | sort
instead of
cat example.csv | sort -n --field-separator=,
then it would give correct output. Use this command, hope this is helpful to you.
Note: I tested with "sort (GNU coreutils) 7.4"
I was sent a large list of URL's in an Excel spreadsheet, each unique according to a certain get variable in the string (who's value is a number ranging from 5-7 numbers in length). I am having to run some queries on our databases based on those numbers, and don't want to have to go through the hundreds of entries weeding out the numbers one-by-one. What BASH commands that can be used to parse out the number from each line (it's the only number in each line) and consolidate it down to one line with all the numbers, comma separated?
A sample (shortened) listing of the CVS spreadsheet includes:
http://www.domain.com/view.php?fDocumentId=123456
http://www.domain.com/view.php?fDocumentId=223456
http://www.domain.com/view.php?fDocumentId=323456
http://www.domain.com/view.php?fDocumentId=423456
DocumentId=523456
DocumentId=623456
DocumentId=723456
DocumentId=823456
....
...
The change of format was intentional, as they decided to simply reduce it down to the variable name and value after a few rows. The change of the get variable from fDocumentId to just DocumentId was also intentional. Ideal output would look similar to:
123456,23456,323456,423456,523456,623456,723456,823456
EDIT: my apologies, I did not notice that half way through the list, they decided to get froggy and change things around, there's entries that when saved as CSV, certain rows will appear as:
"DocumentId=098765 COMMENT, COMMENT"
DocumentId=898765 COMMENT
DocumentId=798765- COMMENT
"DocumentId=698765- COMMENT, COMMENT"
With several other entries that look similar to any of the above rows. COMMENT can be replaced with a single string of (upper-case) characters no longer than 3 characters in length per COMMENT
Assuming the variable always on it's own, and last on the line, how about just taking whatever is on the right of the =?
sed -r "s/.*=([0-9]+)$/\1/" testdata | paste -sd","
EDIT: Ok, with the new information, you'll have to edit the regex a bit:
sed -r "s/.*f?DocumentId=([0-9]+).*/\1/" testdata | paste -sd","
Here anything after DocumentId or fDocumentId will be captured. Works for the data you've presented so far, at least.
More simple than this :)
cat file.csv | cut -d "=" -f 2 | xargs
If you're not completely committed to bash, the Swiss Army Chainsaw will help:
perl -ne '{$_=~s/.*=//; $_=~s/ .*//; $_=~s/-//; chomp $_ ; print "$_," }' < YOUR_ORIGINAL_FILE
That cuts everything up to and including an =, then everything after a space, then removes any dashes. Run on the above input, it returns
123456,223456,323456,423456,523456,623456,723456,823456,098765,898765,798765,698765,