Sorting on two columns in vim - sorting

I have a table that looks something like this:
FirstName SurName;Length;Weight;
I need to sort on length, and if the length is equal for one or more names, I need to sort those on weight. sort ni sorts only on length, I tried sort /.\{-}\ze\dd/ that too, but that didn't work either.
Any help would be greatly appreciated!

This can be done using an external (GNU) sort pretty straightforwardly:
!sort -t ';' -k 2,2n -k 3,3n
This says: split fields by semicolon, sort by 2nd field numerically, then by 3rd field numerically. Probably a lot easier to read and remember than whatever vim-internal command you can cook up.
Much more info on GNU sort here: http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html

Try with the r flag.
Sort on Length:
:%sort rni /.*;\ze\d/
Sort on Weight:
:%sort rni /\d+\ze;$/
Without this flag, the sorting is performed on what comes after the match, which can be a little cumbersome.
With the r flag, the sorting is done on the match itself which may be easier to define. Here, the pattern matches a series of 1 or more digits just before a semicolon at the end of the line.

Related

How to print unique values in order of appearance?

I'm trying to get the unique values
from the list below, but leaving the
unique values in the original order.
This is, the order of appearance.
group
swamp
group
hands
swamp
pipes
group
bellyful
pipes
swamp
emotion
swamp
pipes
bellyful
after
bellyful
I've tried combining sort and uniq commands but the output is sorted alphabetically, and if I don't use sort command, uniq command doesn't work.
$ sort file | uniq
after
bellyful
emotion
group
hands
pipes
swamp
and my desiree output would be like this
group
swamp
hands
pipes
bellyful
emotion
after
How can I do this?
A short, jam-packed awk invocation will get the job done. We'll create an associative array and count every time we've seen a word:
$ awk '!count[$0]++' file
group
swamp
hands
pipes
bellyful
emotion
after
Explanation:
Awk processes the file one line at a time and $0 is the current line.
count is an associative array mapping lines to the number of times we've seen them. Awk doesn't mind us accessing uninitialized variables. It automatically makes count an array and sets the elements to 0 when we first access them.
We increment the count each time we see a particular line.
We want the overall expression to evaluate to true the first time we see a word, and false every successive time. When it's true, the line is printed. When it's false, the line is ignored. The first time we see a word count[$0] is 0, and we negate it to !0 == 1. If we see the word again count[$0] is positive, and negating that gives 0.
Why does true mean the line is printed? The general syntax we're using is expr { actions; }. When the expression is true the actions are taken. But the actions can be omitted; the default action if we don't write one is { print; }.

Removing blankspace at the start of a line (size of blankspace is not constant)

I am a beginner to using sed. I am trying to use it to edit down a uniq -c result to remove the spaces before the numbers so that I can then convert it to a usable .tsv.
The furthest I have gotten is to use:
$ sed 's|\([0-9].*$\)|\1|' comp-c.csv
With the input:
8 Delayed speech and language development
15 Developmental Delay and additional significant developmental and morphological phenotypes referred for genetic testing
4 Developmental delay AND/OR other significant developmental or morphological phenotypes
1 Diaphragmatic eventration
3 Downslanted palpebral fissures
The output from this is identical to the input; it recognises (I have tested it with a simple substitute) the first number but also drags in the prior blankspace for some reason.
To clarify, I would like to remove all spaces before the numbers; hardcoding a simple trimming will not work as some lines contain double/triple digit numbers and so do not have the same amount of blankspace before the number.
Bonus points for some way to produce a usable uniq -c result without this faffing around with blank space.
It's all about writing the correct regex:
sed 's/^ *//' comp-c.csv
That is, replace zero or more spaces at the start of lines (as many as there are) with nothing.
Bonus points for some way to produce a usable uniq -c result without this faffing around with blank space.
The uniq command doesn't have a flag to print its output without the leading blanks. There's no other way than to strip it yourself.

Lexicographically sort file by line, reading right-to-left

I'm looking for a command-line (ideally) solution that lets me sort the lines in a file by comparing each line from right to left.
For example...
Input:
aabc
caab
bcaa
abca
Output:
bcaa
abca
caab
aabc
I'll select the answer which I think will be the easiest to remember in a year when I've forgotten I posted this question, but I'll also upvote clever/short answers as well.
The easiest to remember would be
reverse < input | sort | reverse
You will have to write a reverse command though. Under Linux, there's rev.

shell: What the means of number of sentence

I need to count number of sentences and paragraphs but I do not understand how to do this from a text file.
I can count the number of lines and words using the wc command but I do not understand the meaning of sentence and paragraph in text file. Is there any command in shell do this?
Here's how we count number of words and lines in a text file:
wc -w filename
wc -l filename
For sentences and paragraphs, here is what I tried:
grep -c \\. #to count number of sentences.
grep -o [.'\n'] #to count number of paragraph.
I do not understand how to count number of sentences and paragraphs in a text file.
Any ideas will be helpful.
for example:
Main article: SSID#Security of SSID hiding.
A simple but ineffective method to attempt to secure a wireless network is SSID (Service Set Identifier).[12][13] This provides very little protection against anything but the most casual intrusion efforts...
2 paragraph,and 3 sentence.
A first approximation can be obtained under the assumptions that:
Sentences end with a period and periods are only used for that (no
decimal numbers, no ellipsis, etc.)
Paragraphs are separated with exactly one empty line
(Of course those are not met in reality but it should get you started)
grep -oc \\.
will count the number of sentences, and
grep -c "^$"
will count the number of paragraphs. If your text is strongly formatted you may get to something that works, otherwise, you could consider using Natural Language Processing tools such as NLTK.
To count the number of sentences, you could count the number of peroids, question marks, and exclamation points. But then you run into the problem of an ellipsis (...). I suppose you could only count it if it has whitespace afterwards.
Paragraphs are another matter. Are they indented? How, with a tab? Then count them.
The big question is 'What is the delimiter between sentences and paragraphs?'
When you know that, define the delimiter regex, and count how many are in the file using the tool of your choice.

Sorting lines with numbers and word characters

I recently wrote a simple utility in Perl to count words in a file to determinate its frequency, that's how many times it appears.
It's all fine, but I'd like to sort the result to make it easier to read. An output example would be:
4:an
2:but
5:does
10:end
2:etc
2:for
As you can see, it's ordered by word, not frequency. But with a little help of :sort I could reorganize that. Using n, numbers like 10 go to the right place (even though it start with 1), plus a little ! and the order gets reversed, so the word that appears more is the first one.
:sort! n
10:end
5:does
4:an
2:for
2:etc
2:but
The problem is: when the number is repeated it gets sorted by word — which is nice — but remember, the order was reversed!
for -> etc -> but
How can I fix that? Will I have to use some Vim scripting to iterate over each line checking whether it starts with the previous number, and marking relevant lines to sort them after the number changes?
tac | sort -nr
does this, so select the lines with shift+V and use !
From the vim :help sort:
The details about sorting depend on the library function used. There is no
guarantee that sorting is "stable" or obeys the current locale. You will have
to try it out.
As a result, you might want to perform the sorting in your Perl script instead; it wouldn't be hard to extend your Perl sort to be stable, see perldoc sort for entirely too many details.
If you just want this problem finished, then you can replace your :sort command with this:
!sort -rn --stable (it might be easiest to use Shift-V to visually select the lines first, or use a range for the sort, or something similar, but if you're writing vim scripts, none of this will be news to you. :)

Resources