How to sort comma separated words in Vim - sorting

In Python code, I frequently run into import statements like this:
from foo import ppp, zzz, abc
Is there any Vim trick, like :sort for lines, to sort to this:
from foo import abc, ppp, zzz

Yep, there is:
%s/import\s*\zs.*/\=join(sort(split(submatch(0), '\s*,\s*')),', ')
The key elements are:
:h :substitute
:h /\zs
:h s/\=
:h submatch()
:h sort()
:h join()
:h split()
To answer the comment, if you want to apply the substitution on a visual selection, it becomes:
'<,'>s/\%V.*\%V\#!/\=join(sort(split(submatch(0), '\s*,\s*')), ', ')
The new key elements are this time:
:h /\%V that says the next character matched shall belong to the visual selection
:h /\#! that I use, in order to express (combined with \%V), that the next character shall not belong to the visual selection. That next character isn't kept in the matched expression.
BTW, we can also use s and i_CTRL-R_= interactively, or put it in a mapping (here triggered on µ):
:xnoremap µ s<c-r>=join(sort(split(#", '\s*,\s*')), ', ')<cr><esc>

Alternatively, you can do the following steps:
Move the words you want to sort to the next line:
from foo import
ppp, zzz, abc
Add a comma at the end of the words list:
from foo import
ppp, zzz, abc,
Select the word list for example with Shift-v. Now hit : and then enter !xargs -n1 | sort | xargs. It should look like this:
:'<,'>!xargs -n1 | sort | xargs
Hit Enter.
from foo import
abc, ppp, zzz,
Now remove the trailing comma and merge the word list back to the original line (for example with Shift-j).
from foo import abc, ppp, zzz
There are Vim plugins, which might be useful to you:
AdvancedSorters : Sorting of certain areas or by special needs.

I came here looking for a fast way to sort comma separated lists, in general, e.g.
relationships = {
'project', 'freelancer', 'task', 'managers', 'team'
}
My habit was to search/replace spaces with newlines and invoke shell sort but that's such a pain.
I ended up finding Chris Toomey's sort-motion plugin, which is just the ticket: https://github.com/christoomey/vim-sort-motion. Highly recommended.

Why not try vim-isort ? https://github.com/fisadev/vim-isort
I use that and vim-yapf-format for beautify the code :) https://github.com/pignacio/vim-yapf-format

Select the comma-separated text in visual mode, :, and run this command:
'<,'>!tr ',' '\n' | sort -f | paste -sd ','
🎩-tip this comment

Related

Regex and/or sed to replace lowercase

I have a text file with a single column of data. Take the following data for example
united states
germany
france
canada
Of which I am trying to generate all possible mixed case variations. For example the new file might look like this
United states
uNited states
unIted states
uniTed states
unitEd states
uniteD stated
united States
united sTates
united stAtes
united staTes
united statEs
united stateS
UNited states
And so on until all possible case variations of each word have been generated.
Given the above input and expected output I have three questions
Is regex and sed the right tool for this job?
What alternatives do I have to regex and sed for this task?
If I did use regex and sed what might the correct syntax look like?
1) No
2) Awk and substr()
3) You wouldn't
Start with this:
$ echo 'foo' |
awk '{
for (i=1;i<=length($0);i++) {
print substr($0,1,i-1) toupper(substr($0,i,1)) substr($0,i+1)
}
}'
Foo
fOo
foO
and massage to suit with the obvious logic.
For the fun of sed.
1) Yes. (e.g. GNU sed version 4.2.1)
2) Maybe awk, perl
3) See code below
sed -E "s/^.*$/\n&#\n/;:a;s/\n([^#\n]*)([^#\n])#([^#\n]*)\n/\n\1#\u\2\3\n\1#\l\2\3\n/;ta;s/(^\n#|\n$)//g;s/\n#/\n/g;"
This does assume that "#" is not part of the strings found in the file.
create a certain pattern
(start and end with newline; mark the cursor with #)
start a loop
replace text between newlines and containing the cursor by same text twice,
once with uppercase before cursor, once with lower case
move cursor one towards the start
loop if that replaced something
remove newlines at start and end and cursors
Note that # is not special. It just needs to be a character wich will not occur in input and not in desired output. Hopefully you can find a special character.
If you can have all characters, it gets complicated. Look at the comments to this answer. There probably is a discussion going on.
Output (for input "foo"):
FOO
fOO
FoO
foO
FOo
fOo
Foo
foo

Command line - Convert any string into an identifier?

Often I'd like to be able to take a string at the command line (bash), and convert it into an identifier. Usually this is for use in a filename, branch name, or variable name, and I prefer that it:
has no spaces in it
has no special characters in it
So for example, I could take a string like so:
bug fix for #PROJECT1 item 52, null pointer
and convert it to something like this:
bug_fix_for_PROJECT1_item_52__null_pointer
I'm open to solutions in any language, e.g. bash, node, perl, python, etc, but prefer languages that are installed by default on most linux/osx machines.
You could so something like this :
original="bug fix for #PROJECT1 item 52, null pointer"
sanitized=${original//[^[:alnum:]]/_}
echo "$sanitized"
Le me break that down a bit :
${VAR_NAME//SEARCH/REPLACE} searches and replaces all occurrences of SEARCH and performs the replacement.
[^[:alnum:]] means any character that is NOT alphabetic or numeric. The "NOT" part is the ^
The outer brackets indicate that the expression refers to one character chosen among the different possibilities listed inside the bracket (see below for how to use this to your advantage).
This could be tailored to do something a bit more subtle if desired. Remember UNIX-like systems accept almost any character in file names (even newlines), so you are not restricted to letters and digits.
For instance, suppose you want to keep periods and commas in file names. You could change the replacement statement :
sanitized=${original//[^[:alnum:].,]/_}
The modified part ([^[:alnum:].,]) means "anything that is not an alphanumeric character, and not a period, and not a comma". You can add any other character you want to avoid replacing using regular expression syntax, but it is key that you keep the outer brackets.
Just an alternate variation in perl command-line substitution, to have exactly one _ between words and not have consecutive characters like __
perl -ple 's/[^\w]/_/g;' -pe 's/__/_/g' <<<"bug fix for #PROJECT1 item 52, null pointer"
bug_fix_for_PROJECT1_item_52_null_pointer
and a simple snippet in python as
>>> import re
>>> re.sub('[^0-9a-zA-Z]+','_','bug fix for #PROJECT1 item 52, null pointer')
'bug_fix_for_PROJECT1_item_52_null_pointer'
Did you try tr?
echo 'bug fix for #PROJECT1 item 52, null pointer' | tr -d [:punct:] | tr '[:blank:]' '_'
bug_fix_for_PROJECT1_item_52_null_pointer

deselect text in vim, like grep -v

I would like to immitate the following pattern of searching in vim:
grep "\<[0-9]\>" * | grep -v "666"
I can highlight all numbers using
/\<[0-9]\>"
but then how can I tell vim to remove from the highlighted text the ones that match the expression
/666
Can this be done in Visual Studio at least ?
You cannot sequentially filter the matches like in the shell, so you need to use advanced regular expression features to combine both into a single one.
Basically, you need to assert a non-match of 666 at the match position. That's achieved with the \#! atom (in other regular expression dialects, that's often written as (?!...)):
/\%(\d*666\d*\)\#!\<\d\+\>
Note: If you want to only exclude 666, but not 6666 etc. you need to specify \<666\> instead in the first part.
I've used \d instead of [0-9]; you can further strip down the \ use with the \v "very magic" modifier:
/\v(\d*666\d*)#!<\d+>
Of course, /666 doesn't match that expression.
Assuming, though, that you had e.g. \d\+ and wanted to exclude 666, you can use the negative lookahead:
\v((666)#!\d)+
This uses
\v for very magic (reducing the number of \ escapes)
\#! for "negative zero-width look-ahead assertion"

How to display the non-whitespace character count of a visual selection in Vim?

I want to count the characters without whitespace of a visual selection.
Intuitively, I tried the following
:'<,'>w !tr -d [:blank:] | wc -m
But vim does not like it.
This is possible with the following substitute command:
:'<,'>s/\%V\S//gn
The two magical ingredients are
the n flag of the substitute command. What it does is
Report the number of matches, do not actually substitute. (...) Useful to count items.
See :h :s_flags, and check out :h count-items, too.
the zero-width atom \%V. It matches only inside the Visual selection. As a zero-width match it makes an assertion about the following atom \S "non-space", which is to match only when inside the Visual selection. See :h /\%V.
The whole command thus substitutes :s nothing // for every non-whitespace character \S inside the Visual selection \%V, globally g – only that it doesn't actually carry out any substitutions but instead reports how many times it would have!
In order to count the non-whitespace characters within a visual selection in vim, you could do a
:'<,'>s/\S/&/g
Vim will then tell how many times it replaced non-whitespace characters (\S) with itself (&), that is without actually changing the buffer.
You must escape special character for the shell, and use [:space:] better because it will delete also the newline character. It should be:
:'<,'>w !tr -d '[:space:]' | wc -m

display consolidated list of numbers from a CSV using BASH

I was sent a large list of URL's in an Excel spreadsheet, each unique according to a certain get variable in the string (who's value is a number ranging from 5-7 numbers in length). I am having to run some queries on our databases based on those numbers, and don't want to have to go through the hundreds of entries weeding out the numbers one-by-one. What BASH commands that can be used to parse out the number from each line (it's the only number in each line) and consolidate it down to one line with all the numbers, comma separated?
A sample (shortened) listing of the CVS spreadsheet includes:
http://www.domain.com/view.php?fDocumentId=123456
http://www.domain.com/view.php?fDocumentId=223456
http://www.domain.com/view.php?fDocumentId=323456
http://www.domain.com/view.php?fDocumentId=423456
DocumentId=523456
DocumentId=623456
DocumentId=723456
DocumentId=823456
....
...
The change of format was intentional, as they decided to simply reduce it down to the variable name and value after a few rows. The change of the get variable from fDocumentId to just DocumentId was also intentional. Ideal output would look similar to:
123456,23456,323456,423456,523456,623456,723456,823456
EDIT: my apologies, I did not notice that half way through the list, they decided to get froggy and change things around, there's entries that when saved as CSV, certain rows will appear as:
"DocumentId=098765 COMMENT, COMMENT"
DocumentId=898765 COMMENT
DocumentId=798765- COMMENT
"DocumentId=698765- COMMENT, COMMENT"
With several other entries that look similar to any of the above rows. COMMENT can be replaced with a single string of (upper-case) characters no longer than 3 characters in length per COMMENT
Assuming the variable always on it's own, and last on the line, how about just taking whatever is on the right of the =?
sed -r "s/.*=([0-9]+)$/\1/" testdata | paste -sd","
EDIT: Ok, with the new information, you'll have to edit the regex a bit:
sed -r "s/.*f?DocumentId=([0-9]+).*/\1/" testdata | paste -sd","
Here anything after DocumentId or fDocumentId will be captured. Works for the data you've presented so far, at least.
More simple than this :)
cat file.csv | cut -d "=" -f 2 | xargs
If you're not completely committed to bash, the Swiss Army Chainsaw will help:
perl -ne '{$_=~s/.*=//; $_=~s/ .*//; $_=~s/-//; chomp $_ ; print "$_," }' < YOUR_ORIGINAL_FILE
That cuts everything up to and including an =, then everything after a space, then removes any dashes. Run on the above input, it returns
123456,223456,323456,423456,523456,623456,723456,823456,098765,898765,798765,698765,

Resources