Command line - Convert any string into an identifier? - bash

Often I'd like to be able to take a string at the command line (bash), and convert it into an identifier. Usually this is for use in a filename, branch name, or variable name, and I prefer that it:
has no spaces in it
has no special characters in it
So for example, I could take a string like so:
bug fix for #PROJECT1 item 52, null pointer
and convert it to something like this:
bug_fix_for_PROJECT1_item_52__null_pointer
I'm open to solutions in any language, e.g. bash, node, perl, python, etc, but prefer languages that are installed by default on most linux/osx machines.

You could so something like this :
original="bug fix for #PROJECT1 item 52, null pointer"
sanitized=${original//[^[:alnum:]]/_}
echo "$sanitized"
Le me break that down a bit :
${VAR_NAME//SEARCH/REPLACE} searches and replaces all occurrences of SEARCH and performs the replacement.
[^[:alnum:]] means any character that is NOT alphabetic or numeric. The "NOT" part is the ^
The outer brackets indicate that the expression refers to one character chosen among the different possibilities listed inside the bracket (see below for how to use this to your advantage).
This could be tailored to do something a bit more subtle if desired. Remember UNIX-like systems accept almost any character in file names (even newlines), so you are not restricted to letters and digits.
For instance, suppose you want to keep periods and commas in file names. You could change the replacement statement :
sanitized=${original//[^[:alnum:].,]/_}
The modified part ([^[:alnum:].,]) means "anything that is not an alphanumeric character, and not a period, and not a comma". You can add any other character you want to avoid replacing using regular expression syntax, but it is key that you keep the outer brackets.

Just an alternate variation in perl command-line substitution, to have exactly one _ between words and not have consecutive characters like __
perl -ple 's/[^\w]/_/g;' -pe 's/__/_/g' <<<"bug fix for #PROJECT1 item 52, null pointer"
bug_fix_for_PROJECT1_item_52_null_pointer
and a simple snippet in python as
>>> import re
>>> re.sub('[^0-9a-zA-Z]+','_','bug fix for #PROJECT1 item 52, null pointer')
'bug_fix_for_PROJECT1_item_52_null_pointer'

Did you try tr?
echo 'bug fix for #PROJECT1 item 52, null pointer' | tr -d [:punct:] | tr '[:blank:]' '_'
bug_fix_for_PROJECT1_item_52_null_pointer

Related

What is the meaning of this BASH SED command?

Example of tnum ... HYH19986_T_DRIVER_BAG_PRESSURE__78ms_546ms
tnum=`echo $1 | sed -e 's/_.*$//'`
The end result is that tnum will eventually become HYH19986. I have absolutely no experience of BASH but a quick search found that SED is the stream editor and essentially a find an replace too.
Please could someone explain to me what everything means from the -e onwards? Thank you.
Sed is the "stream editor". It is a non-interactive text editor, that takes commands to edit text. It's most commonly used command is "s", short for "substitute". This takes two expressions and optionally some options, and replaces the first expression with the second one.
The character after the "s" is the delimiter - it separates the expressions. Typically this is "/", but if you are working e.g. with paths it might be nicer to use something different like : or _ so you don't need to escape every /.
The _.*$ is a regular expression. Sed matches this, and replaces it with the second expression, the bit between the second and third slash, i.e. nothing in this case.
_ is a literal underline, .* is "any number of characters" and $ is the end of the line.
After that third slash you could also give options, like "g" (I remember it as "global"), which would cause this to be run multiple times per line. That's missing, but in this case the expression matches to the end of the line anyway, so nothing would change.
So this substitutes anything after an underline with nothing, which results in trimming it.
s/pattern/repl/ replaces the first occurrence of the pattern with the string repl. _.*$ matches a literal _ followed by the longest string of zero or more of any character (.*) up to the end of the line ($). So this just deletes everything from and including the first underscore to the end of the line.

Delete a column in a log, that changes position in the line, Powershell

I have a log file, and I want to get rid of the third column that start with "external", this column is not always in the third place so I need to find the word "external" and then delete it with the string that follows the colon.
I was thinking in using -replace for that, but does "-replace" accept some regex to delete the rest of the string (after the semicolons) that is always changing?
or maybe there is a better way to do this?
02/02/2020 name:VAL_NATURE external:af2045b2-5992-432e-b790-c1ad4743038 status:good
cat mylog.log | %{$_ -replace "external???",""}
With any delimited file, the first thought I have is to break it at the delimiters (in your case, the white space) and treat it like an object. Deleting a column is trivial if you do that, and it lets you have easy access to the data for other purposes.
If, however, your only task is to remove that column with 'external' + colon + all text up to the next bit of white space, that is an easy thing to do with a regex replace.
$line = '02/02/2020 name:VAL_NATURE external:af2045b2-5992-432e-b790-c1ad4743038 status:good'
$line -replace 'external:.*\s',''
EDIT: Tested the code above, and got this output:
02/02/2020 name:VAL_NATURE status:good
The . is any character, and .* says "any character zero or more times" it continues matching until it gets to whitespace, which is represented by the \s. So this regex matches the word 'external' followed by a ':' followed by zero or more other characters followed by whitespace (space/tab/etc).

grep wildcards issue ubuntu

I have an input file named test which looks like this
leonid sergeevich vinogradov
ilya alexandrovich svintsov
and when I use grep like this grep 'leonid*vinogradov' test it says nothing, but when I type grep 'leonid.*vinogradov' test it gives me the first string. What's the difference between * and .*? Because I see no difference between any number of any characters and any character followed by any number of any characters.
I use ubuntu 14.04.3.
* doesn't match any number of characters, like in a file glob. It is an operator, which indicates 0 or more matches of the previous character. The regular expression leonid*vinogradov would require a v to appear immediately after 0 or more ds. The . is the regular expression metacharcter representing any single character, so .* matches 0 or more arbitrary characters.
grep uses regex and .* matches 0 or more of any characters.
Where as 'leonid*vinogradov' is also evaluated as regex and it means leoni followed by 0 or more of letter d hence your match fails.
It's Regular Expression grep uses, short as regexp, not wildcards you thought. In this case, "." means any character, "" means any number of (include zero) the previous character, so "." means anything here.
Check the link, or google it, it's a powerful tool you'll find worth to knew.

deselect text in vim, like grep -v

I would like to immitate the following pattern of searching in vim:
grep "\<[0-9]\>" * | grep -v "666"
I can highlight all numbers using
/\<[0-9]\>"
but then how can I tell vim to remove from the highlighted text the ones that match the expression
/666
Can this be done in Visual Studio at least ?
You cannot sequentially filter the matches like in the shell, so you need to use advanced regular expression features to combine both into a single one.
Basically, you need to assert a non-match of 666 at the match position. That's achieved with the \#! atom (in other regular expression dialects, that's often written as (?!...)):
/\%(\d*666\d*\)\#!\<\d\+\>
Note: If you want to only exclude 666, but not 6666 etc. you need to specify \<666\> instead in the first part.
I've used \d instead of [0-9]; you can further strip down the \ use with the \v "very magic" modifier:
/\v(\d*666\d*)#!<\d+>
Of course, /666 doesn't match that expression.
Assuming, though, that you had e.g. \d\+ and wanted to exclude 666, you can use the negative lookahead:
\v((666)#!\d)+
This uses
\v for very magic (reducing the number of \ escapes)
\#! for "negative zero-width look-ahead assertion"

How to display the non-whitespace character count of a visual selection in Vim?

I want to count the characters without whitespace of a visual selection.
Intuitively, I tried the following
:'<,'>w !tr -d [:blank:] | wc -m
But vim does not like it.
This is possible with the following substitute command:
:'<,'>s/\%V\S//gn
The two magical ingredients are
the n flag of the substitute command. What it does is
Report the number of matches, do not actually substitute. (...) Useful to count items.
See :h :s_flags, and check out :h count-items, too.
the zero-width atom \%V. It matches only inside the Visual selection. As a zero-width match it makes an assertion about the following atom \S "non-space", which is to match only when inside the Visual selection. See :h /\%V.
The whole command thus substitutes :s nothing // for every non-whitespace character \S inside the Visual selection \%V, globally g – only that it doesn't actually carry out any substitutions but instead reports how many times it would have!
In order to count the non-whitespace characters within a visual selection in vim, you could do a
:'<,'>s/\S/&/g
Vim will then tell how many times it replaced non-whitespace characters (\S) with itself (&), that is without actually changing the buffer.
You must escape special character for the shell, and use [:space:] better because it will delete also the newline character. It should be:
:'<,'>w !tr -d '[:space:]' | wc -m

Resources