How to remove special characters from strings but keep underscores in shell script - bash

I have a string that is something like "info_A!__B????????C_*". I wan to remove the special characters from it but keep underscores and letters. I tried with [:word:] (ASCII letters and _) character set, but it says "invalid character set". any idea how to handle this ? Thanks.
text="info_!_????????_*"
if [ -z `echo $text | tr -dc "[:word:]"` ]
......

Using bash parameter expansion:
$ var='info_A!__B????????C_*'
$ echo "${var//[^[:alnum:]_]/}"
info_A__BC_

A sed one-liner would be
sed 's/[^[:alnum:]_]//g' <<< 'info_!????????*'
gives you
info_
An awk one-liner would be
awk '{gsub(/[^[:alnum:]_]/,"",$0)} 1' <<< 'info_!??A_??????*pi9ngo^%$_mingo745'
gives you
info_A_pi9ngo_mingo745
If you don't wish to have numbers in the output then change :alnum: to :alpha:.

My tr doesn't understand [:word:]. I had to do like this:
$ x=$(echo 'info_A!__B????????C_*' | tr -cd '[:alnum:]_')
$ echo $x
info_A__BC_

Not sure if its robust way but it worked for your sample text.
sed one-liner:
echo "SamPlE_#tExT%, really ?" | sed -e 's/[^a-z^A-Z|^_]//g'
SamPlE_tExTreally

Related

How to loop comma separated values in shell script

I tried to loop comma separated values with space, but not able to get the exact value since it has space in the string.
I tried in different ways, but i not able to get desired results.
Can anyone help me on this
#!/bin/ksh
values="('A','sample text','Mark')"
for i in `echo $values | sed 's/[)(]//g' | sed 's/,/ /g'`
do
echo $i
done
My expected output is:
A
sample text
Mark
First, change values to an array. Then iterating over it is a simple matter.
values=(A "sample text" Mark)
for i in "${values[#]}"; do
echo "$i"
done
This is the same as Chepner's answer, only kludgier, (variable substitution), and more dangerous, (the eval...), the better to use the OP's exact $values assignment:
values="('A','sample text','Mark')"
eval values=${values//,/ }
for i in "${values[#]}"; do
echo "$i"
done
It works in ksh, but really, if at all possible try to use Chepner's simpler and safer $values assignment.
Simply trim the quotes
#!/bin/ksh
values="('A','sample text','Mark')"
echo $values | tr -d "()'\"" | tr ',' '\n'
output:
A
sample text
Mark
You should use the single quotes for splitting the string (and quote "$values").
When your sed supports \n for replacement into a line, you can do without a loop:
echo "${values}" | sed "s/[)(]//g;s/','/\n/g;s/'//g"
# or
sed "s/[)(]//g;s/','/\n/g;s/'//g" <<< "${values}"
When the values in your string are without a comma and parentheses, you can use
grep -Eo "[^',()]*" <<< "${values}"
Better is looking for fields between 2 single quotes and remove those single quotes.
grep -Eo "'[^']*'" <<< "${values}" | tr -d "'"

Converting CamelCase to lowerCamelCase with POSIX Shell

I am trying to only change the first letter of a string to lowercase using a Shell script. Ideally a simple way to go from CamelCase to lowerCamelCase.
GOAL:
$DIR="SomeString"
# missing step
$echo $DIR
someString
I have found some great resources for doing this to the entire string but not just altering the first letter and leaving the remaining string untouched.
If your shell is recent enough, you can use the following parameter expansion:
DIR="SomeString" # Note the missing dollar sign.
echo ${DIR,}
Alternative solution (will work on old bash too)
DIR="SomeString"
echo $(echo ${DIR:0:1} | tr "[A-Z]" "[a-z]")${DIR:1}
prints
someString
for assing to variable
DIR2="$(echo ${DIR:0:1} | tr "[A-Z]" "[a-z]")${DIR:1}"
echo $DIR2
prints
someString
alternative perl
DIR3=$(echo SomeString | perl -ple 's/(.)/\l$1/')
DIR3=$(echo SomeString | perl -nle 'print lcfirst')
DIR3=$(echo "$DIR" | perl -ple 's/.*/lcfirst/e'
some terrible solutions;
DIR4=$(echo "$DIR" | sed 's/^\(.\).*/\1/' | tr "[A-Z]" "[a-z]")$(echo "$DIR" | sed 's/^.//')
DIR5=$(echo "$DIR" | cut -c1 | tr '[[:upper:]]' '[[:lower:]]')$(echo "$DIR" | cut -c2-)
All the above is tested with OSX's /bin/bash.
With sed:
var="SomeString"
echo $var | sed 's/^./\L&/'
^ means the start of the line
\L is the command to make the match in lowercase
& is the whole match
Perl solution:
DIR=SomeString
perl -le 'print lcfirst shift' "$DIR"
Since awk hasn't yet been mentioned, here's another way you could do it (requires GNU awk):
dir="SomeString"
new_dir=$(awk 'BEGIN{FS=OFS=""}{$1=tolower($1)}1' <<<"$dir")
This sets the input and output field separators to an empty string, so each character is a field. The tolower function does what you think it does. 1 at the end prints the line. If your shell doesn't support <<< you can do echo "$dir" | awk ... instead.
If you are looking for a POSIX compliant solution then have a look at typeset.
var='SomeString'
typeset -lL1 b="$var"
echo "${b}${var#?}"
Output:
someString
The typeset command creates a special variable that is lowercase, left aligned and one char long. ${var#?} trims the first occurrence of pattern from the start of $var and ? matches a single
character.

Split from 40900000 to 409-00-000

Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance
Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)
No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
123-45-678
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
abc-de-fgh
You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'
For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
409-00-000
With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.
Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
409-00-000
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.

Remove non printing chars from bash variable

I have some variable $a. This variable have non printing characters (carriage return ^M).
>echo $a
some words for compgen
>a+="END"
>echo $a
ENDe words for compgen
How I can remove that char?
I know that echo "$a" display it correct. But it's not a solution in my case.
You could use tr:
tr -dc '[[:print:]]' <<< "$var"
would remove non-printable character from $var.
$ foo=$'abc\rdef'
$ echo "$foo"
def
$ tr -dc '[[:print:]]' <<< "$foo"
abcdef
$ foo=$(tr -dc '[[:print:]]' <<< "$foo")
$ echo "$foo"
abcdef
To remove just the trailing carriage return from a, use
a=${a%$'\r'}
I was trying to send a notification via libnotify, with content that may contain unprintable characters. The existing solutions did not quite work for me (using a whitelist of characters using tr works, but strips any multi-byte characters).
Here is what worked, while passing the 💩 test:
message=$(iconv --from-code=UTF-8 -c <<< "$message")
As an equivalent to the tr approach using only shell builtins:
cleanVar=${var//[![:print:]]/}
...substituting :print: with the character class you want to keep, if appropriate.
tr -dc '[[:alpha:]]'
will translate your string to only have alpha characters (if that is needed)

Remove blank spaces with comma in a string in bash shell

I would like to replace blank spaces/white spaces in a string with commas.
STR1=This is a string
to
STR1=This,is,a,string
Without using external tools:
echo ${STR1// /,}
Demo:
$ STR1="This is a string"
$ echo ${STR1// /,}
This,is,a,string
See bash: Manipulating strings.
Just use sed:
echo $STR1 | sed 's/ /,/g'
or pure BASH way::
echo ${STR1// /,}
kent$ echo "STR1=This is a string"|awk -v OFS="," '$1=$1'
STR1=This,is,a,string
Note:
if there are continued blanks, they would be replaced with a single comma. as example above shows.
This might work for you:
echo 'STR1=This is a string' | sed 'y/ /,/'
STR1=This,is,a,string
or:
echo 'STR1=This is a string' | tr ' ' ','
STR1=This,is,a,string
How about
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]]/,/g')"
echo "$StrFix"
**output**
This,is,a,string
If you have multiple adjacent spaces in your string and what to reduce them to just 1 comma, then change the sed to
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]][[:space:]]*/,/g')"
echo "$StrFix"
**output**
This,is,a,string
I'm using a non-standard sed, and so have used ``[[:space:]][[:space:]]*to indicate one or more "white-space" characters (including tabs, VT, maybe a few others). In a modern sed, I would expect[[:space:]]+` to work as well.
STR1=`echo $STR1 | sed 's/ /,/g'`

Resources