How to remove special characters from strings but keep underscores in shell script - bash

I have a string that is something like "info_A!__B????????C_*". I wan to remove the special characters from it but keep underscores and letters. I tried with [:word:] (ASCII letters and _) character set, but it says "invalid character set". any idea how to handle this ? Thanks.
if [ -z `echo $text | tr -dc "[:word:]"` ]

Using bash parameter expansion:
$ var='info_A!__B????????C_*'
$ echo "${var//[^[:alnum:]_]/}"

A sed one-liner would be
sed 's/[^[:alnum:]_]//g' <<< 'info_!????????*'
gives you
An awk one-liner would be
awk '{gsub(/[^[:alnum:]_]/,"",$0)} 1' <<< 'info_!??A_??????*pi9ngo^%$_mingo745'
gives you
If you don't wish to have numbers in the output then change :alnum: to :alpha:.

My tr doesn't understand [:word:]. I had to do like this:
$ x=$(echo 'info_A!__B????????C_*' | tr -cd '[:alnum:]_')
$ echo $x

Not sure if its robust way but it worked for your sample text.
sed one-liner:
echo "SamPlE_#tExT%, really ?" | sed -e 's/[^a-z^A-Z|^_]//g'


How to loop comma separated values in shell script

I tried to loop comma separated values with space, but not able to get the exact value since it has space in the string.
I tried in different ways, but i not able to get desired results.
Can anyone help me on this
values="('A','sample text','Mark')"
for i in `echo $values | sed 's/[)(]//g' | sed 's/,/ /g'`
echo $i
My expected output is:
sample text
First, change values to an array. Then iterating over it is a simple matter.
values=(A "sample text" Mark)
for i in "${values[#]}"; do
echo "$i"
This is the same as Chepner's answer, only kludgier, (variable substitution), and more dangerous, (the eval...), the better to use the OP's exact $values assignment:
values="('A','sample text','Mark')"
eval values=${values//,/ }
for i in "${values[#]}"; do
echo "$i"
It works in ksh, but really, if at all possible try to use Chepner's simpler and safer $values assignment.
Simply trim the quotes
values="('A','sample text','Mark')"
echo $values | tr -d "()'\"" | tr ',' '\n'
sample text
You should use the single quotes for splitting the string (and quote "$values").
When your sed supports \n for replacement into a line, you can do without a loop:
echo "${values}" | sed "s/[)(]//g;s/','/\n/g;s/'//g"
# or
sed "s/[)(]//g;s/','/\n/g;s/'//g" <<< "${values}"
When the values in your string are without a comma and parentheses, you can use
grep -Eo "[^',()]*" <<< "${values}"
Better is looking for fields between 2 single quotes and remove those single quotes.
grep -Eo "'[^']*'" <<< "${values}" | tr -d "'"

Converting CamelCase to lowerCamelCase with POSIX Shell

I am trying to only change the first letter of a string to lowercase using a Shell script. Ideally a simple way to go from CamelCase to lowerCamelCase.
# missing step
$echo $DIR
I have found some great resources for doing this to the entire string but not just altering the first letter and leaving the remaining string untouched.
If your shell is recent enough, you can use the following parameter expansion:
DIR="SomeString" # Note the missing dollar sign.
echo ${DIR,}
Alternative solution (will work on old bash too)
echo $(echo ${DIR:0:1} | tr "[A-Z]" "[a-z]")${DIR:1}
for assing to variable
DIR2="$(echo ${DIR:0:1} | tr "[A-Z]" "[a-z]")${DIR:1}"
echo $DIR2
alternative perl
DIR3=$(echo SomeString | perl -ple 's/(.)/\l$1/')
DIR3=$(echo SomeString | perl -nle 'print lcfirst')
DIR3=$(echo "$DIR" | perl -ple 's/.*/lcfirst/e'
some terrible solutions;
DIR4=$(echo "$DIR" | sed 's/^\(.\).*/\1/' | tr "[A-Z]" "[a-z]")$(echo "$DIR" | sed 's/^.//')
DIR5=$(echo "$DIR" | cut -c1 | tr '[[:upper:]]' '[[:lower:]]')$(echo "$DIR" | cut -c2-)
All the above is tested with OSX's /bin/bash.
With sed:
echo $var | sed 's/^./\L&/'
^ means the start of the line
\L is the command to make the match in lowercase
& is the whole match
Perl solution:
perl -le 'print lcfirst shift' "$DIR"
Since awk hasn't yet been mentioned, here's another way you could do it (requires GNU awk):
new_dir=$(awk 'BEGIN{FS=OFS=""}{$1=tolower($1)}1' <<<"$dir")
This sets the input and output field separators to an empty string, so each character is a field. The tolower function does what you think it does. 1 at the end prints the line. If your shell doesn't support <<< you can do echo "$dir" | awk ... instead.
If you are looking for a POSIX compliant solution then have a look at typeset.
typeset -lL1 b="$var"
echo "${b}${var#?}"
The typeset command creates a special variable that is lowercase, left aligned and one char long. ${var#?} trims the first occurrence of pattern from the start of $var and ? matches a single

Split from 40900000 to 409-00-000

Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance
Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)
No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'
For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.
Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.

Remove non printing chars from bash variable

I have some variable $a. This variable have non printing characters (carriage return ^M).
>echo $a
some words for compgen
>echo $a
ENDe words for compgen
How I can remove that char?
I know that echo "$a" display it correct. But it's not a solution in my case.
You could use tr:
tr -dc '[[:print:]]' <<< "$var"
would remove non-printable character from $var.
$ foo=$'abc\rdef'
$ echo "$foo"
$ tr -dc '[[:print:]]' <<< "$foo"
$ foo=$(tr -dc '[[:print:]]' <<< "$foo")
$ echo "$foo"
To remove just the trailing carriage return from a, use
I was trying to send a notification via libnotify, with content that may contain unprintable characters. The existing solutions did not quite work for me (using a whitelist of characters using tr works, but strips any multi-byte characters).
Here is what worked, while passing the 💩 test:
message=$(iconv --from-code=UTF-8 -c <<< "$message")
As an equivalent to the tr approach using only shell builtins:
...substituting :print: with the character class you want to keep, if appropriate.
tr -dc '[[:alpha:]]'
will translate your string to only have alpha characters (if that is needed)

Remove blank spaces with comma in a string in bash shell

I would like to replace blank spaces/white spaces in a string with commas.
STR1=This is a string
Without using external tools:
echo ${STR1// /,}
$ STR1="This is a string"
$ echo ${STR1// /,}
See bash: Manipulating strings.
Just use sed:
echo $STR1 | sed 's/ /,/g'
or pure BASH way::
echo ${STR1// /,}
kent$ echo "STR1=This is a string"|awk -v OFS="," '$1=$1'
if there are continued blanks, they would be replaced with a single comma. as example above shows.
This might work for you:
echo 'STR1=This is a string' | sed 'y/ /,/'
echo 'STR1=This is a string' | tr ' ' ','
How about
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]]/,/g')"
echo "$StrFix"
If you have multiple adjacent spaces in your string and what to reduce them to just 1 comma, then change the sed to
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]][[:space:]]*/,/g')"
echo "$StrFix"
I'm using a non-standard sed, and so have used ``[[:space:]][[:space:]]*to indicate one or more "white-space" characters (including tabs, VT, maybe a few others). In a modern sed, I would expect[[:space:]]+` to work as well.
STR1=`echo $STR1 | sed 's/ /,/g'`
