Remove non printing chars from bash variable

Remove non printing chars from bash variable - bash

I have some variable $a. This variable have non printing characters (carriage return ^M).
>echo $a
some words for compgen
>a+="END"
>echo $a
ENDe words for compgen
How I can remove that char?
I know that echo "$a" display it correct. But it's not a solution in my case.

You could use tr:
tr -dc '[[:print:]]' <<< "$var"
would remove non-printable character from $var.
$ foo=$'abc\rdef'
$ echo "$foo"
def
$ tr -dc '[[:print:]]' <<< "$foo"
abcdef
$ foo=$(tr -dc '[[:print:]]' <<< "$foo")
$ echo "$foo"
abcdef

To remove just the trailing carriage return from a, use
a=${a%$'\r'}

I was trying to send a notification via libnotify, with content that may contain unprintable characters. The existing solutions did not quite work for me (using a whitelist of characters using tr works, but strips any multi-byte characters).
Here is what worked, while passing the 💩 test:
message=$(iconv --from-code=UTF-8 -c <<< "$message")

As an equivalent to the tr approach using only shell builtins:
cleanVar=${var//[![:print:]]/}
...substituting :print: with the character class you want to keep, if appropriate.

tr -dc '[[:alpha:]]'
will translate your string to only have alpha characters (if that is needed)

Related

How to properly expand a Bash variable that contains newlines on sed replacement (insertion) side

Bear with me at first, thank you. Suppose I have
$ echo $'foo\nbar'
foo
bar
Now when I assign the string to a Bash variable, Bash does not give the same vertical output anymore:
$ str='foo\nbar'
$
$ echo $str
foo\nbar
$
$ echo $'str'
str
Try printf:
$ printf "$str\n"
foo
bar
Those examples are for illustration purposes because I am looking for a way to expand the newline(s) inside the $str variable such that I can substitute the $str variable on sed replacement (insertion) side.
# this does not work:
sed -i.bak $'/<!-- insert here -->/i\\\n'$'str'$'\\\n' index.html
# this works as expected though:
sed -i.bak $'/<!-- insert here -->/i\\\n'foo$'\\\n'bar$'\\\n' index.html
I did several ways to hack this but none worked; here is one example:
# this does not work:
sed -i.bak $'/<!-- insert here -->/i\\\n'`printf 'foo\\x0Abar'`$'\\\n' index.html
Further tests, I realized that as long as the variable does not contain newlines, things work as expected:
# This works as long as str2 does not contain any newline.
str2='foo_bar'
sed -i.bak $'/<!-- insert here -->/i\\\n'$str2$'\\\n' index.html
The expected result is that sed will insert 2 liners in place before <!-- insert here --> of the index.html file.
foo
bar
<!-- insert here -->
I try to achieve this as one liner. I know I can break sed into the vertical, multi-line form, which will be easier for me; however, I want to explore if there is a one liner style.
Is this doable or not?
My system is macOS High Sierra 10.13.6
Bash version: 3.2.57(1)-release
BSD sed was last updated on May 10, 2005

Your examples have a few subtle error, so here are a few examples regarding quoting and newlines in strings in bash and sed.
How quoting works in general:
# bash converts escape-sequence '\n' to real newline (0x0a) before passing it to echo
$ echo $'foo\nbar'
foo
bar
# bash passes literal 8 characters 'foo\nbar' to echo and echo simply prints them
$ echo 'foo\nbar'
foo\nbar
# bash passes literal 8 characters 'foo\nbar' to echo and echo converts escape-sequence
$ echo -e 'foo\nbar'
foo
bar
# bash passes literal string 'foo\nbar' to echo (twice)
# then echo recombines both arguments using a single space
$ str='foo\nbar'
$ echo $str "$str"
foo\nbar foo\nbar
# bash interprets escape-sequences and stores result 'foo<0x0a>bar' in str,
# then passes two arguments 'foo' and 'bar' to echo, due to "word splitting"
# then echo recombines both arguments using a single space
$ str=$'foo\nbar'
$ echo $str
foo bar
# bash interprets escape-sequences and stores result 'foo<0x0a>bar' in str,
# then passes it as a single argument to echo, without "word splitting"
$ str=$'foo\nbar'
$ echo "$str"
foo
bar
How to apply shell quoting, when dealing with newlines in sed
# replace a character with newline, using newline's escape-sequence
# sed will convert '\n' to a literal newline (0x0a)
$ sed 's/-/foo\nbar/' <<< 'blah-blah'
# replace a character with newline, using newline's escape-sequence in a variable
# sed will convert '\n' to a literal newline (0x0a)
$ str='foo\nbar' # str contains the escape-sequence '\n' and not a literal newline
$ sed 's/-/'"$str"'/' <<< 'blah-blah'
# replace a character with newline, using a literal newline.
# note the line-continuation-mark \ after 'foo' before the literal newline,
# which is part of the sed script, since everything in-between '' is literal
$ sed 's/-/foo\
bar/' <<< 'blah-blah' # end-of-command
# replace a character with newline, using a newline in shell-escape-mode
# note the same line-continuation-mark \ before $'\n', which is part of the sed script
# note: the sed script is a single string composed of three parts '…\', $'\n' and '…',
$ sed 's/-/foo\'$'\n''bar/' <<< 'blah-blah'
# the same as above, but with a single shell-escape-mode string instead of 3 parts.
# note the required quoting of the line-continuation-mark with an additional \ escape
# i.e. after shell-escaping the sed script contains a single \ and a literal newline
$ sed $'s/-/foo\\\nbar/' <<< 'blah-blah'
# replace a character with newline, using a shell-escaped string in a variable
$ str=$'\n' # str contains a literal newline (0x0a) due to shell escaping
$ sed 's/-/foo\'"$str"'bar/' <<< 'blah-blah'
# same as above with the required (quoted) line-continuation inside the variable
# note, how the single \ from '…foo\' (previous example) became \\ inside $'\\…'
$ str=$'\\\n' # str contains \ and a literal newline (0x0a) due to shell escaping
$ sed 's/-/foo'"$str"'bar/' <<< 'blah-blah'
All the sed examples will print the same:
blahfoo
barblah
So, a newline in sed's replacement string must either be
(1) newline's escape-sequence (i.e. '\n'), so sed can replace it with a literal newline, or
(2) a literal newline preceded by a line-continuation-mark (i.e. $'\\\n' or '\'$'\n', which is NOT the same as '\\\n' or '\\n' or $'\\n').
This means you need to replace each literal newline <0x0a> with newline's escape-sequence \n or insert a line-continuation-mark before each literal newline inside your replacement string before double-quote-expanding it into sed's substitute replacement string.
Since there are many more caveats regarding escaping in sed, I recommend you use awk's gsub function instead passing your replacement string as a variable via -v, e.g.
$ str=$'foo\nbar'
$ awk -v REP="$str" -- '{gsub(/-/, REP); print}' <<< 'blah-blah'
blahfoo
barblah
PS: I don't know, if this answer is entirely true in your case, because your operating system uses an outdated version of bash.

echo -e $str
where -e is
enable interpretation of backslash escapes

Use sed command r to insert arbitrary text
str="abc\ndef"
tmp=$(mktemp)
(
echo
printf -- "$str"
echo
) > "$tmp"
sed -i.bak '/<!-- insert here -->/r '"$tmp" index.html
rm -r "$tmp"
sed interprets newline as command delimiter. The ; doesn't really is a seds command delimeter, only newline is. Don't append/suffix ; or } or spaces in the w command - it will be interpreted as part of the filename (yes, spaces also). sed commands like w or r are escaped by a newline.
If you want more flexibility, rather move to awk.

BASH: unescape string

Suppose I have the following string:
"some\nstring\n..."
And it displays as one line when catted in bash. Further,
string_from_pipe | sed 's/\\\\/\\/g' # does not work
| awk '{print $0}'
| awk '{s = $0; print s}'
| awk '{s = $0; printf "%s",s}'
| echo $0
| sed 's/\\(.)/\1/g'
# all have not worked.
How do I unescape this string such that it prints as:
some
string
Or even displays that way inside a file?

POSIX sh provides printf %b for just this purpose:
s='some\nstring\n...'
printf '%b\n' "$s"
...will emit:
some
string
...
More to the point, the APPLICATION USAGE section of the POSIX spec for echo explicitly suggests using printf %b for this purpose rather than relying on optional XSI extensions.

As you observed, echo does not solve the problem:
$ s="some\nstring\n..."
$ echo "$s"
some\nstring\n...
You haven't mentioned where you got that string or which escapes are in it.
Using a POSIX-compliant shell's printf
If the escapes are ones supported by printf, then try:
$ printf '%b\n' "$s"
some
string
...
Using sed
$ echo "$s" | sed 's/\\n/\n/g'
some
string
...
Using awk
$ echo "$s" | awk '{gsub(/\\n/, "\n")} 1'
some
string
...

If you have the string in a variable (say myvar), you can use:
${myvar//\\n/$'\n'}
For example:
$ myvar='hello\nworld\nfoo'
$ echo "${myvar//\\n/$'\n'}"
hello
world
foo
$
(Note: it's usually safer to use printf %s <string> than echo <string>, if you don't have full control over the contents of <string>.)

How about using the -e option of echo?
$ s="some\nstring\n..." && echo -e "$s"
some
string
...
From the echo man-page
-e enable interpretation of the following backslash escapes
[...]
\a alert (bell)
\b backspace
\c suppress further output
\e escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\0nnn the character whose ASCII code is NNN (octal). NNN can be 0 to 3 octal digits
\xHH the eight-bit character whose value is HH (hexadecimal). HH can be one or two hex digits

How to remove special characters from strings but keep underscores in shell script

I have a string that is something like "info_A!__B????????C_*". I wan to remove the special characters from it but keep underscores and letters. I tried with [:word:] (ASCII letters and _) character set, but it says "invalid character set". any idea how to handle this ? Thanks.
text="info_!_????????_*"
if [ -z `echo $text | tr -dc "[:word:]"` ]
......

Using bash parameter expansion:
$ var='info_A!__B????????C_*'
$ echo "${var//[^[:alnum:]_]/}"
info_A__BC_

A sed one-liner would be
sed 's/[^[:alnum:]_]//g' <<< 'info_!????????*'
gives you
info_
An awk one-liner would be
awk '{gsub(/[^[:alnum:]_]/,"",$0)} 1' <<< 'info_!??A_??????*pi9ngo^%$_mingo745'
gives you
info_A_pi9ngo_mingo745
If you don't wish to have numbers in the output then change :alnum: to :alpha:.

My tr doesn't understand [:word:]. I had to do like this:
$ x=$(echo 'info_A!__B????????C_*' | tr -cd '[:alnum:]_')
$ echo $x
info_A__BC_

Not sure if its robust way but it worked for your sample text.
sed one-liner:
echo "SamPlE_#tExT%, really ?" | sed -e 's/[^a-z^A-Z|^_]//g'
SamPlE_tExTreally

Correct exponential output with printf

I try to write a script. With this script I need to remove return carriage at the end of the output numbers I parsed from some command output. So I need to transform them to integer. But printf won't format the number the way I want:
echo $var
2.80985e+09
var=$(printf "%s" "$var" | tr -dc '[:digit:]' )
echo $var
28098509
As you may see, printf removes the carriage but also modifies the value of variable. But I would like this value remain same, only return carriage is removed. Which parameter I should use with printf?
Thanks

Maybe you want to do this:
$ printf "%f\n" $var
2809850000.000000
Or this:
$ printf "%f\n" $var | sed -e 's/\..*//'
2809850000

printf did not modify the value of the variable; tr did. You can verify this by:
$ printf "%s\n" "$var"
2.80985e+09
$ printf "%s\n" "$var" | tr -dc '[:digit:]'
28098509
The tr command, as given, removes all non-digit characters.

Your tr command said 'remove every non-digit', so it did that. You should expect programs to do exactly what you tell them to. The whole var=$(...) sequence is bizarre. To remove a carriage return, you could use:
var=$(tr -d '\013' <<< $var)
The <<< redirection sends the string (value of $var) as the standard input of the command.

How to remove extra spaces in bash?

How to remove extra spaces in variable HEAD?
HEAD=" how to remove extra spaces "
Result:
how to remove extra spaces

Try this:
echo "$HEAD" | tr -s " "
or maybe you want to save it in a variable:
NEWHEAD=$(echo "$HEAD" | tr -s " ")
Update
To remove leading and trailing whitespaces, do this:
NEWHEAD=$(echo "$HEAD" | tr -s " ")
NEWHEAD=${NEWHEAD%% }
NEWHEAD=${NEWHEAD## }

Using awk:
$ echo "$HEAD" | awk '$1=$1'
how to remove extra spaces

Take advantage of the word-splitting effects of not quoting your variable
$ HEAD=" how to remove extra spaces "
$ set -- $HEAD
$ HEAD=$*
$ echo ">>>$HEAD<<<"
>>>how to remove extra spaces<<<
If you don't want to use the positional paramaters, use an array
ary=($HEAD)
HEAD=${ary[#]}
echo "$HEAD"
One dangerous side-effect of not quoting is that filename expansion will be in play. So turn it off first, and re-enable it after:
$ set -f
$ set -- $HEAD
$ set +f

This horse isn't quite dead yet: Let's keep beating it!*
Read into array
Other people have mentioned read, but since using unquoted expansion may cause undesirable expansions all answers using it can be regarded as more or less the same. You could do
set -f
read HEAD <<< $HEAD
set +f
or you could do
read -rd '' -a HEAD <<< "$HEAD" # Assuming the default IFS
HEAD="${HEAD[*]}"
Extended Globbing with Parameter Expansion
$ shopt -s extglob
$ HEAD="${HEAD//+( )/ }" HEAD="${HEAD# }" HEAD="${HEAD% }"
$ printf '"%s"\n' "$HEAD"
"how to remove extra spaces"
*No horses were actually harmed – this was merely a metaphor for getting six+ diverse answers to a simple question.

Here's how I would do it with sed:
string=' how to remove extra spaces '
echo "$string" | sed -e 's/ */ /g' -e 's/^ *\(.*\) *$/\1/'
=> how to remove extra spaces # (no spaces at beginning or end)
The first sed expression replaces any groups of more than 1 space with a single space, and the second expression removes any trailing or leading spaces.

echo -e " abc \t def "|column -t|tr -s " "
column -t will:
remove the spaces at the beginning and at the end of the line
convert tabs to spaces
tr -s " " will squeeze multiple spaces to single space
BTW, to see the whole output you can use cat - -A: shows you all spacial characters including tabs and EOL:
echo -e " abc \t def "|cat - -A
output: abc ^I def $
echo -e " abc \t def "|column -t|tr -s " "|cat - -A
output:
abc def$

Whitespace can take the form of both spaces and tabs. Although they are non-printing characters and unseen to us, sed and other tools see them as different forms of whitespace and only operate on what you ask for. ie, if you tell sed to delete x number of spaces, it will do this, but the expression will not match tabs. The inverse is true- supply a tab to sed and it will not match spaces, even if the number of them is equal to those in a tab.
A more extensible solution that will work for removing either/both additional space in the form of spaces and tabs (I've tested mixing both in your specimen variable) is:
echo $HEAD | sed 's/^[[:blank:]]*//g'
or we can tighten-up #Frontear 's excellent suggestion of using xargs without the tr:
echo $HEAD | xargs
However, note that xargs would also remove newlines. So if you were to cat a file and pipe it to xargs, all the extra space- including newlines- are removed and everything put on the same line ;-).
Both of the foregoing achieved your desired result in my testing.

Try this one:
echo ' how to remove extra spaces ' | sed 's/^ *//g' | sed 's/$ *//g' | sed 's/ */ /g'
or
HEAD=" how to remove extra spaces "
HEAD=$(echo "$HEAD" | sed 's/^ *//g' | sed 's/$ *//g' | sed 's/ */ /g')

I would make use of tr to remove the extra spaces, and xargs to trim the back and front.
TEXT=" This is some text "
echo $(echo $TEXT | tr -s " " | xargs)
# [...]$ This is some text

echo variable without quotes does what you want:
HEAD=" how to remove extra spaces "
echo $HEAD
# or assign to new variable
NEW_HEAD=$(echo $HEAD)
echo $NEW_HEAD
output: how to remove extra spaces

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Remove non printing chars from bash variable - bash

I have some variable $a. This variable have non printing characters (carriage return ^M). >echo $a some words for compgen >a+="END" >echo $a ENDe words for compgen How I can remove that char? I know that echo "$a" display it correct. But it's not a solution in my case.

You could use tr: tr -dc '[[:print:]]' <<< "$var" would remove non-printable character from $var. $ foo=$'abc\rdef' $ echo "$foo" def $ tr -dc '[[:print:]]' <<< "$foo" abcdef $ foo=$(tr -dc '[[:print:]]' <<< "$foo") $ echo "$foo" abcdef

To remove just the trailing carriage return from a, use a=${a%$'\r'}

As an equivalent to the tr approach using only shell builtins: cleanVar=${var//[![:print:]]/} ...substituting :print: with the character class you want to keep, if appropriate.

tr -dc '[[:alpha:]]' will translate your string to only have alpha characters (if that is needed)

Related

How to properly expand a Bash variable that contains newlines on sed replacement (insertion) side

BASH: unescape string

How to remove special characters from strings but keep underscores in shell script

Correct exponential output with printf

How to remove extra spaces in bash?

Categories

Resources