Getting "sed error - illegal byte sequence" (in bash) [duplicate] - bash

This question already has answers here:
RE error: illegal byte sequence on Mac OS X
(7 answers)
Closed 7 years ago.
Doing some stream editing to change the nasty Parallels icon. It's poorly developed and embedded into the app itself rather than being an image file. So I've located this sed command that has some good feedback:
sudo sed -i.bak s/Parallels_Desktop_Overlay_128/Parallels_Desktop_Overlay_000/g /Applications/Parallels\ Desktop.app/Contents/MacOS/prl_client_app
It returns sed: RE error: illegal byte sequence
Can anyone explain what this means? What part of the command is the problem?

Try setting the LANG environment variable (LANG=C sed ...) or use one of the binary sed tools mentioned here: binary sed replacement
Why the error?
Without LANG=C sed assumes that files are encoded in whatever encoding is specified in LANG and the file (being binary) may contain bytes which are not valid characters in LANG's encoding (thus you could get 'illegal byte sequence').
Why does LANG=C work?
C just happens to treat all ASCII characters as themselves and non-ASCII characters as literals.

LANG=C alone didn't do the trick for me but adding LC_CTYPE=C as well solved it.

In addition to LANG=C and LC_CTYPE=C, I had to do LC_ALL=C to get this to work.
LC_ALL overrides all individual LC_* categories. Thus, the most robust approach is to use LC_ALL=C sed ... - no need to also deal with the other variables.

I managed to do it by running:
unset LANG
before the sed command.
Not sure what I've done or why it works but it did.

Related

Having trouble using sed command in MAC [duplicate]

This question already has answers here:
What is character encoding and why should I bother with it
(4 answers)
Closed 2 years ago.
I'm trying to do the following:
LC_CTYPE=C sed 's/|/¦/g' t.txt > new_t.txt
The code is working but, when I open the new file, the replace adds an additional character "A¦". Why is that?
When you typed
LC_CTYPE=C sed 's/|/¦/g' t.txt > new_t.txt
your shell was probably configured to accept the command itself as UTF-8, and so in fact you ended up converting the single byte 0x7C (U+007C) to the two bytes 0xC2 0xA6 which is the correct UTF-8 encoding for U+00A6.
What you then did is unclear, but somehow you ended up examining the file in some other encoding than UTF-8, which exposes the two bytes as the string you report seeing.
The correct workaround is to examine the file in a correctly configured program which supports UTF-8.

replace or delete character '#' in string with bash shell [duplicate]

A console program (translate-shell) has an output with colors and uses special decorate characters for this: ^[[22m, ^[[24m, ^[[1m... and so on.
I'd like to remove them to get a plain text.
I tried with tr -d "^[[22m" and with sed 's/[\^[[22m]//g', but only is removed the number, not the special character ^[
Thanks.
You have multiple options:
https://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output
http://www.commandlinefu.com/commands/view/3584/remove-color-codes-special-characters-with-sed
and as -no-ansi as pointed out by Jens in other answer
EDIT
The solution from commandlinefu does the job pretty well:
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g"
The solution from unix.stackexchange might be better but is much longer and so you would want to create a separate script file because it is so long instead of just doing a shell one-liner.
I found this in the manual about the use of ANSI escape codes:
-no-ansi
Do not use ANSI escape codes.
So you should add this option when starting the program.

Issue with encoding of a character (not able to sed or .gsub)

I am dealing with some multilingual data(English and Arabic) in a json file with a weird character i am not able to parse. I am not sure what the character is. I tried getting the ASCII value via vim and this is what i got
"38 0x26"
This is the status line in vim i used to get the value (http://vim.wikia.com/wiki/Showing_the_ASCII_value_of_the_current_character).
:set statusline=%<%f%h%m%r%=%b\ 0x%B\ \ %l,%c%V\ %P
This is how the character looks in vim -
I tried 'sed' and '.gsub' to replace this character unsuccessfully.
Is there a way where i can replace this character(preferably with .gsub ruby) with '&' or something else?
Thanks
try with something like
sed 's/[[:alpnum:][:space:]\[\]{}()\.\*\\\/_(AllAsciiVariationYouWant)/&/g;t
s/./?/g' YourFile
where (AllAsciiVariationYouWant) is all character that you want to keep as is (without the surrounding "()" )
JSON is encoded in UTF-8 (Unicode). If you're seeing funky-looking characters in your file, it's probably because your editor is not treating Unicode characters properly. That could be caused by the use of a terminal emulator that doesn't support Unicode; an incorrect $LANG setting; vim not being able to correctly determine the encoding of the file; and likely other reasons.
What terminal program are you using? What's your $LANG environment variable set to (echo $LANG)? If you're certain your terminal supports Unicode, try:
LANG=en_US.utf-8 vim your_file_here.json
(The above example assumes that U.S. English is appropriate for the file, which it may not be.)
As for replacing characters in the file, vim's substitution command can be used:
:%s/old text/new text/g
The above command will run the substitute command on all lines in the file (%), replacing every instance of "old text" with "new text". (The g at the end tells vim to replace every instance on a line, not just the first it finds.)

Converting escaped characters to UTF-8 in bash

I have a large text file containing sequences such as
\u02BBUtthay\u0101n h\u01E3ng Ch\u0101t Khao Yai
However, they render exactly as above. How do I convert this so people just see UTF-8? I would prefer to process the files at the command line if possible.
use the printf command.
http://manpages.ubuntu.com/manpages/intrepid/man3/printf.3.html
you can wrap it in $() to use as a variable if needed, too.
For example,
echo $(printf '\u02BBUtthay\u0101n h\u01E3ng Ch\u0101t Khao Yai')
this outputs: ʻUtthayān hǣng Chāt Khao Yai
Hope that helps.

BASH: Replacing special character groups

I have a rather tricky request...
We use a special application which is connected to a oracle database. For control reasons the application uses special characters which are defined by the application and saved in a long field of the database.
My task is to query the long field periodically and check for changes. To do that, I write the content by using a bash script in a file and compare the old and the new file with md5sum.
When there's a difference, I want to send the old file via mail. The problem is, that the old file contains these special characters and I don't know how to replace them with for example a string which describes them.
I tried to replace them on the basis of their ASCII code, but this didn't work. I've also tried to replace them by their appearance in the file. (They look like this: ^P ) This didn't work neither.
When viewing the file by text editor like nano the characters are visible like described above. But when using cat on the file, the content is only displayed until the first appearance of such a control character.
As far as I know there is know possibility to replace them while querying from the database because of the fact that the content is in a LONG field.
I hope you can help me.
Thank you in advance.
Marco
^P is the Control-P character, which is decimal 16 or hexadecimal 0x10, also known as the Data Link Escape (DLE) character in ASCII.
To replace all occurrences of 0x10 in a file with another string we can use our friend gsed:
gsed "s/\x10/Data Link Escape/g" yourfile.txt
This should replace all occurrences of characters containing the hex value 0x10 with the text string "Data Link Escape". You'll probably want to use a different string - this is just an example.
Depending on the system you're using you may be able to use the standard sed command if your version of sed recognizes the \xNN single-character escape codes. If there are multiple hex characters you need to replace you may want to create a file containing your sed commands, one for each hexadecmial character you need to replace, and tell sed or gsed to use the commands in the file - consult the sed or gsed man pages for how to do this.
Share and enjoy.
You can use xxd to change the string to its hex representation, then use xxd -r to convert back.
Or, you can use uuencode and uudecode.
One option is to run the file through cat -v. This replaces nonprinting characters with visible representations (using the ^ notation for control characters):
$ echo $'\x10\x12\x13\x14\x16' | cat -v
^P^R^S^T^V

Resources