How to convert utf8 into binary - utf-8

Recently I am studying about binary code, and I want to know how do I convert text that has been encoded by UTF-8 and then into binary?

I recommend using the command-line tool iconv.
For example:
$ iconv option
$ iconv options -f from-encoding -t to-encoding inputfile(s) -o outputfile
Here is a online tutorial that might be of help:
https://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/

Related

Using iconv command in bash - incomplete character or shift sequence error

I am trying to convert a .csv from UTF-16LE to UTF-8. The file is too large to be opened in Excel, and I am encountering the "incomplete character or shift sequence" error when using the following command:
iconv -f utf-16le -t -c utf-8 myfilename.csv > mynewfilename.csv
How do I get past this?
I'm using Bash on Mac OS Mojave.
Thanks!
Edit to add:
iconv -c -f utf-16le -t utf-8//IGNORE myfilename.csv > mynewfilename.csv
also didn't work, per suggestion below.

Replacing special characters

I have a document which contains various special characters such as é ÿ ° Æ oºi
I've written the following two commands which both work on 'single looking' characters such as à ± È.
However neither of which work with the special characters listed above.
This command works using two byte hex decimals (To replace é with A)
sed -i 's/\xc3\xA9/A/g' test.csv
This command uses utf8 to replace characters:
CHARS=$(python -c 'print u"\u00a9".encode("utf8")') sed -i 's/['"$CHARS"']/A/g' $filename
Either of these commands should work but neither do.
It looks like you are viewing UTF-8 data as ISO-8859-1 (aka latin1).
This is what you'd experience when handling a UTF-8 encoded file in a ISO-8859-1 terminal:
$ cat file
The café has crème brûlée.
$ iconv -f utf-8 -t iso-8859-1 < file
The café has crème brûlée.
$ iconv -c -f utf-8 -t ascii//ignore < file
The caf has crme brle.
This usually only happens for PuTTY users, because PuTTY is one of the few terminal emulators that still uses ISO-8859-1 by default. You can set it to use UTF-8 in the PuTTY configuration.
Here's the same example in a UTF-8 terminal:
$ cat file
The café has crème brûlée.
$ iconv -f utf-8 -t iso-8859-1 < file
The caf� has cr�me br�l�e.
$ iconv -c -f utf-8 -t ascii//ignore < file
The caf has crme brle.
The only correct solution is to fix your setup so that it uses UTF-8 throughout. ISO-8859-1 does not support the languages and features we take for granted today, and is not a useful option.

Does iconv with TRANSLIT work in Unix AIX?

Does iconv with TRANSLIT work in Unix AIX ?Iit is not included in MANual.. I gave a try with this syntax and it did not work.
iconv -f UTF-8 -t ISO8859-1//TRANSLIT File1 > File2
Transliteration is a GNU iconv extension. Most unices do not support this extension, which is not in POSIX. You might have to compile GNU iconv to your platform to use this functionality.

iconv: cannot convert some strings from gb3212 to UTF-8

Good day! I have a problem with converting this string in gb3212: "е – с"
My actions:
[i.remen#win74 ~]$ iconv -f gb2312 -t utf-8 tst.txt
е iconv: illegal input sequence at position 3
[i.remen#win74 ~]$
I tried many different versions(both from separate iconv and as part of glibc). Is there any way to to this conversion?
maybe some characters is not in gb2312 ,try gb18030,it's a 'bigger' charset than gb2312

Unaccent string in bash script (RHEL)

On Debian-based distributions, there is a utility called unaccent which can be used to remove accents from accented letters in a text.
I was looking for a package containing this on Redhat distros, but the only one I found was unac available for Mandriva only.
I tried to use iconv but it seems to not support my case.
What is the best, lightweight approach, easily usable in a bash script ?
Are there any secret options to iconv that allow this ?
You can use the -c(clear) option in iconv to remove non-ascii chars:
$ echo 'été' | iconv -c -f utf8 -t ascii
t
If you just want to remove the accent:
$ echo 'été' | iconv -f utf8 -t ascii//TRANSLIT
ete

Resources