Recently I am studying about binary code, and I want to know how do I convert text that has been encoded by UTF-8 and then into binary?
I recommend using the command-line tool iconv.
For example:
$ iconv option
$ iconv options -f from-encoding -t to-encoding inputfile(s) -o outputfile
Here is a online tutorial that might be of help:
https://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/
Related
I am trying to convert a .csv from UTF-16LE to UTF-8. The file is too large to be opened in Excel, and I am encountering the "incomplete character or shift sequence" error when using the following command:
iconv -f utf-16le -t -c utf-8 myfilename.csv > mynewfilename.csv
How do I get past this?
I'm using Bash on Mac OS Mojave.
Thanks!
Edit to add:
iconv -c -f utf-16le -t utf-8//IGNORE myfilename.csv > mynewfilename.csv
also didn't work, per suggestion below.
I have a document which contains various special characters such as é ÿ ° Æ oºi
I've written the following two commands which both work on 'single looking' characters such as à ± È.
However neither of which work with the special characters listed above.
This command works using two byte hex decimals (To replace é with A)
sed -i 's/\xc3\xA9/A/g' test.csv
This command uses utf8 to replace characters:
CHARS=$(python -c 'print u"\u00a9".encode("utf8")') sed -i 's/['"$CHARS"']/A/g' $filename
Either of these commands should work but neither do.
It looks like you are viewing UTF-8 data as ISO-8859-1 (aka latin1).
This is what you'd experience when handling a UTF-8 encoded file in a ISO-8859-1 terminal:
$ cat file
The café has crème brûlée.
$ iconv -f utf-8 -t iso-8859-1 < file
The café has crème brûlée.
$ iconv -c -f utf-8 -t ascii//ignore < file
The caf has crme brle.
This usually only happens for PuTTY users, because PuTTY is one of the few terminal emulators that still uses ISO-8859-1 by default. You can set it to use UTF-8 in the PuTTY configuration.
Here's the same example in a UTF-8 terminal:
$ cat file
The café has crème brûlée.
$ iconv -f utf-8 -t iso-8859-1 < file
The caf� has cr�me br�l�e.
$ iconv -c -f utf-8 -t ascii//ignore < file
The caf has crme brle.
The only correct solution is to fix your setup so that it uses UTF-8 throughout. ISO-8859-1 does not support the languages and features we take for granted today, and is not a useful option.
Does iconv with TRANSLIT work in Unix AIX ?Iit is not included in MANual.. I gave a try with this syntax and it did not work.
iconv -f UTF-8 -t ISO8859-1//TRANSLIT File1 > File2
Transliteration is a GNU iconv extension. Most unices do not support this extension, which is not in POSIX. You might have to compile GNU iconv to your platform to use this functionality.
Good day! I have a problem with converting this string in gb3212: "е – с"
My actions:
[i.remen#win74 ~]$ iconv -f gb2312 -t utf-8 tst.txt
е iconv: illegal input sequence at position 3
[i.remen#win74 ~]$
I tried many different versions(both from separate iconv and as part of glibc). Is there any way to to this conversion?
maybe some characters is not in gb2312 ,try gb18030,it's a 'bigger' charset than gb2312
On Debian-based distributions, there is a utility called unaccent which can be used to remove accents from accented letters in a text.
I was looking for a package containing this on Redhat distros, but the only one I found was unac available for Mandriva only.
I tried to use iconv but it seems to not support my case.
What is the best, lightweight approach, easily usable in a bash script ?
Are there any secret options to iconv that allow this ?
You can use the -c(clear) option in iconv to remove non-ascii chars:
$ echo 'été' | iconv -c -f utf8 -t ascii
t
If you just want to remove the accent:
$ echo 'été' | iconv -f utf8 -t ascii//TRANSLIT
ete