Display arabic text from txt file to cmd prompt - windows

I am trying to print Arabic on cmd prompt from a file named test.txt
The contents of test.txt is as below:
ASCII abcde xyz
German äöü ÄÖÜ ß
Polish ąęźżńł
Russian абвгдеж эюя
CJK 你好
Arabic جيدة هذا هو اختبار
Test.txt is saved as encoding: UTF-8 and font: arial Unicode ms, script: Arabic
The cmd properties has been changed to font Lucida console and chcp (code page) to 1256
But still when type test.txt throws garbage value in place of Arabic text. Is there any work around for displaying this correctly?

Doesn't seem like there's a workaround:
Arabic and Hebrew (and for that matter Thai, Hindi, and other complex scripts) are not supported on Windows 2000 and Windows XP console.
http://support.microsoft.com/kb/821083

Related

Decoding files containing hebrew characters and German eszett using iconv

I have a file that I'm pretty sure is in a weird encoding. I've successfully converted similar files to utf-8 previously by assuming they were encoded in windows-1255 using iconv (iconv -f windows-1255 -t utf-8 $file) and this has worked successfully.
My current file contains a ß character that is throwing me off - iconv breaks when it hits this (with an "illegal input sequence" error). Is there a different kind of encoding I should be using?
WINDOWS-1255 (= Hebrew) does not know an Eszett (ß), so ICONV behaves correctly. Other legacy codepages that know that character on code point 00DF:
WINDOWS-1250 = Latin 2 / Central European
WINDOWS-1252 = Latin 1 / Western European
WINDOWS-1254 = Turkish
WINDOWS-1257 = Baltic
WINDOWS-1258 = Vietnamese
Only the document owner knows which codepage is the correct one. If it's one of the WINDOWS-125x at all.

Don't understand encoding ANSI to UTF8

I have the following text in Notepad++ which is in "ANSI" encoding:
Vallés, Ramon Casas
When I tell Notepad++ to "encode in UTF8" it displays as:
Vallés, Ramon Casas
The two characters é are c3a9 in Hex. How can they become an é in UTF8 which is e9?

Bash pressing ç or Ç emits beep instead of writing the key in PASE

I have a Portuguese keyboard
All keys are ok, but ç or Ç returns a beep sound instead of displaying it.
If I open vi then start typing, then ç or Ç show ok.
Any ideas?
If you are on AIX/PASE, the UTF-8 locales are all in the format LL_CC or LL_CC.UTF-8 and not ll_CC.UTF-8 like they are on Linux/BSD/etc. You want either PT_PT or PT_PT.UTF-8 for UTF-8 in Portugal.
You can see the list of locales and encodings here: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.nlsgdrf/support_languages_locales.htm
You have to fix your bash environnment for it to accept UTF-8 characters.
If you have a .profile or .bash_profile,
add to it the exact 2 following lines:
LANG=pt_PT.UTF-8
export LANG
Then open a new terminal window and check your two keys ç and Ç working correctly.

Character replacement batch file

I'm trying to do a batch script using Windows command line to convert some characters for example:
É to Й
Ö to Ц
Ó to У
Ê to К
Å to Е
Í to Н
à to Г
Ø to Ш
Ù to Щ
Ç to З
with no success. That's because I am using a program that does not support a Cyrillic font.
And I have already the file with these words, like:
ОБОГРЕВ ЗОНЫ 1
ДАВЛЕНИЕ ЦВЕТА 1
...
and so on...
Is it possible?
I'm guessing that you'd like to convert the character set (alias code page) of a file so you can open and read it.
I'm assuming you are using a Windows computer.
Let's say that your file is russian.txt and when you open it with notepad, the characters doesn't make any sense. The russian.txt file's character encoding is most propably ANSI and it's code page is Windows-1251.
Some words about character encoding:
In ANSI one character is one byte long.
Different languages have different code pages: Windows-1251 = Russian, Windows-1252 = Western Languages (English, German, Swedish...), Windows-1253 = Greek ...
In UTF-8 English characters are one byte long and non-English characters two bytes long.
In Unicode all characters are two bytes long.
UTF-8 and Unicode doesn't need code pages.
You can check the encoding by opening the file in notepad and clicking File, Save As. At the right bottom corner beside the Save-button you can see the encoding.
With some googling I found a site where you can do the character encoding conversion online. I Haven't tested it, but here's the address:
http://i-tools.org/charset
I've made a script (= a small program) which changes the character encoding from any ANSI and code page combination to UTF-8 or Unicode or vice versa.
Let's say you have and English Windows computer and want to convert the russian.txt (ANSI / Windows-1251) to UTF-8.
Here's how:
Open this web-page and copy the script in it to the clipboard:
VB6/VBScript change file encoding to ansi
Create a new file named ConvertCharset.vbs to the same folder, where the russian.txt is, say C:\Temp.
Open the ConvertCharset.vbs in notepad (right click+edit) and paste.
Open CMD (Windows-button+R, cmd, Enter).
In CMD-window type (hit Enter-key at each end of the line):
cd C:\Temp\
cscript ConvertCharset.vbs /InputCharset:Windows-1251 /OutputCharset:utf-8 /InputFile:russian.txt /OutputFile:russian_utf-8.txt
Now the you can open the russian_utf-8.txt in notepad and you'll see the Russian characters OK.
More info:
http://en.wikipedia.org/wiki/Character_encoding
http://en.wikipedia.org/wiki/Windows-1251
http://en.wikipedia.org/wiki/UTF-8
VB6/VBScript change file encoding to ansi

convert UTF-8 to CP1252 in ubuntu with PHP or bash shell

I have a question about converting UTF-8 to CP1252 in Ubuntu with PHP or SHELL.
Background : Converting a csv file from UTF-8 to CP1252 in Ubuntu with PHP or SHELL, copy file from Ubuntu to Windows, open file with nodepad++.
Environment :
Ubuntu 10.04
PHP 5.3
a file csv with letters (œ, à, ç)
Methods used :
With PHP
iconv("UTF-8", "CP1252", "content of file")
or
mb_convert_encoding("content of file", "UTF-8", "CP1252")
If I check the generated file with
file -i name_of_the_file
It displayed :
name_of_the_file: text/plain; charset=iso-8859-1
I copy this converted file to windows and opened with notepad++, in the bottom of the right, we can see the encoding is ANSI
And when I changed the encoding from ANSI to Windows-1252, the specials characters were well displayed.
With Shell
iconv -f UTF-8 -t CP1252" "content of file"
The rest will be the same .
Question :
1. Why the command file did not display directly CP1252 or ANSI but ISO-8895-1 ?
2. Why the specials characters could be well displayed when I changed the encoding from ANSI to Windows-1252.
Thank you in advance !
1.
CP1252 and ISO-8859-1 are very similar, quite often a file encoded in one of them would look identically as the file encoded in the second one. See Wikipedia to see which characters are in Windows-1252 and not in ISO-8859-1.
Letters à and ç are encoded identically in both encodings. While ISO-8859-1 doesn't have an œ and CP1252 does, file might have missed that. AFAIK it doesn't analyse the entire file.
2.
"ANSI" is a misnomer used for the default non-Unicode encoding in Windows. In case of Western European languages, ANSI means Windows-1252. In case of Central European, it's Windows-1250, in case of Russian it's Windows-1251, and so on. Nothing apart from Windows uses the term "ANSI" to refer to an encoding.

Resources