What encoding does Windows use for command line parameters passed to programs started in a cmd.exe window?
The encoding of command line parameters doesn't seem to be affected by the console code page set using chcp (I set it to UTF-8, code page 65001 and use the Lucida Console font.)
If I paste an EN DASH, encoded as hex E28093, from a UTF-8 file into a command line, it is displayed correctly in the cmd.exe window. However, it seems to be translated to a hex 96 (an ANSI representation) when it is passed to the program. If I paste Cyrillic characters into a command line, they are also displayed correctly, but appear in the program as question marks (hex 3F.)
If I copy a command line and paste it into a text file, the resulting file is UTF-8; it contains the same encoding of the EN DASH and Cyrillic characters as the source file.
It appears the characters pasted into the cmd.exe window are captured and displayed using the code page selected with chcp, but some ANSI code page is used to translate the characters into a different encoding before passing them as parameters to a program. Characters that cannot be converted apparently are silently converted to question marks.
So, if I want to correctly handle command line parameters in a program, I need to know exactly what the encoding of the parameters is. For example, if I wish to compare command line parameters with known UTF-8 data read from a file, I need to convert the parameters from the correct encoding to UTF-8. Thanks.

If your goal is to compare Unicode characters then you should call GetCommandLineW in your program (or use wmain so that argv uses wchar_t) and then convert this UTF-16LE command line string to UTF-8 or vice versa.
GetCommandLineA probably converts the Unicode source string with CP_ACP.


Using terminal to sort data and keep the format in notepad.exe

I'm using Ubuntu Bash within Windows 10 and I have a text document with:
{u'osidjfoij23': 3894798, u'oisjdao':234567, u'oaijsdofj': 984759}
using tr, in terminal I change my output to
'osidjfoij23': 3894798,
'oaijsdofj': 984759}
when opening the same document via notepad.exe, the newline "\n" added from tr doesn't register and all the data gets presented as a paragraph.
I know this is because bash and notepad have different encodings for their documents, is there a way to make these work together or an alternative I can use for notepad?
You can use unix2dos to convert a file to Windows line endings. Linux programs handle Windows line endings fairly well, so this shouldn't break anything (especially if that's JSON as it appears to be).

Read Double byte characters in plist from shell

I am working on Mac. I have a p-list entry containing double byte chinese characters,
ie.ProductRoot /Users/labuser/Desktop/您好.
Now i am running this command on terminal
defaults read "path to p-list" ProductRoot
and I am getting /Users/labuser/Desktop/\u60a8\u597d
........How can i fix this?
"defaults read" doesn't seem to have any way to change the format of the output. Maybe you could pipe that to another command-line tool to unescape the Unicode characters.
Failing that, it'd be very easy to write a tool in Objective-C or Swift to dump just that one value as a string.
As a side note, you claim the file has double-byte characters. If it's being created by native Mac code, it's probably more-likely to be in UTF-8 encoding. I don't know if that would matter at all, but I figured I'd add that in case it's relevant.
You could try this:
defaults read | grep ppt | perl -npe 's/\\\\U(\w\w\w\w)/chr hex $1/ge'

Windows Raster Fonts Encoding Error

I'm writing an interpreter and I have come across a peculiar problem involving character sets. ( I think ).
When I create a file on my Mac called, hello.rd and I run the command;
file -I hello.rd
I get this output:
hello.rd: text/plain; charset=utf-8
That shows me the file is UTF-8 which it should be. The source file looks like this;
print "Hello World á"
And the output in the terminal is:
Hello World á
This is all the way I want / expect it to be. The problem arises when I execute the code on Windows. When I execute the same code on Windows I get this output:
As you can see the á isn't output correctly. I changed the codepage to 65001 and it made no difference, but when I used the Lucida Console font, the characters displayed correctly. But what I can't understand is, why I can type the letter á in the terminal using my keyboard and it displays, but it won't display from my files.
So what I did next was I created a file on my Windows PC called test123.rd and saved this text in it:
print "Hello World á ã ß"
When I execute that on my Mac I get the incorrect output this time, I get:
Hello World ? ? ?
And on my PC I still get the incorrect output, I get this:
I used the file -I command on my Mac on the file test123.rd and I got this output:
test123.rd: text/plain; charset=iso-8859-1
I assume since the character set in the test123.rd file isn't UTF-8, is why the file test123.rd is displaying incorrectly on OSX but I don't understand why it's displaying incorrectly on Windows as well.
Does anyone have any idea how to solve the problem, without changing the font of the Windows CMD?
Type cmd /? to see how to switch unicode on, then choose a unicode font. Also see chcp /?.

Perl on Windows: Problems with Encoding

I have a problem with my Perl scripts. In UNIX-like systems it prints out all Unicode characters like ä properly to the console. In the Windows commandline, the characters are broken to senseless glyphs. Is there a simple way to avoid this? I'm using use utf8;.
Thanks in advance.
use utf8; simply tells Perl your source is encoded using UTF-8.
It's not working on unix either. There are some strings that won't print properly (print chr(0xE9);), and most that do will print a "Wide character" warning (print chr(0x2660);). You need decode your inputs and encode your outputs.
In unix systems, that's usuaully
use open ':std', ':encoding(UTF-8)';
In Windows system, you'll need to use chcp to find the console's character page. (437 for me.)
use open ':std', ':encoding(cp437)'; # Encoding used by console
use open IO => ':encoding(cp1252)'; # Encoding used by files

Ruby and Accented Characters

Summary of the wall of text below: How can I display accented characters (so they work via puts, etc) in Ruby?
Hello! I am writing a program for my class which will display some sentences in Spanish. When I try to use accented characters in Ruby, they do not display correctly (in the NetBeans output window (which displays accented characters in Java fine) or in the Command Prompt).
At first, some of my code didn't even run because the accented characters in my arrays where throwing off the Ruby interrupter (I guess?). I got errors like Ruby was expecting a closing bracket.
But I did some research, and found a solution, to add the following line of code to the beginning of my Ruby file:
# coding: utf-8
In NetBeans, my program ran regardless of this line. But I needed to add this line to get my program to run successfully in Command Prompt. (I don't know why.)
I'm still, however, having a problem actually displaying the characters to the screen. A word such as "será" will display in the NetBeans output window as "seré". And in the command prompt it draws little pipe characters (that I don't know how to type).
Doing some more research, I heard about:
$KCODE = 'UTF-8'
but I'm not having any luck with this.
I'm using Ruby 1.8 and 1.9 (I go back and forth between different machines).
A command prompt in Windows 7 has raster fonts by default. And it doesn't support unicode. At first, you should change cmd font to Lucida Console or Consolas. And then change the command prompt's codepage with chcp 65001. You can do it manually or add this line to your ruby programm:
# encoding: utf-8
`chcp 65001` #change cmd encoding to unicode
puts 'será test '
