Viewing Japanese MBCS text while remote debugging from English Windows machine? - visual-studio

Trying to debug a MBCS application that has had the strings, dialogs, etc. localized for Japanese. Seems to be a bug somewhere with a string getting truncated or something.
I am debugging from an English Windows 7 using Visual Studio 2013. Of course, since it is MBCS and not Unicode, when I view the strings, it is just gibberish. Probably, if it was unicode, then the strings would display in Japanese while remote debugging, but it's not, and it is not really an option.
So, is there any way to use some special encoding trick to view the string as Japanese on my English system. I'm not going to set my local system for remote debugging to Japanese either.
So... basically looking for some kind of option to view the Japanese strings from the remote system as Japanese strings on my English system. Anybody else been down this road?

Related

Why some software can display all characters and some not?

Reference text: どうもありがとうございました
Copied to:
Notepad/Notepad++: displays it with no problems
LibreOffice Writer: it changes the font family to work, if you convert to Lucida Console, square boxes appear
Windows: displays it with no problems
Console: it needs the correct chcp and a font family (Lucida Console displays square boxes here too) which can display them if I am right
Is it possible to explain why Notepad can display any text in any font family and LibreOffice + Console cannot? Where is(are) the difference(s)? Is it possible to have the same behaviour on the console as the Notepad does for example?
Some Windows fonts have glyphs for many different scripts, some cover a few scripts, and many cover just one. (Fonts which support many scripts are sometimes called "Unicode fonts," which can be a misleading term. In other OSes, these kinds of fonts are more prevalent. Windows itself doesn't ship with any, though I think you get one or two with the Office suite.)
When you try to output text in multiple scripts using standard Windows functions using one of the well-known fonts, then Windows uses font fallback and/or font linking, which automatically switches between fonts as needed to output the whole string. Most programs, like Notepad and Notepad++, thus get coverage automatically.
I haven't read the LibreOffice code, but I suspect that when you select a font for a span of text, it sticks with that font, effectively preventing Windows's font fallback and font linking mechanisms from helping. This isn't surprising, since a WYSIWYG editor is likely to use lower-level APIs for outputting text in order to have more typographic control. But using the lower-level APIs means you don't get fallback and linking for free, so you'd have to implement it yourself, and that's a lot of extra work that may not be important to very many users.
The Windows console has a lot of legacy and limitations that persist for backward compatibility with older programs. The console mostly emulates DOS systems, which didn't have any sort of Unicode support and instead relied on "Code Pages," which are, roughly speaking, alternate mappings between character values and glyphs. Code Pages are geared at just one (or maybe two) scripts, so if you need characters from another script, you were basically out of luck. I think modern versions of Windows have hacked in some support for a pseudo code page that supports UTF-8, but I've never gotten it to work well and it, too, has limitations.

VB6.0 Automation error

I'm currently working on a VB6.0 application which is giving an automation error which isn't very consistent (Sometimes the code works then crashes after several successful iterations).
Dim example As String
...
On Error GoTo ERROR
example = UCase$(Replace(form.UniTextBox(1).Text, " ", ""))
ERROR:
debug.print("ERROR: " & Err.description)
This the section of code which I've identified causes the automation error. The root cause seems to lie when the computer is set up as Polish with Windows 7 running. When English locale is set there are no issues.
What is causing this issue?
Any advice or tips would be appreciated.
thanks
Controls are ANSI not unicode. COM is unicode not ANSI. That string is being converted back and forth by Windows and VB.
Windows, which controls are, are either ANSI or Unicode. VB6 was written when most computers only had ANSI windows, hence all API calls (which creating a window require) are ANSI calls. Send unicode to an ANSI window and Windows will convert it to ANSI first. Ask VB to do API calls or Forms and it will convert the unicode string to ANSI.
See StrConv, byte arrays can act as unicode strings, and also see system settings in Regional Options for non unicode programs.

Why do Netbeans, Aptana Studio and Komodo Edit all not save in UTF-8?

I'm getting back into development and want to find a good editor for HTML5/JQuery.
Being able to save files in UTF-8 is important.
However, although I set my project in NetBeans 7.0 to encode in UTF-8, when I create a file in the project, then look at it in Notepad++, the file is encoded in ANSI and I have to manually set the encoding to UTF-8:
In Aptana Studio 3 I set the workspace to UTF-8 encoding, and my project inherits from that, but when I create a file in the project and look at it in Notepad++, it is encoded in ANSI and I have to change the encoding manually to UTF-8:
So I tried Komodo Edit 7 and in the file manually set the encoding to UTF-8, saved the file, looked at it in Notepad++ which said the file is in ANSI.
I notice in any of these editors if I put a German umlaut character in the file, then Notepad++ shows it as "ANSI as UTF-8" but I still have to manually change it to UTF-8 in Notepad++ where it will stay.
The reason I want an editor that saves in UTF-8 is I remember having a project a couple years ago which had German and French characters in the files and after they were viewed and saved in various editors, the characters would be replaced with garbage characters. The solution was to always initially set the encoding of the file to UTF-8.
I assumed that editors would be so far advanced now that if you specify that the files should be saved in UTF-8, that they actually save in UTF-8 in a way that is recognized by every modern text editor. Is this not the case? What am I not understanding about modern text editors and development environments in regard to UTF-8?
How can I get these editors to save their files in UTF-8 encoding?
A UTF-8 encoded file that only contains characters also present in the ASCII table (the first 128 Unicode characters, i.e. your basic alphanumeric characters) is indistinguishable from an ASCII/ANSI encoded file. My guess is that Notepad++ simply can't make the distinction (because there is none) and defaults to ANSI. You can see the difference when you include a character that is not in the ASCII table. By "ANSI as UTF-8" I can only guess that it means "this documents contains characters from the ANSI table (a.k.a. Latin-1) and is saved in UTF-8".
In other words, your IDEs are probably fine, the problem is with Notepad++.
Try a character like 漢字, that will result in a pretty unique UTF-8 byte sequence that's most certainly not ANSI.
From what I've seen on this topic, Notepad's UTF-8 equates to Notepad++'s UTF-8, which means with BOM included. If a file is saved with this encoding and opened in NetBeans, it will actually show a - character or the  characters for the BOM sequence (depending on whether the encoding for the project or IDE is set to UTF-8.) But if you save the file in Notepad++ encoded as "UTF-8 without BOM", and have either your project defined as UTF-8 or have your netbeans_default_options included with this -J-Dfile.encoding=UTF-8, you'll see what I think is UTF-8 as it should be. Unfortunately, if you try to edit this file in NetBeans without including characters that are outside of the ANSI code set, you see the behavior that you referred to in your question with the file having its encoding set to ANSI.
So in an attempt to make this a "sort-of" answer to your question, please remember that not all editor's concept of UTF-8 are the same. Notepad++ gives the most actual info on what the real encoding for a file is. I'd say that developing in either a Linux or Mac environment might be a possible good choice for making sure that localization is correct, but on Windows a decent workaround might be to just include a non-ANSI character in the file to insure it always get saved as a UTF-8 (non-BOM) file. This is all geared towards NetBeans dev by the way. I haven't tested this with the others, though I'm willing to bet that they will save the file correctly on a Windows machine if they have non-ANSI characters in them. Sorry for the kluge gang, but either way, I hope it helps someone struggling with this same issue.

TextPad and Unicode: full support?

I've got some UTF-8 files created in Mac, and when trying to open them using TextPad in Windows, I get the following warning:
WARNING: (file name) contains characters that do not exist in code
page 1252 (ANSI Latin 1). They will be converted to the system default
character, if you click OK.
Linux (GNOME gEdit) can open the same file without complaints. What does the above mean? I thought that TextPad had full UTF-8 support. Can I safely open and edit UTF-8 files using it without corrupting the file?
It seems that TextPad cannot handle characters outside windows-1252 (CP1252, here carrying the misnomer “ANSI Latin 1”). I tested it on Windows, opening a plain text file created on the same system, as UTF-8 encoded, both with and without BOM, with the same result. The program’s help does not seem to contain anything related to character encodings, and its tools for writing “international characters” are for Latin-1 characters only.
There are several text editors for Windows that can deal with UTF-8 (even Notepad can open a UTF-8 file, but it can hardly be recommended for serious editing). See Alan Wood’s collection of information on Unicode editors and word processors for Windows. (Personally, I like Notepad++ and BabelPad, which are both free.)
TextPad 8, the newest as of 2016-01-28, does finally properly support BMP Unicode. It's a paid upgrade, but so far has been working flawlessly for me.
TextPad ‘supports’ UTF-8 and UTF-16 documents only in as much as it will import and export them. But it still edits files as simple bytes, and not Unicode characters (using the ANSI code page, which is code page 1252 for Western European).
So unless the file happened to contain only characters that also exist in that code page, you will lose content. This rather defeats the point of Unicode.
Indeed, this was the issue that made me flee—to EmEditor, at the time, though now I would agree with the previous comments and recommend Notepad++. The era of paying for text editors is long gone.
Actually TextPad does support displaying Unicode code points granted they went about it the wrong way. In order to display the Unicode characters you have to choose Configure->Preferences and expand "Document Classes->Text->Font.
You need to choose a Unicode font AND set the Script to match. E.g. Arial Unicode MS with script CHINESE_BIG5.
However, this is a backward approach since the application should handle this when the user tells TextPad to open the file in Unicode or UTF-8. The built in Notepad application with MS Windows will detect the encoding automatically and display the glyphs correctly based upon the encoding.
I found a discussion on this in the Textpad forums:
http://forums.textpad.com/viewtopic.php?t=11019
While I have Notepad++, Textpad handles large files with ease while other editors I've tried, including Notepad++, either slow to a crawl or die. I'm currently trying to edit a 475MB file and Notepad++ is not up to the task.
Textpad Configure Menu --> Preferences --> Document Classes --> Default --> Default encoding --> UTF-8
Try the ANSI code set with File/Open, that should solve the problem in TextPad

Cygwin displays error messages in Hebrew and garbled

I have been using Cygwin to build my Android library using the NDK's ndk-build script and Cygwin's make tool. It started giving me errors with a bunch of Latin non-English characters. When copying the text to Google, it was pasted as Hebrew (which I can read). Is there any way to force it to output errors in English? Any idea why this happens?
Check your environment variables for the correct locale. LANG or LC_MESSAGES are probably responsible. Set those to an English locale (in your profile to have that in future sessions as well) to get English error messages. Sorry, I'm a Windows person and know nearly nothing of Unix so you'd have to look up the specifics elsewhere, but this should be the general direction to go.
Some programs/libraries try to be overly smart by guessing the locale from the keyboard layout or the user's locale. And oftentimes ignoring the fact that on Windows locale and UI language are two different concepts (and that different languages on the console are even harder to get right).
As for why the messages appear garbled that's likely because the console window uses the wrong code page. The easiest fix is usually to use a TrueType font for the console window, but in this case neither Consolas nor Lucida Console include glyphs for Hebrew, so you'd only see boxes anyway.

Resources