PDCurses and DOS codepage 437 - windows

I'm using PDCurses version 3.4 for Windows; it's compiling and running properly, but I can't get it to display the IBM Extended Ascii characters from Codepage 437 (although the console is running in that codepage). I'm specifically trying to get line-drawing characters.
The following commands display the wrong characters:
mvaddch(0,20,186);
mvaddch(1,20,204);
for (unsigned i=0; i<80; i++) {
mvaddch(1,i+20,205);
}
(This is with a 100x50-column terminal window.)
Giving mvaddch() the role-equivalent Unicode codepoints (186 = 2251, 204 = 2560, 205 = 2250) with PDC_WIDE defined also fails, displaying the same characters ('º' on line 1, 'I' repeatedly on line 2).
How do I get the line-draw characters -- and the rest of IBM Extended ASCII -- to display?
(Related article -- different symptoms, same resolution: PDcurses displaying question marks in place of intended character.)

I should have downloaded pdc34dll.zip instead of pdc34dllw.zip, despite the latter having a "w" (for "Windows"?) in the title and being advertised by SourceForge as the most recent version.
I have no idea why this worked; but it did, and PDCurses now displays IBM Extended ASCII characters properly.

Related

vDos (DOS emulator) shows incorrect character

that is some sort of unusual question. A friend uses an old MS-DOS program on XP to handle some database operations (.dbf files). For instance, the application shows information about product e.g. the diameter. On XP this is displayed like you would expect Ø 1.2 (maybe not that exact character, but some sort of diameter symbol). But in vDos (runs on Windows 10) Ý 1.2 is displayed. In the database itself, it is also written Ý 1.2. I assume it has something to do with a different codepage or windows graftabl.
If I change the database entry to Ø vDos renders it correctly, but this is not the type of solution I am looking for. Do you have any idea what could be different between those two systems and what I could try out?
EDIT: I found a workaround. In vDos it is possible to specify a codepage and change specific characters. Just create a codepage file in the vDos dir (e.g. C_850.txt) and add chcp 850 to the autoexec.txt. Each line in this file represents each char of the 255 chars in the codepage. To swap a char, in my case 237 = Ý (line 238) write the hexcode of the char you want to replace it with, in my case D8 (= Ø).

Assembly problems with ascii extended characters

i want know what to do to solve this issue with the ascii extended characters, i don´t understand why print a strange symbols instead of letter that represent 0x90
i put PutStr c381
nothing happen
This has nothing to do with assembly language and everything to do with UTF-8 (which your terminal is expecting) vs. ISO-8859-1 (latin-1) or Windows 1252 (IDK which) extended 8-bit character set which you seem to be looking up codes from. It would be the same if you wrote a C program with those bytes in a char array[] and used stdio puts.
As #Fuz says, "Á does not have an ASCII code." ASCII only includes characters from 0..127 (and the low 32 are non-printable) http://www.asciitable.com/. Extended-ASCII 8-bit character sets only overlap with UTF-8 for code-points from 0 to 127.
Any program that makes a write() system call to write a 0x90 byte to stdout will do the same thing, regardless of what language it was written in. (Use strace ./program to see what yours does, or pipe it into hexdump -C). For example, in bash run printf '\x90\n' to do exactly the same thing. 90 0a is not a valid UTF-8 multi-byte sequence, so your terminal prints a � glyph (a ? in a diamond).
You could set your gnome-terminal to ISO-8859-1 or Windows 1252 (right click and use the dropdown, or find the menu entry). I'm using konsole, and it does support both those non-UTF-8 character encodings.
You'll probably want to set export LANG=en_US in that terminal only (not the usual en_US.UTF-8) if you do that, so other programs will continue to work well.
Or en_CA or whatever locale you actually use, just use the non-UTF-8 version of it so man's line-drawing will work, and so will full-screen text things like gdb's TUI layout reg mode, or editors like jed.

Using non-ASCII characters in a cmd batch file

I'm working on a .bat program, and the program is written in Finnish. The problem is that CMD doesn't know these "special" letters, such as Ä, Ö, Å.
Is there a way to make those work? I'd also like it if the user could use those letters too.
Part of my code:
#echo off
/u
title JustATestProgram
goto test123
:test123
echo Letters : Ää Öö Åå
pause
exit
When I open this file, the letters look like this:
Try putting this line at the top of the batch file:
chcp 65001
It should change the console encoding to UTF-8, and you should be able to read the file properly in the script after that.
Theoretically you just need to use the /u (Unicode) switch:
c:\>cmd /u
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
c:\>echo Ä
Ä
If you use Notepad++, you can simply change the charset. Doing this will allow you to write letters from desired charset. The western region -US. should support it.
You can do it in a drop down menu in Notepad++ or by hand by writing chcp 437. But I recommend doing this in Notepad++ as it will show you the output as it will be in the batch. So you will then easily see if you use the right code page. And at same time it's easy to switch if you want more special symbols. You can also as stated in previous posts. Try UTF-8.
You can read more about this here: http://ss64.com/nt/chcp.html. And here's a list over different code pages (check out the OEM pages): Code Page Identifiers
The command prompt uses DOS encoding. Windows uses ANSI or Unicode.
PS I'm assuming you are in the US with code page 437 rather than international English/Western European 850.
So I used Character Map to get the DOS code then find out what ANSI character that code maps to.
This is the notepad contents.
echo Ž„™”†
Which was made by putting the DOS codes for your characters into notepad.
0142, 0132, 0153, 0148, 0143, 0134 which display as the above ANSI characters.
Command prompt output
C:\Windows\system32>echo ÄäÖöÅå
ÄäÖöÅå
Alt + Character Code [Prev | Next | Contents]
Holding down alt and pressing the character code on the numeric keypad will enter that character. The keyboard language in use must support entering that character. If your keyboard supports it the code is shown on the right hand side of the status bar in Character Map else this section of the status bar is empty. The status bar us also empty for characters with well known keys, like the letters A to Z.
However there is two ways of entering codes. The point to remember here that the characters are the same for the first 127 codes. The difference is if the first number typed is a zero of not. If it is then the code will insert the character from the current character set else it will insert a character from the OEM character set. Codes over 255 enter the unicode character and are in decimal. Characters entered are converted to OEM for Dos applications and either ANSI or Unicode depending on the Windows' application. See Converting Between Decimal and Hexadecimal.
E.G., Alt + 0 then 6 then 5 then release Alt enters the letter A
From Shortcut Keys and Key Modifiers by Me at https://1drv.ms/f/s!AvqkaKIXzvDieQFjUcKneSZhDjw

WriteConsoleW, wprintf and Unicode

AllocConsole();
consoleHandle = GetStdHandle(STD_OUTPUT_HANDLE);
WriteConsoleW(consoleHandle, L"qweąęėšų\n", 9, NULL, NULL);
_wfreopen(L"CONOUT$", L"w", stdout);
wprintf(L"qweąęėšų\n");
Output is:
qweąęėšų
qwe
Why does wprintf stop after printing qwe? \0 byte encountered in ą should terminate wide-char string, AFAIK
At first I accepted Hans Passant answer, but the root cause for wprintf not printing to UTF-8 streams is that wprintf behaves as though it uses the function wcrtomb, which encodes a wide character (wchar_t) into a multibyte sequence, depending on the current locale - link.
Windows does not have an UTF-8 capable locale (a locale which would support an UTF-8 codepage (65001)).
Quote from MSDN:
The set of available locale names, languages, country/region codes, and code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8.
The stdout stream can be redirected and therefore always operates in 8-bit mode. The Unicode string you pass to wprintf() gets converted from utf-16 to the 8-bit code page that's selected for the console. By default that's the olden 437 OEM code page. That's where the buck stops, that code page doesn't support the character.
You'll need to switch to another 8-bit code page, one that does support that character. A good choice is 65001, the code page for utf-8. Fix:
SetConsoleOutputCP(CP_UTF8);
Or use SetConsoleCP() if you want stdin to use utf-8 as well.

Is this a bug (Windows API)?

I had a question about string normalization and it was already answered, but the problem is, I cannot correctly normalize korean characters that require 3 keystrokes
With the input "ㅁㅜㄷ"(from keystrokes "ane"), it comes out "무ㄷ" instead of "묻".
With the input "ㅌㅐㅇ"(from keystrokes "xod"), it comes out "태ㅇ" instead of "탱".
This is Mr. Dean's answer and while it worked on the example I gave at first...it doesn't work with the one's I cited above.
If you are using .NET, the following will work:
var s = "ㅌㅐㅇ";
s = s.Normalize(NormalizationForm.FormKC);
In native Win32, the corresponding call is NormalizeString:
wchar_t *input = "ㅌㅐㅇ";
wchar_t output[100];
NormalizeString(NormalizationKC, input, -1, output, 100);
NormalizeString is only available in Windows Vista+. You need the "Microsoft Internationalized Domain Name (IDN) Mitigation APIs" installed if you want to use it on XP (why it's in the IDN download, I don't understand...)
Note that neither of these methods actually requires use of the IME - they work regardless of whether you've got the Korean IME installed or not.
This is the code I'm using in delphi (with XP):
var buf: array [0..20] of char;
temporary: PWideChar;
const NORMALIZATIONKC=5;
...
temporary:='ㅌㅐㅇ';
NormalizeString(NORMALIZATIONKC , temporary, -1, buf, 20);
showmessage(buf);
Is this a bug? Is there something incorrect in my code?
Does the code run correctly on your computer? In what language? What windows version are you using?
The jamo you're using (ㅌㅐㅇ)are in the block called Hangul Compatibility Jamo, which is present due to legacy code pages. If you were to take your target character and decompose it (using NFKD), you get jamo from the block Hangul Jamo (ᄐ ᅢ ᆼ, sans the spaces, which are just there to prevent the browser from normalizing it), and these can be re-composed just fine.
Unicode 5.2 states:
When Hangul compatibility jamo are
transformed with a compatibility
normalization form, NFKD or NFKC, the
characters are converted to the
corresponding conjoining jamo
characters.
(...)
Table 12-11
illustrates how two Hangul
compatibility jamo can be separated in
display, even after transforming them
with NFKD or NFKC.
This suggests that NFKC should combine them correctly by treating them as regular Jamo, but Windows doesn't appear to be doing that. However, using NFKD does appear to convert them to the normal Jamo, and you can then run NFKC on it to get the right character.
Since those characters appear to come from an external program (the IME), I would suggest you either do a manual pass to convert those compatibility Jamo, or start by doing NFKD, then NFKC. Alternatively, you may be able to reconfigure the IME to output "normal" Jamo instead of comaptibility Jamo.

Resources