Using non-ASCII characters in a cmd batch file - windows

I'm working on a .bat program, and the program is written in Finnish. The problem is that CMD doesn't know these "special" letters, such as Ä, Ö, Å.
Is there a way to make those work? I'd also like it if the user could use those letters too.
Part of my code:
#echo off
/u
title JustATestProgram
goto test123
:test123
echo Letters : Ää Öö Åå
pause
exit
When I open this file, the letters look like this:

Try putting this line at the top of the batch file:
chcp 65001
It should change the console encoding to UTF-8, and you should be able to read the file properly in the script after that.

Theoretically you just need to use the /u (Unicode) switch:
c:\>cmd /u
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
c:\>echo Ä
Ä

If you use Notepad++, you can simply change the charset. Doing this will allow you to write letters from desired charset. The western region -US. should support it.
You can do it in a drop down menu in Notepad++ or by hand by writing chcp 437. But I recommend doing this in Notepad++ as it will show you the output as it will be in the batch. So you will then easily see if you use the right code page. And at same time it's easy to switch if you want more special symbols. You can also as stated in previous posts. Try UTF-8.
You can read more about this here: http://ss64.com/nt/chcp.html. And here's a list over different code pages (check out the OEM pages): Code Page Identifiers

The command prompt uses DOS encoding. Windows uses ANSI or Unicode.
PS I'm assuming you are in the US with code page 437 rather than international English/Western European 850.
So I used Character Map to get the DOS code then find out what ANSI character that code maps to.
This is the notepad contents.
echo Ž„™”†
Which was made by putting the DOS codes for your characters into notepad.
0142, 0132, 0153, 0148, 0143, 0134 which display as the above ANSI characters.
Command prompt output
C:\Windows\system32>echo ÄäÖöÅå
ÄäÖöÅå
Alt + Character Code [Prev | Next | Contents]
Holding down alt and pressing the character code on the numeric keypad will enter that character. The keyboard language in use must support entering that character. If your keyboard supports it the code is shown on the right hand side of the status bar in Character Map else this section of the status bar is empty. The status bar us also empty for characters with well known keys, like the letters A to Z.
However there is two ways of entering codes. The point to remember here that the characters are the same for the first 127 codes. The difference is if the first number typed is a zero of not. If it is then the code will insert the character from the current character set else it will insert a character from the OEM character set. Codes over 255 enter the unicode character and are in decimal. Characters entered are converted to OEM for Dos applications and either ANSI or Unicode depending on the Windows' application. See Converting Between Decimal and Hexadecimal.
E.G., Alt + 0 then 6 then 5 then release Alt enters the letter A
From Shortcut Keys and Key Modifiers by Me at https://1drv.ms/f/s!AvqkaKIXzvDieQFjUcKneSZhDjw

Related

vDos (DOS emulator) shows incorrect character

that is some sort of unusual question. A friend uses an old MS-DOS program on XP to handle some database operations (.dbf files). For instance, the application shows information about product e.g. the diameter. On XP this is displayed like you would expect Ø 1.2 (maybe not that exact character, but some sort of diameter symbol). But in vDos (runs on Windows 10) Ý 1.2 is displayed. In the database itself, it is also written Ý 1.2. I assume it has something to do with a different codepage or windows graftabl.
If I change the database entry to Ø vDos renders it correctly, but this is not the type of solution I am looking for. Do you have any idea what could be different between those two systems and what I could try out?
EDIT: I found a workaround. In vDos it is possible to specify a codepage and change specific characters. Just create a codepage file in the vDos dir (e.g. C_850.txt) and add chcp 850 to the autoexec.txt. Each line in this file represents each char of the 255 chars in the codepage. To swap a char, in my case 237 = Ý (line 238) write the hexcode of the char you want to replace it with, in my case D8 (= Ø).

Assembly problems with ascii extended characters

i want know what to do to solve this issue with the ascii extended characters, i don´t understand why print a strange symbols instead of letter that represent 0x90
i put PutStr c381
nothing happen
This has nothing to do with assembly language and everything to do with UTF-8 (which your terminal is expecting) vs. ISO-8859-1 (latin-1) or Windows 1252 (IDK which) extended 8-bit character set which you seem to be looking up codes from. It would be the same if you wrote a C program with those bytes in a char array[] and used stdio puts.
As #Fuz says, "Á does not have an ASCII code." ASCII only includes characters from 0..127 (and the low 32 are non-printable) http://www.asciitable.com/. Extended-ASCII 8-bit character sets only overlap with UTF-8 for code-points from 0 to 127.
Any program that makes a write() system call to write a 0x90 byte to stdout will do the same thing, regardless of what language it was written in. (Use strace ./program to see what yours does, or pipe it into hexdump -C). For example, in bash run printf '\x90\n' to do exactly the same thing. 90 0a is not a valid UTF-8 multi-byte sequence, so your terminal prints a � glyph (a ? in a diamond).
You could set your gnome-terminal to ISO-8859-1 or Windows 1252 (right click and use the dropdown, or find the menu entry). I'm using konsole, and it does support both those non-UTF-8 character encodings.
You'll probably want to set export LANG=en_US in that terminal only (not the usual en_US.UTF-8) if you do that, so other programs will continue to work well.
Or en_CA or whatever locale you actually use, just use the non-UTF-8 version of it so man's line-drawing will work, and so will full-screen text things like gdb's TUI layout reg mode, or editors like jed.

Win 10: Desktop.ini infotip/tooltip text formatting - line break

I'm trying to customize some folders in Windows 10 os using Desktop.ini text files. One thing I can't solve is how to make a line break in the infotip.
Current text file looks like this:
[.ShellClassInfo]
ConfirmFileOp=0
NoSharing=1
IconFile=$path_to_icon
IconIndex=0
InfoTip=Line1 \n Line2
So the last line of the text document is not working as desired. It just doesn't recognize the \n symbol. I also tried replacing the standard \n new line symbol with unicode characters and some other similar methods and symbols, but it didn't work. It just recognizes it as a string no matter what is written there.
The only way I could achieve a line break was to add so many characters, that Win 10 would automatically start a new line.
Help is much appreciated. Thank you!
You could define a string in a resource-only DLL:
InfoTip=#Your.dll,-12345
The negative number defines the resource ID of the string to use.
String resources in a DLL are not limited in the range of character codes, so this should in principle enable you to use line breaks (ASCII code 10).
To create such a resource-only DLL there are many free tools available, google for "windows resource editor".

how to read Arabic input from cmd?

I want to ask if can read entered query in Arabic in cmd when running the program written in python ( my program accept query from user in cmd, it accepts the English input but doesn't accept the Arabic input )?
To allow unicode character support in cmd you need to set the code page for the language you are after.
This is done with
chcp CodePageNumber
I think arabic should be chcp 708
See more info here: http://ss64.com/nt/chcp.html
And a list of code pages here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx

ANSI questions: "\x1B[?25h" and "\x1BE"

What does "\x1B[?25h" do?
How is "\x1BE" different from "\n"? According to http://ascii-table.com/ansi-escape-sequences-vt-100.php it "moves to next line"? Seems like that's what "\n" does?
I tried echo "xxx\nxxx\n" and echo "xxx\x1BExxx\n" in PHP and they both output the same thing.
Any ideas?
Thanks!
These are ANSI escape sequences (also known as VT100 codes) are an early standardisation of control codes pre-dating ASCII.
The escape sequence \x1BE, or Esc+E, is NEL or "Next line", and is used on older terminals and mainframes to denote CR+LF, or \r\n.
The escape sequence \x1B[ (Esc+[) is an example of a Control Sequence Introducer. (\x9B is another single-character CSI.) The control sequence ?25h following it is used to show the cursor.
Most terminals will support these control codes; to enter escape sequences you can type Ctrl+V, Ctrl+[ which should render as ^[ (the C0 code for ESC), followed by the escape code.
References:
ANSI escape code
C0 and C1 control codes

Resources