Why is there a difference between the encoding of the Windows Command Prompt vs. a batch file?

Why is there a difference between the encoding of the Windows Command Prompt vs. a batch file? - windows

For example, suppose I have a batch file called 'test.cmd' and it simply contains:
echo %1
I can call this directly from the command prompt with 'test.cmd some¬arg' and the result is that the string 'some¬arg' is printed.
However if I place that same call in a second batch file, called 'tester.cmd' for the sake of argument, and I call this from the command prompt the result is that the string 'some%arg' is printed.
What is it that messes up the encoding and how do I get around it? I am sure I've fixed this before, but I can't remember how...
Thanks!

This is because your batch file is encoded in a different code page than cmd.exe is currently in.
In western default configurations, cmd.exe starts in CP850, but text editors usually work in CP1252 (what is often wrongly referred to as Latin-1 or ISO-8859-1).
The characters "¬" and "¼" share the same character code in these two code pages, "BC".
The solution is simple. Either encode your batch file in code page 850, or switch cmd.exe to code page 1252 by issuing chcp 1252.

Related

In Windows 10 how do I rename a file to a filename that includes a character with an umlaut?

I'm on Win10 and I have a .bat file to rename a bunch of files. Some of the entries need to be renamed to a non-English name, e.g.
RENAME "MyFile1.txt" "Eisenhüttenstadt.txt"
However, when I run this, the 'ü' comes out as something else, other characters with an umlaut also are replaced by different characters.
I've tried saving the .bat file in Notepad with Unicode and UTF-8 encoding but then Windows doesn't recognise the command when I try to run it.
I've read this and other similar issues but not found a solution, surely it's simple when you know how?
Any suggestions?

The default code page in the console is 437(USA) or 850(Europe), which does not support characters with umlaut, so you must change this to 1252(West European Latin). So, use Chcp command in the beginning of your batch file to change it, like this:
Chcp 1252
Example:
image via http://www.pctipp.ch/tipps-tricks/kummerkasten/windows-7/artikel/windows-7-umlaute-in-batch-dateien-55616/
Sources:http://ss64.com/nt/chcp.html , http://www.pctipp.ch/tipps-tricks/kummerkasten/windows-7/artikel/windows-7-umlaute-in-batch-dateien-55616/ (The article says for Windows 7 but this applies for Windows 10 too)

Batch wont execute but just re-print its content

No matter what my code is, even if my batch file is syntactically incorrect, even if it is absolutely correct and even if there is nothing to display on the screen the batch file when executed just displays the code as it is.
I read a similar question MSDOS prints the whole batch file on screen instead of executing but since that was on MS-DOS I hoped my issue could have a solution different than that.
Eg,
#echo off
set abcd=4
Even its batch file would just display the same lines as it is.
Please help.

Try "resetting" cmd if possible. U can try copying someone else's "cmd.exe" and replace it with yours using another bootable OS as windows wont allow that.
Here use my cmd.exe. https://drive.google.com/open?id=0B6ghonMKBfUSLVpRV0U5bG5pQTQ
Just in case u need to know I am using Windows 10 64 bit.

Check the file with an editor that allows you to see the encoding.
For example Notepad++ , you will see is very different the end of line via CF (\r) and LF (\n)
Your CMD can be recognizing EOL via \n only.

To determine whether your issue is really with line breaks being converted by your text editor (as the post you mention suggests), perform the following test:
Open a Command Line Window
Type the following command: copy con test.bat
The cursor will reposition itself under the command prompt, this is normal
Type the following 3 commands, each followed by the [Enter] key:
.
Echo Off
Set abcd=4
Echo abcd
Press CTRL-Z simultaneously (it will show up on screen as ^Z)
A confimation message should state: 1 file(s) copied.
Now type Test to run the batch file. If it runs properly, it means you are indeed dealing with line termination issues. Use a different text editor (don't use Notepad!!!), ideally one where you have an option to display the line termination characters (I personnally use NotePad++, it works great for these kinds of things but there are many others out there).

Perhaps there is a problem with your environment variables. Check the following:
Press WIN + R and run "%SYSTEMROOT%\System32\SystemPropertiesAdvanced.exe"
Click on "Environment Variables"
The system variables are listed at the bottom. Select the variable "Path" and click "Edit..."
Check whether the list contains "C:\Windows\System32" or "%SYSTEMROOT%\System32". If not, add one of those. You may have to restart your computer afterwards.

CMD: '■m' is not recognized as an internal or external command

I am trying to get a batch file to work. Whenever I attempt to run a .bat the command line returns '■m' is not recognized... error, where "m" is the first letter of the file. For example:
md c:\testsource
md c:\testbackup
Returns
C:>"C:\Users\Michael\Dropbox\Documents\Research\Media\Method Guide\Program\test
.bat"
C:>■m
'■m' is not recognized as an internal or external command,
operable program or batch file.
Things I have tried:
Changing Path variables, rebooting, etc.
Changing file directory (i.e. run from C:)
Running example files from web (like above) to check for syntax errors.
Thanks

What text editor are you writing this in? It seems like your text editor may save the file as UTF-16 encoded text, which cmd.exe can't handle. Try setting the "coding"/"file encoding" to "ANSI" when saving the file.
This results in the first byte being a byte-order-mark (telling other editors how to process the file), and cmd.exe can't deal with this.

In addition to the approved answer I would add the case where is a PowerShell command the one that creates the file... PowerShell comes by default with the UTF-16 encoding.
To solve your problem then, force the file encoding lie this: | out-file foo.txt -encoding utf8
Answer based on this other answer.

In windows 10 I had the same issue.
Changing the character set to UTF-8 made it worse.
It worked correctly when I selected Encoding as UTF-8-NO BOM.

International characters in a batch file

Hey, I'm having some problems writing a batch file where I need to specify some file paths containing international characters (the norwegian letter 'ø' to be exact).
For example, the filename axporteføljedb.vbp (which looks normal in notepad) turns into axportef°ljedb.vbp on the command line, which the system then goes on to complain about not finding.
Any suggestions?

It will work if you save your batch file as ANSI with a Norwegian character set (with Notepad++ for example). Then, in the cmd, when you want to run your batch file, first change the code page to something that supports Norwegian: chcp 1252 (in the console).

Batch file encoding

I would like to deal with filename containing strange characters, like the French é.
Everything is working fine in the shell:
C:\somedir\>ren -hélice hélice
I know if I put this line in a .bat file, I obtain the following result:
C:\somedir\>ren -hÚlice hÚlice
See ? é have been replaced by Ú.
The same is true for command output. If I dir some directory in the shell, the output is fine. If I redirect this output to a file, some characters are transformed.
So how can I tell cmd.exe how to interpret what appears as an é in my batch file, is really an é and not a Ú or a comma?
So there is no way when executing a .bat file to give an hint about the codepage in which it was written?

You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.
Batch files and encoding are really two things that don't particularly like each other. You'll notice that Unicode is also impossible to use there, unfortunately (even though environment variables handle it fine).
Alternatively, you can set the console to use another codepage:
chcp 1252
should do the trick. At least it worked for me here.
When you do output redirection, such as with dir, the same rules apply. The console window's codepage is used. You can use the /u switch to cmd.exe to force Unicode output redirection, which causes the resulting files to be in UTF-16.
As for encodings and code pages in cmd.exe in general, also see this question:
What encoding/code page is cmd.exe using
EDIT: As for your edit: No, cmd always assumes the batch file to be written in the console default codepage. However, you can easily include a chcp at the start of the batch:
chcp 1252>NUL
ren -hélice hélice
To make this more robust when used directly from the commandline, you may want to memorize the old code page and restore it afterwards:
#echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
ren -hélice hélice
chcp %cp%>nul

I was having trouble with this, and here is the solution I found. Find the decimal number for the character you are looking for in your current code page.
For example, I'm in codepage 437 (chcp tells you), and I want a degree sign, . http://en.wikipedia.org/wiki/Code_page_437 tells me that the degree sign is number 248.
Then you find the Unicode character with the same number.
The Unicode character at 248 (U+00F8) is .
If you insert the Unicode character in your batch script, it will display to the console as the character you desire.
So my batch file
echo
prints
°

I created the following block, which I put at the beginning of my batch files:
set Filename=%0
IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END
rem Converting code page from 1252 to 850.
rem My editors use 1252, my batch uses 850.
rem We create a converted -850.bat file, and then launch it.
set File850=%~n0-850.bat
PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%"
call %File850%
del %File850%
EXIT /b 0
:CONVERT_CODEPAGE_END

I care about three concepts:
Output Console Encoding
Command line internal encoding (that changed with chcp)
.bat Text Encoding
The easiest scenario to me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in that same encoding (in Notepad++, menu Encoding → Character sets → Western European → OEM 850).
But suppose someone hands me a .bat in another encoding, say CP1252 (in Notepad++, menu Encoding* → Character sets → Western European → Windows-1252)
Then I would change the command line internal encoding, with chcp 1252.
This changes the encoding it uses to talk with other processes, neither the input device nor output console.
So my command line instance will effectively send characters in 1252 through its STDOUT file descriptor, but gabbed text appears when the console decodes them out as 850 (é is Ú).
Then I modify the file as follows:
#echo off
perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));"
ren -hlice hlice
First I turn echo off so the commands don't output unless explicitly doing either echo... or perl -e "print..."
Then I put this boilerplate each time I need to output something
perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"
I substitute the actual text I'll show for this: ren -hélice hélice.
And also I could need to substitute my console encoding for cp850 and other side encoding for cp1252.
And just below I put the desired command.
I did broke the problematic line into the output half and the real command half.
The first I make for sure: The "é" is interpreted as an "é" by means of transcoding. It is necessary for all the output sentences since the console and the file are at different encodings.
The second, the real command (muttered with #echo off), knowing we have the same encoding both from chcp and the .bat text is enough to ensure a proper character interpretation.

I had polish signs inside the code in R (eg. ą, ę, ź, ż etc.) and had the problem while running this R script with .bat file (in the output file .Rout instead of those signs there were signs like %, &, # etc. and the code didn't run to the end).
My solution:
Save R script with encoding: File > Save with encoding > CP1250
Run .bat file
It worked for me but if there is still the problem, try to use the other encodings.

In Visual Studio Code, click on the encoding at the bottom, choose Save with encoding, then DOS(CP437).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio