Windows batch: Unicode parameters for (robo) copy command

Windows batch: Unicode parameters for (robo) copy command - windows

I need to copy multiple files in a single batch file. The files have Unicode names that map to different codepages.
Example:
set ArabicFile=ڊڌڵڲڛشس
set CyrillicFile=щЖЛдЉи
set GermanFile=Bücher
copy %ArabicFile% SomePlaceElse
copy %CyrillicFile% SomePlaceElse
copy %GermanFile% SomePlaceElse
Problem: Batch files cannot be Unicode.
Question: How can I write the Unicode file names to the batch file so that the copy command recognizes them?
Notes:
I do not care how the file names are displayed.
Actually the batch file does much more than just copy these files, I just simplified the description to make the problem clearer.
Correct batch file:
With Arnout's answer I modified my batch file as follows. It now works correctly without requiring a font change (which would be messy, as Arnout commented).
#echo off
chcp 65001
set ArabicFolder=ڊڌڵڲڛشس
set CyrillicFolder=щЖЛдЉи
set GermanFolder=Bücher
robocopy /e d:\temp\test\%ArabicFolder% d:\temp\test2\%ArabicFolder% /log:copy.log
robocopy /e d:\temp\test\%CyrillicFolder% d:\temp\test2\%CyrillicFolder% /log+:copy.log
robocopy /e d:\temp\test\%GermanFolder% d:\temp\test2\%GermanFolder% /log+:copy.log

If
I add CHCP 65001 as the first line of your batch file,
save the file as UTF-8 without BOM, and
set my console font to something else than "Raster Fonts" (on my Win7 box I can choose Consolas or Lucida Console),
it works. Simple, no? :-)
(The font change is actually not necessary, provided you're not writing non-ASCII output to the console.)

I'm not certain, but I think the short (8.3) filename will be ASCII, so you could refer to it that way? You can find out the short filename with dir /X .

I want to create a batch file (e.g. RunThis.bat) which creates directories of names that can be Russians or others.
Example:
When DOS Windows is open with prompt:
D:\>md "Russia - Шпионка"
This work in command like and the name appear correctly.
But if I try that using Notepad and save in ANSII, I can’t.
So if I use again Notepad and save in UTF-8, it will work but with garbage characters.
RunThis.bat (Notepad save UTF-8), give garbage characters.
chcp 65001
set fn14="Russia - Шпионка"
md %fn14%
The problem with notepad it uses UTF-8 with BOM.
To save the .bat using UTF-8 without BOM we must use editor like Notepad++.
RunThis.bat (Notepad++ save UTF-8 – no BOM)
chcp 65001
set fn14="Russia - Шпионка"
md %fn14%
This time its work perfectly when we run “RunThis.bat” directly from explorer.exe

Related

Robocopy command in Windows 10 struggles with German letters (ü, ä, ö), and chcp to change the code page has apparently no effect

On my Windows 10 machine I am trying to run a "robocopy" command (from a .bat file) to backup files.
Everything is fine as long as the paths (to folders to backup) do not contain letters like ö, ü, ä which whoever is inevitable as this is a German environment.
Earlier I was able to fix this by sending a
chcp 1252
command first, so that the command prompt window runs on Code Page 1252 which has these characters. But this was on a Windows 7 machine then. (The default code page on this system is 850. It is a larger institutional network, and I have no administrator access.)
Now with the Windows 10 machine this does not have effect for me. The code page is set to 1252 (at least that's the return when you ask "chcp"), and still the robocopy command does not run.
Here is my robocopy command:
robocopy C:\Users\Myself\Documents\Notizbücher Y:\RobocopyBackups\001_NotizbücherBackup /e /mir /np /z /tee /log:Y:\RobocopyBackups\001_Backup_log.txt
When I run this, the "ü" in "Notizbücher" always gets messed up, and of course the command cannot be run since the computer does not find the (messed up) address.
I tried all sorts of things
Sending first chcp 1252
Making sure that my .bat file where I keep the code is encoded as Windows-1252 (I am using Notepad++)
Trying chcp 65001 for UTF-8 (with and without encoding the .bat file similarly)
Trying chcp 2851 for ISO-8859-1, just for fun...
It's always the same: the "ü" gets messed up.
Of course I could just remove the "ü" from the folder names. But I want to have a clean solution, not such a lame workaround.
What could I do?

Solution found after doing all sorts of experiments (not least with inspiration through commenter JeffRSon):
Consideration
As the code page used by default on the system is Code Page 850 (part of the "OEM" series of code pages), I thought I could give it a try and save my .bat file with exactly that encoding.
Implementation
As I am using Notepad++ for writing, I used Notepad++'s "Encode" option in the menu, i.e. I selected there
[Menu] -> Encode --> Character sets --> Western European --> OEM 850
(And of course I also removed the chcp 1252 command from the batch.)
I did not forget to save this file afterwards.
Result
Surprise or not, the system now accepts my batch commands and executes them nicely.
Note for newbees (like me)
For finding out what the current (default) code page of your system is, enter
chcp
into your command prompt. It should then return the current value, in my case: 850.

In Windows 10 how do I rename a file to a filename that includes a character with an umlaut?

I'm on Win10 and I have a .bat file to rename a bunch of files. Some of the entries need to be renamed to a non-English name, e.g.
RENAME "MyFile1.txt" "Eisenhüttenstadt.txt"
However, when I run this, the 'ü' comes out as something else, other characters with an umlaut also are replaced by different characters.
I've tried saving the .bat file in Notepad with Unicode and UTF-8 encoding but then Windows doesn't recognise the command when I try to run it.
I've read this and other similar issues but not found a solution, surely it's simple when you know how?
Any suggestions?

The default code page in the console is 437(USA) or 850(Europe), which does not support characters with umlaut, so you must change this to 1252(West European Latin). So, use Chcp command in the beginning of your batch file to change it, like this:
Chcp 1252
Example:
image via http://www.pctipp.ch/tipps-tricks/kummerkasten/windows-7/artikel/windows-7-umlaute-in-batch-dateien-55616/
Sources:http://ss64.com/nt/chcp.html , http://www.pctipp.ch/tipps-tricks/kummerkasten/windows-7/artikel/windows-7-umlaute-in-batch-dateien-55616/ (The article says for Windows 7 but this applies for Windows 10 too)

CMD: '■m' is not recognized as an internal or external command

I am trying to get a batch file to work. Whenever I attempt to run a .bat the command line returns '■m' is not recognized... error, where "m" is the first letter of the file. For example:
md c:\testsource
md c:\testbackup
Returns
C:>"C:\Users\Michael\Dropbox\Documents\Research\Media\Method Guide\Program\test
.bat"
C:>■m
'■m' is not recognized as an internal or external command,
operable program or batch file.
Things I have tried:
Changing Path variables, rebooting, etc.
Changing file directory (i.e. run from C:)
Running example files from web (like above) to check for syntax errors.
Thanks

What text editor are you writing this in? It seems like your text editor may save the file as UTF-16 encoded text, which cmd.exe can't handle. Try setting the "coding"/"file encoding" to "ANSI" when saving the file.
This results in the first byte being a byte-order-mark (telling other editors how to process the file), and cmd.exe can't deal with this.

In addition to the approved answer I would add the case where is a PowerShell command the one that creates the file... PowerShell comes by default with the UTF-16 encoding.
To solve your problem then, force the file encoding lie this: | out-file foo.txt -encoding utf8
Answer based on this other answer.

In windows 10 I had the same issue.
Changing the character set to UTF-8 made it worse.
It worked correctly when I selected Encoding as UTF-8-NO BOM.

Batch file encoding

I would like to deal with filename containing strange characters, like the French é.
Everything is working fine in the shell:
C:\somedir\>ren -hélice hélice
I know if I put this line in a .bat file, I obtain the following result:
C:\somedir\>ren -hÚlice hÚlice
See ? é have been replaced by Ú.
The same is true for command output. If I dir some directory in the shell, the output is fine. If I redirect this output to a file, some characters are transformed.
So how can I tell cmd.exe how to interpret what appears as an é in my batch file, is really an é and not a Ú or a comma?
So there is no way when executing a .bat file to give an hint about the codepage in which it was written?

You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.
Batch files and encoding are really two things that don't particularly like each other. You'll notice that Unicode is also impossible to use there, unfortunately (even though environment variables handle it fine).
Alternatively, you can set the console to use another codepage:
chcp 1252
should do the trick. At least it worked for me here.
When you do output redirection, such as with dir, the same rules apply. The console window's codepage is used. You can use the /u switch to cmd.exe to force Unicode output redirection, which causes the resulting files to be in UTF-16.
As for encodings and code pages in cmd.exe in general, also see this question:
What encoding/code page is cmd.exe using
EDIT: As for your edit: No, cmd always assumes the batch file to be written in the console default codepage. However, you can easily include a chcp at the start of the batch:
chcp 1252>NUL
ren -hélice hélice
To make this more robust when used directly from the commandline, you may want to memorize the old code page and restore it afterwards:
#echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
ren -hélice hélice
chcp %cp%>nul

I was having trouble with this, and here is the solution I found. Find the decimal number for the character you are looking for in your current code page.
For example, I'm in codepage 437 (chcp tells you), and I want a degree sign, . http://en.wikipedia.org/wiki/Code_page_437 tells me that the degree sign is number 248.
Then you find the Unicode character with the same number.
The Unicode character at 248 (U+00F8) is .
If you insert the Unicode character in your batch script, it will display to the console as the character you desire.
So my batch file
echo
prints
°

I created the following block, which I put at the beginning of my batch files:
set Filename=%0
IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END
rem Converting code page from 1252 to 850.
rem My editors use 1252, my batch uses 850.
rem We create a converted -850.bat file, and then launch it.
set File850=%~n0-850.bat
PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%"
call %File850%
del %File850%
EXIT /b 0
:CONVERT_CODEPAGE_END

I care about three concepts:
Output Console Encoding
Command line internal encoding (that changed with chcp)
.bat Text Encoding
The easiest scenario to me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in that same encoding (in Notepad++, menu Encoding → Character sets → Western European → OEM 850).
But suppose someone hands me a .bat in another encoding, say CP1252 (in Notepad++, menu Encoding* → Character sets → Western European → Windows-1252)
Then I would change the command line internal encoding, with chcp 1252.
This changes the encoding it uses to talk with other processes, neither the input device nor output console.
So my command line instance will effectively send characters in 1252 through its STDOUT file descriptor, but gabbed text appears when the console decodes them out as 850 (é is Ú).
Then I modify the file as follows:
#echo off
perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));"
ren -hlice hlice
First I turn echo off so the commands don't output unless explicitly doing either echo... or perl -e "print..."
Then I put this boilerplate each time I need to output something
perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"
I substitute the actual text I'll show for this: ren -hélice hélice.
And also I could need to substitute my console encoding for cp850 and other side encoding for cp1252.
And just below I put the desired command.
I did broke the problematic line into the output half and the real command half.
The first I make for sure: The "é" is interpreted as an "é" by means of transcoding. It is necessary for all the output sentences since the console and the file are at different encodings.
The second, the real command (muttered with #echo off), knowing we have the same encoding both from chcp and the .bat text is enough to ensure a proper character interpretation.

I had polish signs inside the code in R (eg. ą, ę, ź, ż etc.) and had the problem while running this R script with .bat file (in the output file .Rout instead of those signs there were signs like %, &, # etc. and the code didn't run to the end).
My solution:
Save R script with encoding: File > Save with encoding > CP1250
Run .bat file
It worked for me but if there is still the problem, try to use the other encodings.

In Visual Studio Code, click on the encoding at the bottom, choose Save with encoding, then DOS(CP437).

Why is there a difference between the encoding of the Windows Command Prompt vs. a batch file?

For example, suppose I have a batch file called 'test.cmd' and it simply contains:
echo %1
I can call this directly from the command prompt with 'test.cmd some¬arg' and the result is that the string 'some¬arg' is printed.
However if I place that same call in a second batch file, called 'tester.cmd' for the sake of argument, and I call this from the command prompt the result is that the string 'some%arg' is printed.
What is it that messes up the encoding and how do I get around it? I am sure I've fixed this before, but I can't remember how...
Thanks!

This is because your batch file is encoded in a different code page than cmd.exe is currently in.
In western default configurations, cmd.exe starts in CP850, but text editors usually work in CP1252 (what is often wrongly referred to as Latin-1 or ISO-8859-1).
The characters "¬" and "¼" share the same character code in these two code pages, "BC".
The solution is simple. Either encode your batch file in code page 850, or switch cmd.exe to code page 1252 by issuing chcp 1252.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio