German Umlauts in Powershell called by vbs - windows

i do have a ps1 file, which create a Link
create-link.ps1
$path = $env:HOMESHARE + "\My Projects\"
If(!(test-path $path))
{
New-Item -ItemType Directory -Force -Path $path
}
$WshShell = New-Object -comObject WScript.Shell
$Shortcut = $WshShell.CreateShortcut($env:HOMESHARE + "\My Projects\" + "linkname.lnk")
$Shortcut.TargetPath = "\\path\for\link"
$Shortcut.Description = "äöüß"
$Shortcut.IconLocation = $env:SYSTEMROOT + "\\system32\\shell32.dll,3"
$Shortcut.Save()
I also do have a vbs file which calls the ps1
create-link.vbs
command = "powershell.exe Get-Content ""C:\path\to\file\create-link.ps1"" | PowerShell.exe -noprofile"
set shell = CreateObject("WScript.Shell")
shell.Run command,0
Both files are saved with utf-8 encoding.
This construction was necessary, because the ps1 needed to run completly headless without any noticable things for the user. Calling a ps1 through a vbs solved this problem, if there is a better way i would be happy if you let me know.
If i am calling the powershell script directly or with "powershell.exe Get-Content ""C:\path\to\file\create-link.ps1"" | PowerShell.exe -noprofile" (by using cmd) everything works fine.
However, if i call the vbs to do the work it works in general, but the german umlauts from 'Description' are just questions marks, so somehow the encoding got scrambled. Is there any way to fix this?

tl;dr:
Save your *.ps1 file as UTF-8 with BOM.
Simplify your command by using the PowerShell CLI's -File parameter:
command = "powershell.exe -NoProfile -File ""C:\path\to\file\create-link.ps1"""
See also: GitHub issue #3028, which requests the ability to launch PowerShell itself completely hidden - obviating the need for an aux. VBScript script - which a future version may support (but it won't be back-ported to Windows PowerShell).
If you're using Windows PowerShell (versions up to v5.1), you must save your *.ps1 files as UTF-8 with a BOM in order for them to be interpreted correctly with respect to characters outside the ASCII (7-bit) range, such as äöüß.
This is no longer necessary in PowerShell [Core] v6+, which consistently defaults to UTF-8, but if your scripts need to run in both editions, you should always use UTF-8 with BOM.
If a given *.ps1 doesn't have a BOM, Windows PowerShell interprets each byte that is part of an UTF-8 encoding sequence (all non-ASCII characters are encoded as 2-4 bytes) individually as a character, based on the system's active ANSI code page (a single-byte encoding such as Windows-1252).
On a US-English system, where the active ANSI code page is Windows-1252, the above sample string therefore surfaces as garbage string äöüß
Note that question marks, or, more accurately, instances of � (REPLACEMENT CHARACTER, U+FFFD), would only surface in the reverse scenario: when ANSI-encoded text is misinterpreted as UTF-8.
As an aside, re your approach of providing the source code to the PowerShell CLI via the pipeline (stdin):
Since your script apparently runs hidden, it won't make a difference in your case, but note that this technique exhibits pseudo-interactive mode and also doesn't support passing arguments to the script being provided via stdin - see GitHub issue #3223

Related

Windows 10 PowerShell cmdlet's doesn't allow UTF-8 strings as parameters [duplicate]

$logstring = Invoke-Command -ComputerName $filesServer -ScriptBlock {
param(
$logstring,
$grp
)
$Klassenbuchordner = "KB " + $grp.Gruppe
$Gruppenordner = $grp.Gruppe
$share = $grp.Gruppe
$path = "D:\Gruppen\$Gruppenordner"
if ((Test-Path D:\Dozenten\01_Klassenbücher\$Klassenbuchordner) -eq $true)
{$logstring += "Verzeichnis für Klassenbücher existiert bereits"}
else {
mkdir D:\Dozenten\01_Klassenbücher\$Klassenbuchordner
$logstring += "Klassenbuchordner wurde erstellt!"
}} -ArgumentList $logstring, $grp
My goal is to test the existence of a directory and create it on demand.
The problem is that the path contains German letters (umlauts), which aren't seen correctly by the target server.
For instance, the server receives path "D:\Dozent\01_Klassenbücher" instead of the expected "D:\Dozent\01_Klassenbücher".
How can I force proper UTF-8 encoding?
Note: Remoting and use of Invoke-Command are incidental to your problem.
Since the problem occurs with a string literal in your source code (...\01_Klassenbücher\...), the likeliest explanation is that your script file is misinterpreted by PowerShell.
In Windows PowerShell, if your script file is de facto UTF-8-encoded but lacks a BOM, the PowerShell engine will misinterpret any non-ASCII-range characters (such as ü) in the script.[1]
Therefore: Re-save your script as UTF-8 with BOM.
Note:
A UTF-8 BOM is no longer strictly necessary in the install-on-demand, cross-platform PowerShell (Core) 7+ edition (which consistently defaults to (BOM-less) UTF-8), but continues to be required if you want your scripts to work in both PowerShell editions.
Why you should save your scripts as UTF-8 with BOM:
Visual Studio Code and other modern editors create UTF-8 files without BOM by default, which is what causes the problem in Windows PowerShell.
By contrast, the PowerShell ISE creates "ANSI"-encoded[1] files, which Windows PowerShell - but not PowerShell Core - reads correctly.
You can only get away with "ANSI"-encoded files:
if your scripts will never be run in PowerShell Core - where all future development effort will go.
if your scripts will never run on a machine where a different "ANSI" code page is in effect.
if your script doesn't contain characters - e.g., emoji - that cannot be represented with your "ANSI" code page.
Given these limitations, it's safest - and future-proof - to always create PowerShell scripts as UTF-8 with BOM.
(Alternatively, you can use UTF-16 (which is always saved with a BOM), but that bloats the file size if you're primarily using ASCII/"ANSI"-range characters, which is likely in PS scripts).
How to make Visual Studio Code create UTF-8 files with-BOM for PowerShell scripts by default:
Note: The following is still required as of v1.11.0 of the PowerShell extension for VSCode, but not that there's a suggestion to make the extension default PowerShell files to UTF-8 with BOM on GitHub.
Add the following to your settings.json file (from the command palette (Ctrl+Shift+P, type settings and select Preferences: Open Settings (JSON)):
"[powershell]": {
"files.encoding": "utf8bom"
}
Note that the setting is intentionally scoped to PowerShell files only, because you wouldn't want all files to default to UTF-8 with BOM, given that many utilities on Unix platforms neither expect nor know how to handle such a BOM.
[1] In the absence of a BOM, Windows PowerShell defaults to the encoding of the system's current "ANSI" code page, as determined by the legacy system locale; e.g., in Western European cultures, Windows-1252.

Output Filenames in a Folder to a Text File

Using Windows Command Prompt or Windows PowerShell, how can I output all the file names in a single directory to a text file, without the file extension?
In Command Prompt, I was using:
dir /b > files.txt
Result
01 - Prologue.mp3
02 - Title.mp3
03 - End.mp3
files.txt
Desired Output
01 - Prologue
02 - Title
03 - End
Notice the "dir /b > files.txt" command includes the file extension and puts the filename at the bottom.
Without using a batch file, is there a clean Command Prompt or PowerShell command that can do what I'm looking for?
In PowerShell:
# Get-ChildItem (gci) is PowerShell's dir equivalent.
# -File limits the output to files.
# .BaseName extracts the file names without extension.
(Get-ChildItem -File).BaseName | Out-File files.txt
Note: You can use dir in PowerShell too, where it is simply an alias of Get-ChildItem. However, to avoid confusion with cmd.exe's internal dir command, which has fundamentally different syntax, it's better to use the PowerShell-native alias, gci. To see all aliases defined for Get-ChildItem, run Get-Alias -Definition Get-ChildItem
Note that use of PowerShell's > redirection operator - which is effectively an alias of the Out-File cmdlet - would also result in the undesired inclusion of the output, files.txt, in the enumeration, as in cmd.exe and POSIX-like shells such as bash, because the target file is created first.
By contrast, use of a pipeline with Out-File (or Set-Content, for text input) delays file creation until the cmdlet in this separate pipeline segment is initialized[1] - and because the file enumeration in the first segment has by definition already completed by that point, due to the Get-ChildItem call being enclosed in (...), the output file is not included in the enumeration.
Also note that property access .BaseName was applied to all files returned by (Get-ChildItem ...), which conveniently resulted in an array of the individual files' property values being returned, thanks to a feature called member-access enumeration.
Character-encoding note:
In Windows PowerShell, Out-File / > creates "Unicode" (UTF-16LE) files, whereas Set-Content uses the system's legacy ANSI code page.
In PowerShell (Core) 7+, BOM-less UTF-8 is the consistent default.
The -Encoding parameter can be used to control the encoding explicitly.
[1] In the case of Set-Content, it is actually delayed even further, namely until the first input object is received, but that is an implementation detail that shouldn't be relied on.

Replace string with unicode in text file via Windows batch file

I have a file with this simple contents:
test.txt (ASCII encoded)
Baby, you can drive my :car:
Via a Windows batch file, I need to change :car: to 🚗 (https://unicode-table.com/en/1F697/)
I'd like to avoid installing new software on the client's server, so I'm trying to do it using PowerShell or something native.
So far I've tried a ton of suggestions (https://www.generacodice.com/en/articolo/30745/How-can-you-find-and-replace-text-in-a-file-using-the-Windows-command-line-environment?), but nothing works for me. Either it doesn't get replaced, or \u1F697 shows up literally. I've tried changing the inbound file's encoding to Unicode and that isn't working either.
Non-working example:
powershell -Command "(gc test.txt) -replace ':car:', '🚗' | Out-File -encoding Unicode test.txt"
Does anyone have any tips?
Edit: I've determined how to reproduce it.
If I run this line via command line, it works:
powershell -Command "(gc test.txt) -replace ':car:', '🚗' | Out-File -encoding utf8 test-out.txt"
If I put the same line of code inside replace.bat and then execute it, test-out.txt is corrupt.
The batch file is set to UTF-8 encoding. Should something be different?
I don't think a .bat file can have non-ascii encoding. If you're willing to have a file.ps1 file:
(gc test.txt) -replace ':car:', '🚗' | Out-File -encoding utf8 test-out.txt
The file has to be saved as utf8 with bom in notepad, not just utf8.
Then your .bat file would be:
powershell -file file.ps1
The powershell ise is a nice way to test this.
cmd /c file.bat
type test-out.txt
🚗
Windows .bat script interpreter does not understand any Unicode encoding (e.g. utf-8 or utf-16 or utf-16); the simplest principle is:
You have to save the batch file with OEM encoding. How to do this
varies depending on your text editor. The encoding used in that case
varies as well. For Western cultures it's usually CP850.
To use any Unicode character (above ASCII range) as a part of string passed to PowerShell command then (instead of '🚗') apply the .NET method Char.ConvertFromUtf32(Int32); in terms of PowerShell syntax [char]::ConvertFromUtf32(0x1F697)
Being in ASCII it does not contradicts with above .bat encoding rule, and PowerShell would evaluate it to the 🚗 character…
Then, your line could be as follows:
powershell -Command "(gc test.txt) -replace ':car:', [char]::ConvertFromUtf32(0x1F697) | Out-File -encoding Unicode test.txt"

Powershell and German umlauts

I have written a script for my MySQL database with the Powershell ISE, which creates a backup for me with the help of the MySQL dump tool.
When I run the script in the PowerShell ISE, everything works. If I execute the same script now in the normal PowerShell, he does not show me the German umlauts correctly.
Here is my script:
# delete everything older than 30 days
foreach ($ordner in (ls D:\Backup -Depth 0))
{
if ($ordner.LastWriteTime.Date -lt (Get-Date).AddDays(-30).Date)
{
rm -Recurse ("D:\Backup\"+ $ordner.Name)
}
}
mkdir ("D:\Backup\" + (Get-Date -Format "yyyy_MM_dd") + "\Datenbank")
C:\xampp\mysql\bin\mysqldump.exe -uroot --default-character-set=latin1 --opt MY_DATABASE > ("D:\Backup\"+(Get-Date -Format "yyyy_MM_dd") + "\Datenbank\backup.sql")
However, I need the normal PowerShell for the automated execution and the script.
How can I fix the problem that the German umlauts are displayed in the normal PowerShell correctly?
The Windows PowerShell ISE differs from regular (conhost.exe) PowerShell console windows in that it interprets the output from external programs (such as mysqldump.exe) as encoded based on the active ANSI code page (e.g., Windows-1252 on US-English systems) - which is what --default-character-set=latin1 in your command requests.
Note: The PowerShell ISE is no longer actively developed and there are reasons not to use it (bottom section), notably not being able to run PowerShell [Core] 6+. The actively developed editor that offers the best PowerShell development experience, across platforms, is Visual Studio Code, combined with its PowerShell extension.
By contrast, regular PowerShell console windows default to the active OEM code page (e.g., 437 on US-English systems).
It is the encoding reported by [console]::OutputEncoding that determines how PowerShell interprets the output from external programs (though for mere display output that may not matter).
Therefore, you have two options:
Adjust the --default-character-set option to match the code page reported by [console]::OutputEncoding - assuming MYSQL supports it (this documentation suggests that the US-English OEM code page 437 is not supported, for instance).
Adjust [console]::OutputEncoding to (temporarily) match the specified --default-character-set option:
[console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(1252)
In general, it is (Get-Culture).TextInfo.ANSICodePage / (Get-Culture).TextInfo.OEMCodePage that reports the a given system's active ANSI / OEM code page number.

Windows Batch File, Route Output of EXE called in Batch File

In my batch file, I call a EXE and would like the output to be redirected to a file. In the PowerShell command line, it would look something like this:
prog.exe file.txt | Out-File results\results.txt -Encoding ascii
The above works in the command line. In my batch file, I have written it as this:
prog.exe file.txt | powershell -Command "Out-File results\file.txt -Encoding ascii"
When I run the batch file, the results file gets created but contains zero content. How can write this to behave like I need it too?
The following should work in a batch file:
prog.exe file.txt > results\results.txt
If you want to redirect both stdout and stderr use:
prog.exe file.txt > results\results.txt 2>&1
kichik's helpful answer shows you an effective solution using batch-file features alone.
Unless you have a need to create files with an encoding other than ASCII or the active OEM code page, there's no need to get PowerShell involved - it'll only slow things down.
That said, you can choose a different code page via chcp in cmd.exe, but for output to a file only 65001 for UTF-8 really makes sense, but note that the resulting file will have no BOM - unlike when you use Out-File -Encoding utf8 in Windows PowerShell.
If you do need to use PowerShell - e.g., to create UTF-16LE ("Unicode") files or UTF-8 files with BOM - you'll have to use $Input with a PowerShell-internal pipe in your PowerShell command in order to access the stdin stream (i.e., what was piped in):
prog.exe file.txt | powershell -c "$Input | Out-File results\file.txt -Encoding ascii"
Note that only characters representable in the active code page (as reflected in chcp) will be recognized by PowerShell and can be translated into potentially different encodings.
Choosing -Encoding ascii would actually transliterate characters outside the (7-bit) ASCII range to literal ? characters, which would result in loss of information.

Resources