My code was working well with special chars. I could use Write-Host "é" without any issue.
And then I moved some of my functions to an other PS1 file that I "dot sourced" (using Import-Module does the same), and I got encoding errors : prénom became prénom
I don't understand anything about encoding. VS Code doesn't allow me to change the encoding of a file. It has a parameter to set the default encoding but its defaulted on UTF8 and when I set Windows1252 it changes nothing. If I use Geany to update the encoding to Windows1252 it works... until I save the file again with VS Code.
Everything was working well when all my code was in the same file. Why would creating this second .ps1 file (which I created from the Windows Explorer) be a problem?
Working on Windows 10, in french, with VS Code 1.50.
Thank you in advance
Good day Stackoverflow.
As the title says, I have an issue with Doxygen.
Description
A PowerShell script modify the PROJECT_NUMBER variable of my Doxyfile.
Then it runs Doxygen, but it generates the documentation in HTML and LaTeX like it's reading a Default generated Doxyfile.
If I manually modify the Doxyfile before running this script, via Notepad++, Doxygen works perfectly, but once the script is ran, the issue appears.
I would also mention that my Doxyfile has:
GENERATE_HTML = YES
GENERATE_LATEX = NO
GENERATE_MAN = YES
In practice Doxygen behave like this:
.\doxygen.exe -g
\doxygen.exe .\Doxyfile
The bizzarre behaviour begins now!
Let's call my actual Doxyfile CustomConfig and the default generated DefaultConfig.
If I generate a DefaultConfig through .\doxygen.exe -g and then I overwrite its content with the text of CustomConfig via Notepad++, doxygen accepts the Doxyfile, as it should, and generates a correct output!
So the problem is not the Doxyfile content but PowerShell that modifies the file.
I've verified this by doing a simple copy&paste of the entire content:
Copy&Paste through Notepad++: WORK
Copy&Paste through PowerShell: DOESN'T WORK
PowerShell Script
# Replace the old PROJECT_NUMBER with the new one
$DOXY_PATH = $env:FS_OS + "\doc"
$CONFIG_PATH = $DOXY_PATH + "\bin\Doxyfile"
$BIN_PATH = $DOXY_PATH + "\bin\doxygen.exe"
$GIT_PATH = $env:FS_OS
$GIT_BRANCH = "Development"
# Get git commit number on the specified branch
$GIT_HASH = git log $GIT_BRANCH -1 --pretty=format:%H
$PRJ_CONTENT = Get-Content $CONFIG_PATH
$PRJ_NUM = "PROJECT_NUMBER = " + $GIT_HASH
$PRJ_CONTENT = $PRJ_CONTENT -replace "PROJECT_NUMBER\s*=\s*[A-z0-9]{40}",$PRJ_NUM
$PRJ_CONTENT | Out-File -FilePath $CONFIG_PATH
Start-Process -FilePath $BIN_PATH -ArgumentList "$CONFIG_PATH" -WorkingDirectory ($DOXY_PATH + "\bin")
Copy&Paste Script
$var = Get-Content "./doc/bin/Doxyfile.bak"
$var | Out-File -FilePath "./doc/bin/Doxyfile"
Thanks to #BenH for the comment, I've found the solution.
It looks like PowerShell writes to files automatically with BOM.
I've found a solution with the Accepted Answer from this question:
Using PowerShell to write a file in UTF-8 without the BOM
This is my code:
# -*- coding: utf-8 -*-
import subprocess as sp
import locale
LOCAL_ENCODING = locale.getpreferredencoding()
cmds = ['dir', '/b', '*.txt']
out = sp.check_output(cmds, shell=True)
print(out)
print(out.decode(LOCAL_ENCODING))
s = 'レミリア・スカレート.txt'
print(s.encode(LOCAL_ENCODING, 'replace'))
print(LOCAL_ENCODING)
# print(s.encode('utf-8'))
This is the output:
b'\xa5\xec\xa5\xdf\xa5\xea\xa5\xa2?\xa5\xb9\xa5\xab\xa5\xec\xa9`\xa5\xc8.txt\r\n'
レミリア?スカレート.txt
b'\xa5\xec\xa5\xdf\xa5\xea\xa5\xa2?\xa5\xb9\xa5\xab\xa5\xec\xa9`\xa5\xc8.txt'
cp936
(A text file named 'レミリア・スカレート.txt' is in the script directory.)
As the result shows, the bytes of the file name returned has been automatically encoded by local encoding, which can't totally encode the filename(Note the ? in the bytes), thus some information lost.
Environment:
- win10 Chinese Simplified
- python-3.5.1
My question is:
Is it possible to avoid the automatical local-encoding and get an utf-8(or some other specified encoding) bytes?
I read this issue, but got no solution :-(
1.For built-in command, solved by eryksun's answer:
out = sp.check_output('cmd.exe /u /c "dir /b *.txt"').decode('utf-16le'),
/u: Output UNICODE characters (UCS-2 le),
/c: Run Command and then terminate)
2.For external programs:[no general solution]
configure the output using proper encoding(by setting exteral programs' options or configurations, of course, such options may be nonexistent),
for example, in the latest winrar, one can set the encoding of console rar messages:rar lb -scur data > list.txt, will produce Unicode list.txt with archived file names
I am running the following command:
([xml](new-object net.webclient).DownloadString(
"http://blogs.msdn.com/powershell/rss.aspx"
)).rss.channel.item | format-table title,link
The output for one of the RSS items contains this weird text:
You Don’t Have to Be An Administrator to Run Remote PowerShell Commands
So, the question is:
Why the mix up in characters? What happened to the apostrophe? Why is the output rendered as Don’t when it should just render as Don't?
How would I get the correct character in the PowerShell standard output?
You need to set the encoding property of the webclient:
$wc = New-Object System.Net.WebClient
$wc.Encoding = [System.Text.Encoding]::UTF8
([xml]$wc.DownloadString( "http://blogs.msdn.com/powershell/rss.aspx" )).rss.channel.item | format-table title,link
As part of a build setup on a windows machine I need to add a registry entry and I'd like to do it from a simple batch file.
The entry is for a third party app so the format is fixed.
The entry takes the form of a REG_SZ string but needs to contain newlines ie. 0xOA characters as separators.
I've hit a few problems.
First attempt used regedit to load a generated .reg file. This failed as it did not seem to like either either long strings or strings with newlines. I discovered that export works fine import fails. I was able to test export as the third party app adds similar entries directly through the win32 api.
Second attempt used the command REG ADD but I can't find anyway to add the newline characters everything I try just ends up with a literal string being added.
You can import multiline REG_SZ strings containing carriage return (CR) and linefeed (LF) end-of-line (EOL) breaks into the registry using .reg files as long as you do not mind translating the text as UTF-16LE hexadecimal encoded data. To import a REG_SZ with this text:
1st Line
2nd Line
You might create a file called MULTILINETEXT.REG that contains this:
Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Environment]
"MULTILINETEXT"=hex(1):31,00,73,00,74,00,20,00,4c,00,69,00,6e,00,65,00,0d,00,0a,00,\
32,00,6e,00,64,00,20,00,4c,00,69,00,6e,00,65,00,0d,00,0a,00,\
00,00
To encode ASCII into UTF-16LE, simply add a null byte following each ASCII code value. REG_SZ values must terminate with a null character (,00,00) in UTF-16LE notation.
Import the registry change in the batch file REG.EXE IMPORT MULTILINETEXT.REG.
The example uses the Environment key because it is convenient, not because it is particularly useful to add such data to environment variables. One may use RegEdit to verify that the imported REG_SZ data contains the CRLF characters.
If you're not constrained to a scripting language, you can do it in C# with
Registry.CurrentUser.OpenSubKey(#"software\classes\something", true).SetValue("some key", "sometext\nothertext", RegistryValueKind.String);
You could create a VBScript(.vbs) file and just call it from a batch file, assuming you're doing other things in the batch other than this registry change. In vbscript you would be looking at something like:
set WSHShell = CreateObject("WScript.Shell")
WSHShell.RegWrite "HKEY_LOCAL_MACHINE\SOMEKEY", "value", "type"
You should be able to find the possible type values using Google.
Another approach -- that is much easier to read and maintain -- is to use a PowerShell script. Run PowerShell as Admin.
# SetLegalNotice_AsAdmin.ps1
# Define multi-line legal notice registry entry
Push-Location
Set-Location -Path Registry::HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System\
$contentCaption="Legal Notice"
$contentNotice= #"
This is a very long string that runs to many lines.
You are accessing a U.S. Government (USG) Information System (IS) that is provided for USG-authorized use only.
By using this IS (which includes any device attached to this IS), you consent to the following conditions:
-The USG routinely intercepts and monitors communications on this IS for purposes including, but not limited to, penetration testing, COMSEC monitoring, network operations and defense, personnel misconduct (PM), law enforcement (LE), and counterintelligence (CI) investigations.
etc...
"#
# Caption
New-ItemProperty -Path . -Name legalnoticetext -PropertyType MultiString -Value $contentCaption -Force
# Notice
New-ItemProperty -Path . -Name legalnoticetext -PropertyType MultiString -Value $contentNotice -Force
Pop-Location