Replace string with unicode in text file via Windows batch file - windows

I have a file with this simple contents:
test.txt (ASCII encoded)
Baby, you can drive my :car:
Via a Windows batch file, I need to change :car: to 🚗 (https://unicode-table.com/en/1F697/)
I'd like to avoid installing new software on the client's server, so I'm trying to do it using PowerShell or something native.
So far I've tried a ton of suggestions (https://www.generacodice.com/en/articolo/30745/How-can-you-find-and-replace-text-in-a-file-using-the-Windows-command-line-environment?), but nothing works for me. Either it doesn't get replaced, or \u1F697 shows up literally. I've tried changing the inbound file's encoding to Unicode and that isn't working either.
Non-working example:
powershell -Command "(gc test.txt) -replace ':car:', '🚗' | Out-File -encoding Unicode test.txt"
Does anyone have any tips?
Edit: I've determined how to reproduce it.
If I run this line via command line, it works:
powershell -Command "(gc test.txt) -replace ':car:', '🚗' | Out-File -encoding utf8 test-out.txt"
If I put the same line of code inside replace.bat and then execute it, test-out.txt is corrupt.
The batch file is set to UTF-8 encoding. Should something be different?

I don't think a .bat file can have non-ascii encoding. If you're willing to have a file.ps1 file:
(gc test.txt) -replace ':car:', '🚗' | Out-File -encoding utf8 test-out.txt
The file has to be saved as utf8 with bom in notepad, not just utf8.
Then your .bat file would be:
powershell -file file.ps1
The powershell ise is a nice way to test this.
cmd /c file.bat
type test-out.txt
🚗

Windows .bat script interpreter does not understand any Unicode encoding (e.g. utf-8 or utf-16 or utf-16); the simplest principle is:
You have to save the batch file with OEM encoding. How to do this
varies depending on your text editor. The encoding used in that case
varies as well. For Western cultures it's usually CP850.
To use any Unicode character (above ASCII range) as a part of string passed to PowerShell command then (instead of '🚗') apply the .NET method Char.ConvertFromUtf32(Int32); in terms of PowerShell syntax [char]::ConvertFromUtf32(0x1F697)
Being in ASCII it does not contradicts with above .bat encoding rule, and PowerShell would evaluate it to the 🚗 character…
Then, your line could be as follows:
powershell -Command "(gc test.txt) -replace ':car:', [char]::ConvertFromUtf32(0x1F697) | Out-File -encoding Unicode test.txt"

Related

German Umlauts in Powershell called by vbs

i do have a ps1 file, which create a Link
create-link.ps1
$path = $env:HOMESHARE + "\My Projects\"
If(!(test-path $path))
{
New-Item -ItemType Directory -Force -Path $path
}
$WshShell = New-Object -comObject WScript.Shell
$Shortcut = $WshShell.CreateShortcut($env:HOMESHARE + "\My Projects\" + "linkname.lnk")
$Shortcut.TargetPath = "\\path\for\link"
$Shortcut.Description = "äöüß"
$Shortcut.IconLocation = $env:SYSTEMROOT + "\\system32\\shell32.dll,3"
$Shortcut.Save()
I also do have a vbs file which calls the ps1
create-link.vbs
command = "powershell.exe Get-Content ""C:\path\to\file\create-link.ps1"" | PowerShell.exe -noprofile"
set shell = CreateObject("WScript.Shell")
shell.Run command,0
Both files are saved with utf-8 encoding.
This construction was necessary, because the ps1 needed to run completly headless without any noticable things for the user. Calling a ps1 through a vbs solved this problem, if there is a better way i would be happy if you let me know.
If i am calling the powershell script directly or with "powershell.exe Get-Content ""C:\path\to\file\create-link.ps1"" | PowerShell.exe -noprofile" (by using cmd) everything works fine.
However, if i call the vbs to do the work it works in general, but the german umlauts from 'Description' are just questions marks, so somehow the encoding got scrambled. Is there any way to fix this?
tl;dr:
Save your *.ps1 file as UTF-8 with BOM.
Simplify your command by using the PowerShell CLI's -File parameter:
command = "powershell.exe -NoProfile -File ""C:\path\to\file\create-link.ps1"""
See also: GitHub issue #3028, which requests the ability to launch PowerShell itself completely hidden - obviating the need for an aux. VBScript script - which a future version may support (but it won't be back-ported to Windows PowerShell).
If you're using Windows PowerShell (versions up to v5.1), you must save your *.ps1 files as UTF-8 with a BOM in order for them to be interpreted correctly with respect to characters outside the ASCII (7-bit) range, such as äöüß.
This is no longer necessary in PowerShell [Core] v6+, which consistently defaults to UTF-8, but if your scripts need to run in both editions, you should always use UTF-8 with BOM.
If a given *.ps1 doesn't have a BOM, Windows PowerShell interprets each byte that is part of an UTF-8 encoding sequence (all non-ASCII characters are encoded as 2-4 bytes) individually as a character, based on the system's active ANSI code page (a single-byte encoding such as Windows-1252).
On a US-English system, where the active ANSI code page is Windows-1252, the above sample string therefore surfaces as garbage string äöüß
Note that question marks, or, more accurately, instances of � (REPLACEMENT CHARACTER, U+FFFD), would only surface in the reverse scenario: when ANSI-encoded text is misinterpreted as UTF-8.
As an aside, re your approach of providing the source code to the PowerShell CLI via the pipeline (stdin):
Since your script apparently runs hidden, it won't make a difference in your case, but note that this technique exhibits pseudo-interactive mode and also doesn't support passing arguments to the script being provided via stdin - see GitHub issue #3223

Is it possible to replace a string in a text file with a line break via the windows command line?

I have a text-file that contains a bunch of data on a single line without any line-breaks. It will contain data that looks similar to this:
{"Id":1801157,":"33611134":"E","Oct 19:":"G","Order":"117" ,"BroadcastDate":"2019-10-19"}
What I want to do is insert a line break right before BroadcastDate so it now looks like this:
{"Id":1801157,":"33611134":"E","Oct 19:":"G","Order":"117" ,"
BroadcastDate":"2019-10-19"}
I want to be able to do it via a the windows command-line. So basically I want do to find BroadcastDate and replace it with <line break>BroadcastDate.
Seems like an odd thing to do, but not very difficult in PowerShell. If you are on a supported Windows system, it will have PowerShell.
=== Format-BroadcastFile.ps1
Get-Content -Path '.\BroadcastDate.txt' |
ForEach-Object {
$_ -replace 'BroadcastDate'."`nBroadcastDate"
}
=== Run it in a .bat file script or the cmd shell.
powershell -NoLogo -NoProfile -File "Format-BroadcastFile.ps1" >".\newfile.txt

Windows Batch File, Route Output of EXE called in Batch File

In my batch file, I call a EXE and would like the output to be redirected to a file. In the PowerShell command line, it would look something like this:
prog.exe file.txt | Out-File results\results.txt -Encoding ascii
The above works in the command line. In my batch file, I have written it as this:
prog.exe file.txt | powershell -Command "Out-File results\file.txt -Encoding ascii"
When I run the batch file, the results file gets created but contains zero content. How can write this to behave like I need it too?
The following should work in a batch file:
prog.exe file.txt > results\results.txt
If you want to redirect both stdout and stderr use:
prog.exe file.txt > results\results.txt 2>&1
kichik's helpful answer shows you an effective solution using batch-file features alone.
Unless you have a need to create files with an encoding other than ASCII or the active OEM code page, there's no need to get PowerShell involved - it'll only slow things down.
That said, you can choose a different code page via chcp in cmd.exe, but for output to a file only 65001 for UTF-8 really makes sense, but note that the resulting file will have no BOM - unlike when you use Out-File -Encoding utf8 in Windows PowerShell.
If you do need to use PowerShell - e.g., to create UTF-16LE ("Unicode") files or UTF-8 files with BOM - you'll have to use $Input with a PowerShell-internal pipe in your PowerShell command in order to access the stdin stream (i.e., what was piped in):
prog.exe file.txt | powershell -c "$Input | Out-File results\file.txt -Encoding ascii"
Note that only characters representable in the active code page (as reflected in chcp) will be recognized by PowerShell and can be translated into potentially different encodings.
Choosing -Encoding ascii would actually transliterate characters outside the (7-bit) ASCII range to literal ? characters, which would result in loss of information.

Anything like dos2unix for Windows?

I have some shell scripts created on Windows.
I want to run dos2unix on them.
I have read that dos2unix works on Linux.
Is there a way that I can convert my files to having Unix newlines while working on Windows?
You can use Notepad++.
The instructions to convert a directory recursively are as follows:
Menu: Search -> Find in Files...
Directory = the directory you want to be converted to Unix format, recursively. E.g., C:\MyDir
Find what = \r\n
Replace with = \n
Search Mode = Extended
Press "Replace in Files"
Solved it trough Notepad++.
Go to: Edit -> EOL Conversion -> Unix.
If you have perl installed, you can simply run:
perl -i -p -e "s/\r//" <filename> [<filename2> ...]
There are at least two resources:
dos2unix on SourceForge, which appears to be actively maintained (as of 2015), and has pre-compiled releases for Windows, both 32- and 64-bit. Also includes unix2dos, mac2unix, and unix2mac.
CygUtils from GnuWin32, which are miscellaneous utilities forked from Cygwin, which includes dos2unix as well as several other related utilities. This package is not actively maintained (last update was in 2008).
In PowerShell there are so many solutions, given a lot of tools in the .NET platform
With a path to file in $file = 'path\to\file' we can use
[IO.File]::WriteAllText($file, $([IO.File]::ReadAllText($file) -replace "`r`n", "`n"))
or
(Get-Content $file -Raw).Replace("`r`n","`n") | Set-Content $file -Force
It's also possible to use -replace "`r", "" instead
To do that for all files just pipe the file list to the above commands:
Get-ChildItem -File -Recurse | % { (Get-Content -Raw `
-Path $_.Fullname).Replace ("`r`n", "`n") | Set-Content -Path $_.Fullname }
See
how to convert a file from DOS to Unix
How to convert DOS line endings to UNIX on a Windows machine
Powershell v2: Replace CRLF with LF
For bigger files you may want to use the buffering solutions in Replace CRLF using powershell
I used grepWin:
Open the folder containing your files in grepWin
In the "Search for" section
select "Regex search"
Search for -> \r\n
Replace with -> \n
Hit "Search" to confirm which files will be touched, then "Replace".
The search and replace Regex didn't work for me for whatever reason, however solved on single file (~/.bashrc) in Notepad++ by setting Encoding --> UTF-8 and resaving. Not as scalable but hopefully saves some headaches for quick conversion.
Open the file using Notepad++
Hit Ctrl+F
Select search mode as "Regular Expression"
Search for -> \r\n
Replace with -> \n
Hit "Replace all" under the "Replace tab"
if the above doesn't work -
mvn clean install
I realize this may be a bit of a contextual leap, but I'll share my thought anyway since it just helped in my use case...
If the file will live in a git repo, you can enforce the line endings on it via a .gitattributes file. See: how to make git not change line endings for one particular file?
You are using a very old dos2unix version on Cygwin. Cygwin 1.7 changed to a new version of dos2unix, the same as is shipped with most Linux distributions, about two years ago. So update your dos2unix with Cygwin's setup program. Check you get version 6.0.3.
There are also native Windows ports of dos2unix available (win32 and win64).
See http://waterlan.home.xs4all.nl/dos2unix.html
regards,
Any good text editor on Windows supports saving text files with just line-feed as line termination.
For an automated conversion of text files from DOS/Windows to UNIX line endings the batch file JREPL.BAT can be used which is written by Dave Benham and is a batch file / JScript hybrid to run a regular expression replace on a file using JScript working even on Windows XP.
A single file can be converted from DOS/Windows to UNIX using for example:
jrepl.bat "\r" "" /M /F "Name of File to Modify" /O -
In this case all carriage returns are removed from the file to modify. It would be of course also possible to use "\r\n" as search string and "\n" as replace string to remove only a carriage return left to a line-feed if the file contains carriage returns also somewhere else which should not be removed on conversion of the line terminators.
Multiple files of a directory or an entire directory tree can be converted from DOS/Windows to UNIX text files by using command FOR to CALL batch file JREPL.BAT on each file matching a wildcard pattern.
Batch file example to convert all *.sh files in current directory from DOS/Windows to UNIX.
#for %%I in (*.sh) do #call "%~dp0jrepl.bat" "\r" "" /M /F "%%I" /O -
The batch file JREPL.BAT must be stored in same directory as the batch file containing this command line.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
jrepl.bat /?
call /?
for /?

How to use cmd type pipe (/piping) in PowerShell?

In cmd (and bash), pipe "|" pushes output to another command in the original format of the first command's output (as string).
In PowerShell, everything that comes out the pipe is an object (even a string is a string object).
Because of that, some commands fail when run in a PowerShell command window as opposed to a Windows command window.
Example:
dir c:\windows | gzip > test.gz
When this command is run in the Windows command prompt window it works properly - directory listing of C:\windows gets compressed into test.gz file.
The same command in PowerShell fails, because PowerShell does not use cmd-style pipe and replaces it with PowerShell pipe (working with array of file system items).
Q. How do you disable the default piping behavior in PowerShell to make traditional Windows commands work identically in PowerShell?
I tried using the escape character "`" before the pipe "`|", but it didn't work. I also tried invoke-expression -command "command with | here", but it also failed.
if you want to send strings down the pipeline you can use the cmdlet "out-string"
For Example:
get-process | out-string
If you are specifically looking for a PowerShell way to zip up files, check out the PowerShell Community Extensions. there are a bunch of cmdlets to zip and unzip all kinds of files.
http://pscx.codeplex.com
If you can pipe the output of (CMD) dir into gzip, then gzip apparently knows how to parse dir output. The (string) output from the PowerShell dir command (aka Get-ChildItem) doesn't look the same, so gzip likely would not be able to parse it. But, I'd also guess that gzip would be happy to take a list of paths, so this would probably work:
dir c:\windows | select -ExpandProperty FullName | gzip > test.gz
No warrantees express or implied.
If you really need to use the old school DOS pipe system in PowerShell, it can be done by running a command in a separate, temporary DOS session:
& cmd /c "dir c:\windows | gzip > test.gz"
The /c switch tells cmd to run the command then exit. Of course, this only works if all the commands are old school DOS - you can't mix-n-match them with PowerShell commands.
While there are PowerShell alternatives to the example given in the question, there are lots of DOS programs that use the old pipe system and will not work in PowerShell. svnadmin load is one that I've the pleasure of having to deal with.
You can't. PowerShell was designed to pass objects down a pipeline, not text. There isn't a backwards-compatability mode to DOS.

Resources