In Windows, how do I find all files in a certain directory that are encoded using unicode? - windows

I am having trouble searching a large directory of files for a string. The search command I'm using is skipping any file encoded in Unicode. I want to find all the files in this directory that are encoded in Unicode. I am on Windows XP.
Thank you!

You don't know encoding before you open a file and read from it. So you will enumerate directory files, then go through the list, open and check either BOM or the content itself (such as certain amount of heading bytes).

The find command in Windows supports Unicode text files. findstr doesn't.

You can do it with my script below, the input does not care what encoding, as far as you specify the output encoding like this -Encoding ASCII.
Goto the Dir you want cd c:\MyDirectoryWithCrazyCharacterEncodingAndUnicode
Fire this script away!
Copy and past the script in your Powershell windows, you get the idea just play with it to fix the syntax
foreach($FileNameInUnicodeOrWhatever in get-childitem )
{
$tempEncoding = (Get-Content -encoding byte)
write-output $FileNameInUnicodeOrWhatever "has encoding" $tempEncoding
// [System.Text.Encoding]::$result
}
If you want to further resolve issues with not being able to find files because of encoding, change the encoding type

Related

Remove carriage return in Windows [duplicate]

I have some shell scripts created on Windows.
I want to run dos2unix on them.
I have read that dos2unix works on Linux.
Is there a way that I can convert my files to having Unix newlines while working on Windows?
You can use Notepad++.
The instructions to convert a directory recursively are as follows:
Menu: Search -> Find in Files...
Directory = the directory you want to be converted to Unix format, recursively. E.g., C:\MyDir
Find what = \r\n
Replace with = \n
Search Mode = Extended
Press "Replace in Files"
Solved it trough Notepad++.
Go to: Edit -> EOL Conversion -> Unix.
If you have perl installed, you can simply run:
perl -i -p -e "s/\r//" <filename> [<filename2> ...]
There are at least two resources:
dos2unix on SourceForge, which appears to be actively maintained (as of 2015), and has pre-compiled releases for Windows, both 32- and 64-bit. Also includes unix2dos, mac2unix, and unix2mac.
CygUtils from GnuWin32, which are miscellaneous utilities forked from Cygwin, which includes dos2unix as well as several other related utilities. This package is not actively maintained (last update was in 2008).
In PowerShell there are so many solutions, given a lot of tools in the .NET platform
With a path to file in $file = 'path\to\file' we can use
[IO.File]::WriteAllText($file, $([IO.File]::ReadAllText($file) -replace "`r`n", "`n"))
or
(Get-Content $file -Raw).Replace("`r`n","`n") | Set-Content $file -Force
It's also possible to use -replace "`r", "" instead
To do that for all files just pipe the file list to the above commands:
Get-ChildItem -File -Recurse | % { (Get-Content -Raw `
-Path $_.Fullname).Replace ("`r`n", "`n") | Set-Content -Path $_.Fullname }
See
how to convert a file from DOS to Unix
How to convert DOS line endings to UNIX on a Windows machine
Powershell v2: Replace CRLF with LF
For bigger files you may want to use the buffering solutions in Replace CRLF using powershell
I used grepWin:
Open the folder containing your files in grepWin
In the "Search for" section
select "Regex search"
Search for -> \r\n
Replace with -> \n
Hit "Search" to confirm which files will be touched, then "Replace".
The search and replace Regex didn't work for me for whatever reason, however solved on single file (~/.bashrc) in Notepad++ by setting Encoding --> UTF-8 and resaving. Not as scalable but hopefully saves some headaches for quick conversion.
Open the file using Notepad++
Hit Ctrl+F
Select search mode as "Regular Expression"
Search for -> \r\n
Replace with -> \n
Hit "Replace all" under the "Replace tab"
if the above doesn't work -
mvn clean install
I realize this may be a bit of a contextual leap, but I'll share my thought anyway since it just helped in my use case...
If the file will live in a git repo, you can enforce the line endings on it via a .gitattributes file. See: how to make git not change line endings for one particular file?
You are using a very old dos2unix version on Cygwin. Cygwin 1.7 changed to a new version of dos2unix, the same as is shipped with most Linux distributions, about two years ago. So update your dos2unix with Cygwin's setup program. Check you get version 6.0.3.
There are also native Windows ports of dos2unix available (win32 and win64).
See http://waterlan.home.xs4all.nl/dos2unix.html
regards,
Any good text editor on Windows supports saving text files with just line-feed as line termination.
For an automated conversion of text files from DOS/Windows to UNIX line endings the batch file JREPL.BAT can be used which is written by Dave Benham and is a batch file / JScript hybrid to run a regular expression replace on a file using JScript working even on Windows XP.
A single file can be converted from DOS/Windows to UNIX using for example:
jrepl.bat "\r" "" /M /F "Name of File to Modify" /O -
In this case all carriage returns are removed from the file to modify. It would be of course also possible to use "\r\n" as search string and "\n" as replace string to remove only a carriage return left to a line-feed if the file contains carriage returns also somewhere else which should not be removed on conversion of the line terminators.
Multiple files of a directory or an entire directory tree can be converted from DOS/Windows to UNIX text files by using command FOR to CALL batch file JREPL.BAT on each file matching a wildcard pattern.
Batch file example to convert all *.sh files in current directory from DOS/Windows to UNIX.
#for %%I in (*.sh) do #call "%~dp0jrepl.bat" "\r" "" /M /F "%%I" /O -
The batch file JREPL.BAT must be stored in same directory as the batch file containing this command line.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
jrepl.bat /?
call /?
for /?

In Windows 10 how do I rename a file to a filename that includes a character with an umlaut?

I'm on Win10 and I have a .bat file to rename a bunch of files. Some of the entries need to be renamed to a non-English name, e.g.
RENAME "MyFile1.txt" "Eisenhüttenstadt.txt"
However, when I run this, the 'ü' comes out as something else, other characters with an umlaut also are replaced by different characters.
I've tried saving the .bat file in Notepad with Unicode and UTF-8 encoding but then Windows doesn't recognise the command when I try to run it.
I've read this and other similar issues but not found a solution, surely it's simple when you know how?
Any suggestions?
The default code page in the console is 437(USA) or 850(Europe), which does not support characters with umlaut, so you must change this to 1252(West European Latin). So, use Chcp command in the beginning of your batch file to change it, like this:
Chcp 1252
Example:
image via http://www.pctipp.ch/tipps-tricks/kummerkasten/windows-7/artikel/windows-7-umlaute-in-batch-dateien-55616/
Sources:http://ss64.com/nt/chcp.html , http://www.pctipp.ch/tipps-tricks/kummerkasten/windows-7/artikel/windows-7-umlaute-in-batch-dateien-55616/ (The article says for Windows 7 but this applies for Windows 10 too)

CMD: '■m' is not recognized as an internal or external command

I am trying to get a batch file to work. Whenever I attempt to run a .bat the command line returns '■m' is not recognized... error, where "m" is the first letter of the file. For example:
md c:\testsource
md c:\testbackup
Returns
C:>"C:\Users\Michael\Dropbox\Documents\Research\Media\Method Guide\Program\test
.bat"
C:>■m
'■m' is not recognized as an internal or external command,
operable program or batch file.
Things I have tried:
Changing Path variables, rebooting, etc.
Changing file directory (i.e. run from C:)
Running example files from web (like above) to check for syntax errors.
Thanks
What text editor are you writing this in? It seems like your text editor may save the file as UTF-16 encoded text, which cmd.exe can't handle. Try setting the "coding"/"file encoding" to "ANSI" when saving the file.
This results in the first byte being a byte-order-mark (telling other editors how to process the file), and cmd.exe can't deal with this.
In addition to the approved answer I would add the case where is a PowerShell command the one that creates the file... PowerShell comes by default with the UTF-16 encoding.
To solve your problem then, force the file encoding lie this: | out-file foo.txt -encoding utf8
Answer based on this other answer.
In windows 10 I had the same issue.
Changing the character set to UTF-8 made it worse.
It worked correctly when I selected Encoding as UTF-8-NO BOM.

Anything like dos2unix for Windows?

I have some shell scripts created on Windows.
I want to run dos2unix on them.
I have read that dos2unix works on Linux.
Is there a way that I can convert my files to having Unix newlines while working on Windows?
You can use Notepad++.
The instructions to convert a directory recursively are as follows:
Menu: Search -> Find in Files...
Directory = the directory you want to be converted to Unix format, recursively. E.g., C:\MyDir
Find what = \r\n
Replace with = \n
Search Mode = Extended
Press "Replace in Files"
Solved it trough Notepad++.
Go to: Edit -> EOL Conversion -> Unix.
If you have perl installed, you can simply run:
perl -i -p -e "s/\r//" <filename> [<filename2> ...]
There are at least two resources:
dos2unix on SourceForge, which appears to be actively maintained (as of 2015), and has pre-compiled releases for Windows, both 32- and 64-bit. Also includes unix2dos, mac2unix, and unix2mac.
CygUtils from GnuWin32, which are miscellaneous utilities forked from Cygwin, which includes dos2unix as well as several other related utilities. This package is not actively maintained (last update was in 2008).
In PowerShell there are so many solutions, given a lot of tools in the .NET platform
With a path to file in $file = 'path\to\file' we can use
[IO.File]::WriteAllText($file, $([IO.File]::ReadAllText($file) -replace "`r`n", "`n"))
or
(Get-Content $file -Raw).Replace("`r`n","`n") | Set-Content $file -Force
It's also possible to use -replace "`r", "" instead
To do that for all files just pipe the file list to the above commands:
Get-ChildItem -File -Recurse | % { (Get-Content -Raw `
-Path $_.Fullname).Replace ("`r`n", "`n") | Set-Content -Path $_.Fullname }
See
how to convert a file from DOS to Unix
How to convert DOS line endings to UNIX on a Windows machine
Powershell v2: Replace CRLF with LF
For bigger files you may want to use the buffering solutions in Replace CRLF using powershell
I used grepWin:
Open the folder containing your files in grepWin
In the "Search for" section
select "Regex search"
Search for -> \r\n
Replace with -> \n
Hit "Search" to confirm which files will be touched, then "Replace".
The search and replace Regex didn't work for me for whatever reason, however solved on single file (~/.bashrc) in Notepad++ by setting Encoding --> UTF-8 and resaving. Not as scalable but hopefully saves some headaches for quick conversion.
Open the file using Notepad++
Hit Ctrl+F
Select search mode as "Regular Expression"
Search for -> \r\n
Replace with -> \n
Hit "Replace all" under the "Replace tab"
if the above doesn't work -
mvn clean install
I realize this may be a bit of a contextual leap, but I'll share my thought anyway since it just helped in my use case...
If the file will live in a git repo, you can enforce the line endings on it via a .gitattributes file. See: how to make git not change line endings for one particular file?
You are using a very old dos2unix version on Cygwin. Cygwin 1.7 changed to a new version of dos2unix, the same as is shipped with most Linux distributions, about two years ago. So update your dos2unix with Cygwin's setup program. Check you get version 6.0.3.
There are also native Windows ports of dos2unix available (win32 and win64).
See http://waterlan.home.xs4all.nl/dos2unix.html
regards,
Any good text editor on Windows supports saving text files with just line-feed as line termination.
For an automated conversion of text files from DOS/Windows to UNIX line endings the batch file JREPL.BAT can be used which is written by Dave Benham and is a batch file / JScript hybrid to run a regular expression replace on a file using JScript working even on Windows XP.
A single file can be converted from DOS/Windows to UNIX using for example:
jrepl.bat "\r" "" /M /F "Name of File to Modify" /O -
In this case all carriage returns are removed from the file to modify. It would be of course also possible to use "\r\n" as search string and "\n" as replace string to remove only a carriage return left to a line-feed if the file contains carriage returns also somewhere else which should not be removed on conversion of the line terminators.
Multiple files of a directory or an entire directory tree can be converted from DOS/Windows to UNIX text files by using command FOR to CALL batch file JREPL.BAT on each file matching a wildcard pattern.
Batch file example to convert all *.sh files in current directory from DOS/Windows to UNIX.
#for %%I in (*.sh) do #call "%~dp0jrepl.bat" "\r" "" /M /F "%%I" /O -
The batch file JREPL.BAT must be stored in same directory as the batch file containing this command line.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
jrepl.bat /?
call /?
for /?

Windows batch: Unicode parameters for (robo) copy command

I need to copy multiple files in a single batch file. The files have Unicode names that map to different codepages.
Example:
set ArabicFile=ڊڌڵڲڛشس
set CyrillicFile=щЖЛдЉи
set GermanFile=Bücher
copy %ArabicFile% SomePlaceElse
copy %CyrillicFile% SomePlaceElse
copy %GermanFile% SomePlaceElse
Problem: Batch files cannot be Unicode.
Question: How can I write the Unicode file names to the batch file so that the copy command recognizes them?
Notes:
I do not care how the file names are displayed.
Actually the batch file does much more than just copy these files, I just simplified the description to make the problem clearer.
Correct batch file:
With Arnout's answer I modified my batch file as follows. It now works correctly without requiring a font change (which would be messy, as Arnout commented).
#echo off
chcp 65001
set ArabicFolder=ڊڌڵڲڛشس
set CyrillicFolder=щЖЛдЉи
set GermanFolder=Bücher
robocopy /e d:\temp\test\%ArabicFolder% d:\temp\test2\%ArabicFolder% /log:copy.log
robocopy /e d:\temp\test\%CyrillicFolder% d:\temp\test2\%CyrillicFolder% /log+:copy.log
robocopy /e d:\temp\test\%GermanFolder% d:\temp\test2\%GermanFolder% /log+:copy.log
If
I add CHCP 65001 as the first line of your batch file,
save the file as UTF-8 without BOM, and
set my console font to something else than "Raster Fonts" (on my Win7 box I can choose Consolas or Lucida Console),
it works. Simple, no? :-)
(The font change is actually not necessary, provided you're not writing non-ASCII output to the console.)
I'm not certain, but I think the short (8.3) filename will be ASCII, so you could refer to it that way? You can find out the short filename with dir /X .
I want to create a batch file (e.g. RunThis.bat) which creates directories of names that can be Russians or others.
Example:
When DOS Windows is open with prompt:
D:\>md "Russia - Шпионка"
This work in command like and the name appear correctly.
But if I try that using Notepad and save in ANSII, I can’t.
So if I use again Notepad and save in UTF-8, it will work but with garbage characters.
RunThis.bat (Notepad save UTF-8), give garbage characters.
chcp 65001
set fn14="Russia - Шпионка"
md %fn14%
The problem with notepad it uses UTF-8 with BOM.
To save the .bat using UTF-8 without BOM we must use editor like Notepad++.
RunThis.bat (Notepad++ save UTF-8 – no BOM)
chcp 65001
set fn14="Russia - Шпионка"
md %fn14%
This time its work perfectly when we run “RunThis.bat” directly from explorer.exe

Resources