how to know encoding of a file using command line? - cmd

Is there any command to know the encoding of a file in windows?
like for a file A.txt encoding is UTF-16

In Windows command prompt (cmd), there is no command I know of, that is capable of determining how a text file is encoded.
Nevertheless, I wrote a small batch file that is able to check a few conditions and thus, determine whether a given text file is ASCII-/ANSI-encoded or Unicode-encoded (UTF-8 or UTF-16, Little Endian or Big Endian). At first, it checks whether or not the first (non-empty) line contains zero-bytes, which is an indication that the file is not ASCII-/ANSI-encoded. Next, it checks the first few bytes whether they constitute the Byte Order Mark (BOM) for UTF-8/UTF-16. Since the BOM is optional for Unicode-encoded files, its absence is not a clear sign for an ASCI-/ANSI-encoded file.
So here is the code, featuring a lot of explanatory remarks (rem) -- I hope, it helps:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (provide file via the first command line argument)
rem // Check whether a dedicated file is given (so no wild-cards):
2> nul >&2 (< "%_FILE%" set /P ="" & ver) || (
rem // The file does not exist:
>&2 echo The file could not be found, hence there is no encoding!
exit /B 255
)
rem // Determine the file size:
set "SIZE=" & for %%F in ("%_FILE%") do set "SIZE=%%~zF"
if not defined SIZE (
rem // The file does not exist:
>&2 echo The file could not be found, hence there is no encoding!
exit /B 255
)
if %SIZE% EQU 0 (
rem // The file is empty:
>&2 echo The file is empty, hence encoding cannot be determined!
exit /B 1
)
rem // Store current code page to be able to restore it finally:
for /F "tokens=2 delims=:" %%C in ('chcp') do set /A "$CP=%%C"
rem /* Change to code page 437 (original IBM PC or DOS code page) temporarily;
rem this is necessary for extended characters not to be converted: */
> nul chcp 437
rem // Attempt to read first line from file; this fails if zero-bytes occur:
(
rem /* The loop does not iterate over an empty file or one with empty lines only;
rem therefore, the behaviour is the same as when zero-bytes occur: */
for /F usebackq^ delims^=^ eol^= %%L in ("%_FILE%") do (
rem // Abort reading file after first non-empty line:
goto :NEXT
)
) || (
rem /* The `for /F` loop returns a non-zero exit code in case the file is empty,
rem contains empty lines only or the first non-empty line contains zero-bytes;
rem to determine whether there are zero-bytes, let `find` process the file,
rem which removes zero-bytes or converts them to line-breaks, so `for /F` can
rem read the file;
rem however, `find` would read the whole file, hence do that only for small
rem ones and skip that for large ones, such contains zero-bytes most likely: */
if %SIZE% LEQ 8192 (
(
rem // In case the file contains line-breaks only, the loop does not iterate:
for /F delims^=^ eol^= %%L in ('^< "%_FILE%" find /V ""') do (
rem // Abort reading file after first non-empty line:
goto :ZERO
)
) || (
rem /* The loop did not iterate, so the file contains line-breaks only;
rem restore the initial code page prior to termination: */
> nul chcp %$CP%
>&2 echo The file holds only empty lines, hence encoding cannot be determined!
exit /B 1
)
)
)
rem // This point is reached in case the file contains zero-bytes:
:ZERO
rem // Restore the initial code page prior to termination:
> nul chcp %$CP%
>&2 echo NULL-bytes detected in first line, so file is non-ASCII/ANSI!
exit /B 2
rem // This point is reached in case the file does not contain any zero-bytes:
:NEXT
rem /* Build Byte Order Marks (BOMs) for UTF-16-encoded text (Little Endian and Big Endian)
rem and for UTF-8-encoded text: */
for /F "tokens=1-3" %%A in ('
forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0xFF0xFE 0xFE0xFF 0xEF0xBB0xBF"
') do set "$LE=%%A" & set "$BE=%%B" & set "$U8=%%C"
rem /* Reset line string variable, then store first line string (1023 bytes at most);
rem in contrast to `for /F`, this does not skip over blank lines: */
< "%_FILE%" (set "LINE=" & set /P LINE="")
rem // Check whether the first line of the file begins with any of the BOMs:
if not "%LINE:~,2%"=="%$LE%" if not "%LINE:~,2%"=="%$BE%" if not "%LINE:~,3%"=="%$U8%" goto :CONT
rem /* One of the BOMs has been encountered, hence the file is Unicode-encoded;
rem restore the initial code page prior to termination: */
> nul chcp %$CP%
>&2 echo BOM encountered in first line, so file is non-ASCII/ANSI!
exit /B 4
rem // This point is reached in case the file does not appear as Unicode-encoded:
:CONT
rem // Restore the initial code page prior to termination:
> nul chcp %$CP%
echo The file appears to be an ASCII-/ANSI-encoded text.
endlocal
exit /B 0

Related

Nested FOR loops and variables in Batch

I have the following code;
setlocal EnableDelayedExpansion
:splitEncode
::Get the number of Chapters
set "cmd=FINDSTR /R /N "^.*" %~n1.txt | FIND /C ":""
for /F %%a in ('!cmd!') do set numChapters=%%a
::Cycle through this once for every chapter, getting the line and the line after it
for /L %%a in (1,1,%numChapters%) do (
set "skip="
if %%a geq 2 (
set /a skip=%%a-1
set "skip=skip=!skip!"
)
for /F "!skip! tokens=1,2" %%i in ("%~n1.txt") do (
set startTime=%%i
set chapterName=%%j
)
set "skip=skip=%%a"
for /F !skip! %%i in ("%~n1.txt") do (
set endTime=%%i
)
echo %startTime% %endTime% %chapterName%
)
First I find out how many lines are in a text file, and set that to the variable numChapters.
I then use this to create a for loop that iterates for each chapter.
Inside the for loop, there are two further loops. The first reads a line, and the second reads the following line.
The intent of this is to read lines 1+2, 2+3, 3+4, and use those values as part of a command run the same number of times as the number of lines.
This means that from a list such as this;
00:00:00 The Meeting Room/The Meeting
00:03:36 Long Distance Runaround
00:07:47 Wonderous Stories
I can end up with a command that includes the start time, end time, and chapter title.
The issue I am facing is that no matter what I do, I cannot get the nested for loops to use the skip variables. I've tried %%a, %skip%, !skip!, and none of them work. The value isn't correctly substituted in any situation.
Does anyone have any way to get this variable used, or a better method of reading a specific line of a text file than a for loop?
The option string of for /F (like the root path of for /R) requires immediate (%-)expansion, because for (besides if and rem) is recognised by the command interpreter even before delayed expansion and also expansion of for meta-variables occur.
A possible solution is to put each for /F loop with the dynamic skip options into a sub-routine, to use call to call it and to apply %-expansion therein (see all the additional rem remarks for explanations):
#echo off
setlocal EnableDelayedExpansion
:splitEncode
rem Get the number of chapters
rem // To determine the number of lines in a file you do not need `findstr`:
for /F %%a in ('^< "%~n1.txt" find /C /V ""') do set "numChapters=%%a"
rem Cycle through this once for every chapter, getting the line and the line after it
for /L %%a in (1,1,%numChapters%) do (
set /A "skip=%%a-1"
call :getTwoValues startTime chapterName "%~n1.txt" "!skip!"
call :getTwoValues endTime dummy "%~n1.txt" "%%a"
rem /* For the last line, there is of course no next line containing the end time;
rem therefore, let us mark that case specifically: */
if not defined endTime set "endTime=??:??:??"
rem /* If there is no chapter specified, do not output anything; this might also
rem be quite useful in case the last line just contains a time stamp but no
rem chapter name just to provide the end time of the last one: */
if defined chapterName echo !startTime! !endTime! !chapterName!
)
goto :EOF
:getTwoValues <1st var. name> <2nd var. name> <file path/name> <lines to skip>
rem // Ensure not to return the former output, and set up `skip` option string:
set "%~1=" & set "%~2=" & set /A "skip=0, skip+=%~4" 2> nul
if %skip% gtr 0 (set "skip=skip=%skip%") else set "skip="
rem /* Added `usebackq` in order not to interprete the quoted file path/name as
rem a literal string; also changed the `tokens` option to return the first
rem token and then the whole remainder of the line: */
rem /* Remember that `for /F` regards empty lines for its `skip` option, but it
rem does not iterate through such; hence the first line it iterates over is
rem actually the first non-empty line after the number of skipped lines: */
for /F "usebackq %skip% tokens=1,*" %%i in ("%~3") do (
set "%~1=%%i"
set "%~2=%%j"
rem // Since we do not want to iterate to the last line, leave the loop here:
goto :EOF
)
rem /* This is just needed in case `skip` points beyond the end of the file, or
rem there are no more non-empty lines behind the skipped ones: */
goto :EOF
Based on your sample data, the output should be this:
00:00:00 00:03:36 The Meeting Room/The Meeting
00:03:36 00:07:47 Long Distance Runaround
00:07:47 ??:??:?? Wonderous Stories
However, the entire approach could be heavily simplified, when you avoid the file multiple times and do not read each line twice, by simply reading the file line by line once, but return the chapter information from the previous iteration, together with the end time from the current line:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~n1.txt" & rem // (path/name of file to process)
set "_FRMT=??:??:??" & rem // (dummy end time output for last chapter)
rem /* Initialise variables; loop through lines of files, augmented by
rem an additional line at the end, to alyways output last chapter: */
set "STA=" & for /F "tokens=1,*" %%K in ('
type "%_FILE%" ^& echo(%_FRMT%
') do (
rem // Output the chapter from the previous loop iteration:
set "END=%%K" & if defined STA if defined NAME (
setlocal EnableDelayedExpansion
echo(!STA! !END! !NAME!
endlocal
)
rem // Store chapter information for the next loop iteration:
set "STA=%%K" & set "NAME=%%L"
)
endlocal
exit /B

How Can I Replace Any Line by Its Line Number?

EDIT: After great help from #aschipfl, the code is %110 as functional as I wanted it to be! I did some extra research and made it easy to use with prompts for that extra %10 :P
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Create a prompt to set the variables
set /p _FILETYPE="What file type: "
set /p _LINENUM="Which line: "
set /p _NEWLINE="Make line say: "
rem // Start the loop, and set the files
for %%f in (*%_FILETYPE%) do (
set "_FILE=%%f"
echo "_FILE=%%f"
rem // To execute seperate code before the end of the loop, starting at ":subroutine".
call :subroutine "%%f"
)
:subroutine
rem // Write to a temporary file:
> "%_FILE%.new" (
rem /* Loop through each line of the original file,
rem preceded by the line number and a colon `:`:*/
for /F "delims=" %%A in ('findstr /N "^" "%_FILE%"') do (
rem // Store the current line with prefix to a variable:
set "LN=%%A"
rem /* Store the line number into another variable;
rem everything up to the first non-numeric char. is regarded,
rem which is the aforementioned colon `:` in this situation: */
set /A "NUM=LN"
rem // Toggle delayed expansion to avoid trouble with `!`:
setlocal EnableDelayedExpansion
rem /* Compare current line number with predefined one and replace text
rem in case of equality, or return original text otherwise: */
if !NUM! equ %_LINENUM% (
echo(!_NEWLINE!
) else (
rem // Remove line number prefix:
echo(!LN:*:=!
)
endlocal
)
)
rem // Move the edited file onto the original one:
move /Y "%_FILE%.new" "%_FILE%"
endlocal
exit /B
ORIGINAL QUESTION:
Doesn't matter whats in any of the lines already. I just want to be able to pick any line from a .txt and replace it with whatever I choose.
So for example: Maybe I have a bunch of .txt's, and I want to replace line 5 in all of them with "vanilla". And later choose to replace line 10 of all .txt's with "Green". And so on...
I've seen lots of people asking the same main question. But I keep finding situational answers.
"How do I replace specific lines?" "you search for whats already in the line, and replace it with your new text" -I cant have that. I need it to be dynamic, because whats in each "line 5" is different, or there's lots of other lines with the same text.
I had tried the only one answer I could find, but all it ended up doing is replace literally all lines with "!ln:*:=!", instead of echoing.
#echo off
setlocal disableDelayedExpansion
set "file=yourFile.txt"
set "newLine5=NewLine5Here"
>"%file%.new" (
for /f "delims=" %%A in ('findstr /n "^" "%file%"') do for /f "delims=:" %%N in ("%%A") do (
set "ln=%%A"
setlocal enabableDelayedExpansion
if "!ln:~0,6!" equ "5:FMOD" (echo(!newLine5!) else echo(!ln:*:=!
endlocal
)
)
move /y "%file%.new" "%file%" >nul
The following (commented) code should work for you:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=yourFile.txt"
set "_NEWLINE=NewLine5Here"
set /A "_LINENUM=5" & rem // (target line number)
rem // Write to a temporary file:
> "%_FILE%.new" (
rem /* Loop through each line of the original file,
rem preceded by the line number and a colon `:`:*/
for /F "delims=" %%A in ('findstr /N "^" "%_FILE%"') do (
rem // Store the current line with prefix to a variable:
set "LN=%%A"
rem /* Store the line number into another variable;
rem everything up to the first non-numeric char. is regarded,
rem which is the aforementioned colon `:` in this situation: */
set /A "NUM=LN"
rem // Toggle delayed expansion to avoid trouble with `!`:
setlocal EnableDelayedExpansion
rem /* Compare current line number with predefined one and replace text
rem in case of equality, or return original text otherwise: */
if !NUM! equ %_LINENUM% (
echo(!_NEWLINE!
) else (
rem // Remove line number prefix:
echo(!LN:*:=!
)
endlocal
)
)
rem // Move the edited file onto the original one:
move /Y "%_FILE%.new" "%_FILE%"
endlocal
exit /B
Besides the typo in EnableDelayedExpansion in your code, you do not even need a second for /F loop to get the line number, and you do not need to extract a certain number of characters from the prefixed line text.
Note that this approach fails for line numbers higher than 231 - 1 = 2 147 483 647.
...is replace literally all lines with "!ln:*:=!", instead of echoing.
But that's correct, because the FINDSTR /N prefixes each line with a line number before.
The !ln:*:=! only removes the line number again.
And the findstr trick is used to avoid skipping of empty lines or lines beginning with ; (the EOL character).
The !line:*:=! replaces everthing up to the first double colon (and incuding it) with nothing.
This is better than using FOR "delims=:" because delims=: would also strip double colons at the front of a line.
The toggling of delayed expansion is necessary to avoid accidential stripping of ! and ^ in the line set "ln=%%A"
To fix your code:
setlocal DisableDelayedExpansion
for /f "delims=" %%A in ('findstr /n "^" "%file%"') do (
set "ln=%%A"
setlocal EnableDelayedExpansion
if "!ln:~0,6!" equ "5:FMOD" (
set "out=!newLine5!"
) else (
set "out=!ln:*:=!"
)
echo(!out!
endlocal
)

Batch Script - trying to replace a text string in all files of a specific extension

I am trying to change all *.gpx files in a directory, editing out all instances of "Flag, Blue" with "Waypoint" (without quotation marks). I'm not great at Windows script and so want a little help debugging.
I have based this code on code by in question:
Batch Script - Find and replace text in multiple files in a directory without asking user to install any program or add other files to my batch script
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // based on code by aschipfl
rem // https://stackoverflow.com/questions/46467475/batch-script-find-and-replace-text-in-multiple-files-in-a-directory-without-as
rem // Define constants here:
set "_MASK=*.gpx" & rem // (working on all GPX files)
set "_SEARCH=Flag, Blue" & rem // (find those HORRIBLE blue flags)
set "_REPLAC=Waypoint" & rem // (repace with WAYPOINTS)
set "FOROPT=" & rem // NON-recursive
set "IFSW=" & rem // CaSe sEnSiTiVe YeS
set "_TMPF=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (path to temporary file)
pushd "." || exit /B 1
rem // Loop through all matching files in the directory tree:
for %FOROPT% %%F in ("%_MASK%") do (
rem // Write to temporary file:
> "%_TMPF%" (
rem /* Read current file line by line; use `findstr` to precede every line by
rem its line number and a colon `:`; this way empty lines appear non-empty
rem to `for /F`, which avoids them to be ignored; otherwise empty lines
rem became lost: */
for /F "delims=" %%L in ('findstr /N "^" "%%~fF"') do (
rem // Store current line text:
set "LINE=%%L" & set "FLAG="
setlocal EnableDelayedExpansion
rem // Remove line number prefix:
set "LINE=!LINE:*:=!"
rem // Skip replacement for empty line text:
if defined LINE (
rem /* Use `for /F` loop to avoid trouble in case search or replace
rem strings contain quotation marks `"`: */
for /F "tokens=1* delims== eol==" %%I in ("!_SEARCH!=!_REPLAC!") do (
rem // Query to handle case-sensitivity:
if %IFSW% "!LINE!"=="!LINE:%%I=%%I!" (
rem // Detect whether replacement changes line:
if not "!LINE!"=="!LINE:%%I=%%J!" (
rem // Actually do the sub-string replacement:
set "LINE=!LINE:%%I=%%J!"
set "FLAG=#"
)
)
)
)
rem // Output the resulting line text:
echo(!LINE!
if defined FLAG (endlocal & set "FLAG=#") else (endlocal)
)
)
rem // Check whether file content would change upon replacement:
if defined FLAG (
rem // Move the temporary file onto the original one:
> nul move /Y "%_TMPF%" "%%~fF"
) else (
rem // Simply delete temporary file:
del "%_TMPF%"
)
)
popd
endlocal
exit /B
I run the script but no changes to the GPX files.
A real-world example segment from the GPX file would be:
<ele>1.19734255318821</ele>
<time>2019-07-28T00:42:12Z</time>
<name>CW1002</name>
<sym>Flag, Blue</sym>
<extensions>
<trp:ViaPoint>
Obviously I want this to remain the same except:
<sym>Waypoint</sym>

How do I merge text files in a particular order using cmd

I am using the following command copy *.txt newfile.txt to merge my text files into the main file but the order gets messed up. I have text files whose name are in the order
1january.txt
2february.txt
3february.txt
4march.txt
5may.txt
6june.txt
7july.txt
8august.txt
9september.txt
10october.txt
11november.txt
12december.txt
But using the cmd command it first appends 10october,11november,12december & then appends from 1january.
Is there any command in cmd that can do this or any other code will also do.
A possible way is this (given that the preceding numbers are positive and do not have leading zeros, and the file names do not contain ! or ^):
cmd /V /Q /C copy nul "newfile.txt" ^& set /A "LIM=0" ^& (for %J in ("*.txt") do set "NUM=%~nJ" ^& set /A "NUM+=0" ^& set "$[!NUM!]=%~J" ^& if !LIM! lss !NUM! set /A "LIM=NUM") ^& (for /L %I in (1,1,!LIM!) do if defined $[%I] copy /B "newfile.txt" + "!$[%I]!" "newfile.txt")
If you place this code in a batch-file it might look like this (note that I additionally inserted several rem marks for explanation of the code here):
#echo off
rem // Enable delayed expansion to be able to write AND read variables in a single block:
setlocal EnableDelayedExpansion
rem // Create empty target file:
copy nul "newfile.txt"
rem // Reset buffer for greatest preceding integer number:
set /A "LIM=0"
rem // walk through all matching files:
for %%J in ("*.txt") do (
rem // Store base file name to variable, then covert it to integer:
set "NUM=%%~nJ" & set /A "NUM+=0"
rem // Write full file name to array-like variable with gathered integer as index:
set "$[!NUM!]=%%~J"
rem // Update buffer for greatest preceding integer number:
if !LIM! lss !NUM! set /A "LIM=NUM"
)
rem // Count up from one to greatest preceding integer number:
for /L %%I in (1,1,!LIM!) do (
rem // Check whether pseudo-array element is defined and append to target file then:
if defined $[%%I] copy /B "newfile.txt" + "!$[%%I]!" "newfile.txt"
)
rem // End environment localisation:
endlocal

Use cmd prompt findstr to output only certain set of characters in string

Using the cmd prompt, I am trying to use the findstr feature to output certain criteria from a txt file.
My txt file contains a list of .exe names, including comments. There are alot of them- I want to parse out only the "name.exe" of each line.
Here are examples of different lines in the txt file
C:\\Programme\\Windows Media Player\\mplayer2.exe""=dword:00000000
HOPSTER.EXE; Hopster
Out of these, I want only "mplayer2.exe" and "hopster.exe" to be included in the print out.
Instead, I receive this:
script: findstr "*.exe" Exies.txt
output:
.\Exies.txt:""C:\\Programme\\Windows Media Player\\mplayer2.exe""=dword:00000000
.\Exies.txt:HOPSTER.EXE; Hopster
I was able to pull out some items using this script, findstr /e ".exe" Exies.txt,
but am having trouble with the other examples above.
Any help? Please and thank you.
I don't think you can do it only with findstr (I'm not aware of any FINDSTR output format that would let you print only the matched patterns).
Instead, you could use select-string from PowerShell:
C:\>powershell
Windows PowerShell
Copyright (C) 2009 Microsoft Corporation. All rights reserved.
PS C:\> select-string -Path Exies.txt -Pattern "([a-z0-9]+)\.exe" -AllMatches | % { $_.Matches } | % { $_.Value }
mplayer2.exe
HOPSTER.EXE
PS C:\>
Here is a pure batch solution (there are many explanatory rem-arks in the code, so don't be shocked about the extent).
The most tricky parts are to get the offset position beyond the ".exe" extension for proper truncation of the string read from each line of Exies.txt, and to determine the start position of the file name:
#echo off
setlocal EnableDelayedExpansion
rem regular expression for `findstr` which means:
rem a string consisting of a sub-string with at least one character,
rem NOT containing any of '\?*/:<>"', followed by ".exe";
rem such strings are considered as valid executable file names
set REGEX="[^\\?\*/:<>|\""]*[^\\?\*/:<>|\""]\.exe"
rem parse the output of `findstr` with a `for /F` loop
rem (note that `findstr` is told to do case-insensitive searches)
for /F "tokens=*" %%F in ('findstr /I /R %REGEX% Exies.txt 2^> nul') do (
rem assign a single matching line to variable `LINE`
set "LINE=%%F"
rem call sub-routine to retrieve length of string portion after the
rem (first) occurrence of ".exe"; the length is also equal to the
rem character offset of the string portion after the ".exe" occurrence
call :STRLEN "!LINE:*.exe=!" LEN > nul
rem use another `for` loop to truncate `LINE` after ".exe";
rem so afterwards we have everything up to the ".exe" portion
for %%A in (!LEN!) do (
set "LINE=!LINE:~,-%%A!"
) & rem next %%A
rem replace double-quotes '"' and colons ':' by backslashes '\'
set "LINE=!LINE:"=\!
set "LINE=!LINE::=\!"
rem wrap around another `for` loop to extract the file name portion
rem (this is done to remove any paths from the "*.exe" file name)
for %%B in ("!LINE!") do (
set "LINE=%%~nxB"
) & rem next %%B
rem safety check if extracted "*.exe" file name still matches the
rem regular expression (necessary if ".exe" occurs twice in a line)
echo !LINE! | findstr /I /R %REGEX% > nul 2>&1
if not ErrorLevel 1 (
rem output final "*.exe" file name
echo !LINE!
) & rem end if
) & rem next %%F
endlocal
exit /B
:STRLEN
rem this constitutes a sub-routine to get the length of a string
setlocal EnableDelayedExpansion
set "STR=%~1"
if "%STR%" EQU "" (
set /A LEN=0
) else (
set /A LEN=1
for %%I in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
if not "!STR:~%%I!" EQU "" (
set /A LEN+=%%I
set "STR=!STR:~%%I!"
) & rem end if
) & rem next %%I
) & rem end if
endlocal & set /A LEN=%LEN%
if not "%~2" EQU "" set %~2=%LEN%
echo %LEN%
exit /B
Assumptions:
the ".exe" portion occurs once per line of Exies.txt only;
the file name consists of at least one character other than these \?*/:<>|";
the file name is delimited to the left by
either a backslash \ or a colon : (meaning that a path has been specified),
or a double-quote " (it might have been enclosed in a pair of such);

Resources