Findstr doesn't recognize string on first line (Encoding UTF-8 BOM) - utf-8

I need to remove lines containing ":setvar" (case insensitive) in thousands of files.
It all works untill the string is on the first line.
The file is encoded UTF-8 bom.
Re-encoding the files is not an option.
Also, no 3rd party tools allowed except windows 10 stock.
Screenshot: OriginalFile.sql
I have tried using:
findstr /v /i /b ":setvar" "OriginalFile.sql" > "TempFile.sql"

Related

Unconcatenating files using jeb's tricky method

EDIT: My essential question (without the specific setting for which I need a solution, as described in my original posting):
BinFile.bin is a file concatenated from binary files and a text file. The included text file consists only of lines beginning with a specific string, e.g. ;;;===,,,
With a batch file:
findstr /v "^;;;===,,," "BinFile.bin" > output.bin
an output bin file is generated in which the text file is completely removed.
How to use findstr (or another dos command) to not only remove all lines beginning with the specified string, but also the part of the bin before first such line (i.e. the complete binary part preceeding the text file)?
>>> My original posting:
jeb invented a method to concatenate files using Windows native tools which can be unconcatenated (in a specific way) using native tools. His solution is just ingenious!
copy /a batchBin.bat + /b myBinaryFile.bin /b combined.bat
with batchBin.bat:
;;;===,,,#echo off
;;;===,,,echo line2
;;;===,,,findstr /v "^;;;===,,," "%~f0" > output.bin
;;;===,,,exit /b
"The key is the findstr command, it outputs all lines not beginning with ;;;===,,,.
And as each of them are standard batch delimiters, they can be prefix any command in a batch file in any combination."
So myBinaryFile.bin can be extracted from the combined.bat––only by means of native tools!
My question:
In jeb's example the combined file is a batch file, because the first file in the copy command is a batch file. Could jeb's tricky method be used for the following task too, where the combined file would be combined.exe, an exe file?
copy /b aBat2ExeFile.exe + /a delimiter.bat + /b myBinaryFile.bin /b combined.exe
where delimiter.bat would be something like this:
;;;===,,,REM
and aBat2ExeFile.exe would be a batch file (aBat2ExeFile.bat) converted to exe, with a tricky use of findstr like in batchBin.bat, but with the result
[...] > output.exe
In aBat2ExeFile.bat findstr should be used with the result that all lines of combined.exe before and including the line ';;;===,,,REM' would be ignored and output.exe would be equal to myBinaryFile.bin again?
In think the concept is correct. But how this could be implemented in the aBat2ExeFile.bat?
EDIT: My question can be simplified (the frame described above is not essential):
How the findstr method used by jeb could be adapted to process a binary file in such a way that not only lines starting with ';;;===,,,' but also all lines preceding the first such line are "ignored"?

CMD batch file - xcopy won't copy directory with accented character in its name [duplicate]

I have some batch files that use a text file for language-independancy. Until yesterday all worked fine ... but then I began translating the standard texts to Dutch and German. Both languages use so called diacritical or accented characters like ä, ë, ö. I think Spanish will give the same problems with ñ. I created the text file with Notepad using standard encoding, which is ANSI. Just typing (DOS: TYPE) the file showed the wrong accented characters: e.g. ë showed as Ù. After I edited the text file and saved with Unicode encoding the DOS TYPE showed exactly what I typed in Notepad. At this point I thought my problem was solved ... but my batch code now shows me no text at all! All text is retrieved from the file by a batch file that looks like this (simplified):
#rem Parms %1 text type number File %%a program name
#rem %2 program name (double quoted) %%b - - filler (tabs)
#rem %3 text number %%c text number
#rem %4 replacement value - 1 %%d - - filler (tabs)
#rem %5 replacement value - 2 %%e text string
set TempText=
set TempType=
setlocal enabledelayedexpansion
#rem Read file until both values are set ...
for /f "usebackq tokens=1,2,3,4,5 delims=|" %%a in ("%EnvPath%Text.txt") do (
if /i %%a==Tools (if /i %%c==%1 (set TempType=%%e))
if /i %%a==%~2 (if /i %%c==%3 (set TempText=%%e))
if not "!TempType!"=="" (if not "!TempText!"=="" (goto :Leave))
)
:Leave
endlocal & set TempText=%TempText%&set TempType=%TempType%
When ECHO is ON it shows that no lines are read from the file or the FOR-loop is never executed.
My question is: how can I make the FOR loop to read the Unicode texts?
Your problem is that cmd uses code page 850 (in the US it may be 437), type chcp to see. English Windows uses 1252 elsewhere.
GUI programs
ñ 0xf1
Console programs
ñ 0xa4
If you are on 32 bit use edit.exe (a msdos text editor). Else you can use Word and save as MSDos text.
three years late, but...
you can convert the file to ANSI "on the fly" with the type command:
... %%a in ('type "%EnvPath%Text.txt"') do (

Merging of csv breaks diacritical characters

I'm trying to merge some csv files. I do it on Windows with cmd, like type *.csv >> or with a batch file, containing
echo. > all.csv
for %%a in (*.csv) DO copy /b alle.csv+%%a all.csv
On one computer (win7x64) is merging no problem. But on another one (same win7x64) all diacritical characters (german: äüöß) are broken - instead of them there are only ´,,´.
The source files, which should be merged, have healthy diacritical characters - i open them with Notepad++ and Excel, as ANSI or Unicode - everything is OK.
How can i adjust the file merging to save diacritical signs?
I believe there are several issues contributing to the unexpected results:
You try to create an empty file by echo. > all.csv, but this actually results in a file containing a SPACE, followed by a line-break (CR + LF), ANSI-encoded. So you may have files that are differently encoded, which can cause troubles.
To truly create an empty file, use rem/ > all.csv, break > all.csv, type nul > all.csv or copy /Y nul all.csv.
When combining files with copy, it can be problematic when the destination file is also one of the source files. When it is the first source file, the data of every other source files are appended; when it is not the first of the source files, an overwrite prompt may appear (unless you specify /Y) and data may be lost. Since you have given *.csv as the source file, we do actually not know which source file is enumerated first, so it could or may not be all.csv. So to avoid such trouble, you better delete the destination file before copying rather than create an empty file, like del all.csv.
Supposing you have Unicode files, they begin with a two-byte header 0xFF + 0xFE. When combining such files using copy /B, you have multiple of these headers within the file. To overcome this, do copy /A but within a Unicode cmd instance initiated by cmd /U:
cmd /U /C del all.csv ^& copy /A *.csv all.csv
copy /b *.csv all.txt & ren all.txt all.csv
or
2>nul del all.csv & copy /b *.csv all.csv
The type command can make some changes that could interfere in the process. Better use copy /b (with or without the for), but ensure the file being generated is not present or selected to avoid it being included as source in the process.
You should also ensure all your files have the same encoding. If some of them are Unicode/UTF-? with BOM and some not, depending of what the first file is selected, you could end with bad formated data.

Automation to add a LF and CR (EOL) to the end of multiple .csv files

I have a bunch of .csv files that are generated externally and sent to me periodically. they each contain a single row of text with 31 "columns". None of them, however, have any kind of EOL (no LF by itself or with CR), so when I attempt to combine any of these files, I get more columns on the same row, instead of a row for each file.
I would like a way to automatically add this to the end of each of these files in a batch, with the outputs having the same filename as the original file, potentially with the addition of a character at the beginning of the name so I new this process was completed. Ex: originalFile.csv>> 1_originalFile.csv.
I had attempted to create a file called "eol.csv" that was simply (LF and CR), and create a batch that would add that to the end of all of my files, but as I am a novice to writing batch files, I was significantly unsuccessful.
If it were possible for this to execute on each file as it were dropped into a folder, that would be even better.
Thanks for any thoughts on this!
The FINDSTR regular expression $ recognizes end of line as the position immediately before a carriage return. So findstr /v $ will only match lines that do not contain a carriage return. You can use this fact to append carriage return/linefeed to only files that need it, without having to rename any files.
The following one liner from the command line is all you need:
for /f "eol=: delims=" %F in ('findstr /m /v $ *.csv') do #(echo()>>"%F"
Double up the percents if you put the command within a batch script.
This will search all .csv file names for the string _fixed, and on any who fail to have it, will insert a blank line and rename it. Of course replace the pathToWherever with the correct path for you, and the /s option can be added to allow searching in subfolders in the named path too.
#echo off
for /r "C:\pathToWherever\" %%G in (*.csv) do (
echo %%G | findstr /c:"_fixed" || (
echo:>>%%G
ren "%%G" "%%~nG_fixed.csv"
)
)
Since all echos end in a CRLF, and you can use echo/ to echo a CLRF by itself, you can simply use output redirection to append a CLRF to each of the csv files.
If you want to run this on a bunch of files that you've dragged and dropped onto the script, it would look like this:
#for %%A in (%*) do echo/ >>%%A
That one line is the entire script, by the way.
There are several methods to append a line-break to (the last line of) a file if not yet present:
findstr:
findstr /V "$" "data.csv" > nul && echo/>> "data.csv"
This inverse (/V) search matches the last line only when it is not terminated by a line-break. In such case && lets the following command execute, which just appends a line-break.
Restrictions:
lines must be shorter than 8K characters;
find:
< "data.csv" find /V "" > "data.csv.tmp" && move /Y "data.csv.tmp" "data.csv" > nul
This search matches all lines, find appends a line-break to every returned line, even for the last one when there is none. A temporary file is required since it is not possible to read from and write to the same file. If no errors occur, && lets the next command execute, which moves the temporary file onto the original one.
Restrictions:
this requires a temporary file;
lines must be shorter than 4K characters;
more:
more "data.csv" > "data.csv.tmp" && move /Y "data.csv.tmp" "data.csv" > nul
This just returns all lines; more appends a line-break to every returned line, even for the last one when there is none. A temporary file is required since it is not possible to read from and write to the same file. If no errors occur, && lets the next command execute, which moves the temporary file onto the original one.
Restrictions:
this requires a temporary file;
the file must be shorter than 64K lines;
lines must be shorter than 64K characters;
TABs become expanded to SPACEs;
sort:
sort "data.csv" /+65535 /REC 65535 | sort /+65535 /REC 65535 /O "data.csv"
This just returns all lines; sort appends a line-break to every returned line, even for the last one when there is none. Surprisingly, no temporary file is required (I tested with a ~ 30 MB file without data loss due to I/O collisions). Nevertheless, this is likely the slowest method here because of the pipe (|).
The key is to set a character position for sorting that is beyond the data. In such cases, sort seems to simply revert the whole file; this is the reason for using two sort commands. But I tested it just very quickly with one file on Windows 7, so you should be cautious with this.
Restrictions:
lines must be shorter than 64K characters;
All of the above approaches can easily be implemented in a for loop in order to be applied to multiple files; simply replace data.csv with the for meta-variable then (demonstrated on variant 1. here):
for %I in ("*.csv") do #(findstr /V "$" "%~I" > nul && echo/>> "%~I")
Remember that the %-signs need to be doubled when using this code in a batch-file.

want to copy names of files to text in a directory based on their file extension

I want to copy names of files to text in a directory based on their file extension.
As of now I am using dir /b >i67.txt which works fine for me but its not resolving problem of specific file extensions.
Can someone help me in getting a batch script for the same.
You are looking for the following command, run it in the context of the directory which contains your files:
dir /b /s /-p *.txt /o:n | findstr /E .txt > i67.txt
Using the above code example, you will be able to find all *.txt files in the directory and output the results into the i67.txt file (will be outputted to the same directory).
You can specify multiple file masks within one DIR /B command. Based on your comment to Yair Nevet's answer, it seems you want the following extensions: .ovr, .inc, and .dat. That can be done simply using:
dir /b /s *.ovr *.inc *.dat >i67.txt
If the files are on an NTFS volume that has short 8.3 names enabled, then you might get additional undesired file extensions if you have any file extensions longer than 4 characters that begin with your wanted extension. For example someName.data would show up in your output because it most likely would have a short name of SOMENA~1.DAT that matches your file mask.
You can prevent short name inclusion by piping the output to FINDSTR. The /L option forces a literal search as opposed to regular expressions, the /I option ignores case, and the /E option matches only the end of each line. Multiple search terms are delimited by spaces.
dir /b /s *.ovr *.inc *.dat | findstr /lie ".ovr .inc .dat"
Regarding your following comment:
Here is what I am using now: dir /b | findstr [a-z].*ovr>i67.txt &&
dir /b | findstr [a-z].*inc>>i67.txt && dir /b | findstr
[a-z].*dat>>i67.txt What it does?? --- It copies all
names(remember,only name except files itself which are ending with
extension .ovr .dat and .cpi ) present in a directory and copy it to a
text file(here name is i67.txt)
That will not actually do what you want for several reasons.
Windows file names are not case sensitive. Windows would treat NAME.OVR and name.ovr the same, so you should as well. That requires the /I option.
There is nothing in your search to anchor ovr to the extension. It will look for your pattern anywhere within the file name. And the dot is a meta character that represents any character - not a literal dot. The asterisk allows the dot to match any number of characters.
I can't be sure, but it looks like perhaps you only want to match files that begin with a letter. The following modification to my answer should do the trick:
dir /b /s *.ovr *.inc *.dat | findstr /ri "^[a-z].*\.ovr$ ^[a-z].*\.inc$ ^[a-z].*\.dat$"
The \R option forces a regular expression match instead of a literal. It is the default behavior for the given search, but it is a good idea to be explicit with regard to regex vs literal search.
^ anchors the search to the beginning of the name
[a-z] matches any letter (sort of). Remember it is not case sensitive because of the /I option. Without the /I option, it would not match upper case Z. See Why does findstr not handle case properly (in some circumstances)? for an explanation.
.* matches any number of characters, without restriction
\. matches a dot literal, marking the beginning of your extension
Then comes your extension
$ anchors the match to the end of the name

Resources