Batch file keeps adding special characters to file name? - cmd

This is the first time ive posted here so I apologise if im in the wrong place.
I have a batch file that reads a list of domains from a text file and then does an nslookup ls against them, posting the results in their own text file.
Ive never had a problem with this until recently and I cant for the life of me work out why this has started happening.
All the files are perfect except for the first one! The first file name is always proceeded with "" (without the quotes) These files get read by another program I have written so it tends to cause a problem.
Heres the code that creates the files...
(
del /s /q "D:\Profile\Desktop\New_folder\Records\*.*"
for /f %%a in (D:\Profile\Desktop\New_folder\Domains\Domains.txt) do (
echo ls %%a >temp\tempfile.txt
echo exit >>temp\tempfile.txt
nslookup < temp\tempfile.txt > records\%%a.txt
)
)
Any help is much appreciated.
Cheers,
Aaron

According to IBM Extendend Characterset the characters you mentioned have the hex codes EF BB BF which is the UTF-8 byte order mark ("BOM"), see Wikipedia. This means that the file Domain.txt seems to have been saved using UTF-8 character encoding with BOM recently.
In order to get rid of the characters, simply edit the file and save it without a BOM. See e.g. to How to make Notepad to save text in UTF-8 without BOM? how to do that or search for "remove BOM"
Note that UTF-8 without BOM is compatible to printable ASCII, i.e. "normal" characters encoded as UTF-8 will show correctly in most common charactersets such as IBM Extended Characterset.
If you cannot or do not want to edit the input file then you might get rid of the prefix in your batch script, see Substrings in http://www.robvanderwoude.com/ntset.php#StrSubst - eventually something like
set BOM_REMOVED=false
for ...
set X=%%a
if %BOM_REMOVED%==false set X=%X:~3%
set BOM_REMOVED=true
echo ls %X >temp\tempfile.txt
...

Related

InstallScript GetLine() can not read text file contains result from command prompt

My Installation needs to check the result of a command from cmd.exe. Thus, I redirect the result of the command to a text file and then try to read the file to get the result as follows:
// send command to cmd to execute and redirect the result to a text file
// try to read the file
szDir = "D:\\";
szFileName = "MyFile.txt";
if Is(FILEEXISTS, szDir ^ szFileName) then
listID = ListCreate(STRINGLIST);
if listID != LIST_NULL then
if OpenFIleMode(FILE_MODE_NORMAL) = 0 then
if OpenFile(nFileHandle, szDir, szFileName) = 0 then
// I run into problem here
while (GetLine(nFileHandle, szCurLine) = 0 )
ListAddString(listID, szCurLine, AFTER);
endwhile;
CloseFile(nFileHandle);
endif;
endif;
endif;
endif;
The problem is that right after the command prompt is executed and the result is redirected to MyFile.txt, I can set open file mode, open the file but I can not read any text into my list. ListReadFromFile() does not helps. If I open the file, edit and save it manually, my script works.
After debugging, I figured that GetLine() returns an error code (-1) which means the file pointer must be at the end of file or other errors. However, FILE_MODE_NORMAL sets the file as read only and SET THE FILE POINTER AT THE BEGINNING OF THE FILE.
What did I possibly do wrong? Is this something to do with read/write access of the file? I tried this command without result:
icacls D:\MyFile.txt /grant Administrator:(R,W)
I am using IstallShield 2018 and Windows 10 64-bit btw. Your help is much appreciated.
EDIT 1: I suspected the encoding and tried a few things:
After running "wslconfig /l", the content of MyFile.txt opened in Notepad++ is without an encoding, but still appeared normal and readable. I tried to converted the content to UTF-8 but it did not work.
If I add something to the file (echo This line is appended >> MyFile.txt), the encoding changed to UTF-8, but the content in step 1 is changeed also. NULL (\0) is added to between every character and even repelace new line character. Maybe this is why GetLine() failed to read the file.
Work around: after step 1, I run "find "my_desired_content" MyFile.txt" > TempFile.txt and read TempFile.txt (which is encoded in UTF-8).
My ultimate goal is to check if "my_desired_content" apeears in the result of "wslconfig /l" so this is fine. However, what I don't understand is that both MyFile.txt and TempFile.txt are created from cmd command but they are encoded differently?
The problem is due to the contents of the file. Assuming this is the file generated by your linked question, you can examine its contents in a hex editor to find out the following facts:
Its contents are encoded in UTF-16 (LE) without a BOM
Its newlines are encoded as CR or CR CR instead of CR LF
I thought the newlines would be more important than the text encoding, but it turns out I had it backwards. If I change each of these things independently, GetLine seems to function correctly for either CR, CR CR, or CR LF, but only handles UTF-16 when the BOM is present. (That is, in a hex editor, the file starts with FF FE 57 00 instead of 57 00 for a file starting with the character W.)
I'm at a bit of a loss for the best way to address this. If you're up for a challenge, you could read the file with FILE_MODE_BINARYREADONLY, and can use your extra knowledge about what should be in the file to ensure you interpret its encoding correctly. Note that for most of UTF-16, you can create a single code unit by combining two bytes in the following manner:
szResult[i] = (nHigh << 8) + nLow;
where nHigh and nLow are probably values like szBuffer[2*i + 1] and szBuffer[2*i], assuming you filled a STRING szBuffer by calling ReadBytes.
Other unproven ideas include editing it in binary to ensure the BOM (FF FE) is present, figuring out ways to ensure the file is originally created with the BOM, figuring out ways to create it in an alternate encoding, finding another command you can invoke to "fix" the file, or lodging a request with the vendor (my employer) and hoping the development team changes something to better handle this case.
Here's an easier workaround. If you can safely assume that the command will append UTF-16 characters without a signature, you can append this output to a file that has just a signature. How do you get such a file?
You could create a file with just the BOM in your development environment, and add it to your Support Files. If you need to use it multiple times, copy it around first.
You could create it with code. Just call the following (error checking omitted for clarity)
OpenFileMode(FILE_MODE_APPEND_UNICODE);
CreateFile(nFileHandle, szDir, szFileName);
CloseFile(nFileHandle);
and if szDir ^ szFileName didn't exist, it will now be a file with just the UTF-16 signature.
Assuming this file is called sig.txt, you can then invoke the command
wslconfig /l >> sig.txt to write to that file. Note the doubled >> for append. The resulting file will include the Unicode signature you created ahead of time, plus the Unicode data output from wslconfig, and GetLine should interpret things correctly.
The biggest problem here is that this hardcodes around the behavior of wslconfig, and that behavior may change at any point. This is why Christopher alludes to recommending an API, and I agree completely. In the mean time, You could try to make this more robust by invoking it in a cmd /U (but my understanding of what that does or guarantees is fuzzy at best), or by trying the original way and then with the BOM.
This whole WSL thing is pretty new. I don't see any APIs it but rather then screen scrapping command outputs you might want to look at this registry key:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Lxss
It seems to have the list of installed distros that come from the store. Coming from the store probably explains why this is HKCU and not HKLM.
A brave new world.... sigh.

set /p or alternative on MS-DOS (Windows ME) [duplicate]

I've written a program that returns keycodes as integers for DOS
but i don't know how to get it's output as a variable.
Note: I'm using MS-DOS 7 / Windows 98, so i can't use FOR /F or SET /P
Does anyone know how i could do that?
A few solutions are described by Eric Pement here. However, for older versions of cmd the author was forced to use external tools.
For example, program tools like STRINGS by Douglas Boling, allows for following code:
echo Greetings! | STRINGS hi=ASK # puts "Greetings!" into %hi%
Same goes for ASET by Richard Breuer:
echo Greetings! | ASET hi=line # puts "Greetings!" into %hi%
One of alternative pure DOS solutions needs the program output to be redirected to the file (named ANSWER.DAT in example below) and then uses a specially prepared batch file. To cite the aforementioned page:
[I]n the batch file we need to be able to issue the command
set MYVAR={the contents of ANSWER.DAT go here}. This is a difficult task, since MS-DOS doesn't offer an easy way to prepend "set MYVAR=" to a file [...]
Normal DOS text files and batch files end all lines with two consecutive bytes: a carriage return (Ctrl-M, hex 0D, or ASCII 13) and a linefeed (Ctrl-J, hex 0A or ASCII 10). In the batch file, you must be able to embed a Ctrl-J in the middle of a line.
Many text editors have a way to do this: via a Ctrl-P followed by Ctrl-J (DOS EDIT with Win95/98, VDE), via a Ctrl-Q prefix (Emacs, PFE), via direct entry with ALT and the numeric keypad (QEdit, Multi-Edit), or via a designated function key (Boxer). Other editors absolutely will not support this (Notepad, Editpad, EDIT from MS-DOS 6.22 or earlier; VIM can insert a linefeed only in binary mode, but not in its normal text mode).
If you can do it, your batch file might look like this:
#echo off
:: assume that the datafile exists already in ANSWER.DAT
echo set myvar=^J | find "set" >PREFIX.DAT
copy PREFIX.DAT+ANSWER.DAT VARIAB.BAT
call VARIAB.BAT
echo Success! The value of myvar is: [%myvar%].
:: erase temp files ...
for %%f in (PREFIX.DAT ANSWER.DAT VARIAB.BAT) do del %%f >NUL
Where you see the ^J on line 3 above, the linefeed should be embedded at that point. Your editor may display it as a square box with an embedded circle.

MS-DOS how to get output of command as variable

I've written a program that returns keycodes as integers for DOS
but i don't know how to get it's output as a variable.
Note: I'm using MS-DOS 7 / Windows 98, so i can't use FOR /F or SET /P
Does anyone know how i could do that?
A few solutions are described by Eric Pement here. However, for older versions of cmd the author was forced to use external tools.
For example, program tools like STRINGS by Douglas Boling, allows for following code:
echo Greetings! | STRINGS hi=ASK # puts "Greetings!" into %hi%
Same goes for ASET by Richard Breuer:
echo Greetings! | ASET hi=line # puts "Greetings!" into %hi%
One of alternative pure DOS solutions needs the program output to be redirected to the file (named ANSWER.DAT in example below) and then uses a specially prepared batch file. To cite the aforementioned page:
[I]n the batch file we need to be able to issue the command
set MYVAR={the contents of ANSWER.DAT go here}. This is a difficult task, since MS-DOS doesn't offer an easy way to prepend "set MYVAR=" to a file [...]
Normal DOS text files and batch files end all lines with two consecutive bytes: a carriage return (Ctrl-M, hex 0D, or ASCII 13) and a linefeed (Ctrl-J, hex 0A or ASCII 10). In the batch file, you must be able to embed a Ctrl-J in the middle of a line.
Many text editors have a way to do this: via a Ctrl-P followed by Ctrl-J (DOS EDIT with Win95/98, VDE), via a Ctrl-Q prefix (Emacs, PFE), via direct entry with ALT and the numeric keypad (QEdit, Multi-Edit), or via a designated function key (Boxer). Other editors absolutely will not support this (Notepad, Editpad, EDIT from MS-DOS 6.22 or earlier; VIM can insert a linefeed only in binary mode, but not in its normal text mode).
If you can do it, your batch file might look like this:
#echo off
:: assume that the datafile exists already in ANSWER.DAT
echo set myvar=^J | find "set" >PREFIX.DAT
copy PREFIX.DAT+ANSWER.DAT VARIAB.BAT
call VARIAB.BAT
echo Success! The value of myvar is: [%myvar%].
:: erase temp files ...
for %%f in (PREFIX.DAT ANSWER.DAT VARIAB.BAT) do del %%f >NUL
Where you see the ^J on line 3 above, the linefeed should be embedded at that point. Your editor may display it as a square box with an embedded circle.

On windows, how would I detect the line ending of a file?

I've seen answers to the questions, but those answers are not from a windows perspective from what I can tell.
Windows uses CR LF, Unix uses LF, Mac uses LF and classic mac uses something else. I don't have the brainpower to tell that somehow, if a file is using a different line ending than what I am typing, I get errors when trying to run the script/program which frankly, don't make much sense. After conversion, the script works just fine.
Is there anyway to preemptively check what line endings a file uses, on Windows?
use a text editor like notepad++ that can help you with understanding the line ends.
It will show you the line end formats used as either Unix(LF) or Macintosh(CR) or Windows(CR LF) on the task bar of the tool.
you can also go to View->Show Symbol->Show End Of Line to display the line ends as LF/ CR LF/CR.
Steps:
From the following link download binaries and dependencies zip files:
http://gnuwin32.sourceforge.net/packages/file.htm
Extract their content under the same directory (merge existing directories).
e.g. under c:\gnuwin32
Then you can execute:
c:\gnuwin32\bin\file.exe my-lf-file.txt
my-lf-file.txt; ASCII text
c:\gnuwin32\bin\file.exe my-crlf-file.txt
my-crlf-file.txt; ASCII text, with CRLF line terminators
Of course you can add c:\gnuwin32\bin to your %PATH% variable, to be able to access it without providing the full path.
UPDATE:
If you have git installed you can launch git-bash and run file command from there.
Or you can install this subsystem, as described in the official Microsoft documentation, and get access to the file command.
I too am looking for a "native" windows scripting solution. So far, just have to read a line or 2 in VB in binary fashion and inspect the characters.
One tool to check "manually" is Notepad++. The status bar has a newline style indicator on the right end next to the file encoding indicator.
It looks like this in version 7.5.6
Other editors with Hex mode can show you also.
In Powershell, this command returns "True" for a Windows style file and "False" for a *nix style file.
(Get-Content '\\FILESERVER0001\Fshares\NETwork Shares\20181206179900.TXT' -Raw) -match "\r\n$"
This came from Matt over here: https://stackoverflow.com/a/35354009/1337544
In a batch file, you can try converting the file to CRLF and checking if its size increases:
rem check-crlf.bat
#echo off
setlocal
call type "%~1" | c:\Windows\System32\find.exe "" /v > "%~1.temp"
set size1=%~z1
rem add 2 in case the file doesn't have a trailing newline, since find will add it
set /a size1plus2=%size1%+2
call :setsize2 "%~1.temp%"
for /f %%a in ('c:\Windows\System32\findstr /R /N "^" "%~1" ^| c:\Windows\System32\find /C ":"') do set lines=%%a
if %size1plus2% equ %size2% (
if %lines% equ 2 (
echo File uses LF line endings!
) else (
echo File uses CRLF or has no line endings!
)
) else (
if %size1% lss %size2% (
echo File uses LF line endings!
) else (
echo File uses CR+LF line endings!
)
)
del "%~1.temp"
exit /b
:setsize2
set size2=%~z1
exit /b
We're handling the special case of a file without a trailing newline, as well as a file with two LF-terminated newlines, which both lead to an increase of 2 bytes.
Usage:
check-crlf.bat file-i-care-about.txt
So the main thing to remember, at least for a computer programmer working on modern software is that any combination of CR and LF, in sequence needs to be treated as a newline. You will almost never see the 'old' mac, which is CR with no LF - I prefer to ignore its relatively minuscule existence.. I tend to use 1-byte file processing, but that is a personal preference (a preference that pays a dividend in this scenario) Show proficiency as a programmer by making your code resilient to line ending format of text files.

Why is this batch file producing extra, unexpected, unwanted characters?

I'm trying to use the following batch script to concatenate some files together:
copy NUL bin\translate.js
for %%f in (source\Libraries\sprintf.js, source\translate-namespace.js, source\util.js, source\translator.js, source\translate.js) do (
type %%f >> bin\translate.js
echo. >> bin\translate.js
)
However, when I do this, an extra character seems to be printed at the end of each file. When I view the file in ASCII, it is interpreted as these three characters:

Why is this happening? What can I do to fix it?
The  looks like a unicode byte order mark. Is it possible to start with files that are stored without the byte mark? I am not aware of any command line commands that can remove the mark.
The DOS copy command works like the UNIX cat command. That is, you can list multiple source files and one destination file, seperated with + signs.
copy source\Libraries\sprintf.js+source\translate-namespace.js bin\translate.js

Resources