InstallScript GetLine() can not read text file contains result from command prompt - installation

My Installation needs to check the result of a command from cmd.exe. Thus, I redirect the result of the command to a text file and then try to read the file to get the result as follows:
// send command to cmd to execute and redirect the result to a text file
// try to read the file
szDir = "D:\\";
szFileName = "MyFile.txt";
if Is(FILEEXISTS, szDir ^ szFileName) then
listID = ListCreate(STRINGLIST);
if listID != LIST_NULL then
if OpenFIleMode(FILE_MODE_NORMAL) = 0 then
if OpenFile(nFileHandle, szDir, szFileName) = 0 then
// I run into problem here
while (GetLine(nFileHandle, szCurLine) = 0 )
ListAddString(listID, szCurLine, AFTER);
endwhile;
CloseFile(nFileHandle);
endif;
endif;
endif;
endif;
The problem is that right after the command prompt is executed and the result is redirected to MyFile.txt, I can set open file mode, open the file but I can not read any text into my list. ListReadFromFile() does not helps. If I open the file, edit and save it manually, my script works.
After debugging, I figured that GetLine() returns an error code (-1) which means the file pointer must be at the end of file or other errors. However, FILE_MODE_NORMAL sets the file as read only and SET THE FILE POINTER AT THE BEGINNING OF THE FILE.
What did I possibly do wrong? Is this something to do with read/write access of the file? I tried this command without result:
icacls D:\MyFile.txt /grant Administrator:(R,W)
I am using IstallShield 2018 and Windows 10 64-bit btw. Your help is much appreciated.
EDIT 1: I suspected the encoding and tried a few things:
After running "wslconfig /l", the content of MyFile.txt opened in Notepad++ is without an encoding, but still appeared normal and readable. I tried to converted the content to UTF-8 but it did not work.
If I add something to the file (echo This line is appended >> MyFile.txt), the encoding changed to UTF-8, but the content in step 1 is changeed also. NULL (\0) is added to between every character and even repelace new line character. Maybe this is why GetLine() failed to read the file.
Work around: after step 1, I run "find "my_desired_content" MyFile.txt" > TempFile.txt and read TempFile.txt (which is encoded in UTF-8).
My ultimate goal is to check if "my_desired_content" apeears in the result of "wslconfig /l" so this is fine. However, what I don't understand is that both MyFile.txt and TempFile.txt are created from cmd command but they are encoded differently?

The problem is due to the contents of the file. Assuming this is the file generated by your linked question, you can examine its contents in a hex editor to find out the following facts:
Its contents are encoded in UTF-16 (LE) without a BOM
Its newlines are encoded as CR or CR CR instead of CR LF
I thought the newlines would be more important than the text encoding, but it turns out I had it backwards. If I change each of these things independently, GetLine seems to function correctly for either CR, CR CR, or CR LF, but only handles UTF-16 when the BOM is present. (That is, in a hex editor, the file starts with FF FE 57 00 instead of 57 00 for a file starting with the character W.)
I'm at a bit of a loss for the best way to address this. If you're up for a challenge, you could read the file with FILE_MODE_BINARYREADONLY, and can use your extra knowledge about what should be in the file to ensure you interpret its encoding correctly. Note that for most of UTF-16, you can create a single code unit by combining two bytes in the following manner:
szResult[i] = (nHigh << 8) + nLow;
where nHigh and nLow are probably values like szBuffer[2*i + 1] and szBuffer[2*i], assuming you filled a STRING szBuffer by calling ReadBytes.
Other unproven ideas include editing it in binary to ensure the BOM (FF FE) is present, figuring out ways to ensure the file is originally created with the BOM, figuring out ways to create it in an alternate encoding, finding another command you can invoke to "fix" the file, or lodging a request with the vendor (my employer) and hoping the development team changes something to better handle this case.
Here's an easier workaround. If you can safely assume that the command will append UTF-16 characters without a signature, you can append this output to a file that has just a signature. How do you get such a file?
You could create a file with just the BOM in your development environment, and add it to your Support Files. If you need to use it multiple times, copy it around first.
You could create it with code. Just call the following (error checking omitted for clarity)
OpenFileMode(FILE_MODE_APPEND_UNICODE);
CreateFile(nFileHandle, szDir, szFileName);
CloseFile(nFileHandle);
and if szDir ^ szFileName didn't exist, it will now be a file with just the UTF-16 signature.
Assuming this file is called sig.txt, you can then invoke the command
wslconfig /l >> sig.txt to write to that file. Note the doubled >> for append. The resulting file will include the Unicode signature you created ahead of time, plus the Unicode data output from wslconfig, and GetLine should interpret things correctly.
The biggest problem here is that this hardcodes around the behavior of wslconfig, and that behavior may change at any point. This is why Christopher alludes to recommending an API, and I agree completely. In the mean time, You could try to make this more robust by invoking it in a cmd /U (but my understanding of what that does or guarantees is fuzzy at best), or by trying the original way and then with the BOM.

This whole WSL thing is pretty new. I don't see any APIs it but rather then screen scrapping command outputs you might want to look at this registry key:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Lxss
It seems to have the list of installed distros that come from the store. Coming from the store probably explains why this is HKCU and not HKLM.
A brave new world.... sigh.

Related

Convert mangled characters back to UTF-8

Here is what I did:
I dumped a SQLite database with UTF-8 data (sqlite3 example.db .dump > dump.sql), but since this was in powershell, I assume the piping converted it to windows-1252
I loaded that dumped data into a new database, again using powershell (Get-Content dump.sql | sqlite3 example2.db)
I dumped that new database and am left with a new .sql file (this time it was not through powershell - so I assume it was unmodified)
This new sql file's UTF-8 characters are seriously mangled, and I was wondering if there was a way to convert it back into correct UTF-8.
As a few examples, here are what some sequences are in the new file, and what they should be (all are viewed as UTF-8):
ÒüéÒü¬ÒüƒÒü½ should be あなたに
´╝ü should be a full width exclamation mark
Òé¡Òé╗Òé¡ should be キセキ
Does anyone have any idea as to how I might undo this mangling? Any method would be very helpful!
This is in powershell 7.0.1
Edit:
On further inspection, you can duplicate my predicament by redirecting any such data to a file in powershell (note that the data cannot itself be entered in powershell). Hence, setting up a script like this gives the same outcome:
test.sh
#!/bin/bash
echo "キ"
And then running wsl ./test.sh > test.txt will give an output of Òé¡, not キ
Edit 2:
It seems as if the codepage the UTF-8 text was converted to is almost 437: some characters are restored using this assumption (e.g. 木), but others are not. If it's close to 437, but isn't, what could it be?
It turns out, since I am in the UK, the codepage I wanted was 850. Saving the file as 850 and then reloading it as UTF-8 fixed my issue!

Sed replace unusual file extension arising from gmv

As a result of using gmv on a large nested directory to flatten in, I have a number of duplicate files separated out and with the extensions "._1_" "._2_" etc ( .... ._n_ )
eg "a.pdf.\_1\_"
ie its
a(dot)pdf(dot)(back slash)1(back slash)
as opposed to
a(dot)pdf(dot)1
which I want to reduce it back to "a.pdf"
I tried something like
sed -i .bak "s|.\_1\_||" *
which is usually reliable and doesn't require escape characters. However its giving me
"error: illegal byte sequence"
Grateful for help to fix. This is on Mac OSX terminal. Ideally I'd like a generic solution to fix ._*_ forms where the * varies 1 to 9
There are two challenges here.
How to deal with the duplicate basename (The suffixes '1', '2', ... mostly like added to designate different sections of a single file - may be different pages a PDF, etc. Performing rename that will strip the files may cause some important files to disappear.
How to deal with the "error: illegal byte sequence" which indicate that some special characters (unicode) are part of the file name. Usually ASCII characters with value >= \0xc0, which can not be decoded according to the current local. The fact that the file names are escaped (as per OP "a.pdf.\_1\_" may hint at additional characters, not displayed (assuming this was not added by the OP).
Proposed solution is to rename the file, and place the 'sequence' part, that make the file unique BEFORE the extension, allowing the extension to be used to determine file type.
a.pdf.1 => a.1.pdf
The rename command to perform this task is:
rename 's/(.).pdf.(_._)/$1$2.pdf/' .pdf.__
Adjust the file name list as needed, and use -n to verify before running.
rename -n s/.\_1\_// *.*_1_
works (remove the -n once tested).

Corruption when using certain batch variable names in custom build command

I have a VS2013 project with a custom build command. In the command script I set an environment variable, and read it out again in the same script. I can confirm by calling set that setting the variable works. However, depending on the variable name, I can't read it out again.
The following works as expected when run as a batch script:
set AVAR=xxx
set ABLAH=xxx
set BBLAH=xxx
set DEV=xxx
set #ABLAH=xxx
echo %AVAR%
echo %ABLAH%
echo %BBLAH%
echo %DEV%
echo %#ABLAH%
But produces the following output in the project:
1> xxx
1> «LAH
1> »LAH
1> ÞV
1> xxx
In this case, the name AVAR works, but many others don't. Also, variables starting with # seem to work. Any idea what is going on?
I've found the solution. Visual Studio (msbuild) converts %XX escape sequences like in URLs. I only expected it to so in URLs, like browsers do. However, it seems to replace them everywhere.
So when it encounters %ABCDE%, it recognizes %AB and inserts the character « = 0xAB, giving «CDE% to the batch interpreter. But if the code is not a valid hexadecimal number, it silently ignores it, and the interpreter sees the right characters. That's why variable names with # at the beginning always worked.
So the solution is to escape at least all % in front valid hex codes 00-FF, better even all of them, with %25.
An easy solution would be to just edit the corresponding commands in the GUI (via project properties), and not directly in the .vcxproj or .props file. This way, VS inserts the correct escape codes. In my case this was not possible since the commands were defined as user macros (Property Pages: Common Properties/User Macros). My commands span multiple lines, but the user macro editor only supports single lines.
Another thing to watch out for is that it not only replaces percent signs. Other symbols have special meaning and have to be replaced, too. (This goes beyond XML entities, like & -> &.) Here is a list of special characters from MSDN. The characters are: % $ # ' ; ? *. It doesn't seem to be necessary to replace all of them all the time, but if you notice funky behavior then this is a thing to look at. You can try to enter these characters through the GUI and see how and if VS escapes them in the project file.
On other character to note especially is the semicolon. If you define a property with unescaped semicolons, like <MyPaths>DirA;DirB</MyPaths>, msbuild/VS will internally convert them to newlines (well, or it splits the property into a list or something). But it will still show the paths as separated with semicolons in the property pages! Except when you click the dropdown button next to a property and select <Edit...>, then it will show the paths as a list or separated by newlines! This is completely invisible most of the time, except when you set a property not in XML or the GUI, but you are reading the output of a command into a property. In this case the command must output newlines, if you want the effect of a semicolon. Otherwise you don't get multiple paths, but one long path with semicolons in it.
Batch files are usually in North American and Western European countries "ASCII" files using an OEM code page like code page 850 (OEM multilingual Latin I) or code page 437 (OEM US) and not code page Windows-1252 as used usually for single byte encoded text files. The code page to use for a batch file depends on local settings for non Unicode files in console. The code page does not matter if just characters with a code value smaller 128 are used in batch file, i.e. the batch file is a real ASCII file.
Therefore make sure that you edit and save the batch file as ASCII file using the right code page and not as Unicode file using UTF-8, UTF-16 Little Endian or UTF-16 Big Endian. Editor of Visual Studio uses by default UTF-8 encoding for the files. This is the wrong encoding for batch files.
Character « has in table of code page 850 the code value 174 decimal (0xAB). In table of code page 1252 code value 174 is for character ® which is an indication that you want to output in batch file characters encoded in UTF-8 (also code value 174 for character ®) or Windows-1252.
A simple batch code for demonstration stored as ANSI file with code page Windows-1252.
#echo off
cls
echo This batch file was saved as ANSI file using code page Windows-1252.
echo.
echo Registered trademark symbol ® has code value 174 in Windows-1252.
echo.
echo But active code page is not Windows 1252 in console window.
echo.
chcp
echo.
echo Therefore the left guillemet character is output instead of registered
echo trademark symbol as this character has in code page 850 code value 174.
echo.
echo Press any key to continue ...
pause>nul
And batch files are for DOS/Windows and should therefore use carriage return + line-feed as line terminator instead of just line-feed (UNIX) or just carriage return (old MAC).
Some text editors display line terminator type and encoding respectively code page somewhere in status bar at bottom of main application window for active file.

Batch file keeps adding special characters to file name?

This is the first time ive posted here so I apologise if im in the wrong place.
I have a batch file that reads a list of domains from a text file and then does an nslookup ls against them, posting the results in their own text file.
Ive never had a problem with this until recently and I cant for the life of me work out why this has started happening.
All the files are perfect except for the first one! The first file name is always proceeded with "" (without the quotes) These files get read by another program I have written so it tends to cause a problem.
Heres the code that creates the files...
(
del /s /q "D:\Profile\Desktop\New_folder\Records\*.*"
for /f %%a in (D:\Profile\Desktop\New_folder\Domains\Domains.txt) do (
echo ls %%a >temp\tempfile.txt
echo exit >>temp\tempfile.txt
nslookup < temp\tempfile.txt > records\%%a.txt
)
)
Any help is much appreciated.
Cheers,
Aaron
According to IBM Extendend Characterset the characters you mentioned have the hex codes EF BB BF which is the UTF-8 byte order mark ("BOM"), see Wikipedia. This means that the file Domain.txt seems to have been saved using UTF-8 character encoding with BOM recently.
In order to get rid of the characters, simply edit the file and save it without a BOM. See e.g. to How to make Notepad to save text in UTF-8 without BOM? how to do that or search for "remove BOM"
Note that UTF-8 without BOM is compatible to printable ASCII, i.e. "normal" characters encoded as UTF-8 will show correctly in most common charactersets such as IBM Extended Characterset.
If you cannot or do not want to edit the input file then you might get rid of the prefix in your batch script, see Substrings in http://www.robvanderwoude.com/ntset.php#StrSubst - eventually something like
set BOM_REMOVED=false
for ...
set X=%%a
if %BOM_REMOVED%==false set X=%X:~3%
set BOM_REMOVED=true
echo ls %X >temp\tempfile.txt
...

What's with Ruby's ZipInputStream screwing up my line endings?

I'd be happy with ZipInputStream taking indecent liberties with the line endings that are stored in a file if it would at least get them right for the platform I'm storing the file on. Unfortunately, I pull a text file (.txt, .cpp. .etc.) out of a zip and the \n (0x0A) gets replaced with a \r\n (0x0d0a) and, as you can imagine, this is causing me a great deal of trouble.
Is there a flag I can set to tell it either to avoid changing the line endings altogether or to use one of my choosing?
Thanks.
(I've checked the zip file, my creation of it, etc. I've extracted it using other zip tools and verified that it is archived properly. I've stepped through my project with rdebug and seen that the ZipInputStream call to read() is returning \r\n for line endings.)
if you have an open(filename) or open(filename,"r") call in your code, try to replace it with open(filename,"rb")

Resources