CMD: clip command issue - cmd

I have found a weird issue using the clip command of Widows CMD.
I created a simple text file containing this text:
^.*A{0,0}.*$
Then I ran the command clip < PATH_TO_THE_TEXT_FILE in CMD.
Finally, I tried pasting the copied text into text editors such as Notepad and Notepad++, and what I got was some weird Japanese characters. This issue can be reproduced every time and on different PCs.
Can you please tell me what is causing this issue and how can I make the clip command copy the actual text in the text file, and not the weird Japanese characters?

what is causing this
Incorrect encoding conversion, ASCII String -> bytes -> UTF-16 String
// JavaScript code to emulate this
bytes = new TextEncoder("UTF-8").encode( '^.*A{0,0}.*$' )
new TextDecoder("UTF-16LE").decode( bytes )
how can I make the clip command copy the actual text
type PATH_TO_THE_TEXT_FILE | clip
Sorry, update the answer.
Update: clip works if there is a newline in the end of the file.

Based on what 7cc said, I figued out how to make it work.
I need to create the text file using the encoding UTF-16LE, and then it works.

Related

Convert mangled characters back to UTF-8

Here is what I did:
I dumped a SQLite database with UTF-8 data (sqlite3 example.db .dump > dump.sql), but since this was in powershell, I assume the piping converted it to windows-1252
I loaded that dumped data into a new database, again using powershell (Get-Content dump.sql | sqlite3 example2.db)
I dumped that new database and am left with a new .sql file (this time it was not through powershell - so I assume it was unmodified)
This new sql file's UTF-8 characters are seriously mangled, and I was wondering if there was a way to convert it back into correct UTF-8.
As a few examples, here are what some sequences are in the new file, and what they should be (all are viewed as UTF-8):
ÒüéÒü¬ÒüƒÒü½ should be あなたに
´╝ü should be a full width exclamation mark
Òé¡Òé╗Òé¡ should be キセキ
Does anyone have any idea as to how I might undo this mangling? Any method would be very helpful!
This is in powershell 7.0.1
Edit:
On further inspection, you can duplicate my predicament by redirecting any such data to a file in powershell (note that the data cannot itself be entered in powershell). Hence, setting up a script like this gives the same outcome:
test.sh
#!/bin/bash
echo "キ"
And then running wsl ./test.sh > test.txt will give an output of Òé¡, not キ
Edit 2:
It seems as if the codepage the UTF-8 text was converted to is almost 437: some characters are restored using this assumption (e.g. 木), but others are not. If it's close to 437, but isn't, what could it be?
It turns out, since I am in the UK, the codepage I wanted was 850. Saving the file as 850 and then reloading it as UTF-8 fixed my issue!

InstallScript GetLine() can not read text file contains result from command prompt

My Installation needs to check the result of a command from cmd.exe. Thus, I redirect the result of the command to a text file and then try to read the file to get the result as follows:
// send command to cmd to execute and redirect the result to a text file
// try to read the file
szDir = "D:\\";
szFileName = "MyFile.txt";
if Is(FILEEXISTS, szDir ^ szFileName) then
listID = ListCreate(STRINGLIST);
if listID != LIST_NULL then
if OpenFIleMode(FILE_MODE_NORMAL) = 0 then
if OpenFile(nFileHandle, szDir, szFileName) = 0 then
// I run into problem here
while (GetLine(nFileHandle, szCurLine) = 0 )
ListAddString(listID, szCurLine, AFTER);
endwhile;
CloseFile(nFileHandle);
endif;
endif;
endif;
endif;
The problem is that right after the command prompt is executed and the result is redirected to MyFile.txt, I can set open file mode, open the file but I can not read any text into my list. ListReadFromFile() does not helps. If I open the file, edit and save it manually, my script works.
After debugging, I figured that GetLine() returns an error code (-1) which means the file pointer must be at the end of file or other errors. However, FILE_MODE_NORMAL sets the file as read only and SET THE FILE POINTER AT THE BEGINNING OF THE FILE.
What did I possibly do wrong? Is this something to do with read/write access of the file? I tried this command without result:
icacls D:\MyFile.txt /grant Administrator:(R,W)
I am using IstallShield 2018 and Windows 10 64-bit btw. Your help is much appreciated.
EDIT 1: I suspected the encoding and tried a few things:
After running "wslconfig /l", the content of MyFile.txt opened in Notepad++ is without an encoding, but still appeared normal and readable. I tried to converted the content to UTF-8 but it did not work.
If I add something to the file (echo This line is appended >> MyFile.txt), the encoding changed to UTF-8, but the content in step 1 is changeed also. NULL (\0) is added to between every character and even repelace new line character. Maybe this is why GetLine() failed to read the file.
Work around: after step 1, I run "find "my_desired_content" MyFile.txt" > TempFile.txt and read TempFile.txt (which is encoded in UTF-8).
My ultimate goal is to check if "my_desired_content" apeears in the result of "wslconfig /l" so this is fine. However, what I don't understand is that both MyFile.txt and TempFile.txt are created from cmd command but they are encoded differently?
The problem is due to the contents of the file. Assuming this is the file generated by your linked question, you can examine its contents in a hex editor to find out the following facts:
Its contents are encoded in UTF-16 (LE) without a BOM
Its newlines are encoded as CR or CR CR instead of CR LF
I thought the newlines would be more important than the text encoding, but it turns out I had it backwards. If I change each of these things independently, GetLine seems to function correctly for either CR, CR CR, or CR LF, but only handles UTF-16 when the BOM is present. (That is, in a hex editor, the file starts with FF FE 57 00 instead of 57 00 for a file starting with the character W.)
I'm at a bit of a loss for the best way to address this. If you're up for a challenge, you could read the file with FILE_MODE_BINARYREADONLY, and can use your extra knowledge about what should be in the file to ensure you interpret its encoding correctly. Note that for most of UTF-16, you can create a single code unit by combining two bytes in the following manner:
szResult[i] = (nHigh << 8) + nLow;
where nHigh and nLow are probably values like szBuffer[2*i + 1] and szBuffer[2*i], assuming you filled a STRING szBuffer by calling ReadBytes.
Other unproven ideas include editing it in binary to ensure the BOM (FF FE) is present, figuring out ways to ensure the file is originally created with the BOM, figuring out ways to create it in an alternate encoding, finding another command you can invoke to "fix" the file, or lodging a request with the vendor (my employer) and hoping the development team changes something to better handle this case.
Here's an easier workaround. If you can safely assume that the command will append UTF-16 characters without a signature, you can append this output to a file that has just a signature. How do you get such a file?
You could create a file with just the BOM in your development environment, and add it to your Support Files. If you need to use it multiple times, copy it around first.
You could create it with code. Just call the following (error checking omitted for clarity)
OpenFileMode(FILE_MODE_APPEND_UNICODE);
CreateFile(nFileHandle, szDir, szFileName);
CloseFile(nFileHandle);
and if szDir ^ szFileName didn't exist, it will now be a file with just the UTF-16 signature.
Assuming this file is called sig.txt, you can then invoke the command
wslconfig /l >> sig.txt to write to that file. Note the doubled >> for append. The resulting file will include the Unicode signature you created ahead of time, plus the Unicode data output from wslconfig, and GetLine should interpret things correctly.
The biggest problem here is that this hardcodes around the behavior of wslconfig, and that behavior may change at any point. This is why Christopher alludes to recommending an API, and I agree completely. In the mean time, You could try to make this more robust by invoking it in a cmd /U (but my understanding of what that does or guarantees is fuzzy at best), or by trying the original way and then with the BOM.
This whole WSL thing is pretty new. I don't see any APIs it but rather then screen scrapping command outputs you might want to look at this registry key:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Lxss
It seems to have the list of installed distros that come from the store. Coming from the store probably explains why this is HKCU and not HKLM.
A brave new world.... sigh.

How to get a utf code from symbol in linux

I'm struggling with a special symbol in a text file on linux. I actually successfully pasted it between the following letters "a‏a" (my cursor in Geany stops but no character is displayed).
I'd like to know what's the easiest way to get its utf8 code (in the form U+0000). I'm using ubuntu and geany and I tried hexdump on a file containing it but I'm obviously missing something.
You could open the file with vim, put the text cursor over the character, then type 'ga' (without quotes) and it will display the character code in decimal, hex and octal in the status line.

Unknown Character

Facing a typical issue of some unknown character.
Actually trying to compile some packages in database through script and got an error as below:
SP2-0734: unknown command beginning "?SET DEF..." - rest of line ignored.
When i open the log file in notepad++ it shows the line as shown above.
Now, if I open the same log file in scite editor it shows the same file as:
SP2-0734: unknown command beginning "SET DEF..." - rest of line ignored.
Not getting what could be the issue.
Any help would be welcomed.
Your script has an unprintable character at the start (as you discovered from comments), which some editors don't display at all, and others display as an unknown character. "" is the byte order mark:
The UTF-8 representation of the BOM is the byte sequence
0xEF,0xBB,0xBF. A text editor or web browser interpreting the text as
ISO-8859-1 or CP1252 will display the characters  for this.
From that article some editors (notable Notepad) add that automatically. It should be safe to open the file with a hex editor and remove the extra character, and you'll then be able to run the script normally.

CreateTextfile() > write does not work

VBScript does delivers an "Illegal Argument" message when trying to write the text shown below to file using the following code. If I change resultStr to some test text, it works. What could be the problem?
Set resFile = fs.CreateTextfile(resFilePath, true)
resFile.write resultStr
resFile.close
Contents of resultStr:
Your string looks like it contains non-ASCII characters. You need to pass an extra True argument to CreateTextfile to open the text file using a Unicode encoding (probably UTF-16 on Windows).
If you want to write UTF-8 to the file, see Writing UTF8 text to file.

Resources