vbs file saved as utf-8 does not run - vbscript

I have been writing a vbs script in notepad that adds text to an excel file, and this is working fine.
I then needed to write unicode characters to the excel file, so saved the vbs file as unicode and again all worked fine.
I am noe trying to write the file dynamically from another program, which is possible, but it writes the unicode vbs file as utf-8 formatting, and then when I try to run the vbs file, it gives an error
saying error:invalid character
code:800A0408
Source:microsoft VBScript compilation error
Does this mean I cannot run a file saved as utf-8 formatting, or am I missing something?
Any help would be gratefully received!
Dave

Use UCS-2 Little Endian, that accepts unicode chars and runs VBS properly! You can convert any existing VBS file to this format with notepad++ for example.

C/WScript.exe can't run UTF-8 encoded .vbs files. If you can't change the encoding/write mode of that 'another program', you either have to convert the UTF-8 source to UTF-16 or write/generate the code in plain ASCII and inject the unicode data via ChrW() resp. an UTF-16 (easy) or UTF-8 (ADODB.Stream) encoded external file.
WRT comment:
As long as you don't use non-ascii characters in string literals - and you can avoid that for a few of them by using ".." & ChrW(..) & ".." - you can save the .vbs as ascii. If your 'another program' loads and saves such a file as UTF-8 (without BOM!) it doesn't matter; but if it adds a BOM you must convert the source file.
Perhaps you should add some more details/code to improve your chances of getting better advice.

Related

How do WritePrivateProfileStringA() and Unicode go together?

I'm working on a legacy project that uses INI files and I'm currently trying to understand how Microsoft deals with INI files.
The documentation of WritePrivateProfileStringA() [MSDN] says for lpFileName:
If the file was created using Unicode characters, the function writes Unicode characters to the file. Otherwise, the function writes ANSI characters.
What exactly does that mean? What is a file "created using Unicode characters"? How does Microsoft determine whether a file was created using Unicode characters or not?
Since this is documented under lpFileName, do they refer to Unicode characters in the file name, like "if the file has a Japanese file name, we'll read it as Unicode"?
By default neither the ...A() nor the ...W() method supports Unicode as file contents for INI files. If e.g. the file does not exist, they will both create a file with ANSI content.
However, if you create the INI file first and you give it a UTF-16 BOM (byte-order-mark), both ...A() and ...W() will respect that BOM and write UTF-16 characters to the file.
Other than the BOM, the file can be empty, so a 2 byte file with 0xFF 0xFE content is enough to get the Microsoft API to write Unicode characters.
Both methods will not recognize and respect a UTF-8 BOM. In fact, a UTF-8 BOM can break an existing file if the UTF-8 BOM and the first section are both in line 1. In that case you can't access any of the keys in the affected section. If the first section is in line 2, the UTF-8 BOM will have no effect.
My tests on Windows 10 21H1 cannot confirm a statement about UTF16-BE support from 2006:
Just for fun, you can even reverse the BOM bytes and WritePrivateProfileString will write to it as a UTF-16 BE (Big Endian) file!

How to open a (CSV) file in oracle and save it in UTF-8 format if it is in another formats

Can anyone please advise me on the below issue.
I have an oracle program which will take a .CSV file as the input and will process it. We are now facing an issue that when there is an extended ASCII character appear in the input file, its trimming the next letter after that special character.
We are using the File utility function Utl_File.Fopen_Nchar() to open the file and Utl_File.Get_Line_Nchar() for reading the characters in the file. The program is written in such a way that it should handle multiple languages(Unicode characters) in the input file.
In the analysis its found that when the character encoding of the CSV file is UTF-8 its processing the file successfully even when extended ASCII characters as well as Unicode characters are there. But some times we are getting the file in 1252 (ANSI - Latin I) format which makes the trimming problem for extended ASCII characters.
So is there any way to handle this issue? Can we open a (CSV) file in oracle and save it in UTF-8 format if it's in any another formats?
Please let me know if any more info is needed.
Thanks in anticipation.
The problem is when you don't know in which encoding your CSV file is saved then it is not possible to determine any conversion either. You would screw up your CSV file.
What do you mean by "1252 (ANSI - Latin I)"?
Windows-1252 and ISO-8859-1 are not equal, see the difference here: ISO 8859-1 vs. ISO 8859-15 vs. Windows-1252 vs. Unicode
(Sorry for posting the German Wikipedia, however the English version does not show such a nice table)
You could use the fix_latin command-line tool convert a file from an unknown mixture of ASCII / Latin-1 / CP1251 / UTF8 into UTF8:
fix_latin < input.csv > output.csv
The fix_latin utility is a simple Perl script which is shipped with the Encoding::FixLatin module on CPAN.

Generate UTF-8 file with NotesStream

I'm trying to export some text to an UTF-8 file with LotusScript. I checked the documentation and the following lines should output my text as UTF-8, but Notepad++ says it's ANSI.
Dim streamCompanies As NotesStream
Dim sesCurrent as New NotesSession
Set streamCompanies = sesCurrent.CreateStream
Call streamCompanies.Open("C:\companies.txt", "UTF-8")
Call streamCompanies.WriteText("Test")
streamCompanies.Close
When I try the same with UTF-16 instead of UTF-8, the generated fileformat is correct. Could anyone point me in the right direction on how to write an UTF-8 file with LotusScript on a Windows platform?
Notes is most likely doing its job and encoding properly. It is likely that Notepad++ is interpreting the UTF-8 file as ANSI if no UTF-8-only characters exist in the file. There is no other way to determine the encoding in this case other than to analyze its contents.
See this SO answer: How to avoid inadvertent encoding of UTF-8 files as ASCII/ANSI?
So a simple test to make sure Notes is working would be to output a non-ANSI character and then open in Notepad++ to confirm.
Closed - down the line while coding I stumbled across some data with Asian characters which where displayed correctly in my text editor. Rechecking file encodings I found the following:
If the output text only includes ASCII-chars, it is decoded as ANSI with Notepad++
If the output text contains e.g. Katakana, it is decoded as UTF-8 with Notepad++
-> problem solved for me.

How does Windows Notepad interpret characters?

I was wondering how Windows interprets characters.
I made a file with a hex editor with the 3 bytes E3 81 81.
Those bytes are the ぁ character in UTF-8.
I opened the notepad and it displayed ぁ. I didn't specify the encoding of the file, I just created the bytes and the notepad interpreted it correctly.
Is notepad somehow guessing the encoding?
Or is the hex editor saving those bytes with a specific encoding?
If the file only contains these three bytes, then there is no information at all about which encoding to use.
A byte is just a byte, and there is no way to include any encoding information in it. Besides, the hex editor doesn't even know that you intended to decode the data as text.
Notepad normally uses ANSI encoding, so if it reads the file as UTF-8 then it has to guess the encoding based on the data in the file.
If you save a file as UTF-8, Notepad will put the BOM (byte order mark) EF BB BF at the beginning of the file.
Notepad makes an educated guess. I don't know the details, but loading the first few kilobytes and trying to convert them from UTF-8 is very simple, so it probably does something similar to that.
...and sometimes it gets it wrong...
https://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/
There is an easy and efficient way to check whether a file is in UTF-8. See Wikipedia: http://en.wikipedia.org/w/index.php?title=UTF-8&oldid=581360767#Advantages, fourth bullet point. Notepad probably uses this.
Wikipedia claims that Notepad used the IsTextUnicode function, which checks whether a patricular text is written in UTF-16 (it may have stopped using it in Windows Vista, which fixed the "Bush hid the facts" bug): http://en.wikipedia.org/wiki/Bush_hid_the_facts.
how to identify the file is in which encoding ....?
Go to the file and try to Save As... and you can see the default (current) encoding of the file (in which encoding it is saved).

Batch convert to UTF8 using Ruby

I'm encountering a little problem with my file encodings.
Sadly, as of yet I still am not on good terms with everything where encoding matters; although I have learned plenty since I began using Ruby 1.9.
My problem at hand: I have several files to be processed, which are expected to be in UTF-8 format. But I do not know how to batch convert those files properly; e.g. when in Ruby, I open the file, encode the string to utf8 and save it in another place.
Unfortunately that's not how it is done - the file is still in ANSI.
At least that's what my Notepad++ says.
I find it odd though, because the string was clearly encoded to UTF-8, and I even set the File.open parameter :encoding to 'UTF-8'. My shell is set to CP65001, which I believe also corresponds to UTF-8.
Any suggestions?
Many thanks!
/e: What's more, when in Notepad++, I can convert manually as such:
Selecting everything,
copy,
setting encoding to UTF-8 (here, \x-escape-sequences can be seen)
pasting everything from clipboard
Done! Escape-characters vanish, file can be processed.
Unfortunately that's not how it is done - the file is still in ANSI. At least that's what my Notepad++ says.
UTF-8 was designed to be a superset of ASCII, which means that most of the printable ASCII characters are the same in UTF-8. For this reason it's not possible to distinguish between ASCII and UTF-8 unless you have "special" characters. These special characters are represented using multiple bytes in UTF-8.
It's well possible that your conversion is actually working, but you can double-check by trying your program with special characters.
Also, one of the best utilities for converting between encodings is iconv, which also has ruby bindings.

Resources