How to create UTF-16 file in VBScript? - vbscript

My system is Window 10 English-US.
I need to write some non-printable ASCII characters to a text file. So for eg for the ASCII value of 28, I want to write \u001Cw to the file. I don't have to do anything special when coded in Java. Below is my code in VBS
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Type = 2
objStream.Position = 0
objStream.CharSet = "utf-16"
objStream.WriteText ChrW(28) 'Need this to appear as \u001Cw in the output file
objStream.SaveToFile "C:\temp\test.txt", 2
objStream.Close

You need a read-write stream so that writing to it and saving it to file both work.
Const adModeReadWrite = 3
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
With CreateObject("ADODB.Stream")
.Mode = adModeReadWrite
.Type = adTypeText
.Charset = "UTF-16"
.Open
.WriteText text
.SaveToFile filename, adSaveCreateOverWrite
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
Other notes:
I like to explicitly define with Const all the constants in the code. Makes reading so much easier.
A With block save quite some typing here.
Setting the stream type to adTypeText is not really necessary, that's the default anyway. But explicit is better than implicit, I guess.
Setting the Position to 0 on a new stream is superfluous.
It's unnecessary to use ChrW() for ASCII-range characters. The stream's Charset decides the byte width when you save the stream to file. In RAM, everything is Unicode anyway (yes, even in VBScript).
There are two UTF-16 encodings supported by ADODB.Stream: little-endian UTF-16LE (which is the default and synonymous with UTF-16) and big-endian UTF-16BE, with the byte order reversed.
You can achieve the same result with the FileSystemObject and its CreateTextFile() method:
Set FSO = CreateObject("Scripting.FileSystemObject")
Sub SaveToFile(text, filename)
' CreateTextFile(filename [, Overwrite [, Unicode]])
With FSO.CreateTextFile(filename, True, True)
.Write text
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
This is a little bit simpler, but it only offers a Boolean Unicode parameter, which switches between UTF-16 and ANSI (not ASCII, as the documentation incorrectly claims!). The solution with ADODB.Stream gives you fine-grained encoding choices, for example UTF-8, which is impossible with the FileSystemObject.
For the record, there are two ways to create an UTF-8-encoded text file:
The way Microsoft likes to do it, with a 3-byte long Byte Order Mark (BOM) at the start of the file. Most, if not all Microsoft tools do that when they offer "UTF-8" as an option, ADODB.Stream is no exception.
The way everyone else does it - without a BOM. This is correct for most uses.
To create an UTF-8 file with BOM, the first code sample above can be used. To create an UTF-8 file without BOM, we can use two stream objects:
Const adModeReadWrite = 3
Const adTypeBinary = 1
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
Dim iStr: Set iStr = CreateObject("ADODB.Stream")
Dim oStr: Set oStr = CreateObject("ADODB.Stream")
' one stream for converting the text to UTF-8 bytes
iStr.Mode = adModeReadWrite
iStr.Type = adTypeText
iStr.Charset = "UTF-8"
iStr.Open
iStr.WriteText text
' one steam to write bytes to a file
oStr.Mode = adModeReadWrite
oStr.Type = adTypeBinary
oStr.Open
' switch first stream to binary mode and skip UTF-8 BOM
iStr.Position = 0
iStr.Type = adTypeBinary
iStr.Position = 3
' write remaining bytes to file and clean up
oStr.Write iStr.Read
oStr.SaveToFile filename, adSaveCreateOverWrite
oStr.Close
iStr.Close
End Sub

Related

Classic ASP Base64 Encoding and Line Breaks

I have been using the base64 encoding function from this answer (code is below)
https://stackoverflow.com/a/506992/510296
I noticed that it is wrapping lines of output after the 72nd character (which causes problems when I try to pass that encoded string to the eBay API).
I can remove the line breaks easily enough with replace(base64string, vblf, "") but wanted to ask if there is a proper way to prevent line breaks in the output.
Function Base64Encode(sText)
Dim oXML, oNode
Set oXML = CreateObject("Msxml2.DOMDocument.3.0")
Set oNode = oXML.CreateElement("base64")
oNode.dataType = "bin.base64"
oNode.nodeTypedValue =Stream_StringToBinary(sText)
Base64Encode = oNode.text
Set oNode = Nothing
Set oXML = Nothing
End Function
Function Stream_StringToBinary(Text)
Const adTypeText = 2
Const adTypeBinary = 1
'Create Stream object
Dim BinaryStream 'As New Stream
Set BinaryStream = CreateObject("ADODB.Stream")
'Specify stream type - we want To save text/string data.
BinaryStream.Type = adTypeText
'Specify charset For the source text (unicode) data.
BinaryStream.CharSet = "us-ascii"
'Open the stream And write text/string data To the object
BinaryStream.Open
BinaryStream.WriteText Text
'Change stream type To binary
BinaryStream.Position = 0
BinaryStream.Type = adTypeBinary
'Ignore first two bytes - sign of
BinaryStream.Position = 0
'Open the stream And get binary data from the object
Stream_StringToBinary = BinaryStream.Read
Set BinaryStream = Nothing
End Function

Open an XML, Read Text, Replace some Text, Write the file back in UTF-8 without BOM format

My IBM MQ is not accepting XML file saved in UTF-8 format. I want to try if it accepts UTF-8 without BOM format. I have tried multiple things, but was unable to save the file in non-BOM format. My code is below.
Dim objStreamUTF8 : Set objStreamUTF8 = CreateObject("ADODB.Stream")
Dim objStreamUTF8NoBOM : Set objStreamUTF8NoBOM = CreateObject("ADODB.Stream")
With objStreamUTF8
.Charset = "UTF-8"
.Mode = 3
.Type = 2
.Open
objStreamUTF8.LoadFromFile uxtNewRenamePath'"C:\WINDOWS\Temp\DataFiles\TC10_ Apostrophe symbol in tag Cdtr_Adrline2_Pacs8_TxID201765133641.xml"
objStreamUTF8.Flush
strFileText = objStreamUTF8.ReadText()
strFileText = Replace(strFileText,"&#", "&#")
strFileText = Replace(Replace(Replace(Replace(Replace(strFileText, "GreaterThanSymbol", ">"), "LessThanSymbol", "<"), "ApostropheSymbol", "'"), "AmpersandSymbol", "&"), "DoubleQuotesSymbol", chr(34))
Set objFSO = CreateObject("Scripting.FileSystemObject")
objFSO.DeleteFile uxtNewRenamePath, True
Set objFSO = Nothing
.Position = 0
.Flush
.WriteText strFileText
.SaveToFile uxtNewRenamePath, 2
.Close
.Type = 1
.Open
.Position = objStreamUTF8.Size
End With
With objStreamUTF8NoBOM
.Mode = 3
.Type = 1
.Open
objStreamUTF8.CopyTo objStreamUTF8NoBOM
.SaveToFile uxtNewRenamePath, 2
End With
objStreamUTF8.Close
objStreamUTF8NoBOM.Close
When objStreamUTF8NoBOM is saved to file, the file is completely blank. Can you please tell me what I am doing wrong or how I can save the file in the desired format?
Got it. The position of the first stream has to be set to 3rd postion to skip BOM characters
i.e. objStreamUTF8.Position = 3

UCS-2 Little Endian to UTF-8 conversion leaves file with many unwanted characters

I have a script that I put together after going over many different ways that I could do an encoding conversion using ADODB in VBScript.
Option Explicit
Sub UTFConvert()
Dim objFSO, objStream, file
file = "FileToConvert.csv"
Set objStream = CreateObject( "ADODB.Stream" )
objStream.Open
objStream.Type = 2
objStream.Position = 0
objStream.Charset = "utf-8"
objStream.LoadFromFile file
objStream.SaveToFile file, 2
objStream.Close
Set objStream = Nothing
End Sub
UTFConvert
The file is supposed to be converted from UCS-2 Little Endian, or whichever readable format it is in (within limitations), to UTF-8. The issue however is that once this file has finished converting to UTF-8 there are many NUL symbols throughout the entire file before and after every letter, and xFF xFE (UCS-2 LE BOM) at the start of the file. These are visible without needing to use any symbol visualization toggles. Any help would be appreciated in understanding where I may be limited with this conversion. Or any alternative approach I can take.
Your Stream object is loading the file as an UTF-8 encoded file, thus misinterpreting the byte sequences. Read the file using a FileSystemObject instance and write it with the ADODB.Stream object:
Sub UTFConvert(filename)
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile(filename, 1, False, -1).ReadAll
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Position = 0
stream.Charset = "utf-8"
stream.WriteText txt
stream.SaveToFile filename, 2
stream.Close
End Sub

How do I a read text file's content from the web in Visual Basic 6.0 line by line?

I need source code for reading the .txt content from a URL.
My text file content sample and then load in Visual Basic 6.0:
My source code:
Dim data As String
data = Inet1.OpenURL("http://test.com/sample.txt")
Text1.Text = data
There is nothing that will only "download" a line at a time as it can't tell where the line breaks are until it's downloaded it.
If you only want to read/process a line at a time, you can split on the line breaks after downloading it:
Dim Data As String
Dim DataLines() As String
Data = Inet1.OpenURL("http://test.com/sample.txt")
DataLines = Split(Data, vbCrLf)
For Index = LBound(DataLines) to UBound(DataLines)
MsgBox DataLines(Index)
Next
You will need to be careful to make sure you have the correct line break for the data being read.
When dealing with HTTP you have to consider both line separators and character encoding. If you can make assumptions after testing then you can bypass some checking and just hard-code to fit your needs.
However the creay old Internet Transfer Control ("Inet") is usually not the best choice available and more modern alternatives are shipped as part of Windows since at least the advent of IE 5.5, and installed with IE 5.5 on more ancient versions of Windows. Thus they'll even be available and work on nearly any Win95 system still running today.
'References to MSXML 3.0 or later,
' ADO 2.5 or later.
Private Function GetHttpText(ByVal URL As String) As ADODB.Stream
Dim Req As MSXML2.XMLHTTP
Dim CharSet As String
Dim CharsetPos As Long
Dim LineSeparator As LineSeparatorEnum
Set Req = New MSXML2.XMLHTTP
Set GetHttpText = New ADODB.Stream
With GetHttpText
.Open
.Type = adTypeBinary
With Req
.Open "GET", URL, False
.send
CharSet = LCase$(.getResponseHeader("CONTENT-TYPE"))
End With
.Write Req.responseBody
CharsetPos = InStr(CharSet, "charset")
If CharsetPos Then
CharSet = Split(Mid$(CharSet, CharsetPos), "=")(1)
Else
'UTF-8 is a reasonable "default" these days:
CharSet = "utf-8"
End If
If CharSet = "utf-8" Then
LineSeparator = adLF
Else
'Your milage may vary here, since there is no line-end
'header defined for HTTP:
LineSeparator = adCRLF
End If
.Position = 0
.Type = adTypeText
.CharSet = CharSet
.LineSeparator = LineSeparator
End With
End Function
Private Sub DumpTextLineByLine()
With GetHttpText("http://textfiles.com/art/simpsons.txt")
'Read text line by line to populate a multiline TextBox
'just as a demonstration:
Do Until .EOS
Text1.SelText = .ReadText(adReadLine)
Text1.SelText = vbNewLine
Loop
.Close
End With
End Sub

Writing to file with ISO-8859-1 encoding

I have some vb6 code that is stubbornly writing to Windows-1252.
Open fileName For Binary Access Write As #fileNo
Put #fileNo, , contents
Close #fileNo
I managed to make it write to UTF-16 (LE) by doing this;
contents = ChrW$(&HFEFF&) & contents
Is there any way I could easily make it write to ISO-8859-1? Examples/suggestions would be greatly appreciated here.
If your files are not huge then ADO can come to the rescue for quick and dirty handling of odd encodings.
Example:
Option Explicit
Private Sub Main()
Const contents As String = "Hello World. (4 × 6) ÷ 8 = 3 €€€ ƒƒƒ"
Dim Stm As ADODB.Stream
Set Stm = New ADODB.Stream
With Stm
.Open
.Type = adTypeText
.Charset = "iso-8859-1"
.LineSeparator = adLF
.WriteText contents, adWriteLine
.SaveToFile "ISO-8859-1.txt", adSaveCreateOverWrite
.Close
.Open
.Type = adTypeText
.Charset = "windows-1252"
.LineSeparator = adCRLF
.WriteText contents, adWriteLine
.SaveToFile "Windows-1252.txt", adSaveCreateOverWrite
.Close
End With
MsgBox "Done"
End Sub
Windows-1252 is essentially a superset of ISO-8859-1; just write your data as Windows-1522, and don't use any of the Windows-1252 characters that aren't also ISO-8859-1 characters.
VB6 character output is encoded in the default machine code page for non-Unicode programs.
If you can manage to set your machine code page to 28591, which is the Windows code page for iso-8859-1, then you can be absolutely sure that your output will be iso-8859-1
Alternatively just avoid the characters where Windows 1252 differs from ISO-8859-1. Wikipedia says that is 128 to 159. You could detect them and substitute with question marks or throw an error.

Resources