vbscript: convert unicode string to array of bytes - vbscript

I have unicode string passed to vbscript procedure (not visual basic 6, but vbscript). I want to iterate unicode string char by char, get code for every symbol, truncate code to byte range [0..255] and create array of bytes.
This way new array of bytes should be twice smaller in memory compared to original unicode string. I am going save this array to file via ADODB.Stream object further
How can I convert unicode string to bytes array with symbol code truncated to byte range?
Thank you in advance!

Firstly, translating unicode to ascii will only work if your string only contains ascii characters. Since unicode contains ascii, it is just a matter of removing every second character.
Look up unicode on the internet for details.
EDIT: In unicode, every ascii character is proceeded with a NULL (0) byte. Remove this byte to convert the string to ASCII.

It seems there is no way to create array of bytes in vbs (though it's very straightforward in visual basic) -- all arrays are arrays of variants.
The task was to send binary stream from server to vbs script via string type. I have found the solution by creating Xml Document on the server with CDATA section that contains base64 coded array of bytes as string data.
Client (vbs) do the following:
set xmlDoc = CreateObject("Microsoft.XmlDom")
xmlDoc.loadXML(dataFromServer)
base64str = xmlDoc.DocumentElement.Text ' it's base64 coded binary stream
arrayOfBytes = decodeBase64(base64str)
Function decodeBase64(base64)
set dm = CreateObject("Microsoft.XMLDOM")
set el = dm.createElement("tmp")
el.DataType = "bin.base64"
el.Text = base64
decodeBase64 = el.NodeTypedValue
set dm = Nothing
End Function

This function creates an array of bytes:
' http://www.motobit.com/tips/detpg_binarytostring/
Function MultiByteToBinary(MultiByte)
'� 2000 Antonin Foller, http://www.motobit.com
' MultiByteToBinary converts multibyte string To real binary data (VT_UI1 | VT_ARRAY)
' Using recordset
Dim RS, LMultiByte, Binary
Const adLongVarBinary = 205
Set RS = CreateObject("ADODB.Recordset")
LMultiByte = LenB(MultiByte)
If LMultiByte>0 Then
RS.Fields.Append "mBinary", adLongVarBinary, LMultiByte
RS.Open
RS.AddNew
RS("mBinary").AppendChunk MultiByte & ChrB(0)
RS.Update
Binary = RS("mBinary").GetChunk(LMultiByte)
End If
MultiByteToBinary = Binary
End Function
This function creates a multi-byte string.
' http://www.motobit.com/help/regedit/pa26.htm
'Converts unicode string to a multibyte string
Function StringToMB(S)
Dim I, B
For I = 1 To Len(S)
B = B & ChrB(Asc(Mid(S, I, 1)))
Next
StringToMB = B
End Function

Related

Why is the result of VarPtr(ByVal str) the same as StrPtr(str) ? (VB6)

In VB6 VarPtr should return the address of the variable, in this case the address of the variable str which is allocated on stack, and holds a pointer to a string in memory. StrPtr (or StrPtr) should return the address of the allocated string in the memory. ByVal should just create a copy, but in this case, it works strangely:
Dim str As String
str = "asd"
Debug.Print VarPtr(str)
Debug.Print VarPtr(ByVal str)
Debug.Print StrPtr(str)
The result is:
1636452
110882980
110882980
Why is the result of VarPtr(ByVal str) the same as StrPtr(str) ?
Strings passed ByVal pass the address of the first character of the containing C string in the BStr. StrPtr does the same.
There are two reasons that spring to mind for doing this. Passing Unicode to API calls and string building.
Passing Unicode to API calls
You could use StrPtr on a string rather than a byte array when sending Unicode strings to API functions.
Dim ByteArr() as Byte
Var1="My Text"
ByteArr = Var1
APICall(ByteArr(0))
APICall(StrPtr(Var1))
Should both pass a Unicode string to an API functions. Unicode strings are converted to ANSI strings when using the declare statement as Win 95 didn't do unicode.
String Building
On the other hand if you are string building, then that is built in to VBA using the Left, Right, and Mid statements, not functions (they are overloaded).
Sub Main()
Dim Var As String
Var = "gggggggggggg"
MsgBox StrPtr(Var)
Mid(Var, 1, 2) = "xx"
MsgBox StrPtr(Var) & " - " & Var
End Sub
ByVal Versus ByRef
Some authors like to say that the ByVal keyword is overloaded for
strings, meaning that it takes on a different meaning when applied to
strings than when applied to other variables. Frankly, I don't see it.
Writing:
ByVal str As String
tells VB to pass the contents of the BSTR (actually the ABSTR), which is the pointer to the character array. Thus, ByVal is acting
normally--it just happens that the content of the BSTR is a pointer to
another object, so this simulates a pass by reference. Similarly:
ByRef str As String
passes the address of the BSTR, as expected.
Win32 API Programming with Visual Basic, Chapter 6 Strings, O'Reilly, from MSDN Library October 2001
StrPtr
Strings in Visual Basic are stored as BSTR's. If you use the VarPtr on
a variable of type String, you will get the address of the BSTR, which
is a pointer to a pointer of the string. To get the address of the
string buffer itself, you need to use the StrPtr function. This
function returns the address of the first character of the string.
Take into account that Strings are stored as UNICODE in Visual Basic.
To get the address of the first character of a String, pass the String
variable to the StrPtr function.
Example:
Dim lngCharAddress as Long
Dim strMyVariable as String
strMyVariable = "Some String"
lngCharAddress = StrPtr(strMyVariable)
You can use this function when you need to pass a pointer to a
UNIOCODE string to an API call.
HOWTO: Get the Address of Variables in Visual Basic Q199824 Microsoft Knowledge Base, MSDN October 2001.
VarPtr is not part of the VBA/VB6 language, therefore companies that implement VBA (like Corel) may not implement it in their VBA. The VBA spec is here https://msdn.microsoft.com/en-us/library/dd361851.aspx

Powerbuilder: ImportFile of UTF-8 (Converting UTF-8 to ANSI)

My Powerbuilder version is 6.5, cannot use a higher version as this is what I am supporting.
My problem is, when I am doing dw_1.ImportFile(file) the first row and first column has a funny string like this:

Which I dont understand until I tried opening the file and saving it to a new text file and trying to import that new file.which worked flawlessly without the funny string.
My conclusion is that this is happening because the file is UTF-8 (as shown in NOTEPAD++) and the new file is Ansi. The file I am trying to import is automatically given by a 3rd party and my users dont want the extra job of doing this.
How do I force convert this files to ANSI in powerbuilder. If there is none, I might have to do a command prompt conversion, any ideas?
The weird  characters are the (optional) utf-8 BOM that tells editors that the file is utf-8 encoded (as it can be difficult to know it unless we encounter an escaped character above code 127). You cannot just rid it off because if your file contains any character above 127 (accents or any special char), you will still have garbage in your displayed data (for example: é -> é, € -> €, ...) where special characters will become from 2 to 4 garbage chars.
I recently needed to convert some utf-8 encoded string to "ansi" windows 1252 encoding. With version of PB10+, a reencoding between utf-8 and ansi is as simple as
b = blob(s, encodingutf8!)
s2 = string(b, encodingansi!)
But string() and blob() do not support encoding specification before the release 10 of PB.
What you can do is to read the file yourself, skip the BOM, ask Windows to convert the string encoding via MultiByteToWideChar() + WideCharToMultiByte() and load the converted string in the DW with ImportString().
Proof of concept to get the file contents (with this reading method, the file cannot be bigger than 2GB):
string ls_path, ls_file, ls_chunk, ls_ansi
ls_path = sle_path.text
int li_file
if not fileexists(ls_path) then return
li_file = FileOpen(ls_path, streammode!)
if li_file > 0 then
FileSeek(li_file, 3, FromBeginning!) //skip the utf-8 BOM
//read the file by blocks, FileRead is limited to 32kB
do while FileRead(li_file, ls_chunk) > 0
ls_file += ls_chunk //concatenate in loop works but is not so performant
loop
FileClose(li_file)
ls_ansi = utf8_to_ansi(ls_file)
dw_tab.importstring( text!, ls_ansi)
end if
utf8_to_ansi() is a globlal function, it was written for PB9, but it should work the same with PB6.5:
global type utf8_to_ansi from function_object
end type
type prototypes
function ulong MultiByteToWideChar(ulong CodePage, ulong dwflags, ref string lpmultibytestr, ulong cchmultibyte, ref blob lpwidecharstr, ulong cchwidechar) library "kernel32.dll"
function ulong WideCharToMultiByte(ulong CodePage, ulong dwFlags, ref blob lpWideCharStr, ulong cchWideChar, ref string lpMultiByteStr, ulong cbMultiByte, ref string lpUsedDefaultChar, ref boolean lpUsedDefaultChar) library "kernel32.dll"
end prototypes
forward prototypes
global function string utf8_to_ansi (string as_utf8)
end prototypes
global function string utf8_to_ansi (string as_utf8);
//convert utf-8 -> ansi
//use a wide-char native string as pivot
constant ulong CP_ACP = 0
constant ulong CP_UTF8 = 65001
string ls_wide, ls_ansi, ls_null
blob lbl_wide
ulong ul_len
boolean lb_flag
setnull(ls_null)
lb_flag = false
//get utf-8 string length converted as wide-char
setnull(lbl_wide)
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, 0)
//allocate buffer to let windows write into
ls_wide = space(ul_len * 2)
lbl_wide = blob(ls_wide)
//convert utf-8 -> wide char
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, ul_len)
//get the final ansi string length
setnull(ls_ansi)
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, 0, ls_null, lb_flag)
//allocate buffer to let windows write into
ls_ansi = space(ul_len)
//convert wide-char -> ansi
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, ul_len, ls_null, lb_flag)
return ls_ansi
end function

VB ASCB in LotusScript?

Working on converting a Visual Basic SHA-256 encryption routine to work in LotusScript.
Is going well except for the VB's AscB command.
Found: "Use the AscB function to return the first byte of a string containing byte data."
Not finding way to do same in LotusScript.
See the LS CByte command comes close: "CByte returns an expression that has been converted to Byte."
Don't see way to have it return just the first Byte of the expression.
Any suggestions?
Derek
AscB is only appropriate for strings in single-byte character encoding. All LotusScript string data is Unicode represented in UTF16 double-byte encoding.
The LotusScript Uni() function returns a Long containing the integer value of the Unicode character. Since the input is a double byte character, the value returned by Uni() ranges from 0 to 65535. If you want to get the values of each of the two bytes, code like this will do the trick:
Dim ws As New NotesUIWorkspace
Dim s1 As String
Dim u1 As Long
Dim u2 As Long
Dim lowbyte As Integer
Dim highbyte As Integer
Dim b1 As Byte
Dim b2 as Byte
s1 = "Ʃ"
u1 = Uni(s1)
lowbyte = u1 Mod 256
highbyte = (u1 - lowbyte) / 256
b1 = Cbyte(lowbyte)
b2 = Cbyte(highbyte)
Call ws.Prompt(prompt_ok,"test",s1 + " " + Cstr(Cint(b1)) + " " + Cstr(Cint(b2)))
Would Asc, LeftB and RightB do what you need?
In my testing...
Lenb("A") = 2
Leftb("A", 1) = "A"
Asc(Leftb("A", 1)) = 65
Leftb("A", 2) = "A"
Asc(Leftb("A", 2)) = 65
Asc(Rightb(Leftb("A", 2), 1)) = 0

Appending the length of a binary file to itself in VBScript

I am trying to find a way to append 32-bits of binary data that contains the length of a binary file to the front of the binary file. Here is what I have so far:
Dim size, sizeStr, n
Dim byteArray
const adTypeText=2
const adTypeBinary=1
dim inStream, outStream
set inStream=WScript.CreateObject("ADODB.Stream")
set outStream=WScript.CreateObject("ADODB.Stream")
inStream.Open
inStream.type=adTypeBinary
outStream.Open
outStream.type=adTypeBinary
inStream.LoadFromFile("test.bin")
size = inStream.Size
'Converts a decimal value to a 32-bit binary string
sizeStr = DecimalToBinary(size)
outStream.Write sizeStr
outStream.Write = inStream.Read()
outStream.SaveToFile("output.bin")
outStream.Close()
inStream.Close()
What do I do to get the binary string, sizeStr, into a form that can be written out using the outStream.Write function?

how do you parse a string in vb6?

Some of us unfortunately are still supporting legacy app like VB6. I have forgotten how to parse a string.
Given a string:
Dim mystring As String = "1234567890"
How do you loop in VB6 through each character and do something like
for each character in mystring
debug.print character
next
In C# i would do something like
char[] myChars = mystring.ToCharArray();
foreach (char c in theChars)
{
//do something with c
}
Any ideas?
Thanks a lot
You can use the 'Mid' function to get at the individual characters:
Dim i As Integer
For i = 1 To Len(mystring)
Print Mid$(mystring, i, 1)
Next
Note this is untested.
There is no possibility to use foreach on strings.
Use
Dim i As Integer
For i = 1 To Len(YourString)
Result = Mid$(YourString, i, 1)
Next
note that the type of Result is a length-1 string, no char or byte type.
If performance is important, you'll have to convert the string to a bytearray fist (using StrConv) and then loop through it like this.
Dim i As Long
For i = 0 To UBound(Data)
Result = Data(i) ' Type is Byte '
Next
This is much more efficient.
The easiest way is to convert the string into an array of bytes and iterate over the byte array (converting each byte to a character).
Dim str As String
Dim bytArray() As Byte
Dim count As Integer
str = "This is a string."
bytArray = str
For count = 0 To UBound(bytArray)
Debug.Print Chr(bytArray(count))
Next
Don't loop; rather, set a reference to Microsoft VBScript Regular Expressions library and use regular expressions to achieve your 'do something' goal.

Resources