I have the following structures defined (names are anonymised, but data types are correct):
Public Type ExampleDataItem
Limit As Integer ' could be any value 0-999
Status As Integer ' could be any value 0-2
ValidUntil As Date ' always a valid date
End Type
Public Type ExampleData
Name As String ' could be 5-20 chars long
ValidOn As Date ' could be valid date or 1899-12-30 representing "null"
Salt As Integer ' random value 42-32767
Items(0 To 13) As ExampleDataItem
End Type
I would like to generate a 32-bit hash code for an ExampleData instance. Minimising hash collisions is important, performance and data order is not important.
So far I have got (in pseudocode):
Serialise all members into one byte array.
Loop through the byte array, reading 4 bytes at a time into a Long value.
XOR all the Long values together.
I can't really post my code because it's heavily dependent on utility classes to do the serialisation, but if anyone wants to see it regardless then I will post it.
Will this be OK, or can anyone suggest a better way of doing it?
EDIT:
This code is being used to implement part of a software licensing system. The purpose of the hash is to confirm whether the data entered by the end user equals the data entered by the tech support person. The hash must therefore:
Be very short. That's why I thought 32 bits would be most suitable, because it can be rendered as a 10-digit decimal number on screen. This is easy, quick and unambiguous to read over the telephone and type in.
Be derived from all the fields in the data structure, with no extra artificial keys or any other trickery.
The hash is not required for lookup, uniqueness testing, or to store ExampleData instances in any kind of collection, but only for the one purpose described above.
Can you use the CRC32? Steve McMahon has an implementation. Combine that with a bit of base32 encoding and you've got something short enough to read over the phone.
Considering that performance is not an objective, if file size is not important and you want a unique value for each item. Just add an ID field. It data type is a string. Then use this function to generate a GUID. This will be a unique ID. Use it as a key for a dictonary or collection.
Public Type GUID
Data1 As Long
Data2 As Integer
Data3 As Integer
Data4(7) As Byte
End Type
Public Type GUID2 '15 BYTES TOTAL
Data1(14) As Byte
End Type
Public Declare Function CoCreateGuid Lib "OLE32.DLL" (pGuid As GUID) As Long
Public Function GetGUID() As String
Dim VBRIG_PROC_ID_STRING As String
VBRIG_PROC_ID_STRING = "GetGUID()"
Dim lResult As Long
Dim lguid As GUID
Dim MyguidString As String
Dim MyGuidString1 As String
Dim MyGuidString2 As String
Dim MyGuidString3 As String
Dim DataLen As Integer
Dim StringLen As Integer
Dim i As Integer
On Error GoTo error_olemsg
lResult = CoCreateGuid(lguid)
If lResult = 0 Then
MyGuidString1 = Hex$(lguid.Data1)
StringLen = Len(MyGuidString1)
DataLen = Len(lguid.Data1)
MyGuidString1 = LeadingZeros(2 * DataLen, StringLen) & MyGuidString1
'First 4 bytes (8 hex digits)
MyGuidString2 = Hex$(lguid.Data2)
StringLen = Len(MyGuidString2)
DataLen = Len(lguid.Data2)
MyGuidString2 = LeadingZeros(2 * DataLen, StringLen) & Trim$(MyGuidString2)
'Next 2 bytes (4 hex digits)
MyGuidString3 = Hex$(lguid.Data3)
StringLen = Len(MyGuidString3)
DataLen = Len(lguid.Data3)
MyGuidString3 = LeadingZeros(2 * DataLen, StringLen) & Trim$(MyGuidString3)
'Next 2 bytes (4 hex digits)
GetGUID = MyGuidString1 & MyGuidString2 & MyGuidString3
For i = 0 To 7
MyguidString = MyguidString & Format$(Hex$(lguid.Data4(i)), "00")
Next i
'MyGuidString contains last 8 bytes of Guid (16 hex digits)
GetGUID = GetGUID & MyguidString
Else
GetGUID = "00000000" ' return zeros if function unsuccessful
End If
Exit Function
error_olemsg:
GetGUID = "00000000"
Exit Function
End Function
Public Function LeadingZeros(ExpectedLen As Integer, ActualLen As Integer) As String
LeadingZeros = String$(ExpectedLen - ActualLen, "0")
End Function
EDIT: the question has now been edited to clarify that the goal is detecting typing errors, not minimizing collisions between totally different values. In that case Dan F's answer is the best one IMHO, not my offering below (wonderful though it is).
You could use the Microsoft CryptoAPI rather than rolling your own hash algorithm.
For instance this Microsoft article on using CryptoAPI from VB6 should get you started.
Or this from Edanmo on mvps.org for hashing a string in VB6.
EDIT: Following comment. If you insist on a 32-bit value, it will be hard to minimize hash collisions. My algorithm book suggests using Horner's method as a decent general purpose hashing algorithm. I don't have time right now to find out more information and implement in VB6. CopyMemory would probably be useful :)
You may be overthinking it, or I'm not understanding the issue. You could essentially just
hash(CStr(Salt) + Name + CStr(ValidOn) + Anyotherstrings
There is no particular need to go through the process of serializing into byte array and XORing values. Infact XORing values together in that way is more likely to create hash collisions where you aren't intending them.
Edit: I think I understand now. You're creating your own hash value by XORing the data together? It's unfortunately quite likely to give collisions. I know VB6 doesn't include any hashing algorithms, so you may be best importing and using something like Phil Fresle's SHA256 implementation.
Related
I have hunted about quite a bit but can't find a way to get at the Hexadecimal or Binary representation of the content of a Double variable in VB6. (Are Double variables held in IEEE754 format?)
The provided Hex(x) function is no good because it integerizes its input first.
So if I want to see the exact bit pattern produced by Atn(1), Hex(Atn(1)) does NOT produce it.
I'm trying to build a mathematical function containing If clauses. I want to be able to see that the values returned on either side of these boundaries are, as closely as possible, in line.
Any suggestions?
Yes, VB6 uses standard IEEE format for Double. One way to get what you want without resorting to memcpy() tricks is to use two UDTs. The first would contain one Double, the second a static array of 8 Byte. LSet the one containing the Double into the one containing the Byte array. Then you can examine each Byte from the Double one by one.
If you need to see code let us know.
[edit]
At the module level:
Private byte_result() As Byte
Private Type double_t
dbl As Double
End Type
Private Type bytes_t
byts(1 To 8) As Byte
End Type
Then:
Function DoubleToBytes (aDouble As Double) As Byte()
Dim d As double_t
Dim b As bytes_t
d.dbl = aDouble
LSet b = d
DoubleToBytes = b.byts
End Function
To use it:
Dim Indx As Long
byte_result = DoubleToBytes(12345.6789#)
For Indx = 1 To 8
Debug.Print Hex$(byte_result(Indx)),
Next
This is air code but it should give you the idea.
I'm trying to find a way to convert a long string ID like "T2hR8VAR4tNULoglmIbpAbyvdRi1y02rBX" to a numerical id.
I thought about getting the ASCII value of each number and then adding them up but I don't think that this is a good way as different numbers can have the same result, for example, "ABC" and "BAC" will have the same result
A = 10, B = 20, C = 50,
ABC = 10 + 20 + 50 = 80
BAC = 20 + 10 + 50 = 80
I also thought about getting each letters ASCII code, then set the numbers next to each other for example "ABC"
so ABC = 102050
this method won't work as having a 20 letter String will result in a huge number, so how can I solve this problem? thank you in advance.
You can use the hashCode() function. "id".hashcode(). All objects implement a variance of this function.
From the documentation:
open fun hashCode(): Int
Returns a hash code value for the object. The general contract of hashCode is:
Whenever it is invoked on the same object more than once, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
If two objects are equal according to the equals() method, then calling the hashCode method on each of the two objects must produce the same integer result.
All platform object implements it by default. There is always a possibility for duplicates if you have lots of ids.
If you use a JVM based kotlin environment the hash will be produced by the
String.hashCode() function from the JVM.
If you need to be 100% confident that there are no possible duplicates, and the input Strings can be up to 20 characters long, then you cannot store the IDs in a 64-bit Long. You will have to use BigInteger:
val id = BigInteger(stringId.toByteArray())
At that point, I question whether there is any point in converting the ID to a numerical format. The String itself can be the ID.
I have a fairly complex look of code where I am looking through multiple control variables.
I am getting an error 'Invalid 'for' loop control variable
the line in questions is
for w(1) = 32 to 127
I am more familiar with VBA where I would have zero problem with this statement.
I'm guessing it has something to do with the fact that i will be looping through w(1),w(2),w(3) etc. in the same tree. I initialize the variable as dim x(10) but have also tried dim w() , dim w() redim w(10)
Any thoughts? its a fairly critical aspect of the script; as such I am unwilling to swap out all my w 1,2... for individual variables
Thoughts?
EDIT:
As per comments I should clarify a Few things:
Essentially there is a alpha numeric association with an ID in a system that I am working with which I was not handed down the key too. So I have a multi-dimensional array of rates that are used for multiplying out costs.
What I am doing is working backwards through invoices and matching a material with very subtle differences that have different pricings.
For simplicity sake, say theres a 2 dimensional material where AA, AB, ... A9 are all priced through several multiplication factors in what would just be a 2x2 grid. So maintaining a pivot point based on the position in string is very important. For this code you could take tier to mean how many characters in the string (aka how complex the composition of the material):
dim x(), w()
for tier = 1 to 2
for w(1) = 32 to 127
x(1)= chr(w(1))
If tier = 2 then
for w(2)= 32 to 127
X(2)=chr(w(2))
next
end if
str = ""
for y = 1 to (tier)
str = trim(str & x(y))
next
'''msgbox str 'debug
next
end if
str = ""
for y = 1 to (tier)
str = trim(str & x(y))
next
'' msgbox str ' debug
next 'tier
This is just an excerpt i pulled to get a basic idea of the structure w/o any calculations. this is in essence what is not working
The error is quite clear, you cannot use an Array as the control variable. The definition in For...Next Statement is even clearer;
Numeric variable used as a loop counter. The variable cannot be an array element or an element of a user-defined type.
This is one of the key differences between VBA and VBScript.
You won't loop through x(1),x(2)...on what you write it's going like this 32(1),33(1)....what type it's your w(1) and how you define him?
I have a code that should get unique string(for example, "d86c52ec8b7e8a2ea315109627888fe6228d") from client and return integer more than 2200000000 and less than 5800000000. It's important, that this generated int is not random, it should be one for one unique string. What is the best way to generate it without using DB?
Now it looks like this:
did = "d86c52ec8b7e8a2ea315109627888fe6228d"
min_cid = 2200000000
max_cid = 5800000000
cid = did.hash.abs.to_s.split.last(10).to_s.to_i
if cid < min_cid
cid += min_cid
else
while cid > max_cid
cid -= 1000000000
end
end
Here's the problem - your range of numbers has only 3.6x10^9 possible values where as your sample unique string (which looks like a hex integer with 36 digits) has 16^32 possible values (i.e. many more). So when mapping your string into your integer range there will be collisions.
The mapping function itself can be pretty straightforward, I would do something such as below (also, consider using only a part of the input string for integer conversion, e.g. the first seven digits, if performance becomes critical):
def my_hash(str, min, max)
range = (max - min).abs
(str.to_i(16) % range) + min
end
my_hash(did, min_cid, max_cid) # => 2461595789
[Edit] If you are using Ruby 1.8 and your adjusted range can be represented as a Fixnum, just use the hash value of the input string object instead of parsing it as a big integer. Note that this strategy might not be safe in Ruby 1.9 (per the comment by #DataWraith) as object hash values may be randomized between invocations of the interpreter so you would not get the same hash number for the same input string when you restart your application:
def hash_range(obj, min, max)
(obj.hash % (max-min).abs) + [min, max].min
end
hash_range(did, min_cid, max_cid) # => 3886226395
And, of course, you'll have to decide what to do about collisions. You'll likely have to persist a bucket of input strings which map to the same value and decide how to resolve the conflicts if you are looking up by the mapped value.
You could generate a 32-bit CRC, drop one bit, and add the result to 2.2M. That gives you a max value of 4.3M.
Alternately you could use all 32 bits of the CRC, but when the result is too large, append a zero to the input string and recalculate, repeating until you get a value in range.
If I have two variables containing binary values, how do I append them together as one binary value? For example, if I used WMI to read the registry of two REG_BINARY value, I then want to be able to concatenate the values.
VBScript complains of a type mismatch when you try to join with the '&' operator.
REG_BINARY value will be returned as an array of bytes. VBScript may reference an array of bytes in a variable and it may pass this array of bytes either as a variant to another function or as a reference to array of bytes. However VBScript itself can do nothing with the array.
You are going to need some other component to do some from of concatenation:-
Function ConcatByteArrays(ra, rb)
Dim oStream : Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1 'Binary'
oStream.Write ra
oStream.Write rb
oStream.Position = 0
ConcatByteArrays = oStream.Read(LenB(ra) + LenB(rb))
oStream.Close
End Function
In the above code I'm using the ADODB.Stream object which is ubiquitous on currently supported platforms.
If you actually had multiple arrays that you want to concatenate then you could use the following class:-
Class ByteArrayBuilder
Private moStream
Sub Class_Initialize()
Set moStream = CreateObject("ADODB.Stream")
moStream.Open
moStream.Type = 1
End Sub
Public Sub Append(rabyt)
moStream.Write rabyt
End Sub
Public Property Get Length
Length = moStream.Size
End Property
Public Function GetArray()
moStream.Position = 0
GetArray = moStream.Read(moStream.Size)
End Function
Sub Class_Terminate()
moStream.Close
End Sub
End Class
Call append as many times as you have arrays and retrieve the resulting array with GetArray.
For the record, I wanted VBScript code for a large userbase as a logon script that has the least chance of failing. I like the ADO objects, but there are so many mysterious ways ADO can be broken, so I shy away from ADODB.Stream.
Instead, I was able to write conversion code to convert binary to hex encoded strings. Then, to write back to a REG_BINARY value, I convert it to an array of integers and give it to the SetBinaryValue WMI method.
Note: WshShell can only handle REG_BINARY values containing 4 bytes, so it's unusable.
Thank you for the feedback.
Perhaps...
result = CStr(val1) & CStr(val2)