The difference and use of strings and string arrays? - vb6

Okay, so for all i know a string is basically an array of characters. So why would there be string arrays in VB? And what differences are between them?
Just the basics, the way they operate that's what i'm interested in.

At times it is very useful to think of a String as an array of characters. It can also be useful to think of it as an array of bytes at times too - and this is of course not the same thing at all.
See The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) for better understanding of the differences between bytes and the characters held by Strings (UTF-16LE) as well as other character encodings commonly used.
But all of that aside, a String is really a higher level abstraction that you should not think of as an array of any kind.
After all, by that sort of logic an Integer or Long is an array as well.
So considering that a String is meant to be viewed as a primitive scalar value type the purpose of String arrays should be pretty clear. Arrays of Strings have pretty much the same sorts of uses as arrays of any other data type.
The fact that you have operations you can perform on Strings that root around inside them (substring operations) isn't much different conceptually than the operations that operate on the data inside any other simple type.

Say you need to store a list of names, it might be 100 names, or 200 names.. it depends from case to case.. what will u do?
String array can solve such case
Try this:
Dim Names() As String
ReDim Names(3) As String
Names(0) = "First"
Names(1) = "Second"
Names(2) = "Third"
Names(3) = "Fourth"
Dim l As Long
For l = LBound(Names) To UBound(Names)
MsgBox Names(l)
Next

Related

Why is useful to have a atom type (like in elixir, erlang)?

According to http://elixir-lang.org/getting-started/basic-types.html#atoms:
Atoms are constants where their name is their own value. Some other
languages call these symbols
I wonder what is the point of have a atom type. Probably to help build a parser or for macros? But in everyday use how it help the programmer?
BTW: Never use elixir or erlang, just note it exist (also in kdb)
They're basically strings that can easily be tested for equality.
Consider a string. Conceptually, we generally want to think of strings as being equal if they have the same contents. For example, "dog" == "dog" but "dog" != "cat". However, to check the equality of strings, we have to check to see if each letter in one string is equal to the letter in the same position in another string, which means that we have to walk through each element of the string and check each character for equality. This becomes a bit more cumbersome if dealing with Unicode strings and having to consider different ways of composing identical characters (for example, the character é has two representations in UTF-8).
It would be much simpler if we stored identical strings at the same location in memory. Then, checking equality would be a simple pointer or index comparison.
As a consequence of storing identical strings in the same location in memory, we can also store one copy of each unique kind of string regardless of how many times it is used in the program, thus saving some memory for commonly-used strings as well.
At a higher level, using atoms also lets us think of strings the same way we think of other primitive data types like integers.
I think that one of the most common usage in erlang is to tag variables and messages, with the benefit of fast comparison (pattern match) as mipadi says.
For example you write a function that may fail depending on parameters provided, the status of connection to a server, or any reason. A very frequent usage is to return a tuple {ok,Value} in case of success, {error,Reason} in case of error. The calling function will have the choice to manage only the success case coding {ok,Value} = yourModule:yourFunction(Param...). Doing this it is clear that you consider only the success case, you extract directly the Value from the function return, it is fast, and you don't have to share any header with yourModule to decode the ok atom.
In messages you will often see things like {add,Key,Value}, {delete,Key},{delete_all}, {replace,Key,Value}, {append,Key,Value}... These are explicit messages, with the same advantages as mentioned before: fast,sensible,no share of header...
Atoms are constants with itself as value.
This is a concept very usefull in distributed systems, where constants can be defined differently on each system, while atoms are self-containing with no need for definement.

Swift 2.0 String behavior

Strings in 2.0 no longer conform to CollectionType. Each character in the String is now an Extended Graphene Cluster.
Without digging too deep about the Cluster stuff, I tried a few things with Swift Strings:
String now has a characters property that contains what we humans recognize as characters. Each distinct character in the string is considered a character, and the count property gives us the number of distinct characters.
What I don't quite understand is, even though the characters count shows 10, why does the index show emojis occupying 2 indexes?
The index of a String is no more related to the number of characters (count) in Swift 2.0. It is an “opaque” struct (defined as CharacterView.Index) used only to iterate through the characters of a string. So even if it is printed as an integer, it should not be considered or used as an integer, to which, for instance, you can sum 2 to get the second character from the current one. What you can do is only to apply the two methods predecessor and successor to get the previous or successive index in the String. So, for instance, to get the second character from that with index idx in mixedString you can do:
mixedString[idx.successor().successor()]
Of course you can use more confortable ways of reading the characters of string, like for instance, the for statement or the global function indices(_:).
Consider that the main benefit of this approach is not to the threat multi-bytes characters in Unicode strings, as emoticons, but rather to treat in a uniform way identical (for us humans!) strings that can have multiple representations in Unicode, as different set of “scalars”, or characters. An example is café, that can be represented either with four Unicode “scalars” (unicode characters), or with five Unicode scalars. And note that this is a completely different thing from Unicode representations like UTF-8, UTF-16, etc., that are ways of mapping Unicode scalars into memory bytes.
An Extended Graphene Cluster can still occupy multiple bytes, however, the correct way to determine the index position of a character would be:
let mixed = ("MADE IN THE USA 🇺🇸");
var index = mixed.rangeOfString("🇺🇸")
var intIndex: Int = distance(mixed.startIndex, index!.startIndex)
Result:
16
The way you are trying to get the index would normally be meant for an array, and I think Swift cannot properly work that out with your mixedString.

I'm having problems on spliting a string number into single digits with Processing

I'm new with processing and I'm trying to split any string digit into a single array element. Then my goal is to find home many numbers repeat themself anf print them out in an array. I'm not sure if I'm in the right track tho! I'm aware that there are some missing lines, but as I mention before I'm new and exploring the array, modulo and string area.
int[] dig = new string [1233467890];
int n=dig.length;
while(n<0){
arr[i--]=n%10
dig = n % 10;
n = n / 10;
}
println(arr);
Thanks ahead of time for help
Edwin
I think you are mixing things up a little bit here, specially what strings and arrays are.
An array is a sequence of objects, and these objects may be integers, characters, booleans, circles, cups or balls. A String is, in the programming universe, a very special type of array: it is an array of characters.
So, as you may have noticed, there's no way of creating a "string" of integers. And the processing programming interface tells you exactly that if you try to run the code you posted:
"cannot convert [] String to [] Int". That means: strings and ints are things fundamentally different.
As I understood neither your goal nor your code, I can't help you any further.
I think it would be a better idea to read and understand the following link, run and understand the more basic examples there, and only then try to program what you want.
http://processing.org/reference/Array.html
http://processing.org/reference/String.html
Best regards

Algorithm to Map Strings to Short Replacements

I'm looking at ways to deterministically replace unique strings with unique and optimally short replacements. So I have a finite set of strings, and the best compression I could achieve so far is through an enumeration algorithm, where I order the input set and then replace the strings with an enumeration of char strings over an extended alphabet (a..z, A...Z, aa...zz, aA... zZ, a0...z9, Aa..., aaa...zaa, aaA...zaaA, ....).
This works wonderfully as far as compression is concerned, but has the severe drawback that it is not atomic on any given input string. Rather, its result depends on knowing all input strings right from the start, and on the ordering of the input set.
Anybody knows of an algorithm that has similar compression but doesn't require knowing all input strings upfront?! Hashing for example would not work for me, as depending on the size of the input set I'd need a hash length of 8-12 for the hashes to be unique, and that would be too long as replacements (currently, the replacement strings are 1-3 chars long for my use cases (<10,000 input strings)). Also, if theoreticians among us know this is wasted effort, I would be interested to hear :-) .
You could use your enumeration scheme, but sorted by the order in which you first encounter the input strings.
For example, the first string you ever process can be mapped to "a".
The next distinct string would be mapped to "b", etc.
Every time you process a string, you'd need to look it up to see if it has already been mapped.
"Optimally short" depends on the population of strings from which your samples are drawn. In the absence of systematic redundancy in the population, you will find that only a fraction of arbitrary strings can be compressed at all (e.g., consider trying to compress random bit strings).
If you can make assumptions about your data, such as "the strings are expected to be mainly composed of English words" then you can do something simple and effective based on letter frequency (e.g., for English, the relative frequency order is something like ETAOINSHRDLUGCY..., so you would want to use fewer bits to represent Es and more bits to represent uncommon letters like Q).
Cheers.

Why are the hash codes generated by this function not unique?

I'm testing the VB function below that I got from a Google search. I plan to use it to generate hash codes for quick string comparison. However, there are occasions in which two different strings have the same hash code. For example, these strings
"122Gen 1 heap size (.NET CLR Memory w3wp):mccsmtpteweb025.20833333333333E-02"
"122Gen 2 heap size (.NET CLR Memory w3wp):mccsmtpteweb015.20833333333333E-02"
have the same hash code of 237117279.
Please tell me:
- What is wrong with the function?
- How can I fix it?
Thank you
martin
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dest As Any, src As Any, ByVal bytes As Long)
Private Function HashCode(Key As String) As Long
On Error GoTo ErrorGoTo
Dim lastEl As Long, i As Long
' copy ansi codes into an array of long'
lastEl = (Len(Key) - 1) \ 4
ReDim codes(lastEl) As Long
' this also converts from Unicode to ANSI'
CopyMemory codes(0), ByVal Key, Len(Key)
' XOR the ANSI codes of all characters'
For i = 0 To lastEl - 1
HashCode = HashCode Xor codes(i) 'Xor'
Next
ErrorGoTo:
Exit Function
End Function
I'm betting there are more than just "occasions" when two strings generate the same hash using your function. In fact, it probably happens more often than you think.
A few things to realize:
First, there will be hash collisions. It happens. Even with really, really big spaces like MD5 (128 bits) there are still two strings that can generate the same resulting hash. You have to deal with those collisions by creating buckets.
Second, a long integer isn't really a big hash space. You're going to get more collisions than you would if you used more bits.
Thirdly, there are libraries available to you in Visual Basic (like .NET's System.Security.Cryptography namespace) that will do a much better job of hashing than most mere mortals.
The two Strings have the same characters. (Note the '2' and the '1' that are flip-flopped)
That is why the hash value is the same.
Make sure that the hash function is taking into account the order of the characters.
Hash functions do not guarantee uniqueness of hash values. If the input value range (judging your sample strings) is larger than the output value range (eg 32 bit integer), then uniqueness is physically impossible.
If the biggest problem is that it doesn't account for the position of the bytes, you could fix it like this:
Private Function HashCode(Key As String) As Long
On Error GoTo ErrorGoTo
Dim lastEl As Long, i As Long
' copy ansi codes into an array of long'
lastEl = (Len(Key) - 1) \ 4
ReDim codes(lastEl) As Long
' this also converts from Unicode to ANSI'
CopyMemory codes(0), ByVal Key, Len(Key)
' XOR the ANSI codes of all characters'
For i = 0 To lastEl - 1
HashCode = HashCode Xor (codes(i) + i) 'Xor'
Next
ErrorGoTo:
Exit Function
End Function
The only difference is that it adds the characters position to it's byte value before the XOR.
No hash function can guarantee uniqueness. There are ~4 billion 32-bit integers, so even the best hash function will generate duplicates when presented with ~4 billion and 1 strings (and mostly likely long before).
Moving to 64-bit hashes or even 128-bit hashes isn't really the solution, though it reduces the probability of a collision.
If you want a better hash function you could look at the cryptographic hashes, but it would be better to reconsider you algorithm and decide if you can deal with the collisions some other way.
The System.Security.Cryptography namespace contains multiple classes which can do hashing for you (such as MD5) which will probably hash them better than you could yourself and will take much less effort.
You don't always have to reinvent the wheel.
Simple XOR is a bad hash: you'll find lots of strings which collide. The hash doesn't depend on the order of the letters in the string, for one thing.
Try using the FNV hash http://isthe.com/chongo/tech/comp/fnv/
This is really simple to implement. It shifts the hash code after each XOR, so the same letters in a different order will produce a different hash.
Hash functions are not meant to return distinct values for distinct strings. However, a good hash function should return different values for strings that look alike. Hash functions are used to search for many reasons, including searching into a large collection. If the hash function is good and if it returns values from the range [0,N-1], then a large collection of M objects will be divide in N collections, each one having about M/N elements. This way, you need to search only in an array of M/N elements instead of searching in an array of M elements.
But, if you only have 2 strings, it is not faster to compute the hash value for those! It is better to just compare the two strings.
An interresing hash function could be:
unsigned int hash(const char* name) {
unsigned mul=1;
unsigned val=0;
while(name[0]!=0) {
val+=mul*((unsigned)name[0]);
mul*=7; //you could use an arbitrary prime number, but test the hash dispersion afterwards
name++;
}
return val;
}
I fixed the syntax highlighting for him.
Also, for those who weren't sure about the environment or were suggesting a more-secure hash: it's Classic (pre-.Net) VB, because .Net would require parentheses for the the call to CopyMemory.
IIRC, there aren't any secure hashes built in for Classic VB. There's not much out there on the web either, so this may be his best bet.
I don't quite see the environment you work in. Is this .Net code? If you really want good hash codes, I would recommend looking into cryptographic hashes (proven algorithms) instead of trying to write your own.
Btw, could you edit your post and paste the code in as a Code Sample (see toolbar)? This would make it easier to read.
"Don't do that."
Writing your own hash function is a big mistake, because your language certainly already has an implementation of SHA-1, which is a perfectly good hash function. If you only need 32 bits (instead of the 160 that SHA-1 provides), just use the last 32 bits of SHA-1.
This particular hash functions XORs all of the characters in a string. Unfortunately XOR is associative:
(a XOR b) XOR c = a XOR (b XOR c)
So any strings with the same input characters will result in the same hash code. The two strings provided are the same, except for the location of two characters, therefore they should have the same hashcode.
You may need to find a better algorithm, MD5 would be a good choice.
The XOR operation is commutative; that is, when XORing all the chars in a string, the order of the chars does not matter. All anagrams of a string will produce the same XOR hash.
In your example, your second string can be generated from your first by swapping the "1" after "...Gen " with the first "2" following it.
There is nothing wrong with your function. All useful hashing functions will sometimes generate collisions, and your program must be prepared to resolve them.
A collision occurs when an input hashes to a value already identified with an earlier input. If a hashing algorithm could not generate collisions, the hash values would need to be as large as the input values. Such a hashing algorithm would be of limited use compared to just storing the input values.
-Al.
There's a visual basic implementation of MD5 hashing here
http://www.bullzip.com/md5/vb/md5-visual-basic.htm

Resources