Programming and data structures - algorithm

Suggest a data structure for representing a subset S of integers from 1 to n. Following operations on the set S are to be performed in constant time (independent of cardinality of S).
You may assume that the data structure has been suitable initialized.
(i). MEMBER (X):
Check whether X is in the set S or not
(ii). FIND-ONE(S): If S is not empty, return one element of the set S (any arbitrary element will do)
(iii). ADD (X): Add integer X to set S
(iv). DELETE (X): Delete integer X from S.
My analysis:-
I think hash table will work fine here ,but how will hash table work for FIND-ONES(S) operation.Because i might need to scan the entire has table to look for the present element.

You can just use a regular hashset for this in java. In the case of the FIND-ONE(S) what you would do is, call isEmpty(). If that returns false, use the built in iterator, and just get the first value the iterator returns.

A hash table would work, but you need to think about the specific implementation. If you use the compact version from Python 3.6, you can perform FIND-ONEs in constant time by inspecting the entries list.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is represented as follows:
indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]

Related

How does hashtable read correct values in case of collision?

I have some hashtable. For instance I have two entities like
john = { 1stname: jonh, 2ndname: johnson },
eric = { 1stname: eric, 2ndname: ericson }
Then I put them in hashtable:
ht["john"] = john;
ht["eric"] = eric;
Let's imagine there is a collision and hashtable use chaining to fix it. As a result there should be a linked list with these two entities like this
How does hashtable understand what entity should be returned for key? Hash values are the same and it knows nothing about entities structure. For instance if I write thisvar val = ht["john"]; how does hashtable (having only key value and its hash) find out that value should be john record and not eric.
I think what you are confused about is what is stored at each location in the hashtable's adjacent list. It seems like you assume that only the value is being stored. In fact, the data in each list node is a tuple (key, value).
Once you ask for ht['john'], the hashtable find the list associated with hash('john') and if the list is not empty it searches for the key 'john' in the list. If the key is found as the first element of the tuple then the value (second element of the tuple) is returned. If the key is not found, then it means that the element is not in the hashtable.
To summarize, the key hash is used to quickly identify the cell in which the element should be stored if present. Actual key equality is tested for to decide whether the key exists or not.
Is this what you are asking for? I have already put this in comments but seems to me you did not follow link
Collision Resolution in the Hashtable Class
Recall that when inserting an item into or retrieving an item from a hash table, a collision can occur. When inserting an item, an open slot must be found. When retrieving an item, the actual item must be found if it is not in the expected location. Earlier we briefly examined two collusion resolution strategies:
Linear probing
Quardratic probing
The Hashtable class uses a different technique referred to as rehasing. (Some sources refer to rehashing as double hashing.)
Rehasing works as follows: there is a set of hash different functions, H1 ... Hn, and when inserting or retrieving an item from the hash table, initially the H1 hash function is used. If this leads to a collision, H2 is tried instead, and onwards up to Hn if needed. The previous section showed only one hash function, which is the initial hash function (H1). The other hash functions are very similar to this function, only differentiating by a multiplicative factor. In general, the hash function Hk is defined as:
Hk(key) = [GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] % hashsize
Mathematical Note With rehasing it is important that each slot in the hash table is visited exactly once when hashsize number of probes are made. That is, for a given key you don't want Hi and Hj to hash to the same slot in the hash table. With the rehashing formula used by the Hashtable class, this property is maintained if the result of (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))and hashsize are relatively prime. (Two numbers are relatively prime if they share no common factors.) These two numbers are guaranteed to be relatively prime if hashsize is a prime number.
Rehasing provides better collision avoidance than either linear or quadratic probing.
sources here

find endpoints for range given a value within the range

I am trying to solve a simple problem, but at the moment I cannot think of a better solution. I am testing an API that is not documented.
There is an ID used to fetch objects and it has a min and max value with random values missing in-between. I'm trying to test the responses I receive for random objects, but to find objects, I need to have valid IDs.
It would be very inefficient to test random numbers and hope that I get an object back. The best I can do is find a range, get a random number between that range and check if it exists before conducting tests.
A sample list of all of the IDs in the database might look like this:
[1005, 25984, 25986, 29587, 30000, ...]
Assuming the deviation from one value to another will never exceed C, e.g. from the first value to the next value, the difference will never be greater than a pre-defined constant, how would you calculate the min/max of the range given only one value in the range?
Starting from a given value and looping until the last value is found is horrible but that is how it was implemented by previous devs. Below is pseudocode that more or less covers what they do.
// this can be any valid object ID from the database
// assuming the ID's in the database are [1005, 25984, 25986, 29587, 30000]
// "i" could be any one of these values
var i = givenPredefinedObjectId;
var deviation = 100;
// objectWithIdExists() is going to lookup an object with the ID "i" in the database
// if there is no object with the ID "i" , it will return false
// otherwise the object will get tested and return true
while(objectWithIdExists(i)){
i++;
}
for(i; i < i+deviation; i++){
if(objectWithIdExists(i)){
goto while loop;
}
}
endPoint = i - deviation;
Assuming there is no knowledge about the possible values except you can check if they exist and you are given one valid value (there is no array with all possible IDs, that was just an example), how would you find the min/max values?
Unbounded binary search is feasible, with a factor of C slowdown. Given an algorithm for unbounded binary search that, given access to the oracle less_equal(n) for some natural number n, returns n in time O(log n), implement the oracle on input k by querying all of the IDs C*k, C*k+1, ..., C*k+C-1 and reporting that k is less than or equal to n if and only if one ID is found. The running time is O(C*log((max-min)/C)).

Ada Enumerated Type Range

As I understand it, Ada uses 0 based indexes on its enumerated types.. So in Status_Type below, the ordinal value goes from 0 to 5.
type Status_Type is
(Undefined,
Available,
Fout,
Assigned,
Effected,
Cleared);
My question is.. what are the ordinal values for the following examples? Do they start at 0 or do they start from the ordinal value from the super type?
subtype Sub_Status_Type is Status_Type
range Available.. Effected;
subtype Un_Status_Type is Sub_Status_Type
range Fout .. Assigned;
Would Sub_Status_Type ordinal values go from 1 to 4 or from 0 to 3?
Would Un_Status_Type ordinal values go from 3 to 4 or from 1 to 2 or from 0 to 1?
For the subtypes, a 'pos will return the same value as it would have for the base type (1..4 and 2..3 respectively, I believe). Subtypes aren't really new and different types, so much as they are the same old type, but with some extra limitations on its possible values.
But it should be noted that these values are assigned under the scenes. It really should make no difference to you what they are, unless you are using the 'val and 'pos attributes, or you are interfacing to code written outside of Ada (or to hardware).
Plus, if it does end up mattering, you should know that the situation is actually much more complicated. 'pos and 'val don't return the actual bit value the compiler uses for those enumeration values when it generates code. They just return their "ordinal position"; their offset from the first value.
By default they will usually be the same thing. However, you can change the value assignments (but not the ordinal position assignments) yourself with a for ... use clause, like in the code below:
for Status_Type use
(Undefined => 1,
Available => 2,
Out => 4,
Assigned => 8,
Effected => 16,
Cleared => 32);
The position number is defined in terms of the base type. So Sub_Status_Type'Pos(Assigned) is the same as Status_Type'Pos(Assigned), and the position values of Sub_Status_Type go from 1 to 4, not 0 to 3.
(And note that the position number isn't affected by an enumeration representation clause; it always starts at 0 for the first value of the base type.)
Incidentally, it would have been easy enough to find out by running a small test program that prints the values of Sub_Status_Type'Pos(...) -- which would also have told you that you can't use the reserved word out as an identifier.
As I understand it, Ada uses 0 based indexes on its enumerated types
Yes, it uses 0 for the indexes, or rather for the position of the values of the type. This is not the value of the enumeration literals, and not the binary representation of them.
what are the ordinal values for the following examples?
There are no "ordinal" values. The values of the type are the ones you specified. You are confusing "value", "representation", and "position" here.
The values of your Status_Type are Undefined, Available, Out, Assigned, Effected, and Cleared.
The positions are 0, 1, 2, 3, 4, and 5. These are what you can use to translate with 'Pos and 'Val.
The representation defaults to the position, but you can freely assign other values (as long as you keep the correct order). These are used if you write it to a file, or send it through a socket, or load it into a register..
I think the best way to answer your questions is in reverse:
A subtype is, mathematically speaking, a continuous subset of its parent type. So, if the type SIZES is (1, 2, 3, 4, 5, 6, 7, 8) and you define a subtype MEDIUM as (4,5) the first element of MEDIUM is 4.
Example:
Type Small_Natural is 0..16;
Subtype Small_Positive is Small_Natural'Succ(Small_Natural'First)..Small_Natural'Last;
This defines two small sets of possible-values, which are tightly related: namely that Positive numbers are all the Natural Numbers save Zero.
I used this form to illustrate that with a few text-changes we have the following example:
Type Device is ( Not_Present, Power_Save, Read, Write );
Subtype Device_State is Device'Succ(Device'First)..Device'Last;
And here we are modeling the intuitive notion that a device must be present to have a state, but note that the values in the subtype ARE [exactly] the values in the type from which they are derived.
This answers your second question: Yes, an element of an enumeration would have the same value that its parent-type would.
As to the first, I believe the starting position is actually implementation defined (if not then I assume the LM defaults it to 0). You are, however free to override that and provide your own numbering, the only restriction being that elements earlier in the enumeration are valued less than the value that you are assigning currently [IIRC].

Is there an easy way to have a "mode" function on an array of singles in vb6?

I need to run "mode" (which value occurs most frequently) on an array of singles in vb6. Is there a quick way do do this on large arrays?
Have a look online for a decent implementation of a sort algorithm for VB6 (I can't believe it doesn't have one built in!), sort the array, and then go through it counting the occurrences (which will be straightforward as you've all the same items together in the array) - keep a track of the most frequently occurring item on your way through and you're done. This should be O(n ln(n)) - that is, fast enough - if you've used a decent sort algorithm (quicksort or similar).
You could use a hash table. Hash all of the elements of your array (which is O(n)). You'll need a back-end data structure to hold the unique values that each hash bin contains and the number of occurances (some sort of associative memory similar to the C++ std::map). As long as you can guarantee that there will be no more than a constant, m, number of collisions (for dissimilar hash input values) in any given bin, this is O(m log m), but since m is constant, this is really O(1). This assumption may not be reasonable, but the key is to get good enough spread for your input values.
To pull out the mode, examine all of the elements in the hash table, which will be values that occur in your original input array and the number of times they occur. Find the value with the largest number of occurances (again O(n)). Total complexity is O(n) if you can find a suitable hash function. Worst case performance will be O(n log n) if the hash function doesn't provide you with good collision performance.
On another note, .Net provides a large runtime library that might make this easier. If it's feasible, you might want to consider using a new version of VB.
Included a reference to Microsoft Scripting Runtime, and used a Dictionary object to keep tally of frequency, then looked for index highest frequency and the corresponding key is the mode. Not the quickest/most elegant solution, but I just needed something up fast that worked.
Function fnModeSingle(ByRef pValues() As Single) As Single
Dim dict As Dictionary
Set dict = New Dictionary
dict.CompareMode = BinaryCompare
Dim i As Long
Dim pCurVal As Single
For i = 0 To uBound(pValues)
'limit the values that have to be analyzed to desired precision'
pCurVal = Round(pValues(i), 2)
If (pCurVal > 0) Then
'this will create a dictionary entry if it doesn't exist
dict.Item(pCurVal) = dict.Item(pCurVal) + 1
End If
Next
'find index of first largest frequency'
Dim KeyArray, itemArray
KeyArray = dict.Keys
itemArray = dict.Items
pCount = 0
Dim pModeIdx As Integer
'find index of mode'
For i = 0 To UBound(itemArray)
If (itemArray(i) > pCount) Then
pCount = itemArray(i)
pModeIdx = i
End If
Next
'get value corresponding to selected mode index'
fnModeSingle = KeyArray(pModeIdx)
Set dict = Nothing
End Function

Can't sort table with associative indexes

Why I can't use table.sort to sort tables with associative indexes?
In general, Lua tables are pure associative arrays. There is no "natural" order other than the as a side effect of the particular hash table implementation used in the Lua core. This makes sense because values of any Lua data type (other than nil) can be used as both keys and values; but only strings and numbers have any kind of sensible ordering, and then only between values of like type.
For example, what should the sorted order of this table be:
unsortable = {
answer=42,
true="Beauty",
[function() return 17 end] = function() return 42 end,
[math.pi] = "pi",
[ {} ] = {},
12, 11, 10, 9, 8
}
It has one string key, one boolean key, one function key, one non-integral key, one table key, and five integer keys. Should the function sort ahead of the string? How do you compare the string to a number? Where should the table sort? And what about userdata and thread values which don't happen to appear in this table?
By convention, values indexed by sequential integers beginning with 1 are commonly used as lists. Several functions and common idioms follow this convention, and table.sort is one example. Functions that operate over lists usually ignore any values stored at keys that are not part of the list. Again, table.sort is an example: it sorts only those elements that are stored at keys that are part of the list.
Another example is the # operator. For the above table, #unsortable is 5 because unsortable[5] ~= nil and unsortable[6] == nil. Notice that the value stored at the numeric index math.pi is not counted even though pi is between 3 and 4 because it is not an integer. Furthermore, none of the other non-integer keys are counted either. This means that a simple for loop can iterate over the entire list:
for i in 1,#unsortable do
print(i,unsortable[i])
end
Although that is often written as
for i,v in ipairs(unsortable) do
print(i,v)
end
In short, Lua tables are unordered collections of values, each indexed by a key; but there is a special convention for sequential integer keys beginning at 1.
Edit: For the special case of non-integral keys with a suitable partial ordering, there is a work-around involving a separate index table. The described content of tables keyed by string values is a suitable example for this trick.
First, collect the keys in a new table, in the form of a list. That is, make a table indexed by consecutive integers beginning at 1 with keys as values and sort that. Then, use that index to iterate over the original table in the desired order.
For example, here is foreachinorder(), which uses this technique to iterate over all values of a table, calling a function for each key/value pair, in an order determined by a comparison function.
function foreachinorder(t, f, cmp)
-- first extract a list of the keys from t
local keys = {}
for k,_ in pairs(t) do
keys[#keys+1] = k
end
-- sort the keys according to the function cmp. If cmp
-- is omitted, table.sort() defaults to the < operator
table.sort(keys,cmp)
-- finally, loop over the keys in sorted order, and operate
-- on elements of t
for _,k in ipairs(keys) do
f(k,t[k])
end
end
It constructs an index, sorts it with table.sort(), then loops over each element in the sorted index and calls the function f for each one. The function f is passed the key and value. The sort order is determined by an optional comparison function which is passed to table.sort. It is called with two elements to compare (the keys to the table t in this case) and must return true if the first is less than the second. If omitted, table.sort uses the built-in < operator.
For example, given the following table:
t1 = {
a = 1,
b = 2,
c = 3,
}
then foreachinorder(t1,print) prints:
a 1
b 2
c 3
and foreachinorder(t1,print,function(a,b) return a>b end) prints:
c 3
b 2
a 1
You can only sort tables with consecutive integer keys starting at 1, i.e., lists. If you have another table of key-value pairs, you can make a list of pairs and sort that:
function sortpairs(t, lt)
local u = { }
for k, v in pairs(t) do table.insert(u, { key = k, value = v }) end
table.sort(u, lt)
return u
end
Of course this is useful only if you provide a custom ordering (lt) which expects as arguments key/value pairs.
This issue is discussed at greater length in a related question about sorting Lua tables.
Because they don't have any order in the first place. It's like trying to sort a garbage bag full of bananas.

Resources