Difference between hash tables and random access tables [closed] - data-structures

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
What is the difference between Hash tables and random access tables. I feel that they are similar , but wanted to find out the exact differences, Googling did not help me much.

In general Hash Tables are there to be able to map things like various entities to other entities. Depending on programming language it may be mapping tuples to strings, strings to objects, strings to strings and so on - infinite possibilities.
Regular arrays let you address entities using integer index:
array[index] ==> string for example
On the contrary hash maps aka hash tables aka dictionaries aka associative arrays aka hashes etc let you - among other possibilities - map a string to integer for example:
hash_map['Bill'] => 23 etc
For basic understanding go to:
wiki hash tables
Python dicts
PHP arrays
For more advanced understanding I recommend these 2 books:
'Algorithms' by Sadgewick
'Data Structures and Algorithms' by Drozdek

A hash table (aka hash map, or associative array or dictionary or just a hash) is a specific type of random access data structure.
A Hash Table is "random access" because it allows direct, "indexed" access to individual members in constant time. An Array may also be considered a random access data structure, because you can fetch an individual element by its index.
In contrast, a linked list is not a random access data structure, because you need to iterate through its members to find a specific element.

Related

How To Empty a Dynamic Array [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I need to re-use a dynamic arrays many times as I consider it a better performance.
Hence, I don't need to create a new dynamic array every time I need it.
I want to ask if it can lead to bugs and inefficiency if I use the same array for several instructions then clear it and reuse it? And how can I correct my procedure, so, it might approach my need.
My code :
procedure Empty(local_array : array of Integer);
var
i : Integer;
begin
for i:= 0 to high(local_array) do
local_array[i]:= nil;
Setlength(local_array, 0);
end;
If you want to reuse your array don't mes with its size. Changing the size of an array or more specifically increasing it is what could lead to the need for data reallocation.
What is array data reallocation?
In Delphi all arrays need to be stored in continuous memory block. This means that if you are trying to increase the size of your array and there already some data after memory block that is currently assigned to your array the whole array needs to be moved to another memory location where there is enough space to store the new array size in one continuous memory block.
So instead of resizing your array leave its size alone and just set value of array items to some default value. Yes this means that such array will still occupy its allocated memory. But that is goal of reusing such array as you avoid overhead for allocating/deallocating memory to your array.
If you go this way don't forget to store your own count of used items in your array since its length may be larger than the number of item actually used.

why hashcode based data structure use array to create bins?

I was reading an answer to a question asked here:
Why does hashcode() returns an integer and not long?
My question is: Why hashcode based data structures use an array to create bins?
Because array is a low-level data structure which allows random access to its elements.
You need a "low-level" data structure to base a "higher-level" data structure on.
You need random access so that you can address bins very fast.
cause, an array is based on integer-based indexes! now you can show some curiosity, why array using integer-based indexing. one of the assumptions should be -- if you could able to use other types (real numbers) rather than using integer, just think how many dimension you would capable to add --
for example --
for 1-th index, you could capable to add sub-indexes like -- 1.1, 1.2, 1.1.2, 1.1.1.1.2 and so on so forth!
doing so, it will create more overhead, rather than popping up the solution we want.

Distinction between a data structure's members being stored by hash value and by index [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
From Picking the right data structure in Swift:
Like we took a look at in “The power of sets in Swift”, one of the big advantages that sets have over arrays is that both inserts and removals can always be performed in constant (O(1)) time, since members are stored by hash value, rather than by index.
What does it mean if a data structure's members are stored by hash value rather than by index?
Arrays are allocated as single, large blocks of memory and entries are accessed by their indexes. The order of entries is fixed and they need have no particular identity apart from their position in the array.
Other more complex data structures allow one to store objects identified and accessed using some sort of key. (Hash tables, sets, dictionaries, ...) Let's call these "keyed collections". Some objects have a natural key e.g. "SocialSecurityNumber" but what should one do if a key is needed and there are no obvious candidate field/s in the data object?
Hashing is a technique which sets out to derive a "fairly unique identity" to associate with an object. Think of it as mapping numbers to (arbitrary) data.
Although there are some "standard hashing techniques", this is still a field that is evolving - involving some interesting mathematics.
Hashes have purposes including secure hashing (to detect and prevent deliberate tampering with data), error detection and - in this case - keyed (or hashed) data access.
A non-secure hash algorithm should be as fast as possible BUT optimising for speed can involve a trade-off against the "fairly unique" part of the mapping requirement (while secure hashing is unavoidably - and sometimes deliberately - more slow and expensive)
Hashing cannot (ever) guarantee that a given hash value is unique to an object and so attention has to be given to minimising the occurrence of "collisions" and optimising how to deal with them when they occur. This is a difficult subject on its own, when you consider that data has to be treated as "arbitrary" - either appearing to be random, to contain sequences/patterns and/or with duplication.
With that said, assuming we have a "good" hash function, we can - in principle at least - store arbitrary objects in keyed collections.
Important considerations
Arrays offer extremely fast sequential and random access (by index), while insert, delete and growth operations are slow.
Keyed collections have the advantage you quote of offering extremely fast inserts and deletes, but they are granular in nature and introduce complexities such as memory fragmentation (memory management is an overhead, added complexity means added cost).
Performance degrades rapidly when collisions start occurring.
There is no such thing as a free lunch and calculating hashes is relatively expensive (compared to simply using an index value or stored key).
There is a specific downside to hashes that "natural keys" and indexes do not have, which is they do not offer a natural ordering/sequence. (Processing objects sequentially according to their hash values is tantamount to processing them randomly.)
It is always important to choose data structures appropriate to their intended use (but that's what the link you quote is all about;-)
You are actually asking what is the difference between Array and Hash map/table/set. This is part of computer science "Data Structures" course and I am sure you can google some high level overview of each. Highly recommended :)
In short:
You can imagine an array as a long shelf with cells, where each cell has sequence number (a.k.a. index):
Array: [ dog ][ cat ][ mouse ][ fox ]...
where dog is at cell #0, cat is at #1 and so on.
Now, in array you can retrieve objects using cell index, like "Give me the content of cell #1". But in order to find out if you have a "mouse" in your array - you will have to iterate over all the cells. (Inefficient)
Sets (a.k.a. Hash maps) store objects using another index - "hash code", which is kind of a function that calculates some pseudo-unique number per given object (without going into details). So cat and mouse will have unique hash codes and now for Set it is very efficient to find out if you have a "mouse" in the Set.

When is a set better than an array? [duplicate]

This question already has answers here:
Set vs Array , difference
(3 answers)
Closed 5 years ago.
I have used lots of time arrays in Ruby. But never get a chance to use set. My question is when Set can be useful and when it is better than an array?
From the documentation, the initial definitions go as follows:
Array: An
integer-indexed collection of objects.
Set: A
collection of unordered values with no duplicates.
In a nutshell, you should use Set when you want to make sure that each element in the collection is unique, you want to test if a given element is present in the collection and you won't require random access to the objects.

Using an array list with a hash table

I'm attempting to build a simple hash table from scratch. The hash table I have currently uses an array of linked lists. The hashing function takes the hash value of a key-pair objects modulo the size of the array for indexing. This is all well and good, but I'm wondering if I could dynamically expand my array by using an array-list once it starts to fill up (Tell me why this is not a good idea if you think so). Obviously the hash function would be compromised since we're finding indexes using the array length. What would be a good hash function to use that would allow my array of linked-lists to expand while not compromising the integrity of the hash function?
If I am understanding your question correctly, you will have to re-hash all elements after expanding the bucket array. It can be done by iterating over the contents of the old hash table, and inserting them into the newly expanded hash table.

Resources