Difference between Symbol table and Hash map data structures - data-structures

While reading about different data structures, found that the Symbol table used by compilers is classified as a data structure.
Can someone explain what is the difference between Symbol table data structure and a Hash map?

TL;DR: Symbol table is not a data structure. Symbol table is an abstract data type (ADT). So it can't be compared with hash map which is a data structure. But they are very closely related.
Detailed Explanation:
First of all, Symbol table is not a data structure. Symbol table is an Abstract Data Type (ADT) in computer science. ADT is more commonly known as dictionary.
Implementation of an ADT is called a Data Structure. There are many data structures which implement Symbol Table/dictionary ADT. One such data structure is hash map. Various other possible data structures which implement Symbol Table/dictionary ADT are as below:
Unordered array implementation
Ordered (sorted) array implementation
Unordered linked list implementation
Ordered linked list implementation
Binary search tree based implementation
Balanced binary search tree based implementation
Ternary search implementation
Hashing based implementation - e.g. Hash Map
Please remember that above list is not exhaustive in nature. There can be more implementations of this ADT.
Note: You might want to read this thread to understand the difference between ADT and data structure.

A Symbol Table isn't a data structure per se. Most compilers will require one or more symbol tables but their exact form isn't limited to one particular data structure. Some compilers may choose to implement their symbol table as a hash map, if that's suitable for their purposes.
So I'd say the difference is conceptual. "Symbol Table" describes a data structure by its purpose. "Hash Map" describes a data structure by its implementation.
The Wikipedia page isn't too bad

"The primary purpose of a symbol table is to associate a value with a key. Symbol-table implementations are generally characterized by their underlying data structures and their implementations of get() and put().
Search algorithms that use hashing consist of two separate parts. The first part is to compute a hash function that transforms the search key into an array index.
the second part of a hashing search is a collision-resolution process"

Related

Can you have different types of primitive data types in a dynamic array?

I am new to data structures in computer science. I am trying to find out about all the types of implementations of lists. I started with dynamic arrays and I wanted to know if it is possible to have different types of primitive data types in a dynamic array data structure.
I though that "dynamic" only means that you can remove, insert and add to your array without caring about its size. But do you have to care for the types of elements that there are in the array too ?
The term you are searching for is heterogeneous repectivelly homogenous. Heterogenous lists can store different kind of elements, while homogenous lists are limited to one type of elements.
Python is a good example for heterogeneous lists. This is implemented by storing references to the different objects in the list. So from a technical point of view, they store homogenous references, but from a user perspective they store different types, such as integer, strings, and other objects.
The term dynamic data structure only refers to its size/structure in runtime, as in it can change on runtime.
So for example, in C++ an array is a static data structure, whereas a vector or ordered_set is probably what you might call dynamic.
By having multiple data types in a data structure, what you are referring to is a dynamically typed language.
Any data structure will support multiple elements in it if the language is dynamically typed, such as python. The data structure itself need not be strictly dynamic for that to happen.

Hashtable with both values as key

Is there a hashing based data structure where I can search an item in O(1) time on both key and value.
This can be achieved by adding duplicate entry in the list for each key value par by reversing key and value, but it will take double the space.
This kind of data structure might be useful in some scenarios: like I want to store opening and closing parenthesis in a map and while parsing the string, I can just check in the map if the key is present without worrying about whether it is opening-closing map or closing-opening map or without storing duplicate.
I hope I am clear enough!!
Data structure that fulfills your needs is called bidirectional map.
I suppose that you are looking for the existing implementation, not for the pointers how to implement it :) Since you didn't specify the programming language, this is the current situation for Java - there is no such data structure in Java API. However, there is Google Guava's bi-directional map interface with several implementations. From the docs:
A bimap (or "bidirectional map") is a map that preserves the
uniqueness of its values as well as that of its keys. This constraint
enables bimaps to support an "inverse view", which is another bimap
containing the same entries as this bimap but with reversed keys and
values.
Alternatively, there is BidiMap from Apache Collections.
For C++, have a look at Boost.Bimap.
For Python, have a look at bidict.
In C#, as well as in other languages, there does not exist an official implementation, but that's where Jon Skeet comes in.
You're searching for a bidirectional map. Here is an article describing the implementation in c++. Note though that a bidirectional map is basically two maps merged into a single object. There isn't any more efficient solution than this though, for a simple reason:
a map is basically an unconnected directed graph of (key,value)-pairs. Each pair is represented by an edge. If you want the map to be bidirectional you'll wind up with twice as many edges, thus doubling the amount of required memory.
C++ and Java STL don't provide any classes for this purpose though. In Java you can use Googles Guava library, in C++ the boost-library provides bi-directional maps.

is Tree, a data structure or abstract data type?

I hear many people referring tree as a data structure. But trees are mostly implemented using Linked Lists, or Arrays. So does it make it an abstract data type?
Given a type of structure, how can we determine whether it is a data structure or abstract data type?
If you are talking about a general Tree without specifying its implementation or any underlying data structure used, itself is an Abstract Data Type(ADT). ADT is any data type that doesn't specify its implementation.
But once you start talking about a concrete Tree with specific implementation using Linked List or Arrays, then that kind of concrete tree is a data structure.
With the above out of the way, the following may help you clear other confusions related to your question. Correct me if I'm wrong!
Data Type
The definition of data type from Wikipedia:
A data type or simply type is a classification identifying one of various types of data.
Data type is only a classification of data. It doesn't have any specifications about how those data are implemented. IMHO, data type is only a theoretical concept.
For example, any real number can be of the data type real. But along with integers, they can both be classified as a numeric data type, say number.
As I just pointed out, ADT is one kind of data type. But whether string, int can be considered as ADTs?
The answer is both yes and no.
Yes, because programming languages can have many ways to implement string and int ; but on one condition that through out all programming languages, these data types must share consistent properties.
No, because these primitive data types are not as abstract as stacks or queues. Since these data types seldom share consistent properties in every programming language, users of them must know the underlying problems like arithmetic overflow, etc.. Two languages may both have the int data type, but one ranges up to infinity and the other up to 2^32. This kind of technical detail must-knows is not what ADTs have promised. Let's look at stacks instead. In every programming language, stack can promise you with consistent procedures like pop, push. No other details on implementation level you should know about them, you just use them however you like it in every language.
Data Structure
Let's see the definition of data structure from Wiki:
A data structure is a particular way of organizing data in a computer so that it can be used efficiently.
As you can see, data structure is all about implementations. It is not conceptual but concrete. In my opinion, every piece of data in a program can by definition be considered as a data structure. A string can. An int can. And a whole bunch of other things like LinkedList_Stack or Array_Stack are all data structures.
Some of you might argue why int is a data structure? It's a data structure in a lower level from a programming language's author's view. Because programming languages can have many ways storing an int data type in a computer. The most common solution is two's complement, other alternatives are offset binary and ones' complement etc. However, from a user's view, we see int as the primitive data type which a programming language offers out of the box, we don't care its implementation. It's just the building block of one programming language. So for us users, any data constructed by these building blocks(primitive data types) of a programming language is more like a data structure. While for authors of programming languages, the building blocks are some lower level machine code, so for them int is definitely a data structure.
Put simply, whether one thing is a data structure or not really depends on how we look at it.
Via google:
In computer science, an abstract data type (ADT) is a mathematical
model for a certain class of data structures that have similar
behavior
So clearly, it is both.

Hashes: Tables, Lists and Maps, Oh My?

I've been trying to find some concrete (laymen; non super-academic) definitions for the various types of hash data structures, specifically hash tables, hash lists and hash maps. Online searches provide many useful links to all of these, but never give clear definitions of when it is appropriate to use each over the others.
(1) From a practical standpoint, what's the difference between these 3?
(2) How do their operations' run times differ? Are there clear instances when one should be used or avoided over the other types of hashes?
(3) How do each of these relate back to the Map ADT? Are they all just different implementations of it, or different beasts altogether?
Thanks for any insight here!
There's an abstract data structure that contains mapping between keys and values. It has several different names, including Map, Dictionary, Table, Association Table, and more.
The most basic operations that should be supported by this data-structure are adding, removing and retrieving a value, given its associated key. There are variations and additions around this basic concept - for instance, some structures support iterating over all the key-value pairs, some structures support multiple values per key, etc. There's also a difference in time and space complexity between the various implementations.
Of the multiple implementations available for this data structure, some of the most popular ones utilize hash functions for fast access times. Those implementations are sometimes called by the name Hash Table or Hash Map, you can read more about them in Wikipedia. The performance also varies between hash table implementations, with some reaching amortized O(1) insertion and access complexity (for the price of a lot of space used).
A hash list, on the other hand, is a different thing, and is more about the usage of a data structure, than its actual structures. A hash list is usually just a regular list of hash values, nothing special about it. It's used when verifying the integrity of a large piece of data - in that case it allows various data chunks to be verified independently, allowing for fixing or retrieving of just the bad chunks. This is as opposed to using a single hash value to hash the entire piece of data, in which case a failure means all the data has to be fixed or retrieved again.

Looking for a good definition of a map, and if maps can be implemented using trees

I'm going through some potential interview questions, one of which being can you implement a map or a linked list using a tree.
But even after some time googling I don't have a clear idea exactly what a map is, how it differs from an array or a hash table for example. Could anybody provide a clear description.
Can it, and a linked list be written as a tree?
A Map, aka Dictionary or associative array, is a data structure that allows you to look up a value using a key.
A Java Map can be implemented as a HashMap or a TreeMap; that suggests that hash map is one possible implementation and yes, you can implement a Map as a tree.
Can it (a map), and a linked list be written as a tree?
Maps are usually represented with arrays. To determine where an entry goes in a map, you need to compute its key. Look here for a better explanation.
Trees (with an undetermined number of nodes) can be implemented using lists (see here for further discussion). Lists are not usually implemented as trees.
I'd encourage you to get this book which is a classic on data structures and will give you alot of really great information.

Resources