Question regarding Algorithm Design Manual - Data Structure for the Dictionary - algorithm

I started reading Algorithm Design Manual, and while reading it I came across one line which I am not getting. Can someone please clarify me what does author mean here? The line is:
Sorted linked lists or arrays – Maintaining a sorted linked list is usually
not worth the effort unless you are trying to eliminate duplicates, since we
cannot perform binary searches in such a data structure. A sorted array will
be appropriate if and only if there are not many insertions or deletions.
This line is in context with choosing data structure for dictionary.
The point that I am not getting is, why does author says that "Maintaining a sorted linked list is usuallynot worth the effort unless you are trying to eliminate duplicates, since we
cannot perform binary searches in such a data structure"
From what I understood I googled to see if we can binary search on sorted arrays and based on what I found it looks like we can. So I am not sure.
Can someone please help me understand this?
Thanks so much.

You cannot perform binary search on linked list efficiently because you cannot randomly seek in it in constant time. To find the midpoint you have to do n/2 steps (traverse the list). This adds a great overhead and makes lists unsuitable for binary search data structures.

Related

Iterating over classes in a disjoint set data structure

I've implemented a disjoint set data structure for my program and I realized I need to iterate over all equivalence classes.
Searching the web, I didn't find any useful information on the best way to implement that or how it influences complexity. I'm quite surprised since it seems like something that would be needed quite often.
Is there a standard way of doing this? I'm thinking about using a linked list (I use C so I plan to store some pointers in the top element of each equivalence class) and updating it on each union operation. Is there a better way?
You can store pointers to top elements in hash-based set or in any balanced binary search tree. You only need to delete and add elements - both operations in these structures run in O(1)* and in O(logN) respectively. In linked list they run in O(N).
Your proposal seems very reasonable. If you thread a doubly-linked list through the representatives, you can splice out an element from the representatives list in time O(1) and then walk the list each time you need to list representatives.
#ardenit has mentioned that you can also use an external hash table or BST to store the representatives. That's certainly simpler to code up, though I suspect it won't be as fast as just threading a linked list through the items.

Hash-maps or search tree?

The problem is as follows: Given is a list of cities and their countries, population and geo-coordinates. You should read this data, save it and answer it in an endless loop of the following type:
Request: a prefix (e.g., free).
Answer: all states beginning with this prefix ("case-insensitive")
and their associated data (country + population + geo-coordinates).
The cities should be sorted by population (highest population first).
Which data structure are the most suitable for the described problem ?
First Part : My Thoughts are hanging between Trie and Hashmap. Although i tend to the Trie more because i'm dealing with prefix requests , and Trie is basically according to Wikipedia :
"a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is a kind of search tree—an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings".
in addition to that in terms of Storage and reading data Trie has the advantage over Hash-maps.
Second part: returning the sorted cities by population would be a little bit challenging when we speak about Time Complexity.If i'm thinking in the right direction i should save the values of the keys as lists and it will be easier to sort just the returning list , so i don't have to save it sorted to save some times.
Please share you thoughts and correct me if i'm wrong .
There are pros of cons of picking vanilla tries and vanilla hashmaps. In general, for autocomplete systems, the structure of a trie is extremely useful because you're usually searching for prefixes and the user would like to see the words that begin with the string that they have just entered.
However, there is a method to make the best use of both of these data structures, it is called a Hash Trie (implementation: http://www.sanfoundry.com/java-program-implement-hash-trie/). So the way you would implement this is by using the structure of the trie, but the final node is the actual string it refers to. In python, this is done using dictionaries instead of lists while implementing the trie.
For the second half of the question, a list would be your best bet, in essence a list of tuples (population, city) and sort by the population and return the cities. Regarding it being "easier" to sort, I'm not sure if I agree with this, easy is a relevant term and there's really no way of saying that it's easier than, maybe storing it in a tree and then returning the Pre-Order Traversal of the tree. Essentially, if you're using comparison based sort, it won't get better than nlog (n).

When is a linked list the best data structure to use

like the title really. My question is can you give an example where a linked list is the BEST data structure to use. I have been struggling to think of any really, and in my code I pretty much always just use hashmaps or lists etc.
http://bigocheatsheet.com/ Here you can see the cheat sheet of Big O's for various operations. A linked list is no better than a stack or a queue in terms of complexity. And so I wanted to know when someone might use a linked list over these for example? A perfect answer will say "Imagine I was trying to do XYZ, if I did it with an array it would look like this {enter some code}, however, if I do it with a linked list, it will look like this {enter more code}. The complexities or space are substantially better for the linked list." etc.
I don't want an answer where someone tells me WHAT a linked list is. I know what a linked list is and how they are implemented.
Thanks
Consider if you have a line-up of people, and somewhere in the middle you want to add a lot of people. If you used a conventional ArrayList, you would need to shift all elements after it, so O(N) because of indexing per person! In a LinkedList, each person would be O(1), with O(N) to get to the middle. Linked Lists are very quick in adding elements in the middle, as you don't need to reindex anything and just adjust the local pointer.
Someone dd a survey of the C++ standard template library and found that the linked list was the least used of all the common basic structures. So you're right they they are not much used. They're useful when you don't need random access to an array, when you don't know N or have a reasonably tight upper bound on N, and when insertions and deletions are common and time critical. An insertion in the middle is O(N), as with an array, but the actual operation is a lot cheaper (pointer dereference rather than memory shifting), insertions at the beginning are O(1), and at the end if you keep an end pointer.

Which data structure to use for storing a paragraph?

Recently an interviewer asked me
1. Which data structure should be used if you need store a paragraph, traverse it later and find a word ?
2. Which data structure to use if you can also add, edit or delete words in that paragraph ?
Can someone help me with the answer ?
And if possible can someone also post similar questions with logical answers on data structures as am preparing for interviews.
I think what you are looking for is a Trie. A trie a tree whose nodes store unique combinations of letters (prefixes), and whose edges point to combinations of letters following those prefixes (suffixes). Tries can be built from text documents to give O(L) search, insertion, and deletion time (L is length of word you are searching for, adding, or deleting). Tries are used commonly in autocomplete and document search algorithms.

Self-sorted data structure with random access

I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
https://en.wikipedia.org/wiki/Skip_list
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
Stack
Heap
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.

Resources