Usage of data structures - data-structures

I have a common question about the usage of data structures in coding assessments. When giving any coding assessments (with hackerrank etc) for example if an array is passed as an argument to a function which needs to be completed, is it acceptable to restore the data to hashmap or other data structures according to the problem? Well I know there is no restriction explicitly but while evaluating performance would that matters?

Best approach would be to try to avoid copying the received array into some other data structure if this data structure of yours does not really give you an upper hand with regards to performance. However, if the new data structure enhances the performance of your algorithm, then YES you should do it without hesitation as it would only be a linear operation ( O(n) ). Though this is just my humble opinion, hope you get a better answer for this.

There have no straight forward answer for these type of questions. The answer totally depends on the problem you are solving.
For example, if someone asks should I use linear search or binary search to find a value from a unsorted array. Even if the binary search is much more faster than the linear search, still using binary search is not always efficient.
For instance, if he/she has to search in the array for just one time then using binary search will give him/her complexity of O(nlogn) as the array needs to be sorted first, where he/she can search using linear search with O(n) complexity.
I hope you got the answer :)

Related

What is the difference between an online sorting algorithm and an external sorting algorithm?

What is the difference between online sorting algorithm and external sorting algorithm ? Are they same or different?
An online sorting algorithm is one that will work if the elements to be sorted are provided one at a time with the understanding that the algorithm must keep the sequence sorted as more and more elements are added in. Algorithms that assume that the entire input will be given in advance, such as heapsort, will not work as online algorithms because they presume that they know all the elements in advance. On the other hand, an algorithm like insertion sort is online, since it purely works from the left to the right and doesn't need to see the entire array as it's working until it tries to process the very last element.
An external sorting algorithm is one where the goal is to sort data, typically provided in advance, that is so large that it cannot fit into main memory. While external sorting algorithms typically don't keep all the data to be sorted in memory at once, they usually assume that they can load any data that they need into memory at any time.
A good way of thinking about the difference is that in an online sorting algorithm, you should assume that you're trying to sort a sequence that is being generated dynamically - not all the data exists prior to the sort starting. In an external sorting algorithm, all the data already exists, but there's so much of it that you can't load everything into memory at once.
Hope this helps!

Which searching technique can be used if the array is unsorted?

Binary search performs search in O(log n). But, it can be used only if the array is sorted.
Which is the best searching technique if the array is unsorted?
If you're only doing a few searches, then a basic linear search is about the best you can do.
If you're going to search very often, it's usually better to sort, then use a binary search (or, if the distribution of the contents if fairly predictable, an interpolation search).
If your data is unsorted, you can use a hash table to access your data in O(1) time.
You can do a linear search.
But the issue with linear search is that it has performance issues, it takes a lot of time.
So I'd suggest to sort the array if possible and then use binary search.
If you want even better latency, then try interpolation search, which is a bit optimized version of typical binary Search.

What is the performance (Big-O) for removeAll() of a treeset?

I'm taking a Java data structures course atm. One of my assignment asks me to choose a data structure of my choice and write a spell checker program. I am in the process of checking the performance of the different data structures.
I went to the api for treeset and this is what it says...
"This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains)."
would that include removeAll()?
how else would I be able to figure this out
thank you in advance
It would not include removeAll(), but I have to disagree with polkageist's answer. It is possible that removeAll() could be executed in constant time depending on the implementation, although it seems most likely that the execution would happen in linear time.
I think that NlogN would be if it was implemented in pretty much the worst way. If you are removing each element, there is no need to search for elements. Any element that you have needs to be removed, so there's no need to search.
Nope. For an argument collection of size k, the worst-case upper bound of removeAll() is, of course, O(k*log n) - because each of the elements contained in the argument collection have to be removed from the tree set (this requires at least searching for them), each of this searches yielding a cost of log n.

Complexity in using Binary search and Trie

given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs.
priorties: 1. speed. 2. memory
I already know I can use (n is number of words, m is average length of the words)
1. a trie, time is O(log(n)), space(best case) is O(log(nm)), space(worst case) is O(nm).
2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m)
I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?
It is O(m) time for the trie, and up to O(mlog(n)) for the binary search. The space is asymptotically O(nm) for any reasonable method, which you can probably reduce in some cases using compression. The trie structure is, in theory, somewhat better on memory, but in practice it has devils hiding in the implementation details: memory needed to store pointers and potentially bad cache access.
There are other options for implementing a set structure - hashset and treeset are easy choices in most languages. I'd go for the hash set as it is efficient and simple.
I think HashMap is perfectly fine for your case, since the time complexity for both put and get operations is O(1). It works perfectly fine even if you dont have a sorted list.!!!
Preprocessing is ok since I will be calling > this function many times over different
inputs.
As a food for thought, do you consider creating a set from the input data and then searching using particular hash? It will take more time process for the first time to build a set but if number of inputs is limited and you may return to them then set might be good idea with O(1) for "contains" operation for a good hash function.
I'd recommend a hashmap. You can find an extension to C++ for this in both VC and GCC.
Use a bloom filter. It is space efficient even for very large data and it is a fast rejection technique.

Finding an appropriate data structure

I have N keys.
I need to find a data structure which i can do with the following operations :
building it in O(N)
finding min in O(1)
deleting the median in O(logn)
finding the n/2+7-th biggest number
I thought about using a minimum heap (building is O(n),minimum is O(1) - root).
however, I'm having hard time finding a way to do 3 and 4.
I think the median suppose to be on of the leaves, but that's as far as i reached.
A popular question asked in Data Structures 1 exams/hws/tutorials.
I'll try to give you some hints, if they don't suffice, comment, and I'll give you more hints.
Remember that you don't have to use just one data structure, you can use several data structures.
Recall the definition of a median: n/2 of the numbers are larger, and n/2 of the numbers are smaller
What data structures do you know that are built in O(n), and complex operations on them are O(logn) or less? - Reread the tutorials slides on these data structures.
It might be easier for you to solve 1+3 seperately from 1+2, and then think about merging them.
When you say building in O(n), do you mean that addition has to be O(n), or that you have to build a collection of elements in O(n) such that addition has to be O(1)?
You could augment pretty much any data structure with an extra reference to retrieve the minimal element in constant time.
For #3, it sounds like you need to be able to find the median in O(lg n) and delete in O(1), or vice versa.
For #4, you didn't specify the time complexity.
To other posters - this is marked as homework. Please give hints rather than posting the answer.
Simple sorted Array would solve the problem for #2 #3 and #4. But the construction of it would take O(nn). However, there are no restrictions put on space complexity. I am thinking hard to use Hashing concept during the construction of the data structure which would bring down the order to O(n).
Hope this helps. Will get back if I find a better solution

Resources