Understanding these questions about binary search on linear data structures? - data-structures

The answers are (1) and (5) but I am not sure why. Could someone please explain this to me and why the other answers are incorrect. How can I understand how things like binary/linear search will behavior on different data structures?
Thank you

I am hoping you already know about binary search.
(1) True-
Explanation
For performing binary search, we have to get to middle of the sorted list. In linked list to get to the middle of the list we have to traverse half of the list starting from the head, while in array we can directly get to middle index if we know the length of the list. So linked lists takes O(n/2) time which can be done in O(1) by using array. Therefore linked list is not the efficient way to implement binary search.
(2)False
Same explanation as above
(3)False
Explanation
As explained in point 1 linked list cannot be used efficiently to perform binary search but array can be used.
(4) False
Explanation
Binary search worst case time is O(logn). As in binary search we don't need to traverse the whole list. In first loop if key is lesser then middle value we will discard the second half of the list. Similarly now we will operate with the remaining list. As we can see with every loop we are discarding the part of the list that we don't have to traverse, so clearly it will take less then O(n).
(5)True
Explanation
If element is found in O(1) time, that means only one loop was run by the code. And in the first loop we always compare to the middle element of the list that means the search will take O(1) time only if the middle element is the key value.

In short, binary search is an elimination based searching technique that can be applied when the elements are sorted. The idea is to eliminate half the keys from consideration by keeping the keys in sorted order. If the search key is not equal to the middle element, one of the two sets of keys to the left and to the right of the middle element can be eliminated from further consideration.
Now coming to your specific question,
True
The basic binary search requires that mid-point can be found in O(1) time which can't be possible in linked list and can be way more expensive if the the size of the list is unknown.
True.
False
False
Binary search, mid-point calculation should be done in O(1) time which can only be possible in arrays , as the indices defined in arrays are known. Secondly binary search can only be applied to the arrays which are in sorted order.
False
The answer by Vaibhav Khandelwal, explained it nicely. But I wanted to add some variations of the array on to which binary search can be still applied. If the given array is sorted but rotated by X degree and contains duplicates, for example,
3 5 6 7 1 2 3 3 3
Then binary search still applies on it, but for the worst case, we needed we go linearly through this list to find the required element, which is O(n).
True
If the element found in the first attempt i.e situated at the mid-point then it would be processed in O(1) time.
MidPointOfArray = (LeftSideOfArray + RightSideOfArray)/ 2
The best way to understand binary search is to think of exam papers which are sorted according to last names. In order to find a particular student paper, the teacher has to search in that student name's category and rule-out the ones that are not alphabetically closer to the name of the student.
For example, if the name is Alex Bob, then teacher directly starts her search from "B", then take out all the copies that have surname "B", then again repeat the process, and skips the copies till letter "o" and so on till find it or not.

Related

Time Complexity of searching

there is a sorted array which is of very large size. every element is repeated more than once except one element. how much time will it take to find that element?
Options are:
1.O(1)
2.O(n)
3.O(logn)
4.O(nlogn)
The answer to the question is O(n) and here's why.
Let's first summarize the knowledge we're given:
A large array containing elements
The array is sorted
Every item except for one occurs more than once
Question is what is the time growth of searching for that one item that only occurs once?
The sorted property of the array, can we use this to speed up the search for the item? Yes, and no.
First of all, since the array isn't sorted by the property we must use to look for the item (only one occurrence) then we cannot use the sorted property in this regard. This means that optimized search algorithms, such as binary search, is out.
However, we know that if the array is sorted, then all items that have the same value will be grouped together. This means that when we look at an item we see for the first time we only have to compare it to the following item. If it's different, we've found the item we're looking for.
"see for the first time" is important, otherwise we would pick the first value since there will be a boundary between two groups of items where the two items are different.
So we have to move from one end of the array to the other, and compare each item to the following item, and this is an O(n) operation.
Basically, since the array isn't sorted by the property we're looking at, we're back to a linear search.
Must be O(n).
The fact that it's sorted doesn't help. Suppose you tried a binary method, jumping into the middle somewhere. You see that the value there has a neighbour that is the same. Now which half do you go to?
How would you write a program to find the value? You'd start at one end an check for an element whose neighbour is not the same. You'd have to walk the whole array until you found the value. So O(n)

Best sorting algorithm - Partially sorted linked list

Problem- Given a sorted doubly link list and two numbers C and K. You need to decrease the info of node with data K by C and insert the new node formed at its correct position such that the list remains sorted.
I would think of insertion sort for such problem, because, insertion sort at any instance looks like, shown bunch of cards,
that are partially sorted. For insertion sort, number of swaps is equivalent to number of inversions. Number of compares is equivalent to number of exchanges + (N-1).
So, in the given problem(above), if node with data K is decreased by C, then the sorted linked list became partially sorted. Insertion sort is the best fit.
Another point is, amidst selection of sorting algorithm, if sorting logic applied for array representation of data holds best fit, then same sorting logic should holds best fit for linked list representation of same data.
For this problem, Is my thought process correct in choosing insertion sort?
Maybe you mean something else, but insertion sort is not the best algorithm, because you actually don't need to sort anything. If there is only one element with value K then it doesn't make a big difference, but otherwise it does.
So I would suggest the following algorithm O(n), ignoring edge cases for simplicity:
Go forward in the list until the value of the current node is > K - C.
Save this node, all the reduced nodes will be inserted before this one.
Continue to go forward while the value of the current node is < K
While the value of the current node is K, remove node, set value to K - C and insert it before the saved node. This could be optimized further, so that you only do one remove and insert operation of the whole sublist of nodes which had value K.
If these decrease operations can be batched up before the sorted list must be available, then you can simply remove all the decremented nodes from the list. Then, sort them, and perform a two-way merge into the list.
If the list must be maintained in order after each node decrement, then there is little choice but to remove the decremented node and re-insert in order.
Doing this with a linear search for a deck of cards is probably acceptable, unless you're running some monstrous Monte Carlo simulation involving cards, that runs for hours or day, so that optimization counts.
Otherwise the way we would deal with the need to maintain order would be to use an ordered sequence data structure: balanced binary tree (red-black, splay) or a skip list. Take the node out of the structure, adjust value, re-insert: O(log N).

binary search with duplicate elements in the array

I was wondering how do we handle duplicate elements in an array using binary search. For instance, I have an array like 1 1 1 2 2 3 3 . And I am interested in looking for the last occurrence of 2.
According to a post I read before, we can first use binary search to find 2, and then scan through the adjacent elements. This takes about o(log(n)+k). So the worst case is when k = n. Then it takes O(n) time. Is there any way to improve the performance of worst time. Thanks.
Do a binary search for 2.5. In other words, if the value you're searching for is N, then your code should treat N like it's too small, and N+1 like it's too large. The main difference in the algorithm is that it can't get lucky and terminate early (when it finds the value). It has to run all the way till the end, when the high and low indexes are equal. At that point, the index you seek should be no more than 1 away from the final high/low index.
The easiest approach would be to do an upper-bound binary search. This is exactly like the binary search you mention, except instead of trying to find the first instance of a number, it first the first instance of a number which is greater than the one provided. The difference between them is little more than switching a < to a <=.
Once you find the first instance of a number which is greater than yours, step back one index, and look at the value there. If it's a 2, then you found the last 2. If it's anything else, then there were no 2's in the array.

Binary Search - Best and worst case

In an exam I see a question:
Which one of the following is true?
For a binary search, the best-case occurs when the target item is in the beginning of the search list.
For a binary search, the best-case occurs when the target is at the end of the search list.
For a binary search, the worst-case is when the target item is not in the search list.
For a binary search, the worst-case is when the target is found in the middle of the search list.
Well in my point of view both 1. and 3. are correct but it's only asking for one option. What am I missing?
3. is indeed correct, as you will need to go through the algorithm and terminate at the "worst" stop clause, where the list is empty, needed log(n) iterations.
1. is not correct. The best case is NOT when the first element is the target, it is when the middle element is the target, as you compare the middle element to the target, not the first element, so if the middle element is the target - the algorithm will finish in one iteration.
I think 1. is not correct. For each iteration, we compare the middle item of the current search list with the target. So if the target item is at the beginning of the search list, we need the maximum search time.
If we are assuming that the search list is sorted then option 3 will be correct, because each time we divide list into two parts and we go for one part out of two. With this we get log n level. If element is not found then it takes O(log n) to check which is worst case of binary search.
If the list is not sorted and we want to apply binary search then first we have to sort the list.
We can search element with doing insertion sort. In that case 1 & 3 will both be correct.
Correctness of option 1:
When we insert first element from list and at that time we check whether this matches a given element or not - if it matched then it will be best case O(1).
Correctness of option 3:
If we go through through all element in list with doing sorting and searching and still haven't found the element then it will be wort case O(n²).

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources