binary search with duplicate elements in the array - algorithm

I was wondering how do we handle duplicate elements in an array using binary search. For instance, I have an array like 1 1 1 2 2 3 3 . And I am interested in looking for the last occurrence of 2.
According to a post I read before, we can first use binary search to find 2, and then scan through the adjacent elements. This takes about o(log(n)+k). So the worst case is when k = n. Then it takes O(n) time. Is there any way to improve the performance of worst time. Thanks.

Do a binary search for 2.5. In other words, if the value you're searching for is N, then your code should treat N like it's too small, and N+1 like it's too large. The main difference in the algorithm is that it can't get lucky and terminate early (when it finds the value). It has to run all the way till the end, when the high and low indexes are equal. At that point, the index you seek should be no more than 1 away from the final high/low index.

The easiest approach would be to do an upper-bound binary search. This is exactly like the binary search you mention, except instead of trying to find the first instance of a number, it first the first instance of a number which is greater than the one provided. The difference between them is little more than switching a < to a <=.
Once you find the first instance of a number which is greater than yours, step back one index, and look at the value there. If it's a 2, then you found the last 2. If it's anything else, then there were no 2's in the array.

Related

How to quickly search for a specified element in an ordered array consisting of only two types of elements?

The array mentioned in the question are as follows:
[1,1,...,1,1,-1,-1,...,-1,-1]
How to quickly find the index of the 1 closest to -1?
Note: Both 1 and -1 will exist at the same time, and the number of 1 and -1 is large.
For example, for an array like this:
[1,1,1,1,1,-1,-1,-1]
the result should be 4.
The fastest way I can think of is binary search, is there a faster way?
With the current representation of the data, binary search is the fastest way I can thing of. Of course, you can cache and reuse the results in constant time since the answer is always the same.
On the other hand if you change the representation of the array to some simple numbers you can find the next element in constant time. Since the data can always be mapped to a binary value, you can reduce the whole array to 2 numbers. The length of the first partition and the length of the second partition. Or the length of the whole array and the partitioning point. This way you can easily change the length of both partitions in constant time and have access to the next element of the second partition in constant time.
Of course, changing the representation of the array itself is a logarithmic process since you need to find the partitioning point.
By a simple information theoretic argument, you can't be faster than log(n) using only comparisons. Because there are n possible outcomes, and you need to collect at least log(n) bits of information to number them.
If you have extra information about the statistical distribution of the values, then maybe you can exploit it. But this is to be discussed on a case-by-case basis.

Check if string includes part of Fibonacci Sequence

Which way should I follow to create an algorithm to find out whether fibonacci sequence exists in a given string ?
The string includes only digits with no whitespaces and there may be more than one sequence, I need to find all of them.
If as your comment says the first number must have less than 6 digits, you can simply search for all positions there one of the 25 fibonacci numbers (there are only 25 with less than 6 digits) and than try to expand this 1 number sequence in both directions.
After your update:
You can even speed things up when you are only looking for sequences of at least 3 numbers.
Prebuild all 25 3-number-Strings that start with one of the 25 first fibonnaci-numbers this should give much less matches than the search for the single fibonacci-numbers I suggested above.
Than search for them (like described above and try to expand the found 3-number-sequences).
here's how I would approach this.
The main algorithm could search for triplets then try to extend them to as long a sequence as possible.
This leaves us with the subproblem of finding triplets. So if you are scanning through a string to look for fibonacci numbers, one thing you can take advantage of is that the next number must have the same number of digits or one more digit.
e.g. if you have the string "987159725844" and are considering "[987]159725844" then the next thing you need to look at is "987[159]725844" and "987[1597]25844". Then the next part you would find is "[2584]4" or "[25844]".
Once you have the 3 numbers you can check if they form an arithmetic progression with C - B == B - A. If they do you can now check if they are from the fibonacci sequence by seeing if the ratio is roughly 1.6 and then running the fibonacci iteration backwards down to the initial conditions 1,1.
The overall algorithm would then work by scanning through looking for all triples starting with width 1, then width 2, width 3 up to 6.
I'd say you should first find all interesting Fibonacci items (which, having 6 or less digits, are no more than 30) and store them into an array.
Then, loop every position in your input string, and try to find upon there the longest possible Fibonacci number (that is, you must browse the array backwards).
If some Fib number is found, then you must bifurcate to a secondary algorithm, consisting of merely going through the array from current position to the end, trying to match every item in the following substring. When the matching ends, you must get back to the main algorithm to keep searching in the input string from the current position.
None of these two algorithms is recursive, nor too expensive.
update
Ok. If no tables are allowed, you could still use this approach replacing in the first loop the way to get the bext Fibo number: Instead of indexing, apply your formula.

Understanding these questions about binary search on linear data structures?

The answers are (1) and (5) but I am not sure why. Could someone please explain this to me and why the other answers are incorrect. How can I understand how things like binary/linear search will behavior on different data structures?
Thank you
I am hoping you already know about binary search.
(1) True-
Explanation
For performing binary search, we have to get to middle of the sorted list. In linked list to get to the middle of the list we have to traverse half of the list starting from the head, while in array we can directly get to middle index if we know the length of the list. So linked lists takes O(n/2) time which can be done in O(1) by using array. Therefore linked list is not the efficient way to implement binary search.
(2)False
Same explanation as above
(3)False
Explanation
As explained in point 1 linked list cannot be used efficiently to perform binary search but array can be used.
(4) False
Explanation
Binary search worst case time is O(logn). As in binary search we don't need to traverse the whole list. In first loop if key is lesser then middle value we will discard the second half of the list. Similarly now we will operate with the remaining list. As we can see with every loop we are discarding the part of the list that we don't have to traverse, so clearly it will take less then O(n).
(5)True
Explanation
If element is found in O(1) time, that means only one loop was run by the code. And in the first loop we always compare to the middle element of the list that means the search will take O(1) time only if the middle element is the key value.
In short, binary search is an elimination based searching technique that can be applied when the elements are sorted. The idea is to eliminate half the keys from consideration by keeping the keys in sorted order. If the search key is not equal to the middle element, one of the two sets of keys to the left and to the right of the middle element can be eliminated from further consideration.
Now coming to your specific question,
True
The basic binary search requires that mid-point can be found in O(1) time which can't be possible in linked list and can be way more expensive if the the size of the list is unknown.
True.
False
False
Binary search, mid-point calculation should be done in O(1) time which can only be possible in arrays , as the indices defined in arrays are known. Secondly binary search can only be applied to the arrays which are in sorted order.
False
The answer by Vaibhav Khandelwal, explained it nicely. But I wanted to add some variations of the array on to which binary search can be still applied. If the given array is sorted but rotated by X degree and contains duplicates, for example,
3 5 6 7 1 2 3 3 3
Then binary search still applies on it, but for the worst case, we needed we go linearly through this list to find the required element, which is O(n).
True
If the element found in the first attempt i.e situated at the mid-point then it would be processed in O(1) time.
MidPointOfArray = (LeftSideOfArray + RightSideOfArray)/ 2
The best way to understand binary search is to think of exam papers which are sorted according to last names. In order to find a particular student paper, the teacher has to search in that student name's category and rule-out the ones that are not alphabetically closer to the name of the student.
For example, if the name is Alex Bob, then teacher directly starts her search from "B", then take out all the copies that have surname "B", then again repeat the process, and skips the copies till letter "o" and so on till find it or not.

fastest search algorithm to search sorted array

I have an array which only has values 0 and 1. They are stored separately in the array. For example, the array may have first 40% as 0 and remaining 60% as 1. I want to find out the split point between 0 and 1. One algorithm I have in mind is binary search. Since performance is important for me, not sure if binary search could give me the best performance. The split point is randomly distributed. The array is given in the format of 0s and 1s splitted.
The seemingly clever answer of keeping the counts doesn't hold when you are given the array.
Counting is O(n), and so is linear search. Thus, counting is not optimal!
Binary search is your friend, and can get things done in O(lg n) time, which as you may know is way better.
Of course, if you have to process the array anyways (reading from a file, user input etc.), make use of that time to just count the number of 1s and 0s and be done with it (you don't even have to store any of it, just keep the counts).
To drive the point home, if you are writing a library, which has a function called getFirstOneIndex(sortZeroesOnesArr: Array[Integer]): Integer that takes a sorted array of ones and zeroes and returns the position of the first 1, do not count, binary search.

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources