fastest search algorithm to search sorted array - algorithm

I have an array which only has values 0 and 1. They are stored separately in the array. For example, the array may have first 40% as 0 and remaining 60% as 1. I want to find out the split point between 0 and 1. One algorithm I have in mind is binary search. Since performance is important for me, not sure if binary search could give me the best performance. The split point is randomly distributed. The array is given in the format of 0s and 1s splitted.

The seemingly clever answer of keeping the counts doesn't hold when you are given the array.
Counting is O(n), and so is linear search. Thus, counting is not optimal!
Binary search is your friend, and can get things done in O(lg n) time, which as you may know is way better.
Of course, if you have to process the array anyways (reading from a file, user input etc.), make use of that time to just count the number of 1s and 0s and be done with it (you don't even have to store any of it, just keep the counts).
To drive the point home, if you are writing a library, which has a function called getFirstOneIndex(sortZeroesOnesArr: Array[Integer]): Integer that takes a sorted array of ones and zeroes and returns the position of the first 1, do not count, binary search.

Related

How to quickly search for a specified element in an ordered array consisting of only two types of elements?

The array mentioned in the question are as follows:
[1,1,...,1,1,-1,-1,...,-1,-1]
How to quickly find the index of the 1 closest to -1?
Note: Both 1 and -1 will exist at the same time, and the number of 1 and -1 is large.
For example, for an array like this:
[1,1,1,1,1,-1,-1,-1]
the result should be 4.
The fastest way I can think of is binary search, is there a faster way?
With the current representation of the data, binary search is the fastest way I can thing of. Of course, you can cache and reuse the results in constant time since the answer is always the same.
On the other hand if you change the representation of the array to some simple numbers you can find the next element in constant time. Since the data can always be mapped to a binary value, you can reduce the whole array to 2 numbers. The length of the first partition and the length of the second partition. Or the length of the whole array and the partitioning point. This way you can easily change the length of both partitions in constant time and have access to the next element of the second partition in constant time.
Of course, changing the representation of the array itself is a logarithmic process since you need to find the partitioning point.
By a simple information theoretic argument, you can't be faster than log(n) using only comparisons. Because there are n possible outcomes, and you need to collect at least log(n) bits of information to number them.
If you have extra information about the statistical distribution of the values, then maybe you can exploit it. But this is to be discussed on a case-by-case basis.

Making radix sort in-place - trying to understand how

I'm going through all the known / typical sorting algorithms (insertion, bubble, selection, quick, merge sort..) and now I just read about radix sort.
I think I have understood its concept but I still wonder how it could be done in-place? Let me explain how I understood it:
It's made up of 2 phases: Partitioning and Collecting. They will be executed alternately. In Partitioning phase we will split the data into each.. let me call these bucket. In Collecting phase we will collect the data again. Both phases will be executed for each position of the keys to be sorted. So the amount of cycles is based on the size of the keys (Let's rather say amount of digits if we for example want sort integers).
I don't want explain the 2 phases too much in detail because it would be too long and I hope you will read it till here because I don't know how to do this algorithm in-place..
Maybe you can explain with words instead of code? I need to know it for my exam but I couldn't find anything explaining on the internet, at least not in an easy, understandable way.
If you want me to explain more, please tell me. I will do anything to understand it.
Wikipedia is (sometimes) your friend: https://en.wikipedia.org/wiki/Radix_sort#In-place_MSD_radix_sort_implementations.
I quote the article:
Binary MSD radix sort, also called binary quicksort, can be
implemented in-place by splitting the input array into two bins - the
0s bin and the 1s bin. The 0s bin is grown from the beginning of the
array, whereas the 1s bin is grown from the end of the array. [...]
. The most significant
bit of the first array element is examined. If this bit is a 1, then
the first element is swapped with the element in front of the 1s bin
boundary (the last element of the array), and the 1s bin is grown by
one element by decrementing the 1s boundary array index. If this bit
is a 0, then the first element remains at its current location, and
the 0s bin is grown by one element. [...] . The 0s bin and the 1s bin are
then sorted recursively based on the next bit of each array element.
Recursive processing continues until the least significant bit has
been used for sorting.
The main information is: it is a binary and recursive radix sort. In other words:
you have only two buckets, let's say 0 and 1, for each step. Since the algorithm is 'in- place' you swap elements (as in quicksort) to put each element in the right bucket (0 or 1), depending on its radix.
you process recursively: each bucket is split into two buckets, depending on the next radix.
It is very simple to understand for unsigned integers: you consider the bits from the most significant to the least significant. It may be more complex (and overkill) for other data types.
To summarize the differences with quicksort algorithm:
in quicksort, your choice of a pivot defines two "buckets": lower than pivot, greater than pivot.
in binary radix sort, the two buckets are defined by the radix (eg most significant bit).
In both cases, you swap elements to put each element in its "bucket" and process recursively.

binary search with duplicate elements in the array

I was wondering how do we handle duplicate elements in an array using binary search. For instance, I have an array like 1 1 1 2 2 3 3 . And I am interested in looking for the last occurrence of 2.
According to a post I read before, we can first use binary search to find 2, and then scan through the adjacent elements. This takes about o(log(n)+k). So the worst case is when k = n. Then it takes O(n) time. Is there any way to improve the performance of worst time. Thanks.
Do a binary search for 2.5. In other words, if the value you're searching for is N, then your code should treat N like it's too small, and N+1 like it's too large. The main difference in the algorithm is that it can't get lucky and terminate early (when it finds the value). It has to run all the way till the end, when the high and low indexes are equal. At that point, the index you seek should be no more than 1 away from the final high/low index.
The easiest approach would be to do an upper-bound binary search. This is exactly like the binary search you mention, except instead of trying to find the first instance of a number, it first the first instance of a number which is greater than the one provided. The difference between them is little more than switching a < to a <=.
Once you find the first instance of a number which is greater than yours, step back one index, and look at the value there. If it's a 2, then you found the last 2. If it's anything else, then there were no 2's in the array.

Which sorting algorithm should I use for a list with a lot of replications?

I want to sort an array of 1 million integers. What would be the best algorithm to use knowing that the universe of the array's integers are from 1 to 100? Note that this means that there are a lot of items replicated. Furthermore, the array is randomly distributed.
You create an array of 100 elements (with one for each possible value) and simply count how many there are of each. Running time: O(n), with each element of the original array accessed only once, so you're unlikely to find a faster one. :)
Or to give it its proper name, use a counting sort.

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources