Find element of an array that appears only once in O(logn) time - algorithm

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?

using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]

If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.

In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...

This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).

This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};

I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

Related

Bubble sort variant - three adjacent number swapping

This problem appeared in code jam 2018 qualification round which has ended.
https://codejam.withgoogle.com/2018/challenges/ (Problem 2)
Problem description:
The basic operation of the standard bubble sort algorithm is to examine a pair of adjacent numbers and reverse that pair if the left number is larger than the right number. But our algorithm examines a group of three adjacent numbers, and if the leftmost number is larger than the rightmost number, it reverses that entire group. Because our algorithm is a "triplet bubble sort", we have named it Trouble Sort for short.
We were looking forward to presenting Trouble Sort at the Special
Interest Group in Sorting conference in Hawaii, but one of our interns
has just pointed out a problem: it is possible that Trouble Sort does
not correctly sort the list! Consider the list 8 9 7, for example.
We need your help with some further research. Given a list of N
integers, determine whether Trouble Sort will successfully sort the
list into non-decreasing order. If it will not, find the index
(counting starting from 0) of the first sorting error after the
algorithm has finished: that is, the first value that is larger than
the value that comes directly after it when the algorithm is done.
So a naive approach will be to apply trouble sort on the given list, apply normal sort on the list, and find the index of the first non-matching element. However, this would time out for very large N.
Here is what I figured:
The algorithm will compare 0th index with 2nd, 2nd with 4th and so on.
Similarly 1st with 3rd, 3rd with 5th and so on.
All the elements at odd index will be sorted with respect to odd index. Same for even indexed element.
So the issue would lie between two consecutive odd/even indexed element.
I can't think of a way to figure it out without doing an O(n^2) approach.
Is my approach any viable, or there is something easier?
Your observation is spot on. The algorithm presented in the problem statement will only compare( and swap ) the consecutive odd and even elements among themselves.
If you take that observation one step further, you can state that Trouble Sort is an algorithm that correctly sorts odd- and even-indexed elements of an array within themselves. (i.e. as if odd-indexed elements and even-indexed elements of an array A are two separate arrays B and C)
In other words, Trouble Sort does sort B and C correctly. The issue here is whether those arrays B and C of odd and even-indexed elements can be merged properly. You should check if sorting odd- and even-indexed elements among themselves is enough to make the entire array sorted.
This step is really similar to the merging step of MergeSort. The only difference is that, due to the indexing being a limiting factor on your operation, you know at all times from which array you will pick the top element. For a 1-indexed array A, during the merging step of B and C, at each step, you should pick the smallest previously unpicked element from B, and then C.
So, basically, if you sort B and C, which takes, O(NlogN) using an algorithm such as mergesort or heapsort, and then merge them in the manner described in the previous paragraph, which takes O(N), you end up with the same version of the array A after it has been processed by the Trouble Sort algorithm.
The difference is the time complexity. While Trouble Sort takes O(N^2) time, the operations described above takes O(NlogN) time. Once you end up with this array, then you can check in O(N) time if, for each consecutive indices i, j, A[i] < A[j] holds. The overall complexity of the algorithm would still be O(NlogN).
Below is a code sample in Python to demonstrate sort of a pseudocode of the algorithm I described above. There are a couple of minor differences in implementation due to Python arrays being 0-indexed. You may observe the execution of this code here.
def does_trouble_sort_work(A):
B, C = A[0::2], A[1::2]
B_sorted = sorted(B)
C_sorted = sorted(C)
j = k = 0
for i in xrange(len(A)):
if i % 2 == 0:
A[i] = B_sorted[j]
j += 1
else:
A[i] = C_sorted[k]
k += 1
trouble_sort_works = True
for i in xrange(1, len(A)):
if A[i-1] > A[i]:
trouble_sort_works = False
break
return trouble_sort_works

Binary search in 2 sorted integer arrays

There is a big array which consists of 2 small integer arrays written one at the end of another. Both small arrays are sorted by ascending. We have to find an element in big array as fast, as possible. My idea was to find the end of the left array by binsearch in big array and then implement 2 binsearches on small arrays. The problem is that I don't know how to find that end. If you have an idea, how to find element without finding borders of smaller arrays, you're welcome!
Information about arrays: both small arrays have integer elements, both are sorted by ascending, they both can have length from 0 to any positive integer number, but there can be only one copy of an element.
Here are some examples of big arrays:
1 2 3 4 5 6 7 (all the elements of the second array are bigger, than the maximum of the first array)
100 1 (both arrays have only one element)
1 3 5 2 4 6 or 2 4 6 1 3 5 (most common situations)
This problem is impossible to solve in guaranteed time complexity faster than O(n) and not possible to solve at all for certain arrays. Binary search runs in O(log n) for a sorted array, but the big array is not guaranteed to be sorted and will in the worst-case require one or more comparisions per element, which is O(n). The best guaranteed time complexity is O(n) with the trivial algorithm: compare every item with its neighbour until you find the "turning point" with A[i] > A[i+1]. However, if you use a breadth-first search, you may get lucky and find the "turning point" early.
Proof that the problem is unsolvable for some arrays: let the array M = [A B] be our big array. To find the point where the arrays meet we're looking for an index i where M[i] > M[i+1]. Now let A=[1 2 3] and B=[4 5]. There is no index in the array M for which the condition holds true, thus the problem is unsolvable for some arrays.
Informal proof for the former: let M=[A B] and A=[1..x] and B=[(x+1)..y] be two sorted arrays. Then swap the positions of element x and y in M. We have no way of finding the index of x without (in the worst case) checking every index, thus the problem is O(n).
Binary search relies on being able to eliminate half the solution space with each comparision, but in this case we cannot eliminate anything from the array and so we cannot do better than a linear search.
(From a practical standpoint, you should never do this in a program. The two arrays should be separate. If this isn't possible, append the length of either array to the bigger array.)
Edit: changed my answer after question was updated. It's possible to do it faster than linear time for some arrays, but not all possible arrays. Here's my idea for an algorithm using breadth-first search:
Start with the interval [0..n-1] where n is the length of the big array.
Make a list of intervals and put the starting interval in it.
For each interval in the list:
if the interval is only two elements and the first element is greater than the last
we found the turning point, return it
else if the interval is two elements or less
remove it from the list
else if the first element of the interval is greater than the last
turning point is in this interval
clear the list
split this interval in two equal parts and add them to the list
else
split this interval in two equal parts and replace this interval in the list with the two parts
I think a breadth-first approach will increase the odds of finding an interval where A[first] > A[last] early. Note that this approach will not work if the turning point is between two intervals, but it's something to get you started. I would test this myself, but unfortunately I don't have the time now.

Ordered list with O(1) random access and removal

Does there exist a data structure with the following properties:
Elements are stored in some order
Accessing the element at a given index takes O(1) time (possibly amortized)
Removing an element takes amortized O(1) time, and changes the indices appropriately (so if element 0 is removed, the next access to element 0 should return the old element 1)
For context, I reduced an algorithm question from a programming competition to:
Over m queries, return the kth smallest positive number that hasn't been returned yet. You can assume the returned number is less than some constant n.
If the data structure above exists, then you can do this in O(m) time, by creating a list of numbers 1 to n. Then, for each query, find the element at index k and remove it. During the contest itself, my solution ended up being O(m^2) on certain inputs.
I'm pretty sure you can do this in O(m log m) with binary search trees, but I'm wondering if the ideal O(m) is reachable. Stuff I've found online tends to be close, but not quite there - the tricky part is that the elements you remove can be from anywhere in the list.
well the O(1) removal is possible with linked list
each element has pointer to next and previous element so removal just deletes element and sets the pointers of its neighbors like:
element[ix-1].next=element[ix+1].prev
accessing ordered elements at index in O(1) can be done with indexed arrays
so you have unordered array like dat[] and index array like idx[] the access of element ix is just:
dat[idx[ix]]
Now the problem is to have these properties at once
you can try to have linked list with index array but the removal needs to update index table which is O(N) in the worst case.
if you have just index array then the removal is also O(N)
if you have the index in some form of a tree structure then the removal can be close to O(log(N)) but the access will be also about O(log(N))
I believe there is a structure that would do both of this in O(n) time, where n was the number of points which had been removed, and not the total size. So if the number you're removing is small compared to the size of the array, it's close to O(1).
Basically, all the data is stored in an array. There is also a priority queue for deleted elements. Initialise like so:
Data = [0, 1, 2, ..., m]
removed = new list
Then, to remove an element, you add it's original index (see below for how to get this) to the priority queue (which is sorted by size of element with smallest at the front), and leave the array as is. So removing the 3rd element:
Data = [0, 1, 2, 3,..., m]
removed = 2
Then what's now the 4th and was the 5th:
Data = [0, 1, 2, 3,..., m]
removed = 2 -> 4
Then what's now the 3rd and was the 4th:
Data = [0, 1, 2, 3,..., m]
removed = 2 -> 3 -> 4
Now to access an element, you start with it's index. You then iterate along the removed list, increasing the index by one each time, until you reach an element which is larger than the increased value of the index. This will give you the original index(ie. position in Data) of the element you're looking for, and is the index you needed for removal.
This operation of iterating along the queue effectively increases the index by the number of elements before it that were removed.
Sorry if I haven't explained very well, it was clear in my head but hard to write down.
Comments:
Access is O(n), with n number of removed items
Removal is approximately twice the time of access, but still O(n)
A disadvantage is that memory use doesn't shrink with removal.
Could potentially 're-initialise' when removed list is large to reset memory use and access and removal times. This operation takes O(N), with N total array size.
So it's not quite what OP was looking for but in the right situation could be close.

Quicksort: pivot position after one partition

I am reading about quicksort, looking at different implementations and I am trying to wrap my head around something.
In this implementation (which of course works), the pivot is chosen as the middle element and then the left and right pointer move to the right and left accordingly, swapping elements to partition around the pivot.
I was trying the array [4, 3, 2, 6, 8, 1, 0].
On the first partition, pivot is 6 and all the left elements are already smaller than 6, so the left pointer will stop at the pivot. On the right side, we will swap 0 with 6, and then 1 and 8, so at the end of the first iteration, the array will look like:
[4, 3, 2, 0, 1, 8, 6].
However, I was under the impression that after each iteration in quicksort, the pivot ends up in its rightful place, so here it should end up in position 5 of the array.
So, it is possible (and ok) that the pivot doesn't end up in its correct iteration or is it something obvious I am missing?
There are many possible variations of the quicksort algorithm. In this one it is OK for the pivot to be not in its correct place in its iteration.
The defining feature of every variation of the quicksort algorithm is that after the partition step, we have a part in the beginning of the array, where all the elements are less or equal to pivot, and a non-overlapping part in the end of the array where all the elements are greater or equal to pivot. There may also be a part between them, where every element is equal to pivot. This layout ensures, that after we sort the left part and the right part with recursive calls, and leave the middle part intact, the whole array will be sorted.
Notice, that in general elements equal to pivot may go to any part of the array. A good implementation of quicksort, that avoids quadratic time for the most obvious case, i.e. all equal elements, must spread elements equal to pivot between parts rationally.
Possible variants include:
The middle part includes only 1 element: the pivot. In that case pivot takes its final place in the array after the partition and won't be used in the recursive calls. That's what you meant by pivot taking its place in its iteration. For this approach the good implementation must move about half the elements equal to pivot to the left part and the other half to the right part, otherwise we would have quadratic time for an array with all equal elements.
There is no middle part. Pivot and all elements equal to it are spread between the left and the right part. That's what the implementation you linked does. Once again, in this approach about half of the elements equal to pivot should go to the left part, and the other half to the right part. This can also be mixed with the first variation, depending on whether we are sorting an array with an odd or an even number of elements.
Every element equal to pivot goes to the middle part. There are no elements equal to pivot in either left or right part. That's quite efficient and that's the example Wikipedia gives for solving the all-elements-equal problem. Arrays with all elements equal to each other are sorted in linear time in that case.
Thus, the correct and efficient implementation of quicksort is quite tricky (there is also a problem of choosing a good pivot, for which several approaches with different tradeoffs exist as well; or an optimisation of switching to another non-recursive sorting algorithm for smaller sub-array sizes).
Also, it seems that the implementation you linked to, may do recursive calls on overlapping subarrays:
if (i <= j) {
exchange(i, j);
i++;
j--;
}
For example, when i is equal to j, those elements will be swapped, and i will become greater than j by 2. After that 3 elements will overlap between the ranges of the following recursive calls. The code still seems to work correctly though.

Algorithm to generate a 'nearly sorted' or 'k sorted' list?

I want to generate some test data to test a function that merges 'k sorted' lists (lists where each element is at most k positions away from it's correct sorted position) into a single fully sorted list. I have an approach that works but I'm not sure how well randomized it is and I feel there should be a simpler / more elegant way to do this. My current approach:
Generate n random elements paired with an integer index.
Sort random elements.
Set paired index for each element to its sorted position.
Work backwards through the elements, swapping each element with an element a random distance between 1 and k positions behind it in the list. Only swap with the target element if its paired index is its current index (this avoids swapping an element that is already out of place and moving it further than k positions away from where it should be).
Copy the perturbed elements out into another list.
Like I say, this works but I'm interested in alternative / better approaches.
I think you could just fill an array with random integers and then run quicksort on it with a custom stopping condition.
If in a particular quicksort recursion your start and end indexes are less than k apart, then just return instead of continuing to recur.
Because of how quicksort works, every number in the start..end interval belongs somewhere in that region; worst case is that array[start] might really belong at array[end] (or vice versa) in truly sorted order. So, assuring that start and end are no more than k apart should be sufficient.
You can generate array of random numbers and then h-sort it like in shellsort, but without fiew last sorting steps when h is less then k.
Step 1: Randomly permute disjoint segments of length k. (Eg. 1 to K, k+1 to 2k ...)
Step 2: Permute conditionally again by swapping (that they don't break k-sorted assumption (1+t yo k+t, k+1+t to 1+2k+t ...) where t is a number between 1 and k (most preferably k/2)
Probably repeat step 2 multiple times with different t.
If I understand the problem, you want an algorithm to randomly pick a single k-sorted list of length n, uniformly selected from the universe U of all k-sorted lists of length n. (You will then run this algorithm m times to produce m lists as input test data.)
The first step is to count them. What is the size of U? |U|
The next step is to enumerate them. Create any one-to-one mapping F between the integers (1,2,...,|U|) and k-sorted lists of length n.
Then randomly select an integer x between 1 and |U| inclusive, and then apply F(x) to get the list.

Resources