sorting a bivalued list - algorithm

If I have a list of just binary values containing 0's and 1's like the following 000111010110
and I want to sort it to the following 000000111111 what would be the most efficient way to do this if you also know the list size? Right now I am thinking to have one counter where I just count the number of 0's as I traverse the list from beginning to end. Then if I divide the listSize by numberOfZeros I get numberOfOnes. Then I was thinking instead of reordering the list starting with zeros, I would just create a new list. Would you agree this is the most efficient method?

Your algorithm implements the most primitive version of the classic bucket sort algorithm (its counting sort implementation). It is the fastest possible way to sort numbers when their range is known, and is (relatively) small. Since zeros and ones is all you have, you do not need an array of counters that are present in the bucket sort: a single counter is sufficient.

If you have numeric values, you can use the assembly instruction bitscan (BSF in x86 assembly) to count the number of bits. To create the "sorted" value you would set the n+1 bit, then subtract one. This will set all the bits to the right of the n+1 bit.

Bucket sort is a sorting algorithm as it seems.
I dont think there is a need for such operations.As we know there is no Sorting algorithm faster than N*logN . So by default it is wrong.
And all that because all you got to do is what you said in the very beginning.Just traverse the list and count the Zero's or the One's that will give you O(n) complexity.Then just create a new array with the counted zero's in the beginning followed by the One's.Then you have a total of N+N complexity that gives you
O(n) complexity.
And thats only because you have only two values.So neither quick sort or any other sort can do this faster.There is no faster sorting than NLog(n)

Related

How to quickly search for a specified element in an ordered array consisting of only two types of elements?

The array mentioned in the question are as follows:
[1,1,...,1,1,-1,-1,...,-1,-1]
How to quickly find the index of the 1 closest to -1?
Note: Both 1 and -1 will exist at the same time, and the number of 1 and -1 is large.
For example, for an array like this:
[1,1,1,1,1,-1,-1,-1]
the result should be 4.
The fastest way I can think of is binary search, is there a faster way?
With the current representation of the data, binary search is the fastest way I can thing of. Of course, you can cache and reuse the results in constant time since the answer is always the same.
On the other hand if you change the representation of the array to some simple numbers you can find the next element in constant time. Since the data can always be mapped to a binary value, you can reduce the whole array to 2 numbers. The length of the first partition and the length of the second partition. Or the length of the whole array and the partitioning point. This way you can easily change the length of both partitions in constant time and have access to the next element of the second partition in constant time.
Of course, changing the representation of the array itself is a logarithmic process since you need to find the partitioning point.
By a simple information theoretic argument, you can't be faster than log(n) using only comparisons. Because there are n possible outcomes, and you need to collect at least log(n) bits of information to number them.
If you have extra information about the statistical distribution of the values, then maybe you can exploit it. But this is to be discussed on a case-by-case basis.

Making radix sort in-place - trying to understand how

I'm going through all the known / typical sorting algorithms (insertion, bubble, selection, quick, merge sort..) and now I just read about radix sort.
I think I have understood its concept but I still wonder how it could be done in-place? Let me explain how I understood it:
It's made up of 2 phases: Partitioning and Collecting. They will be executed alternately. In Partitioning phase we will split the data into each.. let me call these bucket. In Collecting phase we will collect the data again. Both phases will be executed for each position of the keys to be sorted. So the amount of cycles is based on the size of the keys (Let's rather say amount of digits if we for example want sort integers).
I don't want explain the 2 phases too much in detail because it would be too long and I hope you will read it till here because I don't know how to do this algorithm in-place..
Maybe you can explain with words instead of code? I need to know it for my exam but I couldn't find anything explaining on the internet, at least not in an easy, understandable way.
If you want me to explain more, please tell me. I will do anything to understand it.
Wikipedia is (sometimes) your friend: https://en.wikipedia.org/wiki/Radix_sort#In-place_MSD_radix_sort_implementations.
I quote the article:
Binary MSD radix sort, also called binary quicksort, can be
implemented in-place by splitting the input array into two bins - the
0s bin and the 1s bin. The 0s bin is grown from the beginning of the
array, whereas the 1s bin is grown from the end of the array. [...]
. The most significant
bit of the first array element is examined. If this bit is a 1, then
the first element is swapped with the element in front of the 1s bin
boundary (the last element of the array), and the 1s bin is grown by
one element by decrementing the 1s boundary array index. If this bit
is a 0, then the first element remains at its current location, and
the 0s bin is grown by one element. [...] . The 0s bin and the 1s bin are
then sorted recursively based on the next bit of each array element.
Recursive processing continues until the least significant bit has
been used for sorting.
The main information is: it is a binary and recursive radix sort. In other words:
you have only two buckets, let's say 0 and 1, for each step. Since the algorithm is 'in- place' you swap elements (as in quicksort) to put each element in the right bucket (0 or 1), depending on its radix.
you process recursively: each bucket is split into two buckets, depending on the next radix.
It is very simple to understand for unsigned integers: you consider the bits from the most significant to the least significant. It may be more complex (and overkill) for other data types.
To summarize the differences with quicksort algorithm:
in quicksort, your choice of a pivot defines two "buckets": lower than pivot, greater than pivot.
in binary radix sort, the two buckets are defined by the radix (eg most significant bit).
In both cases, you swap elements to put each element in its "bucket" and process recursively.

Does counting sort range always need to be in [0,k]?

Can I do a counting sort on a small range of numbers say A=[7,9,12,15] from a huge pool of numbers, which I know will consist of only the numbers in the small array? Or does the small range always have to be [0..k].
I can do counting sort on the array A by saying [0..15] but it does not make sense.
And what if A=[100,750,452]
So I guess it is feasible.
I would like some inputs please.
Your question isn't very clear, but here it goes. From your example A=[7,9,12,15] the range be [0..15] and would require addition space of size k=15 (and another result array of A[length]. Since n (A[length]) is 4, the overall runtime would be theta(k + n). Counting sort is a "space-time tradeoff" algo, but if used in your case it wouldn't make any sense. Since, there isn't any tradeoff. Counting sort should be use when you have k=Big-O(n), which means the maximum value in your A[] is less than the size of A[]. btw, I believe the algorithm would still sort your example correctly.

Finding the repeated element

In an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?
I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?
This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.
What calculations can you do on a range of integers whose answer you can determine ahead of time?
Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)
EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.
We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:
Sum(1..N) = 1/2 * (1 + N) * Count(1..N).
Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.
EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.
If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.
If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.
you may try to use bits as hashmap:
1 at position k means that number k occured before
0 at position k means that number k did not occured before
pseudocode:
0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
if bitarray[num]
return num
else
bitarray[num] = 1
end
The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.
Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.
Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.
But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.
Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.
Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources