Finding the k smallest odd integer - algorithm

Actually, I am teaching myself algorithm and here I am trying to solve this problem which is the following:
We have an array of n positive integers in an arbitrary order and we have k which is k>=1 to n. The question is to output k smallest odd integers. If the
number of odd integers in A is less than k, we should report all odd integers. For example,
if A = [2, 17, 3, 10, 28, 5, 9, 4, 12,13, 7] and k = 3, the output should be 3, 5, 9.
I want to solve this problem in O(n) time.
My current solution is to have another array with only odd numbers and then I apply this algorithm which is by finding the median and divide the list into L, Median , Right and compare the k as the following:
If |L|<k<= (|L|+|M|) Return the median
else if K<|L|, solve the problem recursively on (L)
else work on (R, k- (|L|+|M|)
Any help is appreciated.

Assuming the output can be in any order:
Create a separate array with only odd numbers.
Use a selection algorithm to determine the k-th item. One such algorithm is quickselect (which runs in O(n) on average), which is related to quicksort - it partitions the array by some pivot, and then recursively goes to one of the partitioned sides, based on the sizes of each. See this question for more details.
Since quickselect partitions the input, you will be able to output the results directly after running this algorithm (as Karoly mentioned).
Both of the above steps take O(n), thus the overall running time is O(n).
If you need the output in ascending order:
If k = n, and all the numbers are odd, then an O(n) solution to this would be an O(n) sorting algorithm, but no-one knows of such an algorithm.
To anyone who's considering disagreeing, saying that some non-comparison-based sort is O(n) - it's not, each of these algorithms have some other factor in the complexity, such as the size of the numbers.
The best you can do here, with unbounded numbers, is to use the approach suggested in Proger's answer (O(n + k log n)), or iterate through the input, maintaining a heap of the k smallest odd numbers (O(n log k)).

Related

Can someone clarify the difference between Quicksort and Randomized Quicksort?

How is it different if I select a randomized pivot versus just selecting the first pivot in an unordered set/list?
If the set is unordered, isnt selecting the first value in the set, random in itself? So essentially, I am trying to understand how/if randomizing promises a better worst case runtime.
I think you may be mixing up the concepts of arbitrary and random. It's arbitrary to pick the first element of the array - you could pick any element you'd like and it would work equally well - but it's not random. A random choice is one that can't be predicted in advance. An arbitrary choice is one that can be.
Let's imagine that you're using quicksort on the sorted sequence 1, 2, 3, 4, 5, 6, ..., n. If you choose the first element as a pivot, then you'll choose 1 as the pivot. All n - 1 other elements then go to the right and nothing goes to the left, and you'll recursively quicksort 2, 3, 4, 5, ..., n.
When you quicksort that range, you'll choose 2 as the pivot. Partitioning the elements then puts nothing on the left and the numbers 3, 4, 5, 6, ..., n on the right, so you'll recursively quicksort 3, 4, 5, 6, ..., n.
More generally, after k steps, you'll choose the number k as a pivot, put the numbers k+1, k+2, ..., n on the right, then recursively quicksort them.
The total work done here ends up being Θ(n2), since on the first pass (to partition 2, 3, ..., n around 1) you have to look at n-1 elements, on the second pass (to partition 3, 4, 5, ..., n around 2), you have to look at n-2 elements, etc. This means that the work done is (n-1)+(n-2)+ ... +1 = Θ(n2), quite inefficient!
Now, contrast this with randomized quicksort. In randomized quicksort, you truly choose a random element as your pivot at each step. This means that while you technically could choose the same pivots as in the deterministic case, it's very unlikely (the probability would be roughly 22 - n, which is quite low) that this will happen and trigger the worst-case behavior. You're more likely to choose pivots closer to the center of the array, and when that happens the recursion branches more evenly and thus terminates a lot faster.
The advantage of randomized quicksort is that there's no one input that will always cause it to run in time Θ(n log n) and the runtime is expected to be O(n log n). Deterministic quicksort algorithms usually have the drawback that either (1) they run in worst-case time O(n log n), but with a high constant factor, or (2) they run in worst-case time O(n2) and the sort of input that triggers this case is deterministic.
In quick sort, the pivot is always the right most index of the selected array whereas in Randomized quick sort, pivot can be any element in the array.

Generating data to test sorting algorithm

I would like to generate data to test sorting algorithms with. This accomplishes two things:
Find bugs. The output could easily be checked if it was in fact sorted correctly
Profile the code and find which situations take longer for which parts.
I asked the question How do you test speed of sorting algorithm? awhile ago, but this question focuses particularly on generating the data.
I am thinking of
sorted
reverse sorted
random
sorted but then make n inversions in randomly selected elements and see how changing n affects the run time
Any suggestions? Do any frameworks exist that would make this easier? I'm thinking JUnit could be useful.
In this question on comp sci se an answer makes it sound like adding inversions and counting them doesn't mean much:
The number of inversions might work for some cases, but is sometimes
insufficient. An example given in [3] is the sequence
$$\langle \lfloor n/2 \rfloor + 1, \lfloor n/2 \rfloor + 2, \ldots, n,
1, \ldots, \lfloor n/2 \rfloor \rangle$$
that has a quadratic number of inversions, but only consists of two
ascending runs. It is nearly sorted, but this is not captured by
inversions.
I'm not particularly strong in math and don't understand how the example illustrates what's wrong with counting the number of inversions? Is it just academic? How does it make sense to say "quadratic number of inversions"?
Using integer math, the $$...$$ sequence can represent an array:
1 2 n/2 n indices
n/2+1, n/2+2, ..., n, 1, 2, ... n/2 array values
So as stated, just two ascending sequences.
By the definition of inversion, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j. This means that all of the first n/2 elements of a, a[1 to n/2] are greater than all of the second n/2 elements of a, a[(n/2)+1 to n]. So that's (n/2)^2 = n^2/4 inversions which is quadratic.
The relationship between inversion count and sort time complexity depends on the sort algorithm. Using bubble sort on the example array would have time complexity O(n^2). Using generic merge sort on the array would be O(n log(n)), with an near best case comparison count. Using natural merge sort would find the two sorted runs and do a single merge pass for time complexity of O(n).

Given n numbers find minimum perimeter triangle

Encountered this problem in coding contest. Could think only O(n^2 log(n)) solution. I guess the expected was O(n log n).
I am given n numbers, I have to find 3 numbers that follow triangle inequality and have the smallest sum.
I hope this is quite easy to understand.
Eg.
10,2,5,1,8,20
Answer is 23 = (5+8+10)
The longest side should be the successor of the second longest; otherwise, we could shrink the longest and thus the perimeter. Now you can use your binary search to find the third side over O(n) possibilities instead of O(n^2) (and actually, you don't even need to search if you iterate from small to large, though the sort will still cost you).
I think the answer is something like this, assuming no duplicates in the numbers.
Sort the numbers. Then scan the numbers and take the first number that is smaller than the sum of the preceding two numbers. Call that x(n) . . . the nth position of the sorted series.
x(n) is one of the numbers, and so far we are O(n log(n)).
Then there are a limited number of previous choices. Then x(n-1) has to be one of the numbers, because x(n-2) + x(n-3) < x(n-1) < x(n). Then it is a simple scan up to x(n-1) to find the smallest number that matches. This could be at the beginning of the series, as in 2, 3, 8, 15, 16.
I think the analysis is essentially the same with duplicates.

In quick sort if split is 5 : n-5, then time complexity will be?

How do you the find complexity of quick sort when ratio between partition sizes is 5:n-5 or something like 1:19? I do not really understand how to calculate the complexity of the algorithm in these situations.
In general, keep the following in mind:
If you split an array into two pieces defined by some fixed ratio of a:b at each point, after O(log n) splits, the subarrays will be down to size 0.
If you split an array into two pieces where one size is a constant k, it will take Θ(n / k) splits to get the subarray sizes to drop to 0.
Now, think about the work that quicksort does at each level of the recursion. At each layer, it needs to do work proportional to the number of elements in the layer. If you use the first approach and have something like a 1/20 : 19/20 split, then there will be at most n elements per layer but only O(log n) layers, so the total work done will be O(n log n), which is great.
On the other hand, suppose that you always pull off five elements. Then the larger array at each step will have sizes n, n - 5, n - 10, n - 15, ..., 10, 5, 0. If you work out the math and sum this up, this works out to Θ(n2) total work, which is not very efficient.
Generally speaking, try to avoid splitting off a fixed number of elements at a time in quicksort. That gives you the degenerate case that you need to worry about.

Finding a specific ratio in an unsorted array. Time complexity

This is a homework assignment.
The goal is to present an algorithm in pseudocode that will search an array of numbers (doesn't specify if integers or >0) and check if the ratio of any two numbers equals a given x. Time complexity must be under O(nlogn).
My idea was to mergesort the array (O(nlogn) time) and then if |x| > 1 start checking for every number in desending order (using a binary traversal algorithm). The check should also take O(logn) time for each number, with a worst case of n checks gives a total of O(nlogn). If I am not missing anything this should give us a worst case of O(nlogn) + O(nlogn) = O(nlogn), within the parameters of the assignment.
I realize that it doesn't really matter where I start checking the ratios after sorting, but the time cost is amortized by 1/2).
Is my logic correct? Is there a faster algorithm?
An example in case it isn't clear:
Given an array { 4, 9, 2, 1, 8, 6 }
If we want to seach for a ratio of 2:
Mergesort { 9, 8, 6, 4, 2, 1 }
Since the given ratio is >1 we will search from left to right.
2a. First number is 9. Checking 9 / 4 > 2. Checking 9/6 < 2 Next Number.
2b. Second number is 8. Checking 8 / 4 = 2. DONE
The analysis you have presented is correct and is a perfectly good way to solve this problem. Sorting does work in time O(n log n), and 2n binary searches also takes O(n log n) time. That said, I don't think you want to use the term "amortized" here, since that refers to a different type of analysis.
As a hint for how to speed up your solution a bit, the general idea of your solution is to make it possible to efficiently query, for any number, whether that number exists in the array. That way, you can just loop over all numbers and look for anything that would make the ratio work. However, if you use an auxiliary data structure outside the array that supports fast access, you can possibly whittle down your runtime at the cost of increasing the memory usage. Try thinking about what data structures support very fast access (say, O(1) lookups) and see if you can use any of them here.
Hope this helps!
to solve this problem, only O(nlgn) is enough
step 1, sort the array. that cost O(nlgn)
step 2, check whether the ratio exists, this step only needs o(n)
u just need two pointers, one points to the first element(smallest one), another points to the last element(biggest one).
calculate the ratio.
if the ratio is bigger than the specified one, move the second pointer to its previous element.
if the ratio is smaller than the specified one, move the first pointer to its next element.
repeat the above steps until:
u find the exact ratio, or
either the first pointer reaches the end, or the second point reaches the beginning
The complexity of your algorithm is O(n²), because after sorting the array, you iterate over each element (up to n times) and in each iteration you execute up to n - 1 divisions.
Instead, after sorting the array, iterate over each element, and in each iteration divide the element by the ratio, then see if the result is contained in the array:
division: O(1)
search in sorted list: O(log n)
repeat for each element: n times
Results in time complexity O(n log n)
In your example:
9/2 = 4.5 (not found)
8/2 = 4 (found)
(1) Build a hashmap of this array. Time Cost: O(n)
(2) For every element a[i], search a[i]*x in HashMap. Time Cost: O(n).
Total Cost: O(n)

Resources