The question is :
. Let A be an array of integers in the range {1, ..., n}. Give an O(n)
algorithm that gets rid of duplicate numbers, and sorts the elements
of A in decreasing order of frequency, starting with the element that
appears the most. For example, if A = [5, 1, 3, 1, 7, 7, 1, 3, 1, 3],
then the output should be [1, 3, 7, 5].
the thing is, if we want to know how many time each number from 1 to n, appears we need to run of A which his length is m (m = A.length, because its unknow to us).
with bucket-sort , while m = O(n), its possible.
i think there is a problem in the question, because if m = θ(n), or even m = Ω(n).
so basically i think that without classify what m is, its impossible to achive O(n).
if someone know a way to solve this problem i would be glad.
thanks
Sorting is something else. I.e., with the radix sort you can possibly gain O(kn) time which is closer to the linear if k is a constant.
The main concern should be If you can somehow manage to run your
overall summation in O(N) time then you will still gain i.e., O(radixSort) + |O(n)| ~ |O(kn+n)| ~ |O(kn)| in the end.
If you think an approach like this. Take the elements as keys of a hashtable and their sums as the values of the hashtable.
foreach (elements-i in the array){
// element is already in the hashTable
if (hashMap.contains(element-i)){
//take its value (its sum) and update it.
hashMap.put (element-i, hashMap.get(elements-i)+ 1);
}
// element-i hasn't been there
// so put it
else {
// key-value
hashMap.put(element-i, 1); // sum is initialized as 1
}
}
This runs in O(N) time (Worst case scenario). In your case the elements are not sufficient enough to generate some collisions during the hashing so in fact hashMap.put , hashMap.contains or hashMap.get run in O(1) time.
Finally. You can just choose any sorting methods to sort the hash table. And whatever the sorting time complexity produced will be the time complexity for this entire process.
I agree with you that the problem as stated isn’t possible - O(n) time is not sufficient to even read all the elements of the array, which means you can’t necessarily find all the distinct elements in the given time bound.
However, if you assume the array has size O(n), then as you’ve noted this can be done with a counting sort.
You may want to contact whoever gave you this problem to point out that detail. :-)
Hope this helps!
Related
I am not sure how to do this. Given a list of numbers and a number k, return all pairs of numbers from the list that add up to k. only pass through the list once.
For example, given [10, 15, 3, 7] and k of 17. The program should return 10 + 7.
How do you order and return every pair while only going through the list once.
Use a set to keep track of what you've seen. Runtime O(N), Space: O(N)
def twoAddToK(nums, k):
seen = set()
N = len(nums)
for i in range(N):
if k - nums[i] in seen:
return True
seen.add(nums[i])
return False
As an alternative to Shawn's code, which uses a set, there is also the option of sorting the list in O(N log N) time (and possibly no extra space, if you are allowed to overwrite the original input), and then applying an O(N) algorithm to solve the problem on a sorted list
While asymptotic complexity slightly favors using hash sets in terms of time, since O(N) is better than O(N log N), I am ready to bet that sorting + single-pass lookup is considerably faster in practice.
I had this question in interview which I couldn't answer.
You have to find first unique element(integer) in the array.
For example:
3,2,1,4,4,5,6,6,7,3,2,3
Then unique elements are 1, 5, 7 and first unique of 1.
The Solution required:
O(n) Time Complexity.
O(1) Space Complexity.
I tried saying:
Using Hashmaps, Bitvector...but none of them had space complexity O(1).
Can anyone tell me solution with space O(1)?
Here's a non-rigorous proof that it isn't possible:
It is well known that duplicate detection cannot be better than O(n * log n) when you use O(1) space. Suppose that the current problem is solvable in O(n) time and O(1) memory. If we get the index 'k' of the first non-repeating number as anything other than 0, we know that k-1 is a repeated and hence with one more sweep through the array we can get its duplicate making duplicate detection a O(n) exercise.
Again it is not rigorous and we can get into a worst case analysis where k is always 0. But it helps you think and convince the interviewer that it isn't likely to be possible.
http://en.wikipedia.org/wiki/Element_distinctness_problem says:
Elements that occur more than n/k times in a multiset of size n may be found in time O(n log k). Here k = n since we want elements that appear more than once.
I think that this is impossible. This isn't a proof, but evidence for a conjecture. My reasoning is as follows...
First, you said that there is no bound on value of the elements (that they can be negative, 0, or positive). Second, there is only O(1) space, so we can't store more than a fixed number of values. Hence, this implies that we would have to solve this using only comparisons. Moreover, we can't sort or otherwise swap values in the array because we would lose the original ordering of unique values (and we can't store the original ordering).
Consider an array where all the integers are unique:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
In order to return the correct output 1 on this array, without reordering the array, we would need to compare each element to all the other elements, to ensure that it is unique, and do this in reverse order, so we can check the first unique element last. This would require O(n^2) comparisons with O(1) space.
I'll delete this answer if anyone finds a solution, and I welcome any pointers on making this into a more rigorous proof.
Note: This can't work in the general case. See the reasoning below.
Original idea
Perhaps there is a solution in O(n) time and O(1) extra space.
It is possible to build a heap in O(n) time. See Building a Heap.
So you built the heap backwards, starting at the last element in the array and making that last position the root. When building the heap, keep track of the most recent item that was not a duplicate.
This assumes that when inserting an item in the heap, you will encounter any identical item that already exist in the heap. I don't know if I can prove that . . .
Assuming the above is true, then when you're done building the heap, you know which item was the first non-duplicated item.
Why it won't work
The algorithm to build a heap in place starts at the midpoint of the array and assumes that all of the nodes beyond that point are leaf nodes. It then works backward (towards item 0), sifting items into the heap. The algorithm doesn't examine the last n/2 items in any particular order, and the order changes as items are sifted into the heap.
As a result, the best we could do (and even then I'm not sure we could do it reliably) is find the first non-duplicated item only if it occurs in the first half of the array.
OP's question original doesn't mention the limit of the number(although latter add number can be negative/positive/zero). Here I assume one more condition:
The number in array are all smaller than array length and
non-negative.
Then, giving a O(n) time, O(1) space solution is possible and seems like a interview question, and the the test case OP gives in the question comply to above assumption.
Solution:
for (int i = 0; i < nums.length; i++) {
if (nums[i] != i) {
if (nums[i] == -1) continue;
if (nums[nums[i]] == nums[i]) {
nums[nums[i]] = -1;
} else {
swap(nums, nums[i], i);
i--;
}
}
}
}
for (int i = 0; i < nums.length; i++) {
if (nums[i] == i) {
return i;
}
}
The algorithm here is considering the original array as bucket in bucket sort. Put numbers into its bucket, if more than twice, mark it as -1. Using another loop to find the first number that has nums[i] == i
Suppose A={1,2,3,4}, p={36,3,97,19}, sort A using p as sort keys. You can get {2,4,1,3}.
It is an example in the book introducton to algorithms. It says it can be done in nlogn.
Can anyone give me some idea about how it can be done? My thought is you need to keep track of each element in p to find where it ends up, like p[1] ends up at p[3] then A[1] ends up at A[3]. Can anyone use merge sort or other nlogn sorting to get this done?
I'm new to algorithm and find it a little intimidating :( thanks for any help.
Construct an index array:
i = { 0, 1, 2, 3 }
Now, while you are sorting p, make the same changes to the index array i.
When you're done, you'll have:
i = { 1, 3, 0, 2 }
Sorting two arrays takes at most twice as long as sorting one (and actually, if you're only counting comparisons you don't have to do any additional comparisons, just data swaps in two arrays instead of one), so that doesn't change the Big-O complexity of the overall sort because O( 2n log n ) = O(n log n).
Now, you can use those indices to construct the sorted A array in linear time by simply iterating through the sorted index array and looking up the element of A at that index. This takes O( n ) time.
The runtime complexity of your overall algorithm is at worst: O( n + 2n log n ) = O( n log n )
Of course you can also skip index array entirely and simply treat the array A in the same way, sorting it along side p.
I don't see this difficult, since complexity of a sorting algorithm is usually measured on number of comparisons required you just need to update the position of elements in array A according to the elements in B. You won't need to do any comparison in addition to ones already needed to sort B so complexity is the same.
Every time you move an element, just move it in both arrays and you are done.
I have a list of some objects and I want to iterate through them in a specific sequence for a particular number as returned by the following function. What the following does is, removes each number based on the modulo of the hash number to the list size and generates a sequence.
def genSeq(hash,n):
l = range(n)
seq = []
while l:
ind = hash % len(l)
seq.append(l[ind])
del l[ind]
return seq
Eg: genSeq(53,5) will return [3, 1, 4, 2, 0]
I am presenting the algo in python for easy understanding. I am supposed to code in c++. The complexity in this form in O(n^2) both for vector and list. (we either pay for removing or for access). Can this be made any better ?
A skip list would give you O(log n) access and removal. So your total traversal time would be O(n log n).
I'd like to think there is a linear solution, but nothing jumps out at me.
The sequence
[hash % (i+1) in range(len(l))]
can be seen as a number in factoradic.
There is a bijection between permutations and factoradic numbers.
The mapping from factoradic numbers to permutations is described in the top answer under the section "Permuting a list using an index sequence". Unfortunately the algorithm provided is quadratic. However there is a commenter which points to a data-structure which would make the algorithm O(nlog(n)).
You should not delete from your vector, but swap with the end and decrease a fill pointer. Think about the Fisher-Yates shuffle.
I was just interviewed with a question, and I'm curious what the answer ought to be. The problem was, essentially:
Say you have an unsorted list of n integers. How do you find the k minimum values in this list? That is, if you have a list of [10, 11, 24, 12, 13] and are looking for the 2 minimum values, you'd get [10, 11].
I've got an O(n*log(k)) solution, and that's my best, but I'm curious what other people come up with. I'll refrain from polluting folks brains by posting my solution and will edit it in in a little while.
EDIT #1: For example, a function like:
list getMinVals(list &l, int k)
EDIT #2: It looks like it's a selection algorithm, so I'll toss in my solution as well; iterating over the list, and using a priority queue to save the minimum values. The spec on the priority queue was that the maximum values would end up at the top of the priority queue, so on comparing the top to an element, the top would get popped and the smaller element would get pushed. This assumed the priority queue had an O(log n) push and an O(1) pop.
This is the quickSelect algorithm. It's basically a quick sort where you only recurse for one part of the array. Here's a simple implementation in Python, written for brevity and readability rather than efficiency.
def quickSelect(data, nLeast) :
pivot = data[-1]
less = [x for x in data if x <= pivot]
greater = [x for x in data if x > pivot]
less.append(pivot)
if len(less) < nLeast :
return less + quickSelect(greater, nLeast - len(less))
elif len(less) == nLeast :
return less
else :
return quickSelect(less, nLeast)
This will run in O(N) on average, since at each iteration, you are expected to reduce the size of data by a multiplicative constant. The result will not be sorted. The worst case is O(N^2), but this is dealt with in essentially the same way as a quick sort, using things like median-of-3.
This is usually in the algorithm books under selection algorithms or "linear selection". Here's the specific section on min/max k values in a list. It's O(nlog(k)).