Find number(s) repeated k times in unsorted array - algorithm

We have array that contain integer number, I would like to find the numbers that repeated k time in this array. The array is not sorted, and the numbers are not bounded.
Example,
A(20, 6, 99, 3, 6, 2, 1,11,41, 31, 99, 6, 7, 8, 99, 10, 99, ,6)
Find the numbers repeated more than 3 times.
Answer: 6,99
possible answer using bit wise operations (xor) or combination? Efficiency in running time Big(o) is required as well as the space capacity.
This is not homework, its simply interesting problem.

Patrick87 probably has the most straightforward answer in the comments, so I'll give another approach.
As you iterate over your list, you will both create and insert into a map an element (id, value), where id is equivalent to the number, and the value matched a count initialized to 1 upon insertion. If the value is already present in the map, you just increment the counter.
Insertions into your map will take O(log k) time where k is the size of the map. Your total map creation time is therefore O(n log n).
After the map is created, you can iterate over it, and output any number whose count == target count.
Time complexity = O(n) + O(n log n) + O(n) = O(n log n)
Space complexity = O(n) + O(n) = O(n)
If you're looking for better space complexity, you're not going to get it... Even if the numbers are read from a stream, you need to track individual values which is O(n).

Related

Time complexity question, i think it is impossible with current data

The question is :
. Let A be an array of integers in the range {1, ..., n}. Give an O(n)
algorithm that gets rid of duplicate numbers, and sorts the elements
of A in decreasing order of frequency, starting with the element that
appears the most. For example, if A = [5, 1, 3, 1, 7, 7, 1, 3, 1, 3],
then the output should be [1, 3, 7, 5].
the thing is, if we want to know how many time each number from 1 to n, appears we need to run of A which his length is m (m = A.length, because its unknow to us).
with bucket-sort , while m = O(n), its possible.
i think there is a problem in the question, because if m = θ(n), or even m = Ω(n).
so basically i think that without classify what m is, its impossible to achive O(n).
if someone know a way to solve this problem i would be glad.
thanks
Sorting is something else. I.e., with the radix sort you can possibly gain O(kn) time which is closer to the linear if k is a constant.
The main concern should be If you can somehow manage to run your
overall summation in O(N) time then you will still gain i.e., O(radixSort) + |O(n)| ~ |O(kn+n)| ~ |O(kn)| in the end.
If you think an approach like this. Take the elements as keys of a hashtable and their sums as the values of the hashtable.
foreach (elements-i in the array){
// element is already in the hashTable
if (hashMap.contains(element-i)){
//take its value (its sum) and update it.
hashMap.put (element-i, hashMap.get(elements-i)+ 1);
}
// element-i hasn't been there
// so put it
else {
// key-value
hashMap.put(element-i, 1); // sum is initialized as 1
}
}
This runs in O(N) time (Worst case scenario). In your case the elements are not sufficient enough to generate some collisions during the hashing so in fact hashMap.put , hashMap.contains or hashMap.get run in O(1) time.
Finally. You can just choose any sorting methods to sort the hash table. And whatever the sorting time complexity produced will be the time complexity for this entire process.
I agree with you that the problem as stated isn’t possible - O(n) time is not sufficient to even read all the elements of the array, which means you can’t necessarily find all the distinct elements in the given time bound.
However, if you assume the array has size O(n), then as you’ve noted this can be done with a counting sort.
You may want to contact whoever gave you this problem to point out that detail. :-)
Hope this helps!

Time complexity of sort with a limit

Let's take three different sorts: Selection, Bubble, and Quick. Here is how they perform with an array of size n:
Selection: O(n2) in the average and worst cases
Bubble: O(n2) in the average and worst cases
Quicksort: O(n log n), O(n2) in the average and worst cases
With a limit of L applied to them, what would be the time complexity of the sort? My guess was the answer would be: O(nL) and O(n log L) -- is that correct? Why or why not? Additionally, is there a particular type of array sort that performs better than others when doing a sort-with-limit?
For example:
a = [1, 4, 2, 7, 4, 22, 49, 0, 2]
sort(a, limit=4)
==> [0, 1, 2, 2]
Selection sort:
The time complexity for selection sort is going to be O(nL). In selection sort, we append the next smallest element in the unsorted part of the list to the sorted part of the list. Since we only need L sorted elements, we only need to iterate L times. In each iteration, we have to iterate through the unsorted part of the list to find the smallest element, so we need a total of n*L iterations. This leads to O(nL).
Bubble sort:
The time complexity for selection sort is going to be O(nL). The regular bubble sort puts the current largest element to the end of the list after each iteration. We can modify the bubble sort such that in each iteration we put the smallest element to the front of the list. Instead of starting from the start of the array, we start from the end of the array. We only need L iterations to find the L sorted elements, so the time complexity becomes O(nL).
Quick sort:
Quick sort divides the array into two parts and sorts each part using recursion. The time complexity will still be O(log(n)*n) because there is no "check point" in quick sort. We never have a partial sorted list that contains the L smallest elements when we are doing quick sort.
Merge sort:
The idea here is similar to that of quick sort. We have to reach the base case of the recursion and merge the results to get a fully sorted list, so we still have O(log(n)*n) running time.

Range median query in an array in O(1) time using preprocessing

In classroom we learnt about RMQ to LCA to RMQ problems and how to support the operation range minimum query in O(1) time. As an exercise my professor has assigned me to support the operation : Range median query in O(n) space, O(n) preprocessing time and O(1) select time.
Let's say I have an array A that contains 8, 7, 9, 2, 6, 4, 5, 3. Given median(i, j) we should get the median between the ith and jth elements (inclusive) after sorting the array. A sorted is 2, 3, 4, 5, 6, 7, 8, 9. For example median(2,6) = A[4] = 6 (because median of 4, 5, 6, 7, 8 is 6).
I found other solutions that suggest using segment trees but the complexity is not O(1) in those cases. Is it possible to solve this problem in a similar manner that we solve the RMQ to LCA to RMQ problem?
One option would be to use a non-comparative sorting algorithm. Examples I know of being radix sort (O(wn), where w is the word size) and counting sort (O(n+k), where k is the maximum key value). Both of these are linear with respect to input size.
Then, you could just look up the correct position within the list, which is an O(1) operation. Both sorting methods are within your space requirements -- radix sort is O(n+k) and counting sort is O(k).
This isn't possible with comparisons. If it were, you could comparison sort N elements in O(N) time by preprocessing the input and computing median(i, i) for each i from 0 to N-1.
You probably misunderstood the task you were assigned - you were probably supposed to compute medians for subarrays of the original array, not of a sorted version of the array.

storing k largest ints from set S in O(log k) time O(k) space

i have an exam with a question as follows:
Let S be a dynamic set of integers. At the beginning, S is empty. Then, new integers
are added to it one by one, but never deleted. Let k be a fixed integer.
Describe an algorithm to
maintain the k largest integers in S. Your algorithm must use O(k) space at all times, no matter
how large |S| is (note that |S| increases continuously, but your space cannot).
Furthermore, it must
process every integer insertion in O(log k) time.
For example, suppose that k = 3, and that the sequence of integers inserted is 83, 21, 66, 5, 24,
76, 92, 32, 43... Your algorithm must be keeping {83, 66, 24} after the insertion of 24, {83, 66, 76}
after the insertion of 76, and {83, 76, 92} after the insertion of 43.
without S being sorted or structured im unsure how i would be able to complete this??
You can solve it by using min heap of size k. A min-heap is a complete binary tree in which the value in each internal node is lesser than or equal to the values in the children of that node. But all you have to do is just restrict the size of the heap and every time you get a new element just check if that element is greater than the root element if not check the other element that are already in the heap, if so the replace that element with the new element. As you have asked it requires O( k) space and insertion time is also O(log k). Hope it helps you.

Inset number from a non-sorted list of numbers into a sorted list of number

I have a list A which its elements are sorted from smallest to biggest. For example:
A = 1, 5, 9, 11, 14, 20, 46, 99
I want to insert the elements of a non-sorted list of numbers inside A while keeping the A sorted. For example:
B = 0, 77, 88, 10, 4
is going to be inserted in A as follows:
A = 0, 1, 4, 5, 9, 10, 14, 20, 46, 77, 88, 99
What is the best possible solution to this problem?
Best possible is too subjective depending on the definition of best. From big-O point of view, having an array A of length n1 and array B of length n2, you can achieve it in max(n2 * log(n2), n1 + n2).
This can be achieved by sorting the array B in O(n log n) and then merging two sorted arrays in O(n + m)
Best solution depends on how you define the best.
Even for time complexity, it still depends on your input size of A and B. Assume input size A is m and input size of B is n.
As Salvador mentioned, sort B in O(nlogn) and merge with A in O(m + n) is a good solution. Notice that you can sort B in O(n) if you adopt some non-comparison based sort algorithm like counting sort, radix sort etc.
I provide another solution here: loop every elements in B and do a binary search in A to find the insertion position and then insert. The time complexity is O(nlog(m + n)).
Edit 1: As #moreON points out, the binary search and then insert approach assume you list implementation support at least amortized O(1) for insert and random access. Also found that the time complexity should be O(nlog(m + n)) instead of O(nlogm) as the binary search took more time when more elements are added.

Resources