Find top 10 integers among 100 different files [closed]

Find top 10 integers among 100 different files [closed] - algorithm

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am a fresher and preparing for interviews. In my recent interview I was asked a question, for which I couldn't find suitable answer.
I was given some 100 files, each file is containing large number of comma separated integers. I had to find the top 10 integers among the whole files. I tried to solve it using heap. But I got confused with the time complexity of the process. Any help will be appreciated, thanks.

I think you are on the right track with using a heap data structure.
You could process the files in parallel and for each file you could maintain a min-heap of size 10.
As you iterate through a file you insert a value into the min-heap until it is full (size 10) then for values in positions 11 through n
if current_value > min_heap.current()
min_heap.extract()
min_heap.insert(current_value)
You have to iterate through n values and the worst case scenario is if the file is sorted in ascending order. In that case you will have to extract the min value and insert a new value for all the values in positions 11 thru n. The heap operations will be O(log n) giving you an overall running time of O(n * log n) for each file.
At this point you have m (# of files) min-heaps each of size 10. Here you can use a final min heap to store the ten largest numbers contained in the m min-heaps. This computation will be O(m) because the all the heaps at this point will be of max size 10, a constant.
Overall the running time will be O(n * log n + m). m could be much smaller than n so amongst friends we could say O(n * log n).
Even if you don't do the first step in parallel it would be O(m * n * log n + m), but once again if n dominates m we could say O(n * log n).

Related

how to find the maximum of every subset of a set with O(n log n) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I’m having trouble with this question.
Let X be a set of n keys
Let S be a set of m subsets of X
Find a way to find the maximum key of every subset in S with O(n log n) comparisons.
I know I can find maximum with quick sort by O(n) and binary sort by O(log n), but I’m unsure of how to proceed further. Any help would be appreciated!

If a subset is defined by the enumeration of its elements, the largest element is obtained in time proportional to the number of elements and this is optimal.
For m subsets, the total work is the total number of elements, Σni, which is still optimal.
If a subset is specified by a binary mask of length n, you can't avoid O(nm) operations.

Let X be a set of n keys
Let S be a set of m subsets of X
Find the maximum of every subset in S with O(n log n) comparisons.
Solution :
Construct a Max-Heap for each of the m subsets of X.
Use Heap-Sort on each of the m Max-Heaps to find the Maximum of each of the m subsets of X.
The number of comparisons in a Heap-Sort is O(n log n).
So, to Max-Heapify m sets, and to find the maximum of each subset (the root of each Max-Heap), the total number of comparisons would be O(mn log n), but if m is a constant, we can approximate it to O(n log n).

How is O(n log n) different then O(log n)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Researching big O notation, I understand the concept of O(log n) as a binary search and O(n log n) as a quick sort.
Can anyone put into layman's terms what the main difference in runtime is between these two? and why that is the case?
they seem intuitively to be similarly related

Basically: a factor of N.
A binary search only touches a small number of elements. If there's a billion elements, the binary search only touches ~30 of them.
A quicksort touches every single element, a small number of times. If there's a billion elements, the quick sort touches all of them, about 30 times: about 30 billion touches total.

See how Log(n) is flat (not literally but figuratively, in comparison to other functions), while nLog(n) has crossed 600 for a value of n = 100. That's how different they are.

On simple terms and visualization, they are kind of the same in sorting algorithms, but quick sort as O(n log n) has a flaw in some situations, Quick Sort most situations is log n, but on special cases is n², that's why n before log n . So Quick sort for small amount of sorting is very good, but for millions/billions its not, better use Merge Sort for that kind of sorting.

Get k-th largest values [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have some problem. I must add a lot of different values and just get only k-th largest in the end. How can I effectively implement that and what algorithm should I use?

Algorithm:
Create a binary maximum heap, and add each one of the first K values into the heap.
For each one of the remaining N-K values, if it larger than the last value in the heap:
Put it instead of the last value, and bubble it up in order to resort the heap.
Extract all the (K) values from the heap into a list.
Complexity:
O(K)
O((N-K)×log(K))
O(K×log(K))
If N-K ≥ K, then the overall complexity is O((N-K)×log(K)).
If N-K < K, then the overall complexity is O(K×log(K)).

(Based on comments that you do not want to store all the numbers you have seen...)
Keep a running list (sorted) of the k largest you have seen so far. As you get new numbers, look to see if it is larger than the least element in the list. If it is, remove the least element and insert (sorted) the new element into the list of k largest. Your original list of k (when you've seen no numbers) would consist of k entries of negative infinity.

first build max-heap using those elements which is O(n) time.
then extract k-1 elements in O(klogn) time.

What is the smallest value of n such that an algorithm whose running time is 100n^2 runs faster than an algorithm whose running time is 2^n? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What is the smallest value of n such that an algorithm whose running time is 100n^2 runs faster than an algorithm whose running time is 2^n on the same machine?
The Scope
Although I am interested in the answer, I am more interested in how to find the answer step by step (so that I can repeat the process to compare any two given algorithms if at all possible).
From the MIT Press Algorithms book

You want the values of n where 100 × n2 is less than 2 × n.
Which is the solution of 100 × n2 - 2 × n < 0, which happens to be 0 < n < 0.02.
One thousand words:
EDIT:
The original question talked about 2 × n, not 2n (see comments).
For 2n, head to https://math.stackexchange.com/questions/182156/multiplying-exponents-solving-for-n
Answer is 15

The first thing you have to know, is what running time means. If we're talking about algorithms theoretically, the running time of an algorithm is the number of steps (or the amount of time) it takes to finish depending on the size of the input (where the size of the input is for example the number of bits, but also other measures are sometimes considered). In this sense, the algorithm which requires the least number of steps is the fastest.
So in your two formulas, n is the size of the input, and 100 * n^2 and 2^n are the number of steps the two algorithms run if given an input of size n.
On first sight, the 2^n algorithm looks much faster than the 100 * n^2 algorithm. For example, for n = 4, 100*4^2 = 1600 and 2^4 = 16.
However, 2^n is an exponential function, whereas 100 * n^2 is a polynomial function. That means that when n is large enough, it will be the case that 2^n > 100 * n^2. So you will have to solve the unequality 100 * n^2 < 2^n. This will already be the case for a fairly small n, so you can just start evaluating the functions, starting at n=5, and you will reach the answer to the question in a few minutes.

Binary Search vs. Sequential Search / Break-even Point [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm really struggling with this homework question. My professor does a terrible job of explaining anything. Help?
There is a trade-off between sorting a list and then using binary search versus just using sequential search on an unsorted list. The choice depends on how many times the list will be searched. Assume that sequential search requires n comparisons in worst case, sorting requires n*log n comparisons, and binary search requires log n comparisons
in worst case (where log is log base 2, as we have discussed). Given an unsorted list of 1024 elements (i.e. log 1024 = 10), how many searches s would be required for sorting to be worthwhile? Suppose we consider the average case for sequential search requires n/2 comparisons. Now what is the break-even point for s?
Hint: Write an expression for the number of comparisons required for s searches by each method; then set them equal and solve for s.

You are comparing the time that is needed to perform an initial sort (cost: n*log(n)) and subsequent binary search (cost: log(n)). So, if you want to search s times, you will pay an initial n*log(n) to sort the list and log(n) for each (binary) search. That is to say:
c1 = (n*log(n)) + (s*log(n)) = (n+s)*log(n)
Instead, if you perform linear search, there is no "initial cost", but each search will cost you n, so for s searches:
c2 = s*n
Obviously, for s and n small enough, c2 is smaller because there is no such initial cost, but it grows faster than c1. At a certain point c1 and c2 will cross. That is to say, c1 = c2.
n * log(n)
s * n = (n + s) * log(n) --> s * (n - log(n)) = n * log(n) --> s = ------------
n - log(n)
Well, you now have to discuss the equation above. This plot should tell you everything:

As a hint: the work done to sort n and then do k binary searches is given by
n log n + k log n
and the work required to do k sequential searches is
n k
If n = 1,000, for what value of k will the second quantity be smaller than the first?
Hope this helps!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find top 10 integers among 100 different files [closed] - algorithm

Related

how to find the maximum of every subset of a set with O(n log n) [closed]

How is O(n log n) different then O(log n)? [closed]

Get k-th largest values [closed]

What is the smallest value of n such that an algorithm whose running time is 100n^2 runs faster than an algorithm whose running time is 2^n? [closed]

Binary Search vs. Sequential Search / Break-even Point [closed]

Categories

Resources