Binary Search vs. Sequential Search / Break-even Point [closed]

Binary Search vs. Sequential Search / Break-even Point [closed] - sorting

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm really struggling with this homework question. My professor does a terrible job of explaining anything. Help?
There is a trade-off between sorting a list and then using binary search versus just using sequential search on an unsorted list. The choice depends on how many times the list will be searched. Assume that sequential search requires n comparisons in worst case, sorting requires n*log n comparisons, and binary search requires log n comparisons
in worst case (where log is log base 2, as we have discussed). Given an unsorted list of 1024 elements (i.e. log 1024 = 10), how many searches s would be required for sorting to be worthwhile? Suppose we consider the average case for sequential search requires n/2 comparisons. Now what is the break-even point for s?
Hint: Write an expression for the number of comparisons required for s searches by each method; then set them equal and solve for s.

You are comparing the time that is needed to perform an initial sort (cost: n*log(n)) and subsequent binary search (cost: log(n)). So, if you want to search s times, you will pay an initial n*log(n) to sort the list and log(n) for each (binary) search. That is to say:
c1 = (n*log(n)) + (s*log(n)) = (n+s)*log(n)
Instead, if you perform linear search, there is no "initial cost", but each search will cost you n, so for s searches:
c2 = s*n
Obviously, for s and n small enough, c2 is smaller because there is no such initial cost, but it grows faster than c1. At a certain point c1 and c2 will cross. That is to say, c1 = c2.
n * log(n)
s * n = (n + s) * log(n) --> s * (n - log(n)) = n * log(n) --> s = ------------
n - log(n)
Well, you now have to discuss the equation above. This plot should tell you everything:

As a hint: the work done to sort n and then do k binary searches is given by
n log n + k log n
and the work required to do k sequential searches is
n k
If n = 1,000, for what value of k will the second quantity be smaller than the first?
Hope this helps!

Related

how to find the maximum of every subset of a set with O(n log n) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I’m having trouble with this question.
Let X be a set of n keys
Let S be a set of m subsets of X
Find a way to find the maximum key of every subset in S with O(n log n) comparisons.
I know I can find maximum with quick sort by O(n) and binary sort by O(log n), but I’m unsure of how to proceed further. Any help would be appreciated!

If a subset is defined by the enumeration of its elements, the largest element is obtained in time proportional to the number of elements and this is optimal.
For m subsets, the total work is the total number of elements, Σni, which is still optimal.
If a subset is specified by a binary mask of length n, you can't avoid O(nm) operations.

Let X be a set of n keys
Let S be a set of m subsets of X
Find the maximum of every subset in S with O(n log n) comparisons.
Solution :
Construct a Max-Heap for each of the m subsets of X.
Use Heap-Sort on each of the m Max-Heaps to find the Maximum of each of the m subsets of X.
The number of comparisons in a Heap-Sort is O(n log n).
So, to Max-Heapify m sets, and to find the maximum of each subset (the root of each Max-Heap), the total number of comparisons would be O(mn log n), but if m is a constant, we can approximate it to O(n log n).

Logarithms in Computer Science for Big O Notation? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have always had this question in my head, and have never been able to connect these two concepts so I am looking for some help in understanding Logarithms in Computer Science with respect to Big-O notation and algorithmic time complexity. I understand logarithms as a math concept as being able to answer the question, "what number do I need to raise this base to exponentially to get X?". For example, log2(16) tells us that we need to raise 2 to the 4th power to get 16. I also have a memorization-level understanding that O(log n) algorithms are faster than O(n) and other slower algorithms such as those that are exponential and that an example of an O(log n) algorithm is searching a balanced binary search tree.
My question is a little hard to state exactly, but I think it boils down to why is searching a balanced BST logarithmic and what makes it logarithmic and how do I relate mathematical logarithms with the CS use of the term? And a follow-up question would be what is the difference between O(n log n) and O(log n)?
I know that is not the clearest question in the world, but if someone could help me connect these two concepts it would clear up a lot of confusion for me and take me past the point of just memorization (which I generally hate).

When you are calculating Big O notation, you are calculating the complexity of an algorithm as the problem size grows.
For example, when performing a linear search of a list, the worst possible case is that the element is either in the last index, or not in the list at all, meaning your search will perform N steps, with N being the number of elements in the list. O(N).
An algorithm that will always take the same amount of steps to complete regardless of problem size is O(1).
Logarithms come into play when you are cutting the problem size as you move through an algorithm. For a BST, you start in the middle of a list. If the element to search for is smaller, you only focus on the first half of the list. If it is larger, you only focus on the second half. After only one step, you just cut your problem size in half. You continue cutting the list in half until you either find the element or can not proceed. (Note that a binary search assumes the list is in order)
Let's consider we are looking for 0 in the list below (A BST is represented as an ordered list):
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
We first start in the middle: 7
0 is less than 7 so we look in the first half of the list: [0,1,2,3,4,5,6]
We look in the middle of this list: 3
0 is less than 3 and our working list is now: [0,1,2]
So we look at 1. 0 is less than 1, so our list is now [0].
Given we have a working list of just 1 element, we are at the worst case. We either found the element, or it does not exist in the list. We were able to determine this in just four steps, looking at 7,3,1, and 0.
The problem size is 16 (number of elements in the list), which we represent as N.
In the worst case, we perform 4 comparisons (2^4 = 16 OR Log base 2 of 16 is 4)).
If we took a look at a problem size of 32, we would perform only 5 comparisons (2^5 = 32 OR Log base 2 of 32 is 5).
Therefor, the Big O for a BST is O(logN) (note that we use a base 2 for logarithms in CS).
For O(NlogN), the worst case is the problem size times the calculation of it's logarithm. Insertion sort, quick sort, and merge sort are all examples of O(NlogN)

In computer science, the big O notation indicates how fast the number of operations of an algorithm increases with a given parameter n of the requested problem statement. In a balanced binary search tree, n can be number of nodes in the tree. As you search through the tree, the algorithm needs to take a decision at each depth level of the tree. Since the number of nodes doubles at each level, the number of node in the tree n=2^d-1, where d is the depth of the tree. It is thus relatively intuitive that the number of decision that the algorithm takes is d-1 = log_{2}(n+1)-1. This shows that the complexity of the algorithm is of the order O(log(n)), which means that the number of operations is grows like log(n). As a function, log grows slower than n, that is as n becomes large log(n) is smaller than n, so an algorithm that is of time complexity O(log(n)) will be faster than one with complexity O(n), which is itself faster than O(n log(n)).

There are 2^n number of leaves in a BST. “n” is the hight of the tree. When you search, you check at each time the tree branching. So you have logarithmic time. (Logarithm function is inverse of exponent function)

Find top 10 integers among 100 different files [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am a fresher and preparing for interviews. In my recent interview I was asked a question, for which I couldn't find suitable answer.
I was given some 100 files, each file is containing large number of comma separated integers. I had to find the top 10 integers among the whole files. I tried to solve it using heap. But I got confused with the time complexity of the process. Any help will be appreciated, thanks.

I think you are on the right track with using a heap data structure.
You could process the files in parallel and for each file you could maintain a min-heap of size 10.
As you iterate through a file you insert a value into the min-heap until it is full (size 10) then for values in positions 11 through n
if current_value > min_heap.current()
min_heap.extract()
min_heap.insert(current_value)
You have to iterate through n values and the worst case scenario is if the file is sorted in ascending order. In that case you will have to extract the min value and insert a new value for all the values in positions 11 thru n. The heap operations will be O(log n) giving you an overall running time of O(n * log n) for each file.
At this point you have m (# of files) min-heaps each of size 10. Here you can use a final min heap to store the ten largest numbers contained in the m min-heaps. This computation will be O(m) because the all the heaps at this point will be of max size 10, a constant.
Overall the running time will be O(n * log n + m). m could be much smaller than n so amongst friends we could say O(n * log n).
Even if you don't do the first step in parallel it would be O(m * n * log n + m), but once again if n dominates m we could say O(n * log n).

Is it possible to compute the minimum of a set of numbers modulo a given number in amortized sublinear time?

Is there a data structure representing a large set S of (64-bit) integers, that starts out empty and supports the following two operations:
insert(s) inserts the number s into S;
minmod(m) returns the number s in S such that s mod m is minimal.
An example:
insert(11)
insert(15)
minmod(7) -> the answer is 15 (which mod 7 = 1)
insert(14)
minmod(7) -> the answer is 14 (which mod 7 = 0)
minmod(10) -> the answer is 11 (which mod 10 = 1)
I am interested in minimizing the maximal total time spent on a sequence of n such operations. It is obviously possible to just maintain a list of elements for S and iterate through them for every minmod operation; then insert is O(1) and minmod is O(|S|), which would take O(n^2) time for n operations (e.g., n/2 insert operations followed by n/2 minmod operations would take roughly n^2/4 operations).
So: is it possible to do better than O(n^2) for a sequence of n operations? Maybe O(n sqrt(n)) or O(n log(n))? If this is possible, then I would also be interested to know if there are data structures that additionally admit removing single elements from S, or removing all numbers within an interval.

Another idea based on balanced binary search tree, as in Keith's answer.
Suppose all inserted elements so far are stored in balanced BST, and we need to compute minmod(m). Consider our set S as a union of subsets of numbers, lying in intervals [0,m-1], [m, 2m-1], [2m, 3m-1] .. etc. The answer will obviously be among the minimal numbers we have in each of that intervals. So, we can consequently lookup the tree to find the minimal numbers of that intervals. It's easy to do, for example if we need to find the minimal number in [a,b], we'll move left if current value is greater than a, and right otherwise, keeping track of the minimal value in [a,b] we've met so far.
Now if we suppose that m is uniformly distributed in [1, 2^64], let's calculate the mathematical expectation of number of queries we'll need.
For all m in [2^63, 2^64-1] we'll need 2 queries. The probability of this is 1/2.
For all m in [2^62, 2^63-1] we'll need 4 queries. The probability of this is 1/4.
...
The mathematical expectation will be sum[ 1/(2^k) * 2^k ], for k in [1,64], which is 64 queries.
So, to sum up, the average minmod(m) query complexity will be O(64*logn). In general, if we m has unknown upper bound, this will be O(logmlogn). The BST update is, as known, O(logn), so the overall complexity in case of n queries will be O(nlogm*logn).

Partial answer too big for a comment.
Suppose you implement S as a balanced binary search tree.
When you seek S.minmod(m), naively you walk the tree and the cost is O(n^2).
However, at a given time during the walk, you have the best (lowest) result so far. You can use this to avoid checking whole sub-trees when:
bestSoFar < leftChild mod m
and
rightChild - leftChild < m - leftChild mod m
This will only help much if a common spacing b/w the numbers in the set is smaller than common values of m.
Update the next morning...
Grigor has better and more fully articulated my idea and shown how it works well for "large" m. He also shows how a "random" m is typically "large", so works well.
Grigor's algorithm is so efficient for large m that one needs to think about the risk for much smaller m.
So it is clear that you need to think about the distribution of m and optimise for different cases if need be.
For example, it might be worth simply keeping track of the minimal modulus for very small m.
But suppose m ~ 2^32? Then the search algorithm (certainly as given but also otherwise) needs to check 2^32 intervals, which may amount to searching the whole set anyway.

Why to consider binary search running time complexity as log2N

Can someone explain me when it comes to binary search we say the running time complexity is O(log n)? I searched it in Google and got the below,
"The number of times that you can halve the search space is the same as log2 n".
I know we do halve until we find the search key in the data structure, but why we have to consider it as log2 n? I understand that ex is exponential growth and so the log2 n is the binary decay. But I am unable to interpret the binary search in terms of my logarithm definition understanding.

Think of it like this:
If you can afford to half something m times, (i.e., you can afford to spend time proportional to m), then how large array can you afford to search?
Obviously arrays of size 2m, right?
So if you can search an array of size n = 2m, then the time it takes is proportional to m, and solving m for n look like this:
n = 2m
log2(n) = log2(2m)
log2(n) = m
Put another way: Performing a binary search on an array of size n = 2m takes time proportional to m, or equivalently, proportional to log2(n).

Binary search :-
lets take an example to solve the problem .
suppose we are having 'n' apples and every day half of the apples gets rotten . then after how many days the apple count will be '1'.
first day n apples : a a a a .... (total n)
second day : a a a a..a(total n/2)
third day : a a a .. a(total n/(2^2));
so onn..............
lets suppose after k days the apples left will be 1
i.e n/(2^k) should become 1 atlast
n/(2^k)=1;
2^k=n;
applying log to base 2 on both sides
k=log n;
in the same manner in binary search
firstly we are left with n elements
then n/2
then n/4
then n/8
so on
finally we are left with one ele
so time complexity is log n

These are all good answers, however I wish to clarify something that I did not consider before. We are asking how many operations does it take to get an array of size 1 from size n. The reason for this is that when the array size is 1, the only element in the array is the element which is to be found and the search operation can be terminated. In other words, when the array size becomes 1, the element that was searched is found.
The way binary search works is by halving the search space of the array and gradually focusing on the matching element. Let's say the size of array is n. Then, in m operations of halving the search space, the size of the array search space becomes n/2^m. When it becomes 1, we have found our element. So equate it to 1 and solve for m.
To summarize, m = log2(n) is the number of operations it would take for the binary search algorithm to reduce the search space from n to 1 and hence, find the element that is searched.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Binary Search vs. Sequential Search / Break-even Point [closed] - sorting

As a hint: the work done to sort n and then do k binary searches is given by n log n + k log n and the work required to do k sequential searches is n k If n = 1,000, for what value of k will the second quantity be smaller than the first? Hope this helps!

Related

how to find the maximum of every subset of a set with O(n log n) [closed]

Logarithms in Computer Science for Big O Notation? [closed]

Find top 10 integers among 100 different files [closed]

Is it possible to compute the minimum of a set of numbers modulo a given number in amortized sublinear time?

Why to consider binary search running time complexity as log2N

Categories

Resources