How many larger items in top and middle levels of heap - algorithm

I'm looking in a book I have for algorithms and came across this problem which I'm having trouble understanding. Using a max heap:
LARGE is the bigger half of numbers to be sorted. I.E. LARGE = { n/2, n/2 + 1, ... n }
Initially, how many items in LARGE can be within the top (log n)/4 levels of the heap, after build heap phase of HeapSort?
Let A be the items in LARGE which are neither in the bottom level nor within the top (log n)/4 levels initially. At least how many items are in A?
As found out in a previous answer, the number of elements in LARGE that can initially be on the bottom level is 2^(h-1). My theory is that if I find the ones in the top log n / 4 levels, I can subtract the bottom from the top and find the middle ones. Not sure how to go about finding the top levels though. I can assume n is large enough to make the quarter of levels even, i.e. 8 levels / 4 = 2 levels for each quarter.

Related

How many comparisons a call to removeMin() will make in max heap of 7-ary tree?

Assume that a max heap with 10^6 elements is stored in a complete 7-ary tree. Approximately how many comparisons a call to removeMin() will make?
5000
50
10^6
500
5
My solution: The number of comparisons should be equal to the number of leaf nodes at most because in max heap, the min. can be found at any of the leaf nodes which is not in the above options. Better approach was to take the square of ( log of 10^6 to the base 7) which gives 50 but this is only when we are sure that the minimum element will follow a single branch across tree which in the case of max heap is not correct.
I hope that you can help.
There's no "natural" way to remove the minimum value from a max heap. You simply have to look at all the leaf nodes to figure out which one happens to be the minimum.
The question then is how many leaf nodes there are. Intuitively, we'd expect the fraction of nodes in the heap that are leaves to be pretty close to the total number of nodes. Take it to the limit - if you have a 1,000,000-ary heap, you'd have one node in the top layer and all remaining 999,999 elements in the next layer. Even in the smallest case where the heap is a binary heap, you'd expect roughly half the elements to be in the bottom layer.
More specifically, let's do some math! How many leaves will a 7-ary heap with n nodes have? Well, each node in the tree will either
be a leaf, or
have seven children,
with one possible exception that, since the bottommost row might not be full, there might be one node with fewer than seven children. Since that's just a one-off, we can ignore that last node when we're dealing with millions of elements. A quick proof by induction can be used to show that any tree where each node either has no children or seven children will have seven times as many leaf nodes as internal nodes (prove this!), so we'd expect than (7/8)ths of the nodes will be leaves, for a total of 875,000 leaves to check.
As a result, the best answer here would be roughly 106 comparisons.
Min element can be any of the leaves of a max heap or any type, and there's no order there. All elements from A[10^6/7 + 1] onwards (where A is the array storing the leaves) are leaf nodes and need to be checked. This means 8571412 comparisons just to find the minimum. After that there is no simple way to 'remove' the minimum without introducing a gap that cannot be filled by simply shifting the leaves.
This is a misprint. Maybe the teacher wanted to ask removeMax, for which the answer is close to 50 -- see below:
There are 7 comparisons per level done by the heapify since each node has 7 children. If h is the height of the heap then that's 7*h comparisons.
Rough analysis: (here ~ means approximately)
h ~ log_7(10^6) = 7.1, hence total comparisons 7*7.1 ~ 50
More accurate analysis:
A 7-ary heap would have elements: 1 + 7 + 7^2 + ... + 7^h = 10^6
On the left side is a geometric series, that sums to: (7^h -1)/6 = 10^6
=> 7^h = 6*10^6 + 1
=> h = lg_7(6*10^6 + 1) = 8 (approximately) , hence 7*8 = 56, still from the options 50 is the closest.
*A is array to sort heap.

Why does it require no more than 1 + logN compares when inserting a new node into a heap?

I think when inserting a new node into a heap, the amount of nodes it might passes by is logN, why is it (1 + logN), where is 1 from?
This is necessary to account for the border case when the number of notes is 2n. A heap of n levels fits 2n-1 objects, so adding one more object starts the new level:
Black squares represent seven elements of a three-level heap. Red element is number eight. If your search takes you to the location of this last element, you end up with four comparisons, even though log28 is three.

Touching segments

Can anyone please suggest me algorithm for this.
You are given starting and the ending points of N segments over the x-axis.
How many of these segments can be touched, even on their edges, by exactly two lines perpendicular to them?
Sample Input :
3
5
2 3
1 3
1 5
3 4
4 5
5
1 2
1 3
2 3
1 4
1 5
3
1 2
3 4
5 6
Sample Output :
Case 1: 5
Case 2: 5
Case 3: 2
Explanation :
Case 1: We will draw two lines (parallel to Y-axis) crossing X-axis at point 2 and 4. These two lines will touch all the five segments.
Case 2: We can touch all the points even with one line crossing X-axis at 2.
Case 3: It is not possible to touch more than two points in this case.
Constraints:
1 ≤ N ≤ 10^5
0 ≤ a < b ≤ 10^9
Let assume that we have a data structure that supports the following operations efficiently:
Add a segment.
Delete a segment.
Return the maximum number of segments that cover one point(that is, the "best" point).
If have such a structure, we can get use the initial problem efficiently in the following manner:
Let's create an array of events(one event for the start of each segment and one for the end) and sort by the x-coordinate.
Add all segments to the magical data structure.
Iterate over all events and do the following: when a segment start, add one to the number of currently covered segments and remove it from that data structure. When a segment ends, subtract one from the number of currently covered segment and add this segment to the magical data structure. After each event, update the answer with the value of the number of currently covered segments(it shows how many segments are covered by the point which corresponds to the current event) plus the maximum returned by the data structure described above(it shows how we can choose another point in the best possible way).
If this data structure can perform all given operations in O(log n), then we have an O(n log n) solution(we sort the events and make one pass over the sorted array making a constant number of queries to this data structure for each event).
So how can we implement this data structure? Well, a segment tree works fine here. Adding a segment is adding one to a specific range. Removing a segment is subtracting one from all elements in a specific range. Get ting the maximum is just a standard maximum operation on a segment tree. So we need a segment tree that supports two operations: add a constant to a range and get maximum for the entire tree. It can be done in O(log n) time per query.
One more note: a standard segment tree requires coordinates to be small. We may assume that they never exceed 2 * n(if it is not the case, we can compress them).
An O(N*max(logN, M)) solution, where M is the medium segment size, implemented in Common Lisp: touching-segments.lisp.
The idea is to first calculate from left to right at every interesting point the number of segments that would be touched by a line there (open-left-to-right on the lisp code). Cost: O(NlogN)
Then, from right to left it calculates, again at every interesting point P, the best location for a line considering segments fully to the right of P (open-right-to-left on the lisp code). Cost O(N*max(logN, M))
Then it is just a matter of looking for the point where the sum of both values tops. Cost O(N).
The code is barely tested and may contain bugs. Also, I have not bothered to handle edge cases as when the number of segments is zero.
The problem can be solved in O(Nlog(N)) time per test case.
Observe that there is an optimal placement of two vertical lines each of which go through some segment endpoints
Compress segments' coordinates. More info at What is coordinate compression?
Build a sorted set of segment endpoints X
Sort segments [a_i,b_i] by a_i
Let Q be a priority queue which stores right endpoints of segments processed so far
Let T be a max interval tree built over x-coordinates. Some useful reading atWhat are some sources (books, etc.) from where I can learn about Interval, Segment, Range trees?
For each segment make [a_i,b_i]-range increment-by-1 query to T. It allows to find maximum number of segments covering some x in [a,b]
Iterate over elements x of X. For each x process segments (not already processed) with x >= a_i. The processing includes pushing b_i to Q and making [a_i,b_i]-range increment-by-(-1) query to T. After removing from Q all elements < x, A= Q.size is equal to number of segments covering x. B = T.rmq(x + 1, M) returns maximum number of segments that do not cover x and cover some fixed y > x. A + B is a candidate for an answer.
Source:
http://www.quora.com/What-are-the-intended-solutions-for-the-Touching-segments-and-the-Smallest-String-and-Regex-problems-from-the-Cisco-Software-Challenge-held-on-Hackerrank

Why not use alternate elements as vertical search nodes in Skip Lists?

In most implementations of skip lists I've seen, they use a randomized algorithm to decide if an element must be copied into the upper level.
But I think using odd indexed elements at each level to have copies in the upper level will give us logarithmic search complexity. Why isn't this used?
E.g. :
Data : 1 2 3 4 5 6 7 8 9
Skip List:
1--------------------
1--------------------9
1--------5----------9
1----3---5----7----9
1-2-3-4-5-6-7-8-9
It is not used because it is easy to maintain while building the list from scratch - but what happens when you add/remove element to an existing list? Elements that used to be odd indexed are even indexed now, and vise versa.
In your example, assume you now add 3.5, all to maintain the DS as you described, it will require O(k + k/2 + k/4 + ... ) changes to the DS, where k is the number of elements AFTER the element you have just inserted.
This basically gives you O(n/2 + n/4 + ...) = O(n) add/remove complexity on average.
Because if you start deleting nodes or inserting nodes in the middle the structure quickly requires rebalancing or it loses its logarithmic guarantees on access and update.
Actually there is a structure very similar to what you suggest, an interval tree, which gets around the update problem by not using actual elements as intermediate node labels. It can also require some care to get good performance.

Algorithm to optimally group list of values

I have several numbers. I need to group them in several groups, so that sums of all numbers in one group are between predefined min and max. The point is to left as few numbers ungrouped as possible.
Input:
min, max: range for sum of numbers
N1, N2, N3 ... Ni: numbers to group
Output:
[N1,N3,N5],[Ni,Nj,Nk,Nm...]...: groups where sum of numbers is between min and max
Na,Nb,Nc...: numbers, left ingrouped.
This problem could be viewed as bin packing into bins of size max, with a funny objective: minimize the number of items not packed into bins holding at least min. One idea from the bin-packing literature is that the "small" items (in this case, items that are small relative to max - min) are easy to pack but are accountable for most of the combinatorial explosion of possibilities. Thus some approximation algorithms for bin packing do something clever for big items and then fill in with the small. Another way to reduce the number of possibilities is to round the numbers to belong to a smaller set. It's somewhat obvious how to do that for bin packing (round up), but it's not clear what to do for this problem.
Okay, I'll give an example of how these ideas could be instantiated. Suppose that max = 1 and min = 1/2. Let's try to find a solution that's competitive with the optimum for when max = 2 and min = 1/2. (That may sound terrible, but this sort of approximation guarantee where OPT is held to higher standards is sometimes used in the literature.)
First round every item's size up to a power of 2. Very large items, of size 4 or greater, can't be packed. Large items, of size 2 or 1 or 1/2, are given their own bins. Small items, of size 1/4 or less, are dealt with as follows. Whenever two items of size 1/4 or less have the same size, combine them into one super-item. Pack all of the new items of size 1/2 into their own bins. The remainder has total size less than 1/2. If there is space in another bin, put them there. Otherwise, give them their own bin.
The quality of the resulting solution for max = 2 is at least as good as the quality of OPT for max = 1. Take the optimal solution for max = 1 and round the item sizes. The set of bad bins remains the same, because no item is smaller, and each bin stores less than 2 because each item is less than twice as large as it used to be. Now it suffices to show that the packing algorithm I gave for powers of 2 is optimal. I'll leave that as an exercise.
I don't expect this instantly to generalize into a full algorithm. I have to get back to work, but the approach I would take would be to force OPT to deal with max = 1 while ALG gets to use max = 1 + epsilon, substitute powers of (1 + epsilon) for powers of two in the rounding step, and then figure out how to pack the small items, probably using a dynamic program since greed likely won't work.
If you're not worried about efficiency, simply generate each possible grouping and choose the one that is correct and optimal in the sense you describe. Clearly, this works for any finite list of numbers (and is, by definition, optimal).
If efficiency is desired, the problem seems to become somewhat more difficult. :D I'll keep thinking.
EDIT: Come to think of it, this problem seems at least as hard as "subset sum" and, as such, I don't think there is a solution significantly better than the one I give (i.e., no known polynomial-time algorithm can solve it, if it is NP-Hard.

Resources