Better than O(log(N)) base 2 - algorithm

I am solving Segment tree and Quad Tree related problems; while I noticed that in segment tree we split the 1D array into 2 (2^1) segments and recursively do this until base case arrives. Similarly, in Quad tree We subdivide the 2D grid into 4 (2^2) segments in every step. All these divide-and-Conquer mechanism is for achieving logarithmic time complexity. No offense!
But why don't we subdivide the array into 4 (4^1) parts or more instead of 2 parts in segment tree? And why we don't split the grid into 16 (4^2) parts instead of 4? By doing these, We can achieve O(log(N)) performance, but it would be a better log as log(N)(base 4) is better than log(N)(base 2).
I know in this case, the implementation would be little bit difficult. Is there a memory overhead problem? Or anything?
Please correct me if I am wrong anywhere. Thanks!

It wouldn't actually work faster. Let's assume that we divided it into 4 parts. Then we would have to merge 4 values instead of 2 in each node to answer the query. Assuming that merging 4 values takes 3 times longer(for example, to get the maximum of 2 numbers we need 1 call to max function, but to get the maximum of 4 values 3 calls are required), we have log4(n) * 3 > log2(n) * 1. Moreover, it would be harder to implement(more cases to be considered and so on).

log 4 (N) = log 2 (N) / log 2 (4) = log 2 (N) / 2
in general,the time complexity are both O(logn) , while four segments is much harder than two segments to maintain. In fact ,(In acm/icpc) two segments are much easy to code and it is suffcient to work.

Related

Optimal k-way merge pattern

I need to merge n sorted fixed record files of different sizes using k simultaneous consumers, where k<n. Because k is (possibly a lot) smaller than n, the merge will be done in a number of iterations/steps. The challenge is to pick at each step the right files to merge.
Because the files can differ wildly in size, a simple greedy approach of using all k consumers at each step can be very suboptimal.
An simple example makes this clear. Consider the case of 4 files with 1, 1, 10 and 10 records respectively and 3 consumers. We need two merge steps to merge all files. Start with 3 consumers in the first step. The merge sequence ((1,1,10),10) leads to 12 read/write operations in (inner) step 1 and 22 operations in (outer) step 2, making a total of 34 ops. The sequence (1,(1,10,10)) is even worse with 21+22=43 ops. By contrast, if we use only 2 consumers in the first step and 3 in the second step, the merge pattern ((1,1),10,10) takes only 2+22=24 ops. Here our restraint pays off handsomely.
My solution for picking the right number of consumers at each step is the following. All possible merge states can be ordered into a directed graph (which is a lattice I suppose) with the number of ops to move from one state to another attached to each edge as the cost. I can then use a shortest path algorithm to determine the optimal sequence.
The problem with this solution is that the amount of nodes explodes, even with a modest number of files (say hundreds) and even after applying some sensible constraints (like sorting the files on size and allowing only merges of the top 2..k of this list). Moreover, I cannot shake the feeling that there might be an "analytical" solution to this problem, or at least a simple heuristic that comes very close to optimality.
Any thoughts would be appreciated.
May I present it another way:
The traditionnal merge sort complexity is o( n.ln(n)) but in my case with different sublist size, in the worst case if one file is big and all the other are small (that's the example you give) the complexity may be o( n.n ) : which is a bad performance complexity.
The question is "how to schedule the subsort in an optimal way"?
Precompute the graph of all executions is really too big, in the worst case it can be as big as the data you sort.
My proposition is to compute it "on the fly" and let it be not optimal but at least avoid the worse case.
My first naive impression was simply sort the files by sizes and begin with the smaller ones: this way you will privilege the elimination of small files during iterations.
I have K=2:
in your example 1 1 10 10 -> 2 20 -> 22 : It is still (20 + 2) + 22 CC so 42 CC*
CC: Comparison or copy: this is the ops I count for a complexity of 1.
If I have K=1 and reinject the result in my sorted file Array I get:
(1 1 10 10) -> 2 10 10 -> 12 10 -> (22) : 2 CC + 12 + 22 = 46
For different value of K the complexity vary slightly
Compute the complexity of this algorithm in the mean case with probability would be very interresting, but if you can accept some N² execution for bad cases.
PS:
The fact that k<n is another problem: it will be simply resolved by adding a worker per couple of files to a queue (n/2 workers at the beginning), and making the queue read by the k Threads.
Firstly an alternative algorithm
read all record keys (N reads) with a fileid
sort them
read all files and place the records in the final position according to the sorted key (N R/W)
might be a problem if your filesystem can't handle N+1 open files or if your random file access is slow for either read or write. i.e. either the random read or random write will be faster.
Advantage is only N*2 reads and N writes.
Back to your algorithm
Does it pay to merge the large files with small files at a random point in the merging? No
E.g. (1,1,10,10) -> ((1,10),(1,10)) [2*11 ops] -> (11,11) [22 ops] sum 44. ((1,1),10,10) is only 24.
Merging large and small files cause the content of the large files to be R/W an extra time.
Does it pay to merge the large files first? no
E.g (1,10,10,10) -> (1,10,(10,10)) 20+31 ops vs. ((1,10),10,10) 11+31 ops
again we get a penalty for doing the ops on the large file multiple times.
Does it ever pay to merge less than K files at the last merge? yes
e.g. (1,2,3,4,5,6) -> (((1,2),3,4),5,6) 3+10+21 vs ((1,2,3),(4,5,6)) 6+15+21
again merging the largest files more time is a bad idea
Does it pay to merge less than K files, except at the first merge? yes
e.g. !1 (1,2,3,4,5,6) -> (((1,2),3,4),5,6) 3+10+21=34 vs (((1,2,3),4),5,6)) 6+10+21=37
the size 3 file gets copied an extra time
e.g. #2 (((1,1),10),100,100). Here we use k=2 in the first two steps, taking 2+12+212=226 ops. The alternative ((1,1),10,100),100) that uses k=3 in the second step is 2+112+212=326 ops
New heuristic
while #files is larger than 1
sum size of smallest files until K or next larger file is greater than the sum.
K-merge these
ToDo make proof that the sum of additions in this case will be smaller than all other methods.

How to display all ways to give change

As far as I know, counting every way to give change to a set sum and a starting till configuration is a classic Dynamic Programming problem.
I was wondering if there was a way to also display (or store) the actual change structures that could possibly amount to the given sum while preserving the DP complexity.
I have never saw this issue being discussed and I would like some pointers or a brief explanation of how this can be done or why this cannot be done.
DP for change problem has time complexity O(Sum * ValuesCount) and storage complexity O(Sum).
You can prepare extra data for this problem in the same time as DP for change, but you need more storage O(O(Sum*ValuesCount), and a lot of time for output of all variants O(ChangeWaysCount).
To prepare data for way recovery, make the second array B of arrays (or lists). When you incrementing count array A element from some previous element, add used value to corresponding element of B. At the end, unwind all the ways from the last element.
Example: values 1,2,3, sum 4
index 0 1 2 3 4
A 0 1 2 3 4
B - 1 1 2 1 2 3 1 2 3
We start unwinding from B[4] elements:
1-1-1-1 (B[4]-B[3]-B[2]-B[1])
2-1-1 (B[4]-B[2]-B[1])
2-2 (B[4]-B[2])
3-1 (B[4]-B[1])
Note that I have used only ways with non-increasing values to avoid permutation variants (i.e. 1-3 and 3-1)

Touching segments

Can anyone please suggest me algorithm for this.
You are given starting and the ending points of N segments over the x-axis.
How many of these segments can be touched, even on their edges, by exactly two lines perpendicular to them?
Sample Input :
3
5
2 3
1 3
1 5
3 4
4 5
5
1 2
1 3
2 3
1 4
1 5
3
1 2
3 4
5 6
Sample Output :
Case 1: 5
Case 2: 5
Case 3: 2
Explanation :
Case 1: We will draw two lines (parallel to Y-axis) crossing X-axis at point 2 and 4. These two lines will touch all the five segments.
Case 2: We can touch all the points even with one line crossing X-axis at 2.
Case 3: It is not possible to touch more than two points in this case.
Constraints:
1 ≤ N ≤ 10^5
0 ≤ a < b ≤ 10^9
Let assume that we have a data structure that supports the following operations efficiently:
Add a segment.
Delete a segment.
Return the maximum number of segments that cover one point(that is, the "best" point).
If have such a structure, we can get use the initial problem efficiently in the following manner:
Let's create an array of events(one event for the start of each segment and one for the end) and sort by the x-coordinate.
Add all segments to the magical data structure.
Iterate over all events and do the following: when a segment start, add one to the number of currently covered segments and remove it from that data structure. When a segment ends, subtract one from the number of currently covered segment and add this segment to the magical data structure. After each event, update the answer with the value of the number of currently covered segments(it shows how many segments are covered by the point which corresponds to the current event) plus the maximum returned by the data structure described above(it shows how we can choose another point in the best possible way).
If this data structure can perform all given operations in O(log n), then we have an O(n log n) solution(we sort the events and make one pass over the sorted array making a constant number of queries to this data structure for each event).
So how can we implement this data structure? Well, a segment tree works fine here. Adding a segment is adding one to a specific range. Removing a segment is subtracting one from all elements in a specific range. Get ting the maximum is just a standard maximum operation on a segment tree. So we need a segment tree that supports two operations: add a constant to a range and get maximum for the entire tree. It can be done in O(log n) time per query.
One more note: a standard segment tree requires coordinates to be small. We may assume that they never exceed 2 * n(if it is not the case, we can compress them).
An O(N*max(logN, M)) solution, where M is the medium segment size, implemented in Common Lisp: touching-segments.lisp.
The idea is to first calculate from left to right at every interesting point the number of segments that would be touched by a line there (open-left-to-right on the lisp code). Cost: O(NlogN)
Then, from right to left it calculates, again at every interesting point P, the best location for a line considering segments fully to the right of P (open-right-to-left on the lisp code). Cost O(N*max(logN, M))
Then it is just a matter of looking for the point where the sum of both values tops. Cost O(N).
The code is barely tested and may contain bugs. Also, I have not bothered to handle edge cases as when the number of segments is zero.
The problem can be solved in O(Nlog(N)) time per test case.
Observe that there is an optimal placement of two vertical lines each of which go through some segment endpoints
Compress segments' coordinates. More info at What is coordinate compression?
Build a sorted set of segment endpoints X
Sort segments [a_i,b_i] by a_i
Let Q be a priority queue which stores right endpoints of segments processed so far
Let T be a max interval tree built over x-coordinates. Some useful reading atWhat are some sources (books, etc.) from where I can learn about Interval, Segment, Range trees?
For each segment make [a_i,b_i]-range increment-by-1 query to T. It allows to find maximum number of segments covering some x in [a,b]
Iterate over elements x of X. For each x process segments (not already processed) with x >= a_i. The processing includes pushing b_i to Q and making [a_i,b_i]-range increment-by-(-1) query to T. After removing from Q all elements < x, A= Q.size is equal to number of segments covering x. B = T.rmq(x + 1, M) returns maximum number of segments that do not cover x and cover some fixed y > x. A + B is a candidate for an answer.
Source:
http://www.quora.com/What-are-the-intended-solutions-for-the-Touching-segments-and-the-Smallest-String-and-Regex-problems-from-the-Cisco-Software-Challenge-held-on-Hackerrank

Removing elements to sort array

I'm looking for an algorithm to sort an array, but not by moving the values. Rather, I'd like to delete as few values as possible and end up with a sorted list. Basically I want to find the longest ascending sub-array.
To illustrate:
1 4 5 6 7 2 3 8
Should become (2 removes)
1 4 5 6 7 8
And not (5 removes)
1 2 3
I can see how I can do this in a naive way, i.e. by recursively checking both the 'remove' and 'dont remove' tree for each element. I was just wondering if there was a faster / more efficient way to do this. Is there a common go-to algorithm for this kind of problem?
You're looking for the longest increasing subsequence problem. There is an algorithm that solves it in O(n log n) time.
There is one O(NlogN) algorithm from the site which is faster than the recursive algorithm .
http://www.algorithmist.com/index.php/Longest_Increasing_Subsequence

Sort numbers by sum algorithm

I have a language-agnostic question about an algorithm.
This comes from a (probably simple) programming challenge I read. The problem is, I'm too stupid to figure it out, and curious enough that it is bugging me.
The goal is to sort a list of integers to ascending order by swapping the positions of numbers in the list. Each time you swap two numbers, you have to add their sum to a running total. The challenge is to produce the sorted list with the smallest possible running total.
Examples:
3 2 1 - 4
1 8 9 7 6 - 41
8 4 5 3 2 7 - 34
Though you are free to just give the answer if you want, if you'd rather offer a "hint" in the right direction (if such a thing is possible), I would prefer that.
Only read the first two paragraph is you just want a hint. There is a an efficient solution to this (unless I made a mistake of course). First sort the list. Now we can write the original list as a list of products of disjoint cycles.
For example 5,3,4,2,1 has two cycles, (5,1) and (3,4,2). The cycle can be thought of as starting at 3, 4 is in 3's spot, 2 is in 4's spot, and 4 is in 3's. spot. The end goal is 1,2,3,4,5 or (1)(2)(3)(4)(5), five disjoint cycles.
If we switch two elements from different cycles, say 1 and 3 then we get: 5,1,4,2,3 and in cycle notation (1,5,3,4,2). The two cycles are joined into one cycle, this is the opposite of what we want to do.
If we switch two elements from the same cycle, say 3 and 4 then we get: 5,4,3,2,1 in cycle notation (5,1)(2,4)(3). The one cycle is split into two smaller cycles. This gets us closer to the goal of all cycles of length 1. Notice that any switch of two elements in the same cycle splits the cycle into two cycles.
If we can figure out the optimal algorithm for switching one cycle we can apply that for all cycles and get an optimal algorithm for the entire sort. One algorithm is to take the minimum element in the cycle and switch it with the the whose position it is in. So for (3,4,2) we would switch 2 with 4. This leaves us with a cycle of length 1 (the element just switched into the correct position) and a cycle of size one smaller than before. We can then apply the rule again. This algorithm switches the smallest element cycle length -1 times and every other element once.
To transform a cycle of length n into cycles of length 1 takes n - 1 operations. Each element must be operated on at least once (think about each element to be sorted, it has to be moved to its correct position). The algorithm I proposed operates on each element once, which all algorithms must do, then every other operation was done on the minimal element. No algorithm can do better.
This algorithm takes O(n log n) to sort then O(n) to mess with cycles. Solving one cycle takes O(cycle length), the total length of all cycles is n so cost of the cycle operations is O(n). The final run time is O(n log n).
I'm assuming memory is free and you can simulate the sort before performing it on the real objects.
One approach (that is likely not the fastest) is to maintain a priority queue. Each node in the queue is keyed by the swap cost to get there and it contains the current item ordering and the sequence of steps to achieve that ordering. For example, initially it would contain a 0-cost node with the original data ordering and no steps.
Run a loop that dequeues the lowest-cost queue item, and enqueues all possible single-swap steps starting at that point. Keep running the loop until the head of the queue has a sorted list.
I did a few attempts at solving one of the examples by hand:
1 8 9 7 6
6 8 9 7 1 (+6+1=7)
6 8 1 7 9 (7+1+9=17)
6 8 7 1 9 (17+1+7=25)
6 1 7 8 9 (25+1+8=34)
1 6 7 8 9 (34+1+6=41)
Since you needed to displace the 1, it seems that you may have to do an exhaustive search to complete the problem - the details of which were already posted by another user. Note that you will encounter problems if the dataset is large when doing this method.
If the problem allows for "close" answers, you can simply make a greedy algorithm that puts the largest item into position - either doing so directly, or by swapping the smallest element into that slot first.
Comparisons and traversals apparently come for free, you can pre-calculate the "distance" a number must travel (and effectively the final sort order). The puzzle is the swap algorithm.
Minimizing overall swaps is obviously important.
Minimizing swaps of larger numbers is also important.
I'm pretty sure an optimal swap process cannot be guaranteed by evaluating each ordering in a stateless fashion, although you might frequently come close (not the challenge).
I think there is no trivial solution to this problem, and my approach is likely no better than the priority queue approach.
Find the smallest number, N.
Any pairs of numbers that occupy each others' desired locations should be swapped, except for N.
Assemble (by brute force) a collection of every set of numbers that can be mutually swapped into their desired locations, such that the cost of sorting the set amongst itself is less than the cost of swapping every element of the set with N.
These sets will comprise a number of cycles. Swap within those cycles in such a way that the smallest number is swapped twice.
Swap all remaining numbers, which comprise a cycle including N, using N as a placeholder.
As a hint, this reeks of dynamic programming; that might not be precise enough a hint to help, but I'd rather start with too little!
You are charged by the number of swaps, not by the number of comparisons. Nor did you mention being charged for keeping other records.

Resources