Is it O(L1 + L2) or O(max(L1, L2))? - data-structures

We have two lists of lengths L1 and L2. We have traversed both the lists one after the other. What will be the time complexity of overall operation?
Is it O(L1 + L2) or O(max(L1, L2))?
What is the difference between the two?

The first one O(L1 + L2) is appropriate. For instance, in graph algorithms that use V for the number of vertices and E for the number of edges, many operations are expressed in terms of O(V + E) such as a depth first search of the graph. Of course in this case, E may range from O(V) to O(V^2). If L1 and L2 are fixed in relation to each other, then O(max(L1, L2)) = O(L1) or O(L2) may be more appropriate.

There is no difference between the two. Without loss of generality, assume L1 = O(L2); if it's not, then L2 = O(L1) and you can just swap the symbols.
O(L1 + L2) = O(2*L2) = O(L2). Similarly, O(max(L1, L2)) = O(L2). So in both cases, the complexity is O(L2).

Related

calculate a running time of a function

I have trouble with coming up the running time of a function which calls other functions. For example, here is a function that convert the binary tree to a list:
(define (tree->list-1 tree)
(if (null? tree)
’()
(append (tree->list-1 (left-branch tree))
(cons (entry tree)
(tree->list-1 (right-branch tree))))))
The explanation is T(n) = 2*T(n/2) + O(n/2) because procedure append takes linear time. 
Solving above equation, we get T(n) = O(n * log n).
However, the cons is also a procedure that combines two element. In this case it goes though all the entry node, why don't we add another O(n) in the solution?
Thank you for any help.
Consider O(n^2) which is clearly quadratic.
Now consider O(n^2 + n), this still is quadratic, hence we can reduce this to O(n^2) as the + n is not significant (it does not change the "order of magnitude" (not sure this is the right term)).
The same applies here so we can reduce O([n*log(n)] + n) to O(n*log(n)). However we may not reduce this to O(log(n)) as this would be logarithmic, which is not.
If I understand correctly, you are asking about the difference between append and cons.
The time used by (cons a b) does not depend on the values of a and b. The call allocates some memory, tags it with a type tag ("pair") and stores pointers to the values a and b in the pair.
Compare this to (append xs ys). Here append needs to make a new list consisting of the elements in both xs and ys. This means that if xs is a list of n elements, then append needs to allocate n new pairs to hold the elements of xs.
In short: append needs to copy the elements in xs and thus the time is proportional to the length of xs. The function cons uses the time no matter what arguments it is called with.

The smallest free number - divide and conquer algorithm

I'm reading a book Pearls of Functional Algorithm Design. Tried implementing the divide and conquer solution for smallest free number problem.
minfree xs = minfrom 0 (length xs) xs
minfrom a 0 _ = a
minfrom a n xs = if m == b - a
then minfrom b (n-m) vs
else minfrom a (m) us
where b = a + 1 + (n `div` 2)
(us,vs) = partition (<b) xs
m = length us
But this one works no faster than the solution that one might call "naive" solution. Which is
import Data.List ((\\))
minfree' = head . (\\) [0..]
I don't know why this is like this, what's wrong with the divide and conquer algorithm and how to improve it.
Tried using BangPatterns, implementing the version of partition that also returns first list's length in the tuple, so it eliminates additional traversal for m =length us. None of them made improvement.
First one takes more than 5 seconds, whereas second one does it almost instantly in ghci on input [0..9999999].
You have pathological input on which head . (\\) [0..] performs in O(N) time. \\ is defined as follows:
(\\) = foldl (flip delete)
delete x xs is an O(N) operation that removes the first x from xs. foldl (flip delete) xs ys deletes all elements of ys from xs one by one.
In [0..] \\ [0..9999999], we always find the next element to be deleted at the head of the list, so the result can be evaluated in linear time.
If you instead type minfree' (reverse [0..9999999]) into GHCi, that takes quadratic time and you find that it pretty much never finishes.
The divide-and-conquer algorithm on the other hand would not slow down on the reversed input.

How can I improve the complexity of a function that sorts a list for each point on it?

The following function:
sortByDist :: (Ord a, Floating a, RealFrac a) => [V2 a] -> Map (V2 a) [V2 a]
sortByDist graph = Map.fromList $ map sort graph where
sort point = (point, sortBy (comparing (distance point)) graph)
Maps each point P on a list to a list of points ordered by their distance to P. So, for example, sortByDist [a, b, c, d] Map.! b is the list [b,a,c,d] if a is the nearest point to b, c is the 2nd nearest, d is the 3rd.
Since it performs a n * log n sort for each element, the complexity is n^2 * log n. This agrees with benchmarks of the time required to sort a list of N points:
points time
200 0m0.086s
400 0m0.389s
600 0m0.980s
800 0m1.838s
1000 0m2.994s
1200 0m4.350s
1400 0m6.477s
1600 0m8.726s
3200 0m39.216s
How much can this be improved theoretically? Is it possible to get it down to N * log N?
As luqui commented, using a quadtree or similar will probably help. Building the tree should take O(n log n): log n passes, each of them O(n) selection and partition. Once you have the tree, you can traverse it to build the lists. The difference between the lists for a node and its children should generally be small, and when some are large, that should tend to force others to be small. Using an adaptive sort (e.g., adaptive merge sort or adaptive splay sort) should thus give good performance, but analyzing the complexity will not be easy. If you want to try to get some sharing, you will have to represent the lists using a sequence type (e.g. Data.Sequence) and then try to figure out relationships between squares at various scales. I have serious doubts about the potential of such an approach to reduce time complexity.

Subsets with equal sum

I want to calculate how many pairs of disjoint subsets S1 and S2 (S1 U S2 may not be S) of a set S exists for which sum of elements in S1 = sum of elements in S2.
Say i have calculated all the subset sums for all the possible 2^n subsets.
How do i find how many disjoint subsets have equal sum.
For a sum value A, can we use the count of subsets having sum A/2 to solve this ?
As an example :
S ={1,2,3,4}
Various S1 and S2 sets possible are:
S1 = {1,2} and S2 = {3}
S1 = {1,3} and S2 = {4}
S1 = {1,4} nd S2 = {2,3}
Here is the link to the problem :
http://www.usaco.org/index.php?page=viewproblem2&cpid=139
[EDIT: Fixed stupid complexity mistakes. Thanks kash!]
Actually I believe you'll need to use the O(3^n) algorithm described here to answer this question -- the O(2^n) partitioning algorithm is only good enough to enumerate all pairs of disjoint subsets whose union is the entire ground set.
As described at the answer I linked to, for each element you are essentially deciding whether to:
Put it in the first set,
Put it in the second set, or
Ignore it.
Considering every possible way to do this generates a tree where each vertex has 3 children: hence O(3^n) time. One thing to note is that if you generate a solution (S1, S2) then you should not also count the solution (S2, S1): this can be achieved by always maintaining an asymmetry between the two sets as you build them up, e.g. enforcing that the smallest element in S1 must always be smaller than the smallest element in S2. (This asymmetry enforcement has the nice side-effect of halving the execution time :))
A speedup for a special (but perhaps common in practice) case
If you expect that there will be many small numbers in the set, there is another possible speedup available to you: First, sort all the numbers in the list in increasing order. Choose some maximum value m, the larger the better, but small enough that you can afford an m-size array of integers. We will now break the list of numbers into 2 parts that we will process separately: an initial list of numbers that sum to at most m (this list may be quite small), and the rest. Suppose the first k <= n numbers fit into the first list, and call this first list Sk. The rest of the original list we will call S'.
First, initialise a size-m array d[] of integers to all 0, and solve the problem for Sk as usual -- but instead of only recording the number of disjoint subsets having equal sums, increment d[abs(|Sk1| - |Sk2|)] for every pair of disjoint subsets Sk1 and Sk2 formed from these first k numbers. (Also increment d[0] to count the case when Sk1 = Sk2 = {}.) The idea is that after this first phase has finished, d[i] will record the number of ways that 2 disjoint subsets having a difference of i can be generated from the first k elements of S.
Second, process the remainder (S') as usual -- but instead of only recording the number of disjoint subsets having equal sums, whenever |S1'| - |S2'| <= m, add d[abs(|S1'| - |S2'|)] to the total number of solutions. This is because we know that there are that many ways of building a pair of disjoint subsets from the first k elements having this difference -- and for each of these subset pairs (Sk1, Sk2), we can add the smaller of Sk1 or Sk2 to the larger of S1' or S2', and the other one to the other one, to wind up with a pair of disjoint subsets having equal sum.
Here is a clojure solution.
It defines s to be a set of 1, 2, 3, 4
Then all-subsets is defined to be a list of all sets of size 1 - 3
Once all the subsets are defined, it looks at all pairs of subsets and selects only the pairs that are not equal, do not union to the original set, and whose sum is equal
(require 'clojure.set)
(use 'clojure.math.combinatorics)
(def s #{1, 2, 3, 4})
(def subsets (mapcat #(combinations s %) (take 3 (iterate inc 1))))
(for [x all-subsets y all-subsets
:when (and (= (reduce + x) (reduce + y))
(not= s (clojure.set/union (set x) (set y)))
(not= x y))]
[x y])
Produces the following:
([(3) (1 2)] [(4) (1 3)] [(1 2) (3)] [(1 3) (4)])

haskell: a data structure for storing ascending integers with a very fast lookup

(This question is related to my previous question, or rather to my answer to it.)
I want to store all qubes of natural numbers in a structure and look up specific integers to see if they are perfect cubes.
For example,
cubes = map (\x -> x*x*x) [1..]
is_cube n = n == (head $ dropWhile (<n) cubes)
It is much faster than calculating the cube root, but It has complexity of O(n^(1/3)) (am I right?).
I think, using a more complex data structure would be better.
For example, in C I could store a length of an already generated array (not list - for faster indexing) and do a binary search. It would be O(log n) with lower сoefficient than in another answer to that question. The problem is, I can't express it in Haskell (and I don't think I should).
Or I can use a hash function (like mod). But I think it would be much more memory consuming to have several lists (or a list of lists), and it won't lower the complexity of lookup (still O(n^(1/3))), only a coefficient.
I thought about a kind of a tree, but without any clever ideas (sadly I've never studied CS). I think, the fact that all integers are ascending will make my tree ill-balanced for lookups.
And I'm pretty sure this fact about ascending integers can be a great advantage for lookups, but I don't know how to use it properly (see my first solution which I can't express in Haskell).
Several comments:
If you have finitely many cubes, put them in Data.IntSet. Lookup is logarithmic time. Algorithm is based on Patricia trees and a paper by Gill and Okasaki.
If you have infinitely many cubes in a sorted list, you can do binary search. start at index 1 and you'll double it logarithmically many times until you get something large enough, then logarithmically many more steps to find your integer or rule it out. But unfortunately with lists, every lookup is proportional to the size of the index. And you can't create an infinite array with constant-time lookup.
With that background, I propose the following data structure:
A sorted list of sorted arrays of cubes. The array at position i contains exp(2,i) elements.
You then have a slightly more complicated form of binary search.
I'm not awake enough to do the analysis off the top of my head, but I believe this gets you to O((log n)^2) worst case.
You can do fibonacci-search (or any other you wuld like) over lazy infinite tree:
data Tree a = Empty
| Leaf a
| Node a (Tree a) (Tree a)
rollout Empty = []
rollout (Leaf a) = [a]
rollout (Node x a b) = rollout a ++ x : rollout b
cubes = backbone 1 2 where
backbone a b = Node (b*b*b) (sub a b) (backbone (b+1) (a+b))
sub a b | (a+1) == b = Leaf (a*a*a)
sub a b | a == b = Empty
sub a b = subBackbone a (a+1) b
subBackbone a b c | b >= c = sub a c
subBackbone a b c = Node (b*b*b) (sub a b) (subBackbone (b+1) (a+b) c)
is_cube n = go cubes where
go Empty = False
go (Leaf x) = (x == n)
go (Node x a b) = case (compare n x) of
EQ -> True
LT -> go a
GT -> go b

Resources