Performance of list concatenation implemented as a left fold - performance

Consider list concatenation implemented as a fold left, that is, foldl (++) [].
What is the complexity of this implementation in a lazy-evaluated language, such as Haskell?
I understand that in a strict language the performance is quadratic in the total number of elements, but what happens when laziness is involved?
I've tried to evaluate by hand the expression ([1,2,3] ++ [4,5,6]) ++ [7,8,9] (which corresponds to foldl (++) [] [[1,2,3], [4,5,6], [7,8,9]])
and it seems that we traverse only once each element, but I'm not sure whether my reasoning is correct:
([1,2,3] ++ [4,5,6]) ++ [7,8,9]
= { rewrite expression in prefix notation }
(++) ((++) [1,2,3] [4,5,6]) [7,8,9]
= { the first (++) operation needs to pattern match on its first argument; so it evaluates the first argument, which pattern matches on [1,2,3] }
(++) (case [1,2,3] of {[] -> [4,5,6]; x:xs' -> x:(++) xs' [4,5,6]}) [7,8,9]
= { x = 1, xs' = [2,3] }
(++) (1:(++) [2,3] [4,5,6]) [7,8,9]
= { the first (++) operation can now pattern match on its first argument }
1:([2,3] ++ [4,5,6]) ++ [7,8,9]
I have assumed the following implementation of (++):
(++) :: [a] -> [a] -> [a]
xs ++ ys = case xs of [] -> ys
(x:xs') -> x : (xs' ++ ys)

Let's say we have ([1,2,3]++[4,5,6])++[7,8,9]
([1,2,3]++[4,5,6])++[7,8,9]
(1:([2,3]++[4,5,6))++[7,8,9]
1:(([2,3]++[4,5,6])++[7,8,9])
1:((2:([3]++[4,5,6])++[7,8,9])
1:2:(([3]++[4,5,6])++[7,8,9])
1:2:(3:([]++[4,5,6])++[7,8,9])
1:2:3:(([]++[4,5,6])++[7,8,9])
1:2:3:([4,5,6]++[7,8,9])
1:2:3:4:([5,6] ++ [7,8,9])
1:2:3:4:5:([6] ++ [7,8,9])
1:2:3:4:5:6:([] ++ [7,8,9])
1:2:3:4:5:6:[7,8,9]
[1,2,3,4,5,6,7,8,9]
Notice how every element in the first list had to be moved twice? That's because it was two from the end. In general if we have (((a1++a2)++a3)++..an) each element in the list ai will have to be moved n-i times.
So, if you want the whole list, it's quadratic. If you want the first element, and you have n lists, it's n-1* operations (we need to do the step-case of ++ n times). If you want the ith element, it's the number of operations for all the elements before it, plus k-1, where it was in the kth list, counting from the end.
*Plus the n operations from foldl itself, if we want to be pedantic

Related

Finding a "Count Sequence"

Given a list of integers xs, let:
count :: [Integer] -> Integer -> Integer
count xs n = length . filter (==n) $ xs
count the number of times the integer n occurs in the list.
Now, given a "list" (some sort of array of integers, can be something besides a List) of length n, write a function
countSequence :: [Integer] -> Integer -> Integer -> Integer
countSequence xs n m = [count xs x | x <- [0..m]]
that outputs the "list of counts" (0th index contains number of times 0 occurs in the list, 1st index contains number of times 1 occurs in the list, etc) that has time compleity o(m*n)
The above implementation I've given has complexity O(m*n). In Python (which I'm more familiar with), it's easy to do this in O(m + n) time --- iterate through the list, and each element increment a counter in some other list, which is initialized to be all zeros and length (m+1).
How could I get a better implementation in Haskell? I'd prefer if it wasn't some trivial way to implement the Python solution (such as adding another input to the function to keep the "list of counts" in and then interating through it).
In O(n+m) (sort of, I think, maybe):
import Data.Ix (inRange)
import qualified Data.IntMap.Strict as IM
countSequence m =
foldl' count IM.empty . filter (inRange (0,m))
where count a b = IM.insertWith (+) b 1 a
gives
> countSequence 2 [1,2,3,1,2,-1]
fromList [(1,2),(2,2)]
I haven't used n because you also didn't use n and I'm not sure what it's supposed to be. I also moved the list to the last argument to put it in a position to be eta reduced.
I think you should use your Python intuition -- iterate through the one list and increment a counter in another list. Here's an implementation with O(n+m) runtime:
import Data.Array
countSequence xs m = accumArray (+) 0 (0,m) [(x, 1) | x <- xs, inRange (0,m) x]
(This use case is even the motivating example for the existence of accumArray in the documentation!) In ghci:
> countSequence ([1..5] ++ [1,3..5] ++ [1,4..5] ++ [1,5]) 3
array (0,3) [(0,0),(1,4),(2,1),(3,2)]
I guess using Data.IntMap would be as efficient as it gets for this job. One foldr pass is done to establish the IntMap (cm) and a map to construct a new list holding the counts of elements at corresponding positions.
import qualified Data.IntMap.Lazy as IM
countSequence :: [Int] -> [Int]
countSequence xs = map (\x -> let cm = foldr (\x m -> IM.alter (\mx -> if mx == Nothing then Just 1 else fmap (+1) mx) x m) IM.empty xs
in IM.findWithDefault 0 x cm) xs
*Main> countSequence [1,2,5,1,3,7,8,5,6,4,1,2,3,7,9,3,4,8]
[3,2,2,3,3,2,2,2,1,2,3,2,3,2,1,3,2,2]
*Main> countSequence [4,5,4]
[2,1,2]
*Main> *Main> countSequence [9,8,7,6,5]
[1,1,1,1,1]

Haskell Recursion - finding largest difference between numbers in list

Here's the problem at hand: I need to find the largest difference between adjacent numbers in a list using recursion. Take the following list for example: [1,2,5,6,7,9]. The largest difference between two adjacent numbers is 3 (between 2 and 5).
I know that recursion may not be the best solution, but I'm trying to improve my ability to use recursion in Haskell.
Here's the current code I currently have:
largestDiff (x:y:xs) = if (length (y:xs) > 1) then max((x-y), largestDiff (y:xs)) else 0
Basically - the list will keep getting shorter until it reaches 1 (i.e. no more numbers can be compared, then it returns 0). As 0 passes up the call stack, the max function is then used to implement a 'King of the Hill' type algorithm. Finally - at the end of the call stack, the largest number should be returned.
Trouble is, I'm getting an error in my code that I can't work around:
Occurs check: cannot construct the infinite type:
t1 = (t0, t1) -> (t0, t1)
In the return type of a call of `largestDiff'
Probable cause: `largestDiff' is applied to too few arguments
In the expression: largestDiff (y : xs)
In the first argument of `max', namely
`((x - y), largestDiff (y : xs))'
Anyone have some words of wisdom to share?
Thanks for your time!
EDIT: Thanks everyone for your time - I ended up independently discovering a much simpler way after much trial and error.
largestDiff [] = error "List too small"
largestDiff [x] = error "List too small"
largestDiff [x,y] = abs(x-y)
largestDiff (x:y:xs) = max(abs(x-y)) (largestDiff (y:xs))
Thanks again, all!
So the reason why your code is throwing an error is because
max((x-y), largestDiff (y:xs))
In Haskell, you do not use parentheses around parameters and separate them by commas, the correct syntax is
max (x - y) (largestDiff (y:xs))
The syntax you used is getting parsed as
max ((x - y), largestDiff (y:xs))
Which looks like you're passing a tuple to max!
However, this does not solve the problem. I always got 0 back. Instead, I would recommend breaking up the problem into two functions. You want to calculate the maximum of the difference, so first write a function to calculate the differences and then a function to calculate the maximum of those:
diffs :: Num a => [a] -> [a]
diffs [] = [] -- No elements case
diffs [x] = [] -- One element case
diffs (x:y:xs) = y - x : diffs (y:xs) -- Two or more elements case
largestDiff :: (Ord a, Num a) => [a] -> a
largestDiff xs = maximum $ map abs $ diffs xs
Notice how I've pulled the recursion out into the simplest possible case. We didn't need to calculate the maximum as we traversed the list; it's possible, just more complex. Since Haskell has a handy built-in function for calculating the maximum of a list for us, we can also leverage that. Our recursive function is clean and simple, and it is then combined with maximum to implement the desired largestDiff. As an FYI, diffs is really just a function to compute the derivative of a list of numbers, it can be a very useful function for data processing.
EDIT: Needed Ord constraint on largestDiff and added in map abs before calculating maximum.
Here's my take at it.
First some helpers:
diff a b = abs(a-b)
pick a b = if a > b then a else b
Then the solution:
mdiff :: [Int] -> Int
mdiff [] = 0
mdiff [_] = 0
mdiff (a:b:xs) = pick (diff a b) (mdiff (b:xs))
You have to provide two closing clauses, because the sequence might have either even or odd number of elements.
Another solution to this problem, which circumvents your error, can be obtained
by just transforming lists and folding/reducing them.
import Data.List (foldl')
diffs :: (Num a) => [a] -> [a]
diffs x = zipWith (-) x (drop 1 x)
absMax :: (Ord a, Num a) => [a] -> a
absMax x = foldl' max (fromInteger 0) (map abs x)
Now I admit this is a bit dense for a beginner, so I will explain the above.
The function zipWith transforms two given lists by using a binary function,
which is (-) in this case.
The second list we pass to zipWith is drop 1 x, which is just another way of
describing the tail of a list, but where tail [] results in an error,
drop 1 [] just yields the empty list. So drop 1 is the "safer" choice.
So the first function calculates the adjacent differences.
The name of the second function suggests that it calculates the maximum absolute
value of a given list, which is only partly true, it results in "0" if passed an
empty list.
But how does this happen, reading from right to left, we see that map abs
transforms every list element to its absolute value, which is asserted by
the Num a constraint. Then the foldl'-function traverses the list and
accumulates the maximum of the previous accumulator and the current element of
the list traversal. Moreover I'd like to mention that foldl' is the "strict"
sister/brother of the foldl-function, where the latter is rarely of use,
because it tends to build up a bunch of unevaluated expressions called thunks.
So let's quit all this blah blah and see it in action ;-)
> let a = diffs [1..3] :: [Int]
>>> zipWith (-) [1,2,3] (drop 1 [1,2,3])
<=> zipWith (-) [1,2,3] [2,3]
<=> [1-2,2-3] -- zipWith stops at the end of the SHORTER list
<=> [-1,-1]
> b = absMax a
>>> foldl' max (fromInteger 0) (map abs [-1,-1])
-- fromInteger 0 is in this case is just 0 - interesting stuff only happens
-- for other numerical types
<=> foldl' max 0 (map abs [-1,-1])
<=> foldl' max 0 [1,1]
<=> foldl' max (max 0 1) [1]
<=> foldl' max 1 [1]
<=> foldl' max (max 1 1) []
<=> foldl' max 1 [] -- foldl' _ acc [] returns just the accumulator
<=> 1

Slower execution when using an infinite list

I'm beginning to try and get my head round haskell performance, and what makes things fast and slow, and I'm a little confused by this.
I have two implementations of a function that generates a list of primes up to a certain value. The first is straight off the Haskell wiki:
primesTo :: (Ord a, Num a, Enum a) => a -> [a]
primesTo m = eratos [2..m] where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])
The second is the same, but using an infinite list internally:
primes2 :: (Ord a, Num a, Enum a) => a -> [a]
primes2 m = takeWhile (<= m) (eratos [2..]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])
In both cases, the minus function is:
minus :: (Ord a) => [a] -> [a] -> [a]
minus (x:xs) (y:ys) = case (compare x y) of
LT -> x : minus xs (y:ys)
EQ -> minus xs ys
GT -> minus (x:xs) ys
minus xs _ = xs
The latter implementation is significantly (~100x) slower than the former, and I don't get why. I would have thought that haskell's lazy evalutation would make them fairly equivalent under the hood.
This is obviously a reduced test case for the purposes of the question - in real life the optimisation would be no problem (although I don't understand why it is needed), but to me a function that just generates an infinite list of primes is more generically useful than a finite list, but appears slower to work with.
Looks like to me that there's a big difference between
(xs `minus` [p*p, p*p+p..m]) -- primesTo
(xs `minus` [p*p, p*p+p..]) -- primes2
The function minus steps through lists pairwise and terminates when one list reaches the end. In the first minus expression above, this occurs in no more than (m-p*p)/p steps when the latter list is exhausted. In the second one, it will always take steps on the order of length xs.
So your infinite lists have disabled at least one meaningful optimization.
One difference is that in the second case you need to generate one extra prime. You need to generate the first prime greater than m before takeWhile knows its time to stop.
Additionally, the [..m] bounds on both the list to filter and the lists of multiples help reduce the number of calculations. Whenever one of these lists gets empty minus immediately returns via its secons clause while in the infinite case the minus gets stuck in the first case. You can explore this a bit better if you also test the cases where only one of the lists is infinite:
--this is also slow
primes3 :: (Ord a, Num a, Enum a) => a -> [a]
primes3 m = takeWhile (<= m) (eratos [2..m]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])
--this fast
primes4 :: (Ord a, Num a, Enum a) => a -> [a]
primes4 m = takeWhile (<= m) (eratos [2..]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])

most idiomatic way to implement recursive list comprehension in F#

the question in short: What is the most idiomatic way to do "recursive List comprehension" in F#?
more detailed: As I have learned so far (I am new to F#) we have essentially the following tools to "build up" lists: List.map and list comprehension. Imho they both do more or less the same thing, they generate a list by "altering" the elements of a given list (in case of comprehension the given list is of the form [k..n]).
What I want to do is to inductively build up lists (before people ask: for no other reason than curiosity) i.e. is there any built in function with the behavior one would expect from a function called something like "List.maplist" that might take as arguments
a function f : 'a List -> 'a and an n : int,
returning the list
[... ; f (f []) ; f [] ] of length n.
To illustrate what I mean I wrote such a function on my own (as an exercise)
let rec recListComprehension f n =
if n=0 then []
else
let oldList = recListComprehension f (n-1)
f (oldList) :: oldList
or a bit less readable but in turn tail recursive:
let rec tailListComprehension f n list =
if n=0 then list
else tailListComprehension f (n-1) ((f list)::list)
let trecListComprehension f n = tailListComprehension f n []
for example, a list containing the first 200 fibonacci numbers can be generated by
let fiboGen =
function
| a::b::tail -> a+b
| _ -> 1UL
trecListComprehension (fiboGen) 200
to sum up the question: Is there a build in function in F# that behaves more or less like "trecListComprehension" and if not what is the most idiomatic way to achieve this sort of functionality?
PS: sorry for being a bit verbose..
What is the most idiomatic way to do "recursive List comprehension" in F#?
It's the matter of style. You will encounter high-order functions more often. For certain situations e.g. expressing nested computation or achieving laziness, using sequence expression seems more natural.
To illustrate, your example is written in sequence expression:
let rec recListComprehension f n = seq {
if n > 0 then
let oldList = recListComprehension f (n-1)
yield f oldList
yield! oldList }
recListComprehension fiboGen 200 |> Seq.toList
You have a very readable function with both laziness and tail-recursiveness which you can't easily achieve by using Seq.unfold.
Similarly, nested computation of cartesian product is more readable to use sequence expression / list comprehension:
let cartesian xs ys =
[ for x in xs do
for y in ys do
yield (x, y) ]
than to use high-order functions:
let cartesian xs ys =
List.collect (fun x -> List.map (fun y -> (x, y)) ys) xs
I once asked about differences between list comprehension and high-order functions which might be of your interest.
You're basically folding over the numeric range. So it could be written:
let listComp f n = List.fold (fun xs _ -> f xs :: xs) [] [1 .. n]
This has the added benefit of gracefully handling negative values of n.
You could do a Seq.unfold and then do Seq.toList.
See the example from here:
let seq1 = Seq.unfold (fun state -> if (state > 20) then None else Some(state, state + 1)) 0
printfn "The sequence seq1 contains numbers from 0 to 20."
for x in seq1 do printf "%d " x
let fib = Seq.unfold (fun state ->
if (snd state > 1000) then None
else Some(fst state + snd state, (snd state, fst state + snd state))) (1,1)
printfn "\nThe sequence fib contains Fibonacci numbers."
for x in fib do printf "%d " x

Functional O(1) append and O(n) iteration from first element list data structure

I'm looking for a functional data structure that supports the following operations:
Append, O(1)
In order iteration, O(n)
A normal functional linked list only supports O(n) append, while I could use a normal LL and then reverse it, the reverse operation would be O(n) also which (partially) negates the O(1) cons operation.
You can use John Hughes's constant-time append lists, which seem nowadays to be called DList. The representation is a function from lists to lists: the empty list is the identity function; append is composition, and singleton is cons (partially applied). In this representation every enumeration will cost you n allocations, so that may not be so good.
The alternative is to make the same algebra as a data structure:
type 'a seq = Empty | Single of 'a | Append of 'a seq * 'a seq
Enumeration is a tree walk, which will either cost some stack space or will require some kind of zipper representation. Here's a tree walk that converts to list but uses stack space:
let to_list t =
let rec walk t xs = match t with
| Empty -> xs
| Single x -> x :: xs
| Append (t1, t2) -> walk t1 (walk t2 xs) in
walk t []
Here's the same, but using constant stack space:
let to_list' t =
let rec walk lefts t xs = match t with
| Empty -> finish lefts xs
| Single x -> finish lefts (x :: xs)
| Append (t1, t2) -> walk (t1 :: lefts) t2 xs
and finish lefts xs = match lefts with
| [] -> xs
| t::ts -> walk ts t xs in
walk [] t []
You can write a fold function that visits the same elements but doesn't actually reify the list; just replace cons and nil with something more general:
val fold : ('a * 'b -> 'b) -> 'b -> 'a seq -> 'b
let fold f z t =
let rec walk lefts t xs = match t with
| Empty -> finish lefts xs
| Single x -> finish lefts (f (x, xs))
| Append (t1, t2) -> walk (t1 :: lefts) t2 xs
and finish lefts xs = match lefts with
| [] -> xs
| t::ts -> walk ts t xs in
walk [] t z
That's your linear-time, constant-stack enumeration. Have fun!
I believe you can just use standard functional linked list:
To append element, you can use cons (which is O(1))
To iterate elements in the order in which they were inserted, you can first reverse the list,
(which is O(N)) and then traverse it, which is also O(N) (and 2xO(N) is still just O(N)).
How about a difference list?
type 'a DList = DList of ('a list -> 'a list)
module DList =
let append (DList f) (DList g) = (DList (f << g))
let cons x (DList f) = (DList (fun l -> x::(f l)))
let snoc (DList f) x = (DList (fun l -> f(x::l)))
let empty = DList id
let ofList = List.fold snoc empty
let toList (DList f) = f []
You could create a functional Deque, which provides O(1) adding to either end, and O(N) for iteration in either direction. Eric Lippert wrote about an interesting version of an immutable Deque on his blog note that if you look around you will find the other parts of the series, but that is the explanation of the final product. Note also that with a bit of tweaking it can be modified to utilize F# discriminated unions and pattern matching (although that is up to you).
Another interesting property of this version, O(1) peek, removal, and add, from either end (i.e. dequeueLeft, dequeueRight, dequeueLeft, dequeueRight, etc. is still O(N), versus O(N*N) with a double list method).
What about a circularly-linked list? It supports O(1) appends and O(n) iteration.

Resources