I'm trying to learn F# and was watching a video when something odd (at least, to me) came up. The video in question is here and the relevant part starts at 2:30 for those interested. But basically, the guy says that F# makes it awkward to work with arrays and that the designers did so on purpose because lists are easier to "prepend and append".
The question that immediately sprang to mind: isn't easy prepending and appending something that should be frowned upon in an immutable language? Specifically, I'm thinking of C#'s Lists where you can do something like List.Add(obj); and mutate the list. With an array you'd have to create an entirely new array, but that's also what would need to happen in an immutable language.
So why do the designers of F# prefer lists? What is the fundamental difference in an immutable environment between a list and an array? What am I missing? Are lists in F# really linked lists?
I would disagree that "F# makes it awkward to work with arrays". In fact, F# makes working with arrays quite nice compared to most languages.
For example, F# has literal array construction: let arr = [|1;2;3;4;|]. And perhaps even cooler, pattern matching on arrays:
match arr with
| [|1;2;_;_|] -> printfn "Starts with 1;2"
| [|_;_;3;4|] -> printfn "Ends with 3;4"
| _ -> printfn "Array not recognized"
As to why immutable singly linked lists are preferred in functional programming like F#, there's a lot to say, but the short answer is that it allows an O(1) prepend efficiency, and allows the implementation to share nodes, so it is easy on memory. For example,
let x = [2;3]
let y = 1::x
Here y is created by prepending 1 to x, but x is neither modified nor copied, so making y was very cheap. We can see a bit how this is possible, since x points to the head, 2, of the initially constructed list and can only move forward, and since the elements of the list it points to can't be mutated, it doesn't matter that y shares nodes with it.
In functional languages lists are usually single-linked lists. I.e. it is not necessary to copy the complete list. Instead prepending (often called cons) is a O(1) operation and you can still use the old list, because lists are immutable.
First of all, arrays are pretty low-level data structure and they are really only useful if you know the lenght of the array when creating it. That's not often the case and that's a reason why C# programmers use System.Collections.Generic.List<T> and F# programmers use F# list<T>.
The reason why F# prefers its own functional list rather than using .NET List<T> is that functional languages prefer immutable types. Instead of modifying the object by calling list.Add(x), you can create new list with items added to the front by writing let newList = x::list.
I also agree with Stephen that using arrays in F# is not awkward at all. If you know the number of elements you're working with or you're transforming some existing data source, then working with arrays is quite easy:
/ You can create arrays using `init`
let a = Array.init 10 (fun i -> (* calculate i-th element here *) )
// You can transform arrays using `map` and `filter`
a |> Array.map (fun n -> n + 10)
|> Array.filter (fun n -> n > 12)
// You can use array comprehensions:
let a2 = [| for n in a do
if n > 12 then yield n + 10 |]
This is essentially the same as processing lists - there you would use list comprehensions [ ... ] and list processing functions such as List.map etc. The difference really appears just when initializing the list/array.
F# makes it awkward to work with arrays
F# provides many features that make it easier to work with arrays than in other languages, including array literals, array patterns and higher-order functions.
The question that immediately sprang to mind: isn't easy prepending and appending something that should be frowned upon in an immutable language?
I believe you have misunderstood what that statement means. When people talk about prepending and appending in the context of purely functional data structures they are referring to the creation of a new collection that is derived (and shared most of its internals) with an existing collection.
So why do the designers of F# prefer lists?
F# inherited some list-related capabilities from OCaml which inherited them from Standard ML and ML because singly-linked immutable lists are very useful in the context of their application domain (metaprogramming) but I would not say that the designers of F# prefer lists.
What is the fundamental difference in an immutable environment between a list and an array?
In F#, lists provide O(1) prepend and append and O(n) random access whereas arrays provide O(n) prepend and append and O(1) random access. Arrays can be mutated but lists cannot.
What am I missing?
Basic knowledge of purely functional data structures. Read Okasaki.
Are lists in F# really linked lists?
Yes. Specifically, singly-linked immutable lists. In fact, in some MLs the list type can be defined as:
type 'a list =
| ([])
| (::) of 'a * 'a list
This is why the :: operator is a constructor and not a function so you cannot write (::) as you can with, for example, (+).
An F# list is more like the following datastructure - a single linked list:
public class List<T> {
public List(T item, List<T> prev) { /*...*/ }
public T Item { get; }
public List<T> Prev { get; }
}
So when a new list is created, it is actually creating a single node with a reference to the first element of the previous list, rather than copying the entire array.
Related
The Haskell tutorial says, be cautious that when we use "Hello"++" World", the new list construction has to visit all single elements(here, every character of "Hello"), so if the list on the left of "++" is long, then using "++" will bring down performance.
I think I was not understanding correctly, does Haskell's developers never tune the performance of list operations? Why this operation remains slow, to have some kind of syntax consistencies in any lambda function or currying?
Any hints? Thanks.
In some languages, a "list" is a general-purpose sequence type intended to offer good performance for concatenation, splitting, etc. In Haskell, and most traditional functional languages, a list is a very specific data structure, namely a singly-linked list. If you want a general-purpose sequence type, you should use Data.Sequence from the containers package (which is already installed on your system and offers very good big-O asymptotics for a wide variety of operations), or perhaps some other one more heavily optimized for common usage patterns.
If you have immutable list which has a head and a reference to the tail, you cannot change its tail. If you want to add something to the 'end' of the list, you have to reach the end and then put all items one by one to the head of your right list. It is the fundamential property of immutable lists: concatenation is expensive.
Haskell lists are like singly-linked lists: they are either empty or they consist of a head and a (possibly empty) tail. Hence, when appending something to a list, you'll first have to walk the entire list to get to the end. So you end up traversing the entire list (the list to which you append, that is), which needs O(n) runtime.
What is the difference between "open-ended lists" and "difference lists"?
As explained on http://homepages.inf.ed.ac.uk/pbrna/prologbook/node180.html, open list is a tool used to implement a difference list.
Open list is any list where you have a unassigned variable at some point in the list, e.g.: [a,b,c|X]. You can use open list to implement a data structure called difference list, which formally specifies a structure of two terms pointing to first element and to the open end, traditionally defined as: [a,b,c|X]-X, to make operating on such lists easier.
For example, if all you have is an open list, adding element to the end is possible, but you need to iterate over all items. In a difference list you can just use the end-of-list variable (called a Hole on the page above) to skip iteration and perform the operation in constant time.
Both notions seem to be lists, but in fact they are not. One is a concrete term, the other rather a convention.
Open-ended lists, partial lists
Open-ended lists are terms that are not lists but can be instantiated such that they become lists. In standard lingo, they are called partial lists. Here are partial lists: X, [a|X], [X|X] are all partial lists.
The notion open-ended lists suggests a certain usage of such lists to simulate some open-ended state. Think of a dictionary that might be represented by an open-ended list. Every time you add a new item, the variable "at the end of the partial list" is instantiated to a new element. While this programming technique is quite possible in Prolog, it has one big downside: The programs will heavily depend on a procedural interpretation. And in many situations there is no way to have a declarative interpretation at all.
Difference lists
Difference lists are effectively not lists as such but a certain way how lists are used such that the intended list is represented by two variables: one for the start and one for the end of the list. For this reason it would help a lot to rather talk of list differences instead of difference lists.
Consider:
el(E, [E|L],L).
Here, the last two arguments can be seen as forming a difference: a list that contains the single element [E]. You can now construct more complex lists out of simpler ones, provided you respect certain conventions which are essentially that the second argument is only passed further on. The differences as such are never compared to each other!
el2(E, F, L0,L) :-
el(E, L0,L1),
el(F, L1,L).
Note that this is merely a convention. The lists are not enforced. Think of:
?- el2(E, F, L, nonlist).
L = [E,F|nonlist].
This technique is also used to encode dcgs.
For example
Open-ended : [a,b,c | _]
Difference-list : [a,b,c|U]-U.
I'm used to C++ STL containers. D has arrays, associative arrays, and strings, but where is the rest? I know about std.container, but as far as I can tell it only has one container, the red-black tree, which I could use if I needed something similar to std::set. But, what if I need a list? Am I supposed to use an array instead?
std::vector -> array
std::deque -> ?
std::queue -> ?
std::stack -> ? maybe array and std.container functions ?
std::priority_queue -> BinaryHeap
std::list -> ?
std::set -> std.container RedBlackTree
std::multiset -> ?
std::unordered_set -> ?
std::map -> associative arrays
std::multimap -> ?
std::unordered_map -> ?
Are there any plans to support any of the missing?
I believe that the main holdup for getting more containers into std.container is that Andrei Alexandrescu has been sorting out how best to deal with custom allocators, and he wants to do that before implementing all of the sundry container types, because otherwise it's going to require a lot of code changes once he does.
In the interim, you have the built-in arrays and associative arrays, and std.container contains Array (which is essentially std::vector), SList (which is a singly-linked list), RedBlackTree (which can be used for any type of set or map which uses a tree - which is what the STL's various set and map types do), and BinaryHeap.
So, there's no question that the situation needs to be improved (and it will), but I don't know how soon. Eventually, std.container should have container types which correspond to all of the STL container types.
Containers are a todo in terms of library development in D, but noone's gotten a comprehensive container library into Phobos because noone agrees on what the design should be, and everyone who contributes to the standard library (which has been growing very rapidly) has found more interesting things to work on.
std::vector -> array as you say
std::dequeue, std::queue: We don't have one yet, unfortunately.
std::stack: This can be trivially implemented on top of SList or an array.
std::set: This can be trivially implemented on top of either RedBlackTree.
std::multiset: I think RedBlackTree can be set to allow duplicates.
std::unordered_set: This can be trivially implemented on top of the builtin associative array. To implement it on top of the builtin AA, use byte[0][SomeType].
std::map: Can be trivially implemented on top of RedBlackTree.
std::multimap: You can probably use associative arrays of arrays for this.
std__unordered_map: Use builtin associative arrays.
In one of my first attempts to create functional code, I ran into a performance issue.
I started with a common task - multiply the elements of two arrays and sum up the results:
var first:Array[Float] ...
var second:Array[Float] ...
var sum=0f;
for (ix<-0 until first.length)
sum += first(ix) * second(ix);
Here is how I reformed the work:
sum = first.zip(second).map{ case (a,b) => a*b }.reduceLeft(_+_)
When I benchmarked the two approaches, the second method takes 40 times as long to complete!
Why does the second method take so much longer? How can I reform the work to be both speed efficient and use functional programming style?
The main reasons why these two examples are so different in speed are:
the faster one doesn't use any generics, so it doesn't face boxing/unboxing.
the faster one doesn't create temporary collections and, thus, avoids extra memory copies.
Let's consider the slower one by parts. First:
first.zip(second)
That creates a new array, an array of Tuple2. It will copy all elements from both arrays into Tuple2 objects, and then copy a reference to each of these objects into a third array. Now, notice that Tuple2 is parameterized, so it can't store Float directly. Instead, new instances of java.lang.Float are created for each number, the numbers are stored in them, and then a reference for each of them is stored into the Tuple2.
map{ case (a,b) => a*b }
Now a fourth array is created. To compute the values of these elements, it needs to read the reference to the tuple from the third array, read the reference to the java.lang.Float stored in them, read the numbers, multiply, create a new java.lang.Float to store the result, and then pass this reference back, which will be de-referenced again to be stored in the array (arrays are not type-erased).
We are not finished, though. Here's the next part:
reduceLeft(_+_)
That one is relatively harmless, except that it still do boxing/unboxing and java.lang.Float creation at iteration, since reduceLeft receives a Function2, which is parameterized.
Scala 2.8 introduces a feature called specialization which will get rid of a lot of these boxing/unboxing. But let's consider alternative faster versions. We could, for instance, do map and reduceLeft in a single step:
sum = first.zip(second).foldLeft(0f) { case (a, (b, c)) => a + b * c }
We could use view (Scala 2.8) or projection (Scala 2.7) to avoid creating intermediary collections altogether:
sum = first.view.zip(second).map{ case (a,b) => a*b }.reduceLeft(_+_)
This last one doesn't save much, actually, so I think the non-strictness if being "lost" pretty fast (ie, one of these methods is strict even in a view). There's also an alternative way of zipping that is non-strict (ie, avoids some intermediary results) by default:
sum = (first,second).zipped.map{ case (a,b) => a*b }.reduceLeft(_+_)
This gives much better result that the former. Better than the foldLeft one, though not by much. Unfortunately, we can't combined zipped with foldLeft because the former doesn't support the latter.
The last one is the fastest I could get. Faster than that, only with specialization. Now, Function2 happens to be specialized, but for Int, Long and Double. The other primitives were left out, as specialization increases code size rather dramatically for each primitive. On my tests, though Double is actually taking longer. That might be a result of it being twice the size, or it might be something I'm doing wrong.
So, in the end, the problem is a combination of factors, including producing intermediary copies of elements, and the way Java (JVM) handles primitives and generics. A similar code in Haskell using supercompilation would be equal to anything short of assembler. On the JVM, you have to be aware of the trade-offs and be prepared to optimize critical code.
I did some variations of this with Scala 2.8. The loop version is as you write but the
functional version is slightly different:
(xs, ys).zipped map (_ * _) reduceLeft(_ + _)
I ran with Double instead of Float, because currently specialization only kicks in for Double. I then tested with arrays and vectors as the carrier type. Furthermore, I tested Boxed variants which work on java.lang.Double's instead of primitive Doubles to measure
the effect of primitive type boxing and unboxing. Here is what I got (running Java 1.6_10 server VM, Scala 2.8 RC1, 5 runs per test).
loopArray 461 437 436 437 435
reduceArray 6573 6544 6718 6828 6554
loopVector 5877 5773 5775 5791 5657
reduceVector 5064 4880 4844 4828 4926
loopArrayBoxed 2627 2551 2569 2537 2546
reduceArrayBoxed 4809 4434 4496 4434 4365
loopVectorBoxed 7577 7450 7456 7463 7432
reduceVectorBoxed 5116 4903 5006 4957 5122
The first thing to notice is that by far the biggest difference is between primitive array loops and primitive array functional reduce. It's about a factor of 15 instead of the 40 you have seen, which reflects improvements in Scala 2.8 over 2.7. Still, primitive array loops are the fastest of all tests whereas primitive array reduces are the slowest. The reason is that primitive Java arrays and generic operations are just not a very good fit. Accessing elements of primitive Java arrays from generic functions requires a lot of boxing/unboxing and sometimes even requires reflection. Future versions of Scala will specialize the Array class and then we should see some improvement. But right now that's what it is.
If you go from arrays to vectors, you notice several things. First, the reduce version is now faster than the imperative loop! This is because vector reduce can make use of efficient bulk operations. Second, vector reduce is faster than array reduce, which illustrates the inherent overhead that arrays of primitive types pose for generic higher-order functions.
If you eliminate the overhead of boxing/unboxing by working only with boxed java.lang.Double values, the picture changes. Now reduce over arrays is a bit less than 2 times slower than looping, instead of the 15 times difference before. That more closely approximates the inherent overhead of the three loops with intermediate data structures instead of the fused loop of the imperative version. Looping over vectors is now by far the slowest solution, whereas reducing over vectors is a little bit slower than reducing over arrays.
So the overall answer is: it depends. If you have tight loops over arrays of primitive values, nothing beats an imperative loop. And there's no problem writing the loops because they are neither longer nor less comprehensible than the functional versions. In all other situations, the FP solution looks competitive.
This is a microbenchmark, and it depends on how the compiler optimizes you code. You have 3 loops composed here,
zip . map . fold
Now, I'm fairly sure the Scala compiler cannot fuse those three loops into a single loop, and the underlying data type is strict, so each (.) corresponds to an intermediate array being created. The imperative/mutable solution would reuse the buffer each time, avoiding copies.
Now, an understanding of what composing those three functions means is key to understanding performance in a functional programming language -- and indeed, in Haskell, those three loops will be optimized into a single loop that reuses an underlying buffer -- but Scala cannot do that.
There are benefits to sticking to the combinator approach, however -- by distinguishing those three functions, it will be easier to parallelize the code (replace map with parMap etc). In fact, given the right array type, (such as a parallel array) a sufficiently smart compiler will be able to automatically parallelize your code, yielding more performance wins.
So, in summary:
naive translations may have unexpected copies and inefficiences
clever FP compilers remove this overhead (but Scala can't yet)
sticking to the high level approach pays off if you want to retarget your code, e.g. to parallelize it
Don Stewart has a fine answer, but it might not be obvious how going from one loop to three creates a factor of 40 slowdown. I'll add to his answer that Scala compiles to JVM bytecodes, and not only does the Scala compiler not fuse the three loops into one, but the Scala compiler is almost certainly allocating all the intermediate arrays. Notoriously, implementations of the JVM are not designed to handle the allocation rates required by functional languages. Allocation is a significant cost in functional programs, and that's one the loop-fusion transformations that Don Stewart and his colleagues have implemented for Haskell are so powerful: they eliminate lots of allocations. When you don't have those transformations, plus you're using an expensive allocator such as is found on a typical JVM, that's where the big slowdown comes from.
Scala is a great vehicle for experimenting with the expressive power of an unusual mix of language ideas: classes, mixins, modules, functions, and so on. But it's a relatively young research language, and it runs on the JVM, so it's unreasonable to expect great performance except on the kind of code that JVMs are good at. If you want to experiment with the mix of language ideas that Scala offers, great—it's a really interesting design—but don't expect the same performance on pure functional code that you'd get with a mature compiler for a functional language, like GHC or MLton.
Is scala functional programming slower than traditional coding?
Not necessarily. Stuff to do with first-class functions, pattern matching, and currying need not be especially slow. But with Scala, more than with other implementations of other functional languages, you really have to watch out for allocations—they can be very expensive.
The Scala collections library is fully generic, and the operations provided are chosen for maximum capability, not maximum speed. So, yes, if you use a functional paradigm with Scala without paying attention (especially if you are using primitive data types), your code will take longer to run (in most cases) than if you use an imperative/iterative paradigm without paying attention.
That said, you can easily create non-generic functional operations that perform quickly for your desired task. In the case of working with pairs of floats, we might do the following:
class FastFloatOps(a: Array[Float]) {
def fastMapOnto(f: Float => Float) = {
var i = 0
while (i < a.length) { a(i) = f(a(i)); i += 1 }
this
}
def fastMapWith(b: Array[Float])(f: (Float,Float) => Float) = {
val len = a.length min b.length
val c = new Array[Float](len)
var i = 0
while (i < len) { c(i) = f(a(i),b(i)); i += 1 }
c
}
def fastReduce(f: (Float,Float) => Float) = {
if (a.length==0) Float.NaN
else {
var r = a(0)
var i = 1
while (i < a.length) { r = f(r,a(i)); i += 1 }
r
}
}
}
implicit def farray2fastfarray(a: Array[Float]) = new FastFloatOps(a)
and then these operations will be much faster. (Faster still if you use Double and 2.8.RC1, because then the functions (Double,Double)=>Double will be specialized, not generic; if you're using something earlier, you can create your own abstract class F { def f(a: Float) : Float } and then call with new F { def f(a: Float) = a*a } instead of (a: Float) => a*a.)
Anyway, the point is that it's not the functional style that makes functional coding in Scala slow, it's that the library is designed with maximum power/flexibility in mind, not maximum speed. This is sensible, since each person's speed requirements are typically subtly different, so it's hard to cover everyone supremely well. But if it's something you're doing more than just a little, you can write your own stuff where the performance penalty for a functional style is extremely small.
I am not an expert Scala programmer, so there is probably a more efficient method, but what about something like this. This can be tail call optimized, so performance should be OK.
def multiply_and_sum(l1:List[Int], l2:List[Int], sum:Int):Int = {
if (l1 != Nil && l2 != Nil) {
multiply_and_sum(l1.tail, l2.tail, sum + (l1.head * l2.head))
}
else {
sum
}
}
val first = Array(1,2,3,4,5)
val second = Array(6,7,8,9,10)
multiply_and_sum(first.toList, second.toList, 0) //Returns: 130
To answer the question in the title: Simple functional constructs may be slower than imperative on the JVM.
But, if we consider only simple constructs, then we might as well throw out all modern languages and stick with C or assembler. If you look a the programming language shootout, C always wins.
So why choose a modern language? Because it lets you express a cleaner design. Cleaner design leads to performance gains in the overall operation of the application. Even if some low-level methods can be slower. One of my favorite examples is the performance of BuildR vs. Maven. BuildR is written in Ruby, an interpreted, slow, language. Maven is written in Java. A build in BuildR is twice as fast as Maven. This is due mostly to the design of BuildR which is lightweight compared with that of Maven.
Your functional solution is slow because it is generating unnecessary temporary data structures. Removing these is known as deforesting and it is easily done in strict functional languages by rolling your anonymous functions into a single anonymous function and using a single aggregator. For example, your solution written in F# using zip, map and reduce:
let dot xs ys = Array.zip xs ys |> Array.map (fun (x, y) -> x * y) -> Array.reduce ( * )
may be rewritten using fold2 so as to avoid all temporary data structures:
let dot xs ys = Array.fold2 (fun t x y -> t + x * y) 0.0 xs ys
This is a lot faster and the same transformation can be done in Scala and other strict functional languages. In F#, you can also define the fold2 as inline in order to have the higher-order function inlined with its functional argument whereupon you recover the optimal performance of the imperative loop.
Here is dbyrnes solution with arrays (assuming Arrays are to be used) and just iterating over the index:
def multiplyAndSum (l1: Array[Int], l2: Array[Int]) : Int =
{
def productSum (idx: Int, sum: Int) : Int =
if (idx < l1.length)
productSum (idx + 1, sum + (l1(idx) * l2(idx))) else
sum
if (l2.length == l1.length)
productSum (0, 0) else
error ("lengths don't fit " + l1.length + " != " + l2.length)
}
val first = (1 to 500).map (_ * 1.1) toArray
val second = (11 to 510).map (_ * 1.2) toArray
def loopi (n: Int) = (1 to n).foreach (dummy => multiplyAndSum (first, second))
println (timed (loopi (100*1000)))
That needs about 1/40 of the time of the list-approach. I don't have 2.8 installed, so you have to test #tailrec yourself. :)
I have a function that takes a parameter and produces a result. Unfortunately, it takes quite long for the function to produce the result. The function is being called quite often with the same input, that's why it would be convenient if I could cache the results. Something like
let cachedFunction = createCache slowFunction
in (cachedFunction 3.1) + (cachedFunction 4.2) + (cachedFunction 3.1)
I was looking into Data.Array and although the array is lazy, I need to initialize it with a list of pairs (using listArray) - which is impractical . If the 'key' is e.g. the 'Double' type, I cannot initialize it at all, and even if I can theoretically assign an Integer to every possible input, I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array (or, preferably a hash table, as only a handful of resutls will be used) using a function instead of a list.
Update: I am reading the memoization articles and as far as I understand it the MemoTrie could work the way I want. Maybe. Could somebody try to produce the 'cachedFunction'? Prefereably for a slow function that takes 2 Double arguments? Or, alternatively, that takes one Int argument in a domain of ~ [0..1 billion] that wouldn't eat all memory?
Well, there's Data.HashTable. Hash tables don't tend to play nicely with immutable data and referential transparency, though, so I don't think it sees a lot of use.
For a small number of values, stashing them in a search tree (such as Data.Map) would probably be fast enough. If you can put up with doing some mangling of your Doubles, a more robust solution would be to use a trie-like structure, such as Data.IntMap; these have lookup times proportional primarily to key length, and roughly constant in collection size. If Int is too limiting, you can dig around on Hackage to find trie libraries that are more flexible in the type of key used.
As for how to cache the results, I think what you want is usually called "memoization". If you want to compute and memoize results on demand, the gist of the technique is to define an indexed data structure containing all possible results, in such a way that when you ask for a specific result it forces only the computations needed to get the answer you want. Common examples usually involve indexing into a list, but the same principle should apply for any non-strict data structure. As a rule of thumb, non-function values (including infinite recursive data structures) will often be cached by the runtime, but not function results, so the trick is to wrap all of your computations inside a top-level definition that doesn't depend on any arguments.
Edit: MemoTrie example ahoy!
This is a quick and dirty proof of concept; better approaches may exist.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)
mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode
unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral
instance HasTrie Double where
data Double :->: a = DoubleTrie ([Int] :->: a)
trie f = DoubleTrie $ trie $ f . unmangle
untrie (DoubleTrie t) = untrie t . mangle
slow x
| x < 1 = 1
| otherwise = slow (x / 2) + slow (x / 3)
memoSlow :: Double -> Integer
memoSlow = memo slow
Do note the GHC extensions used by the MemoTrie package; hopefully that isn't a problem. Load it up in GHCi and try calling slow vs. memoSlow with something like (10^6) or (10^7) to see it in action.
Generalizing this to functions taking multiple arguments or whatnot should be fairly straightforward. For further details on using MemoTrie, you might find this blog post by its author helpful.
See memoization
There are a number of tools in GHC's runtime system explicitly to support memoization.
Unfortunately, memoization isn't really a one-size fits all affair, so there are several different approaches that we need to support in order to cope with different user needs.
You may find the original 1999 writeup useful as it includes several implementations as examples:
Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell by Simon Peyton Jones, Simon Marlow, and Conal Elliott
I will add my own solution, which seems to be quite slow as well. First parameter is a function that returns Int32 - which is unique identifier of the parameter. If you want to uniquely identify it by different means (e.g. by 'id'), you have to change the second parameter in H.new to a different hash function. I will try to find out how to use Data.Map and test if I get faster results.
import qualified Data.HashTable as H
import Data.Int
import System.IO.Unsafe
cache :: (a -> Int32) -> (a -> b) -> (a -> b)
cache ident f = unsafePerformIO $ createfunc
where
createfunc = do
storage <- H.new (==) id
return (doit storage)
doit storage = unsafePerformIO . comp
where
comp x = do
look <- H.lookup storage (ident x)
case look of
Just res -> return res
Nothing -> do
result <- return (f x)
H.insert storage (ident x) result
return result
You can write the slow function as a higher order function, returning a function itself. Thus you can do all the preprocessing inside the slow function and the part that is different in each computation in the returned (hopefully fast) function. An example could look like this:
(SML code, but the idea should be clear)
fun computeComplicatedThing (x:float) (y:float) = (* ... some very complicated computation *)
fun computeComplicatedThingFast = computeComplicatedThing 3.14 (* provide x, do computation that needs only x *)
val result1 = computeComplicatedThingFast 2.71 (* provide y, do computation that needs x and y *)
val result2 = computeComplicatedThingFast 2.81
val result3 = computeComplicatedThingFast 2.91
I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array ... using a function instead of a list.
I'd go with listArray (start, end) (map func [start..end])
func doesn't really get called above. Haskell is lazy and creates thunks which will be evaluated when the value is actually required.
When using a normal array you always need to initialize its values. So the work required for creating these thunks is necessary anyhow.
Several tens of thousands is far from a lot. If you'd have trillions then I would suggest to use a hash table yada yada
I don't know haskell specifically, but how about keeping existing answers in some hashed datastructure (might be called a dictionary, or hashmap)? You can wrap your slow function in another function that first check the map and only calls the slow function if it hasn't found an answer.
You could make it fancy by limiting the size of the map to a certain size and when it reaches that, throwing out the least recently used entry. For this you would additionally need to keep a map of key-to-timestamp mappings.