F#: Attempt to memoize member function resets cache on each call? - caching

I'm trying to memoize a member function of a class, but every time the member is called (by another member) it makes a whole new cache and 'memoized' function.
member x.internal_dec_rates =
let cache = new Dictionary< Basis*(DateTime option), float*float>()
fun (basis:Basis) (tl:DateTime option) ->
match cache.TryGetValue((basis,tl)) with
| true, (sgl_mux, sgl_lps) -> (sgl_mux, sgl_lps)
| _ ->
let (sgl_mux, sgl_lps) =
(* Bunch of stuff *)
cache.Add((basis,tl),(sgl_mux,sgl_lps))
sgl_mux,sgl_lps
I'm using Listing 10.5 in "Real World Functional Programming" as a model. I've tried using a memoization higher-order function and that doesn't help. The above listing has the memoization built in directly.
The problem is, when I call it e.g.
member x.px (basis:Basis) (tl: DateTime option) =
let (q,l) = (x.internal_dec_rates basis tl)
let (q2,l2) = (x.internal_dec_rates basis tl)
(exp -q)*(1.-l)
execution goes to the 'let cache=...' line, defeating the whole point. I put in the (q2,l2) line in order to make sure it wasn't a scope problem, but it doesn't seem to be.
In fact I did a test using Petricek's code as a member function and that seems to have the same issue:
// Not a member function
let memo1 f =
let cache = new Dictionary<_,_>()
(fun x ->
match cache.TryGetValue(x) with
| true, v -> v
| _ -> let v = f x
cache.Add(x,v)
v
)
member x.factorial = memo1(fun y->
if (y<=0) then 1 else y*x.factorial(y-1))
Even the internal recursion of x.factorial seems to set up a new 'cache' for each level.
What am I doing wrong, and how can I make this work?

In response to your comment on Jack's answer, this doesn't have to become tedious. Given a memoize function:
let memoize f =
let cache = Dictionary()
fun x ->
match cache.TryGetValue(x) with
| true, v -> v
| _ ->
let v = f x
cache.Add(x, v)
v
Define each of your functions as let-bound values and return them from your methods:
type T() as x =
let internalDecRates = memoize <| fun (basis: Basis, tl: DateTime option) ->
(* compute result *)
Unchecked.defaultof<float * float>
let px = memoize <| fun (basis, tl) ->
let (q,l) = x.InternalDecRates(basis, tl)
let (q2,l2) = x.InternalDecRates(basis, tl)
(exp -q)*(1.-l)
member x.InternalDecRates = internalDecRates
member x.Px = px
The only "boilerplate" is the let binding and call to memoize.
EDIT: As kvb noted, in F# 3.0 auto-properties allow a more concise solution:
type T() as x =
member val InternalDecRates = memoize <| fun (basis: Basis, tl: DateTime option) ->
(* compute result *)
Unchecked.defaultof<float * float>
member val Px = memoize <| fun (basis, tl) ->
let (q,l) = x.InternalDecRates(basis, tl)
let (q2,l2) = x.InternalDecRates(basis, tl)
(exp -q)*(1.-l)

I see a lot of long answers here; the short answer is that
member x.P = code()
defines a property P which has a getter that runs code() every time P is accessed. You need to move the cache creation into the class's constructor, so that it will only run once.

As others already said, this cannot be done just by defining a single member in F# 2.0. You either need a separate field (let bound value) for a cache or for a local function that is memoized.
As mentioned by kvb, in F# 3.0, you can do this using member val which is a property that is initialized when the object is created (and has an automatically generated backing field where the result is stored). Here is a complete sample that demonstrates this (it will work in Visual Studio 2012):
open System.Collections.Generic
type Test() =
/// Property that is initialized when the object is created
/// and stores a function value 'int -> int'
member val Foo =
// Initialize cache and return a function value
let cache = Dictionary<int, int>()
fun arg ->
match cache.TryGetValue(arg) with
| true, res -> res
| false, _ ->
let res = arg * arg
printfn "calculating %d" arg
cache.Add(arg, res)
res
// Part of the property declaration that instructs
// the compiler to generate getter for the property
with get
The with get part of the declaration can be omitted, but I include it here to make the sample clearer (you can also use with get, set to get a mutable property). Now you can call test.Foo as a function and it caches the value as required
let t = Test()
t.Foo(10)
t.Foo(10)
The only problem with this approach is that t.Foo is actually compiled as a property that returns a function (instead of being compiled as a method). This is not a big problem when you use the class from F#, but it would be a problem if you were calling it from C# (because C# would see the member as a property of type FSharpFunc<int, int>, which is hard to use).

John is correct -- you need to move the cache dictionary into a private, let-bound member of the type.
Type members are compiled a bit differently than let-bound values in modules, which is the reason for the difference in behavior. If you copy/paste the body of your x.internal_dec_rates method and assign it to a let-bound value in a module, it should work correctly then, because the F# compiler will compile it as a closure which gets created once and then assigned to a static readonly field of the module.
A couple of other tips, for good measure:
Type member methods can use optional parameters -- so you can slightly simplify the method signature if you like.
You can create the cache key just once and reuse it (this also helps avoid mistakes).
You can simplify the (sgl_mux, sgl_lps) pattern-matching code by just assigning the tuple a name (e.g., value), since you're just returning the whole tuple anyway.
Here's my take on your code:
type FooBar () =
let cache = new Dictionary< Basis*(DateTime option), float*float>()
member x.internal_dec_rates (basis : Basis, ?tl : DateTime) =
let key = basis, tl
match cache.TryGetValue key with
| true, value -> value
| _ ->
// sgl_mux, sgl_lps
let value =
(* Bunch of stuff *)
cache.Add (key, value)
value

You need to move the dictionary outside the function call - like
let cache = new Dictionary< Basis*(DateTime option), float*float>()
member x.internal_dec_rates =
fun (basis:Basis) (tl:DateTime option) ->
match cache.TryGetValue((basis,tl)) with
| true, (sgl_mux, sgl_lps) -> (sgl_mux, sgl_lps)
| _ ->
let (sgl_mux, sgl_lps) =
(* Bunch of stuff *)
cache.Add((basis,tl),(sgl_mux,sgl_lps))
sgl_mux,sgl_lps
This way the cache persists across the function calls. Your memo1 has the same problem. In the original version, you create a new cache every time you call the function, this way we just have a single cache, which persists across function calls.

In addition to the other answers, note that in F# 3.0 you can use automatically implemented properties, which will behave as you want:
member val internal_dec_rates = ...
Here, the right hand side is evaluated only once, but everything is self-contained.

Related

Why is it so slow to create records with a field that references a big value in F#?

In the code below that is executed as an .fsx script, the final line takes around 30 seconds to finish. I assumed that since records are reference types, the final line only creates records with a field that references an (immutable) large value, and so it should be very fast. Why is it slow and how can I fix it?
type BigType = { Big: Set<int> }
type CollectionType = { BigVal: BigType; SmallVal: int }
let b = { Big = set [ 0 .. 999999 ] }
let mySet = set [ 0 .. 50 ]
#time
mySet |> Set.map (fun x -> { BigVal = b; SmallVal = x })
Thank you.
One thing to notice here is that the order you define the fields in type CollectionType = { BigVal: BigType; SmallVal: int } makes a difference. If you try:
type BigType = { Big: Set<int> }
type CollectionType = { SmallVal: int; BigVal: BigType; }
let b = { Big = set [ 0 .. 999999 ] }
let mySet = set [ 0 .. 50 ]
#time
mySet |> Set.map (fun x -> { BigVal = b; SmallVal = x })
Instead the time taken goes from Real: 00:00:34.385 to Real: 00:00:00.002.
NB: I was originally concerned that this behaviour could not be relied on and might change without warning; however as nasosev has found this behaviour is described in the F# language specification see section 8.15.3 of version 4.1 of the document.
The reason is that Set is implemented as a search tree, so in order to insert an item into a set, it is compared against some of the existing items. So there are indeed only small records created, but whole sets are being compared.
It's hard to tell what the best way to fix the issue is without knowing the exact problem you're solving. But if it is a requirement that each CollectionType value has a different SmallVal, then you can do as Scott suggests and implement custom comparison that only looks at SmallVal. You don't need a class though, you can do it with a record:
(* For records, you need to put CustomComparison in addition to implementing IComparable.
And CustomComparison requires CustomEquality and its corresponding overrides. *)
[<CustomComparison; CustomEquality>]
type CollectionType =
{ BigVal: BigType; SmallVal: int }
interface System.IComparable with
member this.CompareTo(that) =
match that with
| :? CollectionType as that -> compare this.SmallVal that.SmallVal
| _ -> -1
override this.Equals(that) =
match that with
| :? CollectionType as that -> this.SmallVal = that.SmallVal
| _ -> false
override this.GetHashCode() = this.SmallVal
If you convert to an array first then it takes no time at all:
mySet |> Set.toList |> List.map (fun x -> { BigVal = b; SmallVal = x })
So the time that it take to create the records is insignificant. The reason it's slow is that the records are inside a set. Sets need to compare their items to each other as part of the implementation, to make sure there are no duplicate values. F# compares records structurally, by comparing their contents. So these records that are being compared contain very large sets that take a long time to compare. They are actually different records but the same set in terms of objects in memory, but the F# record comparison doesn't know that, and doesn't check.
Welcome to the F# community.
I'm guessing that each new record is copying b, although since records are reference types by default, as you say, I'm not sure why that would be.
This approach is no faster:
let y = { BigVal = b; SmallVal = 0 }
mySet |> Set.map (fun x -> { y with SmallVal = x })
Using a class instead is much faster:
type BigType = { Big: Set<int> }
let b = { Big = set [ 0 .. 999999 ] }
type CollectionType(smallVal: int) =
interface System.IComparable with
member __.CompareTo other =
compare __.SmallVal (other :?> CollectionType).SmallVal
member __.BigVal = b
member __.SmallVal = smallVal
let mySet = set [ 0 .. 50 ]
#time
mySet |> Set.map (fun x -> CollectionType(x))
This is not a complete solution, since there is a warning FS0343: The type 'CollectionType' implements 'System.IComparable' explicitly but provides no corresponding override for 'Object.Equals'. An implementation of 'Object.Equals' has been automatically provided, implemented via 'System.IComparable'. Consider implementing the override 'Object.Equals' explicitly.

F# efficiency implications of passing large data structures between functions

How does F# pass data from a caller function to a called function? Does it make a copy of the data before handing it over or does it just pass a pointer? I would think the latter but want to make sure.
On a related note, are there any performance implications of the following 2 F# code styles.
let someFunction e =
1//pretend this is a complicated function
let someOtherFunction e =
2//pretend this is a complicated function
let foo f largeList=
List.map (fun elem -> f elem)
let bar largeList =
largeList
|> foo someFunction
|> foo someOtherFunction
let bar2 largeList =
let foo2 f f2 =
largeList
|> List.map (fun elem -> f elem)
|> List.map (fun elem -> f2 elem)
foo2 someFunction someOtherFunction
Would you expect bar to have a different performance to bar2? If not, are there any situations I should be aware of that would make a difference?
The short answer:
No. The entire list is not copied, just the reference to it is.
The long answer:
In F# (just like in C#) both value and reference types can be passed either by value or by reference.
Both value types and reference types are, by default, passed by value.
In the case of value types (structs) this means that you'll be
passing around a copy of the entire data structure.
In the case of reference types (classes, discriminated unions, records, etc.) this means that the reference is passed by value. This does not mean that the entire data structure is copied, it just means that an int/int64 which references the data structure is copied.
If you're working with mutable data structures, e.g. ResizeArray<'T> (.NET List<'T>) which are classes, passing references by value could have implications. Perhaps the function you've passed it to adds elements to the list, for example? Such an update would apply to the data structure referenced from both locations. Since your question uses the immutable F# List though, you don't have to worry about this!
You can also pass value/reference types by reference, for more detail about that see: https://msdn.microsoft.com/en-us/library/dd233213.aspx#Anchor_4
F# list is implemented as a singly linked list, that means that access the head and prepend operations are O(1). These data structures are also very memory efficient because when you prepend an element to the list you only need to store the new value and a reference to the rest of the list.
So you can see how it works, such a data structure can be implemented like this:
type ExampleList<'T> =
|Empty
|Cons of 'T * List<'T>
Additional Information:
List.map is eagerly evaluated meaning that every time you call it, a new list will be created. If you use Seq.map (F# List implements the IEnumerable<'T> interface), which is lazily evaluated, you can evaluate both map operations in only enumeration of the list.
largeList
|> Seq.map (fun elem -> f elem)
|> Seq.map (fun elem -> f2 elem)
|> List.ofSeq
This is likely to be a lot more efficient for large lists because it involves allocating only one new list of results, rather than two.

Map from discriminated union to enum

Currently, I'm trying to teach myself some F# by making an application that consists of a C# GUI layer and an F# business layer. In the GUI layer, the user will at some point have to make a choice by selecting a value that is part of a simple enum, e.g. selecting either of the following:
enum {One, Two, Three}
I have written a function to translate the enum value to an F# discriminated union
type MyValues =
| One
| Two
| Three
Now I have to translate back, and am already tired of the boilerplate code. Is there a generic way to translate my discriminated union to the corresponding enum, and vice versa?
Cheers,
You can also define the enum in F# and avoid doing conversions altogether:
type MyValues =
| One = 0
| Two = 1
| Three = 2
The = <num> bit tells the F# compiler that it should compile the type as a union. When using the type from C#, this will appear as a completely normal enum. The only danger is that someone from C# can call your code with (MyValues)4, which will compile, but it will cause incomplete pattern match exception if you are using match in F#.
Here are generic DU/enum converters.
open Microsoft.FSharp.Reflection
type Union<'U>() =
static member val Cases =
FSharpType.GetUnionCases(typeof<'U>)
|> Array.sortBy (fun case -> case.Tag)
|> Array.map (fun case -> FSharpValue.MakeUnion(case, [||]) :?> 'U)
let ofEnum e =
let i = LanguagePrimitives.EnumToValue e
Union.Cases.[i - 1]
let toEnum u =
let i = Union.Cases |> Array.findIndex ((=) u)
LanguagePrimitives.EnumOfValue (i + 1)
let du : MyValues = ofEnum ConsoleColor.DarkGreen
let enum : ConsoleColor = toEnum Three
It maps the DU tag to the enum underlying value.

Does an equivalent function in OCaml exist that works the same way as "set!" in Scheme?

I'm trying to make a function that defines a vector that varies based on the function's input, and set! works great for this in Scheme. Is there a functional equivalent for this in OCaml?
I agree with sepp2k that you should expand your question, and give more detailed examples.
Maybe what you need are references.
As a rough approximation, you can see them as variables to which you can assign:
let a = ref 5;;
!a;; (* This evaluates to 5 *)
a := 42;;
!a;; (* This evaluates to 42 *)
Here is a more detailed explanation from http://caml.inria.fr/pub/docs/u3-ocaml/ocaml-core.html:
The language we have described so far is purely functional. That is, several evaluations of the same expression will always produce the same answer. This prevents, for instance, the implementation of a counter whose interface is a single function next : unit -> int that increments the counter and returns its new value. Repeated invocation of this function should return a sequence of consecutive integers — a different answer each time.
Indeed, the counter needs to memorize its state in some particular location, with read/write accesses, but before all, some information must be shared between two calls to next. The solution is to use mutable storage and interact with the store by so-called side effects.
In OCaml, the counter could be defined as follows:
let new_count =
let r = ref 0 in
let next () = r := !r+1; !r in
next;;
Another, maybe more concrete, example of mutable storage is a bank account. In OCaml, record fields can be declared mutable, so that new values can be assigned to them later. Hence, a bank account could be a two-field record, its number, and its balance, where the balance is mutable.
type account = { number : int; mutable balance : float }
let retrieve account requested =
let s = min account.balance requested in
account.balance <- account.balance -. s; s;;
In fact, in OCaml, references are not primitive: they are special cases of mutable records. For instance, one could define:
type 'a ref = { mutable content : 'a }
let ref x = { content = x }
let deref r = r.content
let assign r x = r.content <- x; x
set! in Scheme assigns to a variable. You cannot assign to a variable in OCaml, at all. (So "variables" are not really "variable".) So there is no equivalent.
But OCaml is not a pure functional language. It has mutable data structures. The following things can be assigned to:
Array elements
String elements
Mutable fields of records
Mutable fields of objects
In these situations, the <- syntax is used for assignment.
The ref type mentioned by #jrouquie is a simple, built-in mutable record type that acts as a mutable container of one thing. OCaml also provides ! and := operators for working with refs.

Why are some functions in the Seq module optimized whilst others were not in F#?

This is a follow up to my previous question regarding the Seq module's iter and map functions being much slower compared to the Array and List module equivalents.
Looking at the source, I can see that some functions such as isEmpty and length performs a very simple type check to optimize for arrays and lists before resorting to using IEnumerator.
[<CompiledName("IsEmpty")>]
let isEmpty (source : seq<'T>) =
checkNonNull "source" source
match source with
| :? ('T[]) as a -> a.Length = 0
| :? list<'T> as a -> a.IsEmpty
| :? ICollection<'T> as a -> a.Count = 0
| _ ->
use ie = source.GetEnumerator()
not (ie.MoveNext())
[<CompiledName("Length")>]
let length (source : seq<'T>) =
checkNonNull "source" source
match source with
| :? ('T[]) as a -> a.Length
| :? ('T list) as a -> a.Length
| :? ICollection<'T> as a -> a.Count
| _ ->
use e = source.GetEnumerator()
let mutable state = 0
while e.MoveNext() do
state <- state + 1;
state
In the case of the iter the same approach can be done to vastly improve its performance, when I shadowed the iter function it presented significant gains over the built-in version:
[<CompiledName("Iterate")>]
let iter f (source : seq<'T>) =
checkNonNull "source" source
use e = source.GetEnumerator()
while e.MoveNext() do
f e.Current;
My question is, given that some of the functions in the Seq module were optimized for use with specific collection types (arrays, list< T>, etc.) how come other functions such as iter and nth were not optimized in a similar way?
Also, in the case of map function, as #mausch pointed out, is it not possible to employ a similar approach to Enumerable.Select (see below) and build up specialized iterators for different collection types?
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
if (source == null)
throw Error.ArgumentNull("source");
if (selector == null)
throw Error.ArgumentNull("selector");
if (source is Enumerable.Iterator<TSource>)
return ((Enumerable.Iterator<TSource>) source).Select<TResult>(selector);
if (source is TSource[])
return (IEnumerable<TResult>) new Enumerable.WhereSelectArrayIterator<TSource, TResult>((TSource[]) source, (Func<TSource, bool>) null, selector);
if (source is List<TSource>)
return (IEnumerable<TResult>) new Enumerable.WhereSelectListIterator<TSource, TResult>((List<TSource>) source, (Func<TSource, bool>) null, selector);
else
return (IEnumerable<TResult>) new Enumerable.WhereSelectEnumerableIterator<TSource, TResult>(source, (Func<TSource, bool>) null, selector);
}
Many thanks in advance.
In the case of the iter the same approach can be done to vastly improve its performance
I think this is where the answer to your question is. Your test is artificial, and doesn't actually test any real world examples of these methods. You tested 10,000,000 iterations of these methods in order to get differences in timing in ms.
Converted back to per item costs, here they are:
Array List
Seq.iter 4 ns 7 ns
Seq.map 20 ns 91 ns
These methods are typically used once per collection, meaning this cost is an additional linear factor to your algorithms performance. In the worst case you are losing less than 100 ns per item in a list (which you shouldn't be using if you care about performance that much).
Contrast this with the case of length which is always linear in the general case. By adding this optimization you provide enormous benefit to someone who forgot to manually cache the length but luckily is always given a list.
Similarly you may call isEmpty many times, and adding another object creation is silly if you can just ask directly. (This one isn't as strong an argument)
Another thing to keep in mind is that neither of those methods actually looks at more than one element of the output. What would you expect the following code to do (excluding syntax errors or missing methods)
type Custom() =
interface IEnumerable with
member x.GetEnumerator() =
return seq {
yield 1
yield 2
}
interface IList with
member x.Item with
get(index) = index
member x.Count = 12
let a = Custom()
a |> Seq.iter (v -> printfn (v.ToString()))
On the surface, the type-checks in Seq.length/isEmpty seem like mistakes. I assume most Seq functions don't perform such checks for orthogonality: type-specific versions already exist in the List/Array modules. Why duplicate them?
Those checks make more sense in LINQ since it only uses IEnumerable directly.

Resources