Caching possible values of a function constructed at runtime - performance

I have a data constructor with a few value constructors:
data DataType = C1 | C2 | C3 | ... | Cn
I'd like to build a function at run time from that data type to some other values (in fact, I'm doing this in an IO monad):
buildFun :: IO (DataType -> b)
buildFun = do
....
return $ \x -> case x of
C1 -> someProcessesToGetTheValue C1
...
Cn -> someProcessesToGetTheValue Cn
Will this mean that someProcessesToGetTheValue will be called each time I call the returned function?
I'd prefer Haskell to evaluate someProcessesToGetTheValue inside buildFun (since those calls are quite expensive) and return a function which returns these fully evaluated expressions.
Can I force that behaviour? Perhaps by doing something like the following?:
buildFun :: IO (DataType -> b)
buildFun = do
C1value <- return $ someProcessesToGetTheValue C1
...
Cnvalue <- return $ someProcessesToGetTheValue Cn
return $ \x -> case x of
C1 -> C1value
...
Cn -> Cnvalue

You don't have to involve the IO monad at all (and indeed do { x <- return v; ... } is identical to let x = v in ...), just bind the values outside the lambda:
buildFun :: IO (DataType -> b)
buildFun = do
let v1 = someProcessesToGetTheValue C1
...
return $ \x -> case x of { C1 -> v1; ... }
Haskell doesn't really specify anything about runtime evaluation behaviour, but on all common implementations this will ensure that the results are shared; see What does "floated out" mean? for more information.
However, it still won't evaluate v1…vn inside buildFun; instead, they will each be evaluated the first time the corresponding result of the function you return is evaluated. If you want to force them to be evaluated up-front, you can say let !v1 = someProcessesToGetTheValue C1 (this requires the BangPatterns language extension), or v1 <- evaluate $ someProcessesToGetTheValue C1 (from Control.Exception; this behaves better if someProcessesToGetTheValue C1 might throw an exception).

Instead of a function, why not instead define some data structure, like a list, of all the the results of evaluating this function (indexed by position of constructor in data type)? For example, something like this (not tested):
data DataType = C1 | C2 | C3 | ... | Cn deriving (Enum, Bounded)
cachedValues :: [b]
cachedValues = map someProcessesToGetTheValue ([minBound .. maxBound] :: [DataType])
getCachedValue :: DataType -> b
getCachedValue x = cachedValues !! (fromEnum x)
Since Haskell is lazy, it will store a thunk until it is run for the first time, after which it will remember the value.
(If list traversal over the list of size n is inefficient; you can use an array or Map instead. The idea is the same.)

Related

How to turn applicative validation to return MonadThrow?

It seems to me that an idiomatic way to validate input data in Haskell is via an applicative chain:
mkMyData :: a -> b -> c -> Maybe MyData
mkMyData x y z =
MyData
<$> validateA x
<*> validateB y
<*> validateC z
where the validation functions themselves return Maybe values. To make my smart constructor mkMyData more flexible, I would like it to return MonadThrow. That is,
mkMyData :: MonadThrow m => a -> b -> c -> m MyData
Does this require each of the validation functions to return MonadThrow instead of Maybe? Or is there some way to convert the specific Maybe result of each validation into the more general MonadThrow without breaking up the applicative structure and greatly complicating the code?
Or maybe put differently? Is it worthwhile to strive for the more general MonadThrow return type in basic library functions, at the expense of more complex, less idiomatic code?
The answer to this is the same as your last question. The type you propose for your new validation function,
mkMyData :: MonadThrow m => a -> b -> c -> m MyData
means that it is able to work in any monad at all, so long as that monad has a way to throw things. If the implementation of that function relies on being able to return Nothing or Just results explicitly, then it will not satisfy that condition.
Instead, you must rewrite the functions that currently return Maybe a to rely on MonadThrow instead. For example, instead of
validateA :: a -> Maybe t
validateA x | acceptable x = Just $ convert x
| otherwise = Nothing
you will need to write
validateA :: MonadThrow m => a -> m t
validateA x | acceptable x = pure $ convert x
| otherwise = throwM $ problemWith x
(where all the functions taking x as an argument are made up, needing to be related to your domain somehow).

How can I follow F# Lint's suggestion to use `id`

I am comparing two lists of thangs. Since I'm more familiar with Linq than F#, I did this:
let r1 = (rows1.Zip (rows2, fun r1 r2 -> rowComparer r1 r2)) .All (fun f -> f)
This raises two complaints from the F# linter.
Lint: If `rowComparer` has no mutable arguments partially applied then the lambda can be removed.
Lint: `fun x -> x` might be able to be refactored into `id`.
Of these, I could understand the latter, and tried this:
let r1 = (rows1.Zip (rows2, fun r1 r2 -> rowComparer r1 r2)) .All id
But this made the F# compiler complain:
This expression was expected to have type
'System.Func<bool,bool>'
but here has type
''a -> 'a'
Can someone say how this code can be more righteous?
I would suggest using the F# List or Seq modules instead of LINQ methods. Then you'll be able to use F# types like 'a -> 'a instead of System.Func<'a, 'a>, and you can pass id to the forAll function. If you could post a complete example, it would be easier to give you a complete answer, but I think something like this would be roughly equivalent to what you're doing with LINQ:
let compare (rowComparer: ('a * 'a) -> bool) rows =
Seq.zip rows >> Seq.map rowComparer >> Seq.forall id
This creates a function that takes two sequences and compares each value in the first to the corresponding value in the second, generating a sequence of booleans. It then returns true if all of the values in the sequence are true, otherwise it returns false. This is achieved using function composition and partial application to build a new function with the required signature.
You can then partially apply a row comparer function to create a specialized compare function for each of your scenarios, as follows:
let compareEqual = compare (fun (a,b) -> a = b)
compareEqual [0; 1; 2] [0; 1; 2] // true
compareEqual [0; 1; 2] [2; 1; 2] // false
You can supply the standard function id as an argument if you create an instance of System.Func with the correct number of generic type parameters from it. When employing a lambda expression, the F# compiler does that for you.
open System.Linq
let foo rowComparer (rows1 : seq<_>) (rows2 : seq<_>) =
(rows1.Zip (rows2, fun r1 r2 -> rowComparer r1 r2)).All(System.Func<_,_>(id))
// val foo :
// rowComparer:('a -> 'b -> bool) -> rows1:seq<'a> -> rows2:seq<'b> -> bool

Extending Immutable types (or: fast cache for immutable types) in OCaml

I have a recursive immutable data structure in ocaml which can be simplified to something like this:
type expr =
{
eexpr : expr_expr;
some_other_complex_field : a_complex_type;
}
and expr_expr =
| TInt of int
| TSum of (expr * expr)
| TMul of (expr * expr)
It's an AST, and sometimes it gets pretty complex (it's very deep).
there is a recursive function that evaluates an expression. For example, let's say,
let rec result expr =
match expr.eexpr with
| TInt i -> i
| TSum (e1, e2) -> result e1 + result e2
| TMul (e1, e2) -> result e1 * result e2
Now suppose I am mapping an expression to another expression, and I need to constantly check the result of an expr, sometimes more than once for the same expr, and sometimes for expressions that were recently mapped by using the pattern
{ someExpr with eexpr = TSum(someExpr, otherExpr) }
Now, the result function is very lightweight, but running it many times for a deep AST will not be very optimized. I know I could cache the value using a Hashtbl, but AFAIK the Hashtbl will only do structural equality, so it will need to traverse my long AST anyway.
I know the best option would be to include a probably immutable "result" field in the expr type. But I can't.
So is there any way in Ocaml to cache a value to an immutable type, so I don't have to calculate it eagerly every time I need it ?
Thanks!
Hash-cons the values of expr_expr. By doing this structurally equal values in your program will share exactly the same memory representation and you can substitute structural equality (=) by physical equality (==).
This paper should get you quickly started on hash-consing in OCaml.
You can use the functorial interface to control the kind of equality used by the hash table. I believe the semantics of (==) are legitimate for your purposes; i.e., if A == B then f A = f B for any pure function f. So you can cache the results of f A. Then if you find a B that's physically equal to A, the cached value is correct for B.
The downside of using (==) for hashing is that the hash function will send all structurally equal objects to the same hash bucket, where they will be treated as distinct objects. If you have a lot of structurally equal objects in the table, you get no benefit from the hashing. The behavior degenerates to a linear search.
You can't define the hash function to work with physical addresses, because the physical addresses can be changed at any time by the garbage collector.
However, if you know your table will only contain relatively few large-ish values, using physical equality might work for you.
I think you can merge the two ideas above : use hash-consing-like techniques to get the hash of the "pure expression" part of your data, and use this hash as key in the memoization table for the eval function.
Of course this only works when your eval function indeed only depends on the "pure expression" part of the function, as in the example you gave. I believe that is a relatively general case, at least if you restrict yourself to storing the successful evaluations (that won't, for example, return an error including some location information).
Edit: a small proof of concept:
type 'a _expr =
| Int of int
| Add of 'a * 'a
(* a constructor to avoid needing -rectypes *)
type pure_expr = Pure of pure_expr _expr
type loc = int
type loc_expr = {
loc : loc;
expr : loc_expr _expr;
pure : pure_expr (* or any hash_consing of it for efficiency *)
}
(* this is where you could hash-cons *)
let pure x = Pure x
let int loc n =
{ loc; expr = Int n; pure = pure (Int n) }
let add loc a b =
{ loc; expr = Add (a, b); pure = pure (Add(a.pure, b.pure)) }
let eval =
let cache = Hashtbl.create 251 in
let rec eval term =
(* for debug and checking memoization *)
Printf.printf "log: %d\n" term.loc;
try Hashtbl.find cache term.pure with Not_found ->
let result =
match term.expr with
| Int n -> n
| Add(a, b) -> eval a + eval b in
Hashtbl.add cache term.pure result;
result
in eval
let test = add 3 (int 1 1) (int 2 2)
# eval test;;
log: 3
log: 2
log: 1
- : int = 3
# eval test;;
log: 3
- : int = 3

Sorting abstract datatypes in Haskell

For example I have the following,
type something = (Float, Float, Int, Aa, Bb, Cc, Int)
If I were to desire to find the smallest something in base to their first element (Float) how could I do that? The way I have reasoned it is the following, yet I cant manage to figureout how to implement it
Because I have a list of somethings the easiest way should be to create my own min helper function that compares 2 somethings and returns the smallest of the two. However it is trying to do that "easier way" that got me stuck with type compile errors...
findMin :: something -> something -> somthing
findMin x y = sortBy (compare `on` fst) x y
I am not familiar with sortBy and compare on, I just came across a similar question here in SO but I couldnt manage to make it work. As a beginner in Haskell, is there another way to approaching this?.
If you want to compare based on the first field of a data type, you can let Haskell write the code for you:
data Something = Something Float Float Int String Bool Char Int
deriving (Eq, Ord)
The deriving clause specifies which type classes implementations are automatically generated for the Something type. Here, we derive Eq which allows us to ask whether two Somethings are equal (e.g., with ==), and Ord, which allows us to compare two Somethings and know which one is "greater".
Haskell's default behavior when deriving Ord is to compare each field from first to last, so the default code will start by comparing the first Float of each Something, which is exactly what you want.
Once you're dealing with a type that implements Ord, you can use all sorts of built-in functions like minimum :: Ord a => [a] -> a. This takes a list of any type that implements Ord, and gives back the smallest element. So, as an example:
st1 = Something 3.14 2.72 7 "hello" False 'λ' 42
st2 = Something 3.15 2.72 7 "hello" False 'λ' 42
smallest = minimum [st1,st2]
Using a custom data type is usually the better option, but if you really want to use tuples, you can start by defining a helper function comparingFst that compares based on the first element of the tuple.
import Data.Ord
import Data.List
-- Dummy data types for example purposes. Derive from Show just so
-- that the example can be more easily tested interactively in ghci.
data Aa = Aa deriving Show
data Cc = Cc deriving Show
type Something = (Float, Float, Int, Aa, Cc, Int)
comparingFst :: Something -> Something -> Ordering
comparingFst = comparing fstSomething
where fstSomething (x,_,_,_,_,_) = x
Now you can take the smaller of two elements with:
findMin :: Something -> Something -> Something
findMin x y = case comparingFst x y of
LT -> x
_ -> y
or from a list of elements
findMinimum :: [Something] -> Something
findMinimum = minimumBy comparingFst
And you can also use the same helper function for sorting:
sortSomethings :: [Something] -> [Something]
sortSomethings = sortBy comparingFst
Also, it's worth mentioning that tuples are, by default, compared element-wise, starting from the first element, so assuming your Aa and Bb types can be derived from Ord and Eq, you don't need anything extra, i.e. the example becomes:
import Data.List
data Ab = Ab deriving (Show, Ord, Eq)
data Cc = Cc deriving (Show, Ord, Eq)
type Something = (Float, Float, Int, Ab, Cc, Int)
findMin :: Something -> Something -> Something
findMin x y = min x y
findMinimum :: [Something] -> Something
findMinimum = minimum
sortSomethings :: [Something] -> [Something]
sortSomethings = sort
In other words, you can just use the standard min and sort functions as-is.
You have some syntax errors, firstly.
There are two things you can do. Firstly, following the model of using an accessor function to get at the field you want (fst), we can define labels for the fields of your type:
data Something = Something { field_x, field_y :: Float,
field_z :: Int }
and then sort on field_x
import Data.List
import Data.Function
sortSomethings :: [Something] -> [Something]
sortSomethings = sortBy (compare `on` field_x)
getting at the mimimum is the same as taking the head off the sorted list:
minSomethings :: [Something] -> Something
minSomethings = head . sortSomethings
alternatively, you can write a custom Ord instance for the Something type that compares values only using field_x, then regular sort and minimum (and other Ord-based functions), will "just work".

Haskell mutable map/tree

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.

Resources