I’m making a lens combinator in Ruby and I can’t figure out what the generally accepted name of it is. The unknown function composes two lenses that have the same source type and their target type (using Benjamin C. Pierce’s terminology) is a hash map. The unknown function takes in those two lenses and returns a new lens which has the same source type and target type as the original lenses.
It looks like this (Ruby syntax):
lens_a.get(source)
> {:title => "some title"}
lens_b.get(source)
> {:description => "some description"}
new_lens = unknown_function(lens_a, lens_b)
new_lens.get(source)
> {:title => "some title", :description => "some description"}
A diagram of the combinator I’m trying to build can be seen on slide 18 of this presentation (the slide’s title is “Merge?”).
I’ve looked at Haskell's lens docs (small parts of which I can understand), but I can’t figure out which combinator this is.
What is the standard name for the unknown_function above? If this lens doesn’t have a standard name, are there a few standard functions that can be composed to make it? If not, I’ll probably just call it merge.
I believe that roughly the right idea for your combinator, in Haskell, is
merge :: Lens' s t -> Lens' s t -> Lens' s (t, t)
possibly generalized to
merge :: Lens' s t -> Lens' s t' -> Lens' s (t, t')
so that your two targets may differ in type. We could "implement" this as follows, but doing so will reveal a problem
merge l1 l2 = lens pull push where
pull s = (view l1 s, view l2 s)
push s (t1, t2) = set l2 t2 (set l1 t1 s)
In particular, on the Setter side of this equation we explicitly order the way we put back our values. This means that we can use merge to easily build lenses which violate the lens laws. In particular, they violate the PUT-GET law. Here's a counterexample
newtype Only a = Only a
-- A lens focused on the only slot in Only
only :: Lens' (Only a) a
only inj (Only a) = Only <$> inj a
-- We merge this lens with *itself*
testLens :: Lens' (Only a) (a, a)
testLens = merge only only
-- and now it violates the lens laws
> view testLens (Only 1 & testLens .~ (1,2))
(2,2)
Or, in plain english, if we merge a lens with itself then the Setter side turns into two sequential sets on the same location. This means that if we try to set it with a pair of values which are different only the second set will persist and thus we violate the PUT-GET law.
The lens library tends to eschew invalid lenses so this combinator isn't available. The closest we have is alongside which has the following (restricted) type
alongside :: Lens' s a
-> Lens' s' a'
-> Lens' (s,s') (a,a')
but, as you can see, this ensures that our source is also a product type so that the Setters apply uniquely to each side of the source. If we were to try to write a dup Lens which we could compose with alongside to build merge we'll run into the same problem as before
dup :: Lens' a (a, a) -- there are two possible implementations, neither are lenses
dup1 inj a = fst <$> inj a
dup2 inj a = snd <$> inj a
Now, all of this might be fairly technical and meaningless from a Ruby perspective. In Ruby you're not going to have compiler enforced type safety and so you'll likely blend a lot of the rules together and achieve less strictness. In such a case, it might be meaningful to implement merge with its sequential writes semantics.
In fact, based on the examples you've given, it looks like the counterexample I pointed out cannot even happen in Ruby since your target type is a Hash and ensures unique keys already.
But it's important to note that merge can be used to build improper lenses if you look at it hard enough. Fundamentally what that means is that in more complex scenarios, using merge can easily lead to weird bugs caused by it violating our intuitive sense for what a lens is. This may not be a huge problem for Ruby as complex abstraction is frowned upon in the Ruby community anyway, but that's why we have the laws.
Seems to be like you want to combine two lenses into a traversal. I don't see that combinator anywhere, but I'm new to lens so I'm not sure what it would be called anyway.
Something like:
compoundEye :: Traversable tl => tl (Lens s t a b) -> Traversal s t a b
But, the hoogle results are not promising.
Related
I have a program which traverses an expression tree that does algebra on probability distributions, either sampling or computing the resulting distribution.
I have two implementations computing the distribution: one (computeDistribution) nicely reusable with monad transformers and one (simpleDistribution) where I concretize everything by hand. I would like to not concretize everything by hand, since that would be code duplication between the sampling and computing code.
I also have two data representations:
type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show
When I use the data version with the reusable code, computing the distribution of 20d2 (ghc -O3 program.hs; time ./program 20 > /dev/null) takes about one second, which seems way too long. Pick higher values of n at your own peril.
When I use the hand-concretized code, or I use the newtype representation with either implementation, computing 20d2 (time ./program 20 s > /dev/null) takes the blink of an eye.
Why?
How can I find out why?
My knowledge of how Haskell is executed is almost nil. I gather there's a graph of thunks in basically the same shape as the program, but that's about all I know.
I figure with newtype the representation of Distribution is the same as that of Measure, i.e. it's just a list, whereas with the data version each Distribution is kinda' like a single-field record, except with a pointer to the contained list, and so the data version has to perform more allocations. Is this true? If true, is this enough to explain the performance difference?
I'm new to working with monad transformer stacks. Consider the Let and Uniform cases in simpleDistribution — do they do the same as the walkTree-based implementation? How do I tell?
Here's my program. Note that Uniform n corresponds to rolling an n-sided die (in case the unary-ness was surprising).
Update: based on comments I simplified my program by removing everything not contributing to the performance gap. I made two semantic changes: probabilities are now denormalized and all wonky and wrong, and the simplification step is gone. But the essential shape of my program is still there. (See question edit history for the non-simplified program.)
Update 2: I made further simplifications, reducing Distribution down to the list monad with a small twist, removing everything to do with probabilities, and shortening the names. I still observe large performance differences when using data but not newtype.
import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)
main = do
args <- getArgs
let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
print $ f dieCount
fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()
walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)
data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show
class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform
instance Functor P where fmap f (P pxs) = P $ fmap f pxs
instance Applicative P where
pure x = P [x]
(P pfs) <*> (P pxs) = P $ pfs <*> pxs
instance Monad P where
(P pxs) >>= f = P $ do
x <- pxs
case f x of P fxs -> fxs
How can I find out why?
This is, in general, hard.
The extreme way to do it is to look at the core code (which you can produce by running GHC with -ddump-simpl). This can get complicated really quickly, and it's basically a whole new language to learn. Your program is already big enough that I had trouble learning much from the core dump.
The other way to find out why is to just keep using GHC and asking questions and learning about GHC optimizations until you recognize certain patterns.
Why?
In short, I believe it's due to list fusion.
NOTE: I don't know for sure that this answer is correct, and it would take more time/work to verify than I'm willing to put in right now. That said, it fits the evidence.
First off, we can check whether this slowdown you're seeing is a result of something truly fundamental vs a GHC optimization triggering or not by running in O0, that is, without optimizations. In this mode, both Distribution representations result in about the same (excruciatingly long) runtime. This leads me to believe that it's not the data representation that is inherently the problem but rather there's an optimization that's triggered with the newtype version that isn't with the data version.
When GHC is run in -O1 or higher, it engages certain rewrite rules to fuse different folds and maps of lists together so that it doesn't need to allocate intermediate values. (See https://markkarpov.com/tutorial/ghc-optimization-and-fusion.html#fusion for a decent tutorial on this concept as well as https://stackoverflow.com/a/38910170/14802384 which additionally has a link to a gist with all of the rewrite rules in base.) Since computeDistribution is basically just a bunch of list manipulations (which are all essentially folds), there is the potential for these to fire.
The key is that with the newtype representation of Distribution, the newtype wrapper is erased during compilation, and the list operations are allowed to fuse. However, with the data representation, the wrappers are not erased, and the rewrite rules do not fire.
Therefore, I will make an unsubstantiated claim: If you want your data representation to be as fast as the newtype one, you will need to set up rewrite rules similar to the ones for list folding but that work over the Distribution type. This may involve writing your own special fold functions and then rewriting your Functor/Applicative/Monad instances to use them.
I am currently developing an advanced text editor with some features.
One of the features is a substitution system. It will help user to replace strings quickly.
It has a following format:
((a x) | x) (d e)
We can divide the string into two parts: left ((a x) | x) and right (d e). If there is a letter (x, in current case) after a |, then we then can perform a substitute action - replace all x in the left side with a string on the right side.
So after the action we will receive (a (d e)).
Of course, these parenthesized expressions might be nested : ((a | a) (d x | d)) (e f | f) -> ((e f | f) x) -> (e x).
Unfortunately, the reductions may continue infinitely: (a a | a) (a a | a).
I need to show a warning if user would write a string for which there is no way to reduce it into a form without reductions.
Any suggestions on how to do it?
Congratulations, you have just invented the λ-calculus (pronounced "Lambda-calculus")
Why this problem is hard
Let's use the original notation for this, since it was already invented by Church in the 1930s:
((λx.f) u) is the rewriting of f where all x's have been replaced by u's.
Note that this is equivalent to you notation (f | x) u, where f is a string that can contain x's. It's a mathematical tool introduced to understand what functions are "computable", i.e. can be given to a computer with an adequate program, so that for all input, the computer will run its program and output the correct answer. Spoiler: computable functions are exactly the functions that can be written as λ-terms (i.e. rewritings in your settings).
Sometimes λ-terms can be simplified, as you have noted yourself. For instance
(λx.yx)((λu.uu)(λz.z)i) -> (λx.yx)((λz.z)(λz.z)i) -> (λx.yx)((λz.z)i) -> (λx.yx)i -> yi
Sometimes the sequence of simplifications (also called "evaluation", or "beta-reduction") is infinite. One very famous is the Omega operator (λx.xx)(λx.xx) (rings a bell?).
Ideally, we would like to restrict the λ-calculus so that all terms are "simplifiable" to a final usable form in a finite number of steps. This property is called "normalization". The actual thing we want however is one step further : we want that all sequences of simplifications end up in a finite number of steps in the final form, so that when faced with multiple choices, you can choose either and not get stuck in an infinite loop because of a bad choice. This is called "strong normalization".
Here's the issue : the λ-calculus is not strongly normalizing. This property is simply not true. There are terms that do not end up in a final - "normal" - form.
You have found one yourself.
How this problem has been solved theoretically
The key to gaining the strong normalization property was to rule out λ-terms which did not satisfy this property (print a warning and spit an error in your case) so that we only consider strongly normalizing λ-terms. This "ruling out" was put in place via a typing system : λ-terms that have a valid type are strongly normalizing, and λ-terms that are not strongly normalizing cannot have a valid type. Awesome, now we know which ones should give errors ! Let's check that with simple cases.
From the way you are able to phrase your problem in a very clear way, I'm assuming you already have experience with programming and static type systems, but I can modify this answer if you'd rather have a full explanation. I'll be using Caml-like notation for types so s is a string, s -> s is a function that to a string associates a string, and s -> (s -> s) is a function that to a string associates a function from string to string, etc. I denote x : t when variable x has type t.
λx.ax : s -> s provided a : s -> s
(λx.yx)((λu.uu)(λz.z)i) : s provided y : s -> s, i : s as we have seen by the reductions above
λx.x : s -> s. But watch out, λx.x : (s -> s) -> (s -> s) is also true. Deciding the type is hard
How you may solve this problem programatically
You problem is slightly easier, because you are only dealing with string replacements, so you know that the base type of everything is a string, and you can try to "type you way up" until you are either able to type the whole rewriting (i.e. no errors), or able to prove that the rewriting is not typable (see Omega) and spit an error. Watch out though : just because you are not able to type a rewriting does not mean that it cannot be typed !
In your example (λx.ax)(de), you know the actual values for a, d and e so you may have for instance a : s -> s, d : s -> s, e : s, hence de : s and λx.ax : s -> s so the whole thing has type s and you're good to go. You can probably write a compiler that will try to type the rewriting and figure out if it's typable or not based on a set of cleverly-crafted decision rules for the specific use that you want. You can even decide that if the compiler fails to type a rewriting, then it is rejected (even when it's valid) because cases so intricate that they will fail though being correct should never happen in a reasonable editor text substitution scenario.
Do you want to solve this programatically ?
No. I mean it. You really don't want to.
Remember, λ-terms describe all computable functions.
If you were to really implement a fully correct warning generator as you seem to intend to, this means that one could encode any program as a string substitution in you editor, so your editor is essentially a programming language on its own, which is typically not what you want your editor to be.
I mean, writing a full program that queries the webcam, analyzes the image, guesses who is typing, and loads an ad based on Google's suggestion when opening a file, all as a cleverly-written text substitution command, really ? Is that what you're trying to achieve ?
PS: If this is indeed what you're trying to achieve, have a look at Lisp and Haskell, their syntax should look somewhat... familiar to what you've seen here.
And good luck with your editor !
I'm trying to do some hoopy type-level programming, and it just doesn't work. I'm tearing my hair out trying to figure out why the heck GHC utterly fails to infer the type signatures I want.
Is there some way to make GHC tell me what it's doing?
I tried -ddump-tc, which just prints out the final type signatures. (Yes, they're wrong. Thanks, I already knew that.)
I also tried -ddump-tc-trace, which dumps out ~70KB of unintelligible gibberish. (In particular, I can't see any user-written identifiers mentioned anywhere.)
My code is so close to working, but somehow an extra type variable keeps appearing. For some reason, GHC can't see that this variable should be completely determined. Indeed, if I manually write the five-mile type signature, GHC happily accepts it. So I'm clearly just missing a constraint somewhere... but where?!? >_<
As has been mentioned in the comments, poking around with :kind and :kind! in GHCi is usually how I go about doing it, but it also surprisingly matters where you place the functions, and what looks like it should be the same, isn't always.
For instance, I was trying to make a dependently typed functor equivalent, for a personal project, which looked like
class IFunctor f where
ifmap :: (a -> b) -> f n a -> f n b
and I was writing the instance for
data IEither a n b where
ILeft :: a -> IEither a Z b
IRight :: b -> IEither a (S n) b
It should be fairly simple, I thought, just ignore f for the left case, apply it in the right.
I tried
instance IFunctor (IEither a) where
ifmap _ l#(ILeft _) = l
ifmap f (IRight r) = IRight $ f r
but for the specialized version of ifmap in this case being ifmap :: (b -> c) -> IEither a Z b -> IEither a Z c, Haskell inferred the type of l to be IEither a Z b on the LHS, which, makes sense, but then refused to produce b ~ c.
So, I had to unwrap l, get the value of type a, then rewrap it to get the IEither a Z c.
This isn't just the case with dependent types, but also with rank-n types.
For instance, I was trying to convert isomorphisms of a proper form into natural transformations, which should be fairly easy, I thought.
Apparently, I had to put the deconstructors in a where clause of the function, because otherwise type inference didn't work properly.
There are at least three popular libraries for accessing and manipulating fields of records. The ones I know of are: data-accessor, fclabels and lenses.
Personally I started with data-accessor and I'm using them now. However recently on haskell-cafe there was an opinion of fclabels being superior.
Therefore I'm interested in comparison of those three (and maybe more) libraries.
There are at least 4 libraries that I am aware of providing lenses.
The notion of a lens is that it provides something isomorphic to
data Lens a b = Lens (a -> b) (b -> a -> a)
providing two functions: a getter, and a setter
get (Lens g _) = g
put (Lens _ s) = s
subject to three laws:
First, that if you put something, you can get it back out
get l (put l b a) = b
Second that getting and then setting doesn't change the answer
put l (get l a) a = a
And third, putting twice is the same as putting once, or rather, that the second put wins.
put l b1 (put l b2 a) = put l b1 a
Note, that the type system isn't sufficient to check these laws for you, so you need to ensure them yourself no matter what lens implementation you use.
Many of these libraries also provide a bunch of extra combinators on top, and usually some form of template haskell machinery to automatically generate lenses for the fields of simple record types.
With that in mind, we can turn to the different implementations:
Implementations
fclabels
fclabels is perhaps the most easily reasoned about of the lens libraries, because its a :-> b can be directly translated to the above type. It provides a Category instance for (:->) which is useful as it allows you to compose lenses. It also provides a lawless Point type which generalizes the notion of a lens used here, and some plumbing for dealing with isomorphisms.
One hindrance to the adoption of fclabels is that the main package includes the template-haskell plumbing, so the package is not Haskell 98, and it also requires the (fairly non-controversial) TypeOperators extension.
data-accessor
[Edit: data-accessor is no longer using this representation, but has moved to a form similar to that of data-lens. I'm keeping this commentary, though.]
data-accessor is somewhat more popular than fclabels, in part because it is Haskell 98. However, its choice of internal representation makes me throw up in my mouth a little bit.
The type T it uses to represent a lens is internally defined as
newtype T r a = Cons { decons :: a -> r -> (a, r) }
Consequently, in order to get the value of a lens, you must submit an undefined value for the 'a' argument! This strikes me as an incredibly ugly and ad hoc implementation.
That said, Henning has included the template-haskell plumbing to automatically generate the accessors for you in a separate 'data-accessor-template' package.
It has the benefit of a decently large set of packages that already employ it, being Haskell 98, and providing the all-important Category instance, so if you don't pay attention to how the sausage is made, this package is actually pretty reasonable choice.
lenses
Next, there is the lenses package, which observes that a lens can provide a state monad homomorphism between two state monads, by definining lenses directly as such monad homomorphisms.
If it actually bothered to provide a type for its lenses, they would have a rank-2 type like:
newtype Lens s t = Lens (forall a. State t a -> State s a)
As a result, I rather don't like this approach, as it needlessly yanks you out of Haskell 98 (if you want a type to provide to your lenses in the abstract) and deprives you of the Category instance for lenses, which would let you compose them with .. The implementation also requires multi-parameter type classes.
Note, all of the other lens libraries mentioned here provide some combinator or can be used to provide this same state focalization effect, so nothing is gained by encoding your lens directly in this fashion.
Furthermore, the side-conditions stated at the start don't really have a nice expression in this form. As with 'fclabels' this does provide template-haskell method for automatically generating lenses for a record type directly in the main package.
Because of the lack of Category instance, the baroque encoding, and the requirement of template-haskell in the main package, this is my least favorite implementation.
data-lens
[Edit: As of 1.8.0, these have moved from the comonad-transformers package to data-lens]
My data-lens package provides lenses in terms of the Store comonad.
newtype Lens a b = Lens (a -> Store b a)
where
data Store b a = Store (b -> a) b
Expanded this is equivalent to
newtype Lens a b = Lens (a -> (b, b -> a))
You can view this as factoring out the common argument from the getter and the setter to return a pair consisting of the result of retrieving the element, and a setter to put a new value back in. This offers the computational benefit that the 'setter' here can recycle some of the work used to get the value out, making for a more efficient 'modify' operation than in the fclabels definition, especially when accessors are chained.
There is also a nice theoretical justification for this representation, because the subset of 'Lens' values that satisfy the 3 laws stated in the beginning of this response are precisely those lenses for which the wrapped function is a 'comonad coalgebra' for the store comonad. This transforms 3 hairy laws for a lens l down to 2 nicely pointfree equivalents:
extract . l = id
duplicate . l = fmap l . l
This approach was first noted and described in Russell O'Connor's Functor is to Lens as Applicative is to Biplate: Introducing Multiplate and was blogged about based on a preprint by Jeremy Gibbons.
It also includes a number of combinators for working with lenses strictly and some stock lenses for containers, such as Data.Map.
So the lenses in data-lens form a Category (unlike the lenses package), are Haskell 98 (unlike fclabels/lenses), are sane (unlike the back end of data-accessor) and provide a slightly more efficient implementation, data-lens-fd provides the functionality for working with MonadState for those willing to step outside of Haskell 98, and the template-haskell machinery is now available via data-lens-template.
Update 6/28/2012: Other Lens Implementation Strategies
Isomorphism Lenses
There are two other lens encodings worth considering. The first gives a nice theoretical way to view a lens as a way to break a structure into the value of the field, and 'everything else'.
Given a type for isomorphisms
data Iso a b = Iso { hither :: a -> b, yon :: b -> a }
such that valid members satisfy hither . yon = id, and yon . hither = id
We can represent a lens with:
data Lens a b = forall c. Lens (Iso a (b,c))
These are primarily useful as a way to think about the meaning of lenses, and we can use them as a reasoning tool to explain other lenses.
van Laarhoven Lenses
We can model lenses such that they can be composed with (.) and id, even without a Category instance by using
type Lens a b = forall f. Functor f => (b -> f b) -> a -> f a
as the type for our lenses.
Then defining a lens is as easy as:
_2 f (a,b) = (,) a <$> f b
and you can validate for yourself that function composition is lens composition.
I've recently written on how you can further generalize van Laarhoven lenses to get lens families that can change the types of fields, just by generalizing this signature to
type LensFamily a b c d = forall f. Functor f => (c -> f d) -> a -> f b
This does have the unfortunate consequence that the best way to talk about lenses is to use rank 2 polymorphism, but you don't need to use that signature directly when defining lenses.
The Lens I defined above for _2 is actually a LensFamily.
_2 :: Functor f => (a -> f b) -> (c,a) -> f (c, b)
I've written a library that includes lenses, lens families, and other generalizations including getters, setters, folds and traversals. It is available on hackage as the lens package.
Again, a big advantage of this approach is that library maintainers can actually create lenses in this style in your libraries without incurring any lens library dependency whatsoever, by just supplying functions with type Functor f => (b -> f b) -> a -> f a, for their particular types 'a' and 'b'. This greatly lowers the cost of adoption.
Since you don't need to actually use the package to define new lenses, it takes a lot of pressure off my earlier concerns about keeping the library Haskell 98.
I have a function that takes a parameter and produces a result. Unfortunately, it takes quite long for the function to produce the result. The function is being called quite often with the same input, that's why it would be convenient if I could cache the results. Something like
let cachedFunction = createCache slowFunction
in (cachedFunction 3.1) + (cachedFunction 4.2) + (cachedFunction 3.1)
I was looking into Data.Array and although the array is lazy, I need to initialize it with a list of pairs (using listArray) - which is impractical . If the 'key' is e.g. the 'Double' type, I cannot initialize it at all, and even if I can theoretically assign an Integer to every possible input, I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array (or, preferably a hash table, as only a handful of resutls will be used) using a function instead of a list.
Update: I am reading the memoization articles and as far as I understand it the MemoTrie could work the way I want. Maybe. Could somebody try to produce the 'cachedFunction'? Prefereably for a slow function that takes 2 Double arguments? Or, alternatively, that takes one Int argument in a domain of ~ [0..1 billion] that wouldn't eat all memory?
Well, there's Data.HashTable. Hash tables don't tend to play nicely with immutable data and referential transparency, though, so I don't think it sees a lot of use.
For a small number of values, stashing them in a search tree (such as Data.Map) would probably be fast enough. If you can put up with doing some mangling of your Doubles, a more robust solution would be to use a trie-like structure, such as Data.IntMap; these have lookup times proportional primarily to key length, and roughly constant in collection size. If Int is too limiting, you can dig around on Hackage to find trie libraries that are more flexible in the type of key used.
As for how to cache the results, I think what you want is usually called "memoization". If you want to compute and memoize results on demand, the gist of the technique is to define an indexed data structure containing all possible results, in such a way that when you ask for a specific result it forces only the computations needed to get the answer you want. Common examples usually involve indexing into a list, but the same principle should apply for any non-strict data structure. As a rule of thumb, non-function values (including infinite recursive data structures) will often be cached by the runtime, but not function results, so the trick is to wrap all of your computations inside a top-level definition that doesn't depend on any arguments.
Edit: MemoTrie example ahoy!
This is a quick and dirty proof of concept; better approaches may exist.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)
mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode
unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral
instance HasTrie Double where
data Double :->: a = DoubleTrie ([Int] :->: a)
trie f = DoubleTrie $ trie $ f . unmangle
untrie (DoubleTrie t) = untrie t . mangle
slow x
| x < 1 = 1
| otherwise = slow (x / 2) + slow (x / 3)
memoSlow :: Double -> Integer
memoSlow = memo slow
Do note the GHC extensions used by the MemoTrie package; hopefully that isn't a problem. Load it up in GHCi and try calling slow vs. memoSlow with something like (10^6) or (10^7) to see it in action.
Generalizing this to functions taking multiple arguments or whatnot should be fairly straightforward. For further details on using MemoTrie, you might find this blog post by its author helpful.
See memoization
There are a number of tools in GHC's runtime system explicitly to support memoization.
Unfortunately, memoization isn't really a one-size fits all affair, so there are several different approaches that we need to support in order to cope with different user needs.
You may find the original 1999 writeup useful as it includes several implementations as examples:
Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell by Simon Peyton Jones, Simon Marlow, and Conal Elliott
I will add my own solution, which seems to be quite slow as well. First parameter is a function that returns Int32 - which is unique identifier of the parameter. If you want to uniquely identify it by different means (e.g. by 'id'), you have to change the second parameter in H.new to a different hash function. I will try to find out how to use Data.Map and test if I get faster results.
import qualified Data.HashTable as H
import Data.Int
import System.IO.Unsafe
cache :: (a -> Int32) -> (a -> b) -> (a -> b)
cache ident f = unsafePerformIO $ createfunc
where
createfunc = do
storage <- H.new (==) id
return (doit storage)
doit storage = unsafePerformIO . comp
where
comp x = do
look <- H.lookup storage (ident x)
case look of
Just res -> return res
Nothing -> do
result <- return (f x)
H.insert storage (ident x) result
return result
You can write the slow function as a higher order function, returning a function itself. Thus you can do all the preprocessing inside the slow function and the part that is different in each computation in the returned (hopefully fast) function. An example could look like this:
(SML code, but the idea should be clear)
fun computeComplicatedThing (x:float) (y:float) = (* ... some very complicated computation *)
fun computeComplicatedThingFast = computeComplicatedThing 3.14 (* provide x, do computation that needs only x *)
val result1 = computeComplicatedThingFast 2.71 (* provide y, do computation that needs x and y *)
val result2 = computeComplicatedThingFast 2.81
val result3 = computeComplicatedThingFast 2.91
I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array ... using a function instead of a list.
I'd go with listArray (start, end) (map func [start..end])
func doesn't really get called above. Haskell is lazy and creates thunks which will be evaluated when the value is actually required.
When using a normal array you always need to initialize its values. So the work required for creating these thunks is necessary anyhow.
Several tens of thousands is far from a lot. If you'd have trillions then I would suggest to use a hash table yada yada
I don't know haskell specifically, but how about keeping existing answers in some hashed datastructure (might be called a dictionary, or hashmap)? You can wrap your slow function in another function that first check the map and only calls the slow function if it hasn't found an answer.
You could make it fancy by limiting the size of the map to a certain size and when it reaches that, throwing out the least recently used entry. For this you would additionally need to keep a map of key-to-timestamp mappings.