Haskell Range Map library - data-structures

Is there a Haskell library that allows me to have a Map from ranges to values? (Preferable somewhat efficient.)
let myRangeMap = RangeMap [(range 1 3, "foo"),(range 2 7, "bar"),(range 9 12, "baz")]
in rangeValues 2
==> ["foo","bar"]

I've written a library to search in overlapping intervals because the existing ones did not fit my needs. I think it may have a more approachable interface than for example SegmentTree:
It's also available on Hackage: https://hackage.haskell.org/package/IntervalMap

This task is called a stabbing query on a set of intervals. An efficient data structure for it is called (one-dimensional) segment tree.
The SegmentTree package provides an implementation of this data structure, but unfortunately I cannot figure out how to use it. (I feel that the interface of this package does not provide the right level of abstraction.)

Perhaps the rangemin library does what you want?
Good old Data.Map (and its more efficient Data.IntMap cousin) has a function
splitLookup :: Ord k => k -> Map k a -> (Map k a, Maybe a, Map k a)
which splits a map into submaps of keys less than / greater than a given key. This can be used for certain kinds of range searching.


How can I specialize low-level functions for performance while keeping high-level functions polymorphic?

I extracted the following minimal example from my production project. My machine learning project is made up of a linear algebra library, a deep learning library, and an application.
The linear algebra library contains a module for matrices based on storable vectors:
module Matrix where
import Data.Vector.Storable hiding (sum)
data Matrix a = Matrix { rows :: Int, cols :: Int, items :: Vector a } deriving (Eq, Show, Read)
item :: Storable a => Int -> Int -> Matrix a -> a
item i j m = unsafeIndex (items m) $ i * cols m + j
multiply :: Storable a => Num a => Matrix a -> Matrix a -> Matrix a
multiply a b = Matrix (rows a) (cols b) $ generate (rows a * cols b) (f . flip divMod (cols b)) where
f (i, j) = sum $ (\ k -> item i k a * item k j b) <$> [0 .. cols a - 1]
The deep learning library uses the linear algebra library to implement the forward pass through a deep neural network:
module Deep where
import Foreign.Storable
import Matrix
transform :: Storable a => Num a => [Matrix a] -> Matrix a -> Matrix a
transform layers batch = foldr multiply batch layers
And finally the application uses the deep learning library:
import qualified Data.Vector.Storable as VS
import Test.Tasty.Bench
import Matrix
import Deep
main :: IO ()
main = defaultMain [bmultiply] where
bmultiply = bench "bmultiply" $ nf (items . transform layers) batch where
m k l c = Matrix k l $ VS.replicate (k * l) c :: Matrix Double
layers = m 256 256 <$> [0.1, 0.2, 0.3]
batch = m 256 100 0.4
I like the fact that the deep learning library and with some exceptions related to BLAS via FFI also the linear algebra library do not have to worry about concrete types like Float or Double. Unfortunately, this also means that unless specialization takes place, they use boxed values and performance is about 60x worse than it could be (959 ms instead of 16.7 ms).
The only way I have found to get good performance is to force either inlining or specialization throughout the entire call hierarchy via compiler pragmas. This is very annoying because the performance issue that fundamentally should be specific to the multiply function now "infects" the entire code base. Even very high-level functions using multiply via 5 levels of indirection and several intermediate libraries somehow have to "know" about technical specialization issues deep down.
In my actual production code, many more functions are affected than in this minimal example. Forgetting to annotate just a single one of these functions with the right compiler pragma immediately destroys the performance. Additionally, when developing a library, I have no way of knowing which types it will be used with, so specialization pragmas are not an option anyways.
This is particularly unfortunate because all the performance-critical tight loops are wholly contained within the multiply function. The function itself is only called a handful of times and it would not hurt performance if values were only unboxed dynamically whenever multiply is called. In the end, there is really no need for values to be specialized and unboxed inside the high-level machine learning functions. I feel like there should be a way to pass the request for specialization through to the low-level functions while keeping high- and intermediate-level functions polymorphic.
How is this problem typically solved in Haskell? If I develop a library that uses the vector package to generate blazingly-fast code in tight loops, how do I pass that performance on to users of my library without losing all polymorphism or forcing everything to be inlined?
Is there a way to pay the price for polymorphism (in the form of boxing) only within the high-level functions and specialize and unbox only at the boundary to the functions that need it, rather than having specialization "infect" the entire call hierarchy?
If you browse the source for, say, the vector package, you'll find that nearly every function has an INLINABLE or INLINE pragma, whether the function is part of the low-level, performance critical core or part of a high-level generic interface. You'll see something similar if you look at lens or hmatrix, etc.
So, the short answer is: no, the only way to get good performance with your current design will be to infect the entire call hierarchy with pragmas. The best way to avoid missing a pragma and tanking performance will be to have an exhaustive set of benchmarks that can detect performance regressions.
There are a few compiler flags that might be helpful. The flag -fexpose-all-unfoldings makes sure that inlinable versions of all functions find their way into the interface files, while the flag -fspecialise-aggressively looks for any opportunity to specialize those functions. Together, they are kind of like turning on INLINE for every function. This probably isn't a good permanent solution, but it might be useful during development or as a sanity check to get some baseline performance numbers.

Why does refactoring data to newtype speed up my haskell program?

I have a program which traverses an expression tree that does algebra on probability distributions, either sampling or computing the resulting distribution.
I have two implementations computing the distribution: one (computeDistribution) nicely reusable with monad transformers and one (simpleDistribution) where I concretize everything by hand. I would like to not concretize everything by hand, since that would be code duplication between the sampling and computing code.
I also have two data representations:
type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show
When I use the data version with the reusable code, computing the distribution of 20d2 (ghc -O3 program.hs; time ./program 20 > /dev/null) takes about one second, which seems way too long. Pick higher values of n at your own peril.
When I use the hand-concretized code, or I use the newtype representation with either implementation, computing 20d2 (time ./program 20 s > /dev/null) takes the blink of an eye.
How can I find out why?
My knowledge of how Haskell is executed is almost nil. I gather there's a graph of thunks in basically the same shape as the program, but that's about all I know.
I figure with newtype the representation of Distribution is the same as that of Measure, i.e. it's just a list, whereas with the data version each Distribution is kinda' like a single-field record, except with a pointer to the contained list, and so the data version has to perform more allocations. Is this true? If true, is this enough to explain the performance difference?
I'm new to working with monad transformer stacks. Consider the Let and Uniform cases in simpleDistribution — do they do the same as the walkTree-based implementation? How do I tell?
Here's my program. Note that Uniform n corresponds to rolling an n-sided die (in case the unary-ness was surprising).
Update: based on comments I simplified my program by removing everything not contributing to the performance gap. I made two semantic changes: probabilities are now denormalized and all wonky and wrong, and the simplification step is gone. But the essential shape of my program is still there. (See question edit history for the non-simplified program.)
Update 2: I made further simplifications, reducing Distribution down to the list monad with a small twist, removing everything to do with probabilities, and shortening the names. I still observe large performance differences when using data but not newtype.
import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)
main = do
args <- getArgs
let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
print $ f dieCount
fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()
walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)
data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show
class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform
instance Functor P where fmap f (P pxs) = P $ fmap f pxs
instance Applicative P where
pure x = P [x]
(P pfs) <*> (P pxs) = P $ pfs <*> pxs
instance Monad P where
(P pxs) >>= f = P $ do
x <- pxs
case f x of P fxs -> fxs
How can I find out why?
This is, in general, hard.
The extreme way to do it is to look at the core code (which you can produce by running GHC with -ddump-simpl). This can get complicated really quickly, and it's basically a whole new language to learn. Your program is already big enough that I had trouble learning much from the core dump.
The other way to find out why is to just keep using GHC and asking questions and learning about GHC optimizations until you recognize certain patterns.
In short, I believe it's due to list fusion.
NOTE: I don't know for sure that this answer is correct, and it would take more time/work to verify than I'm willing to put in right now. That said, it fits the evidence.
First off, we can check whether this slowdown you're seeing is a result of something truly fundamental vs a GHC optimization triggering or not by running in O0, that is, without optimizations. In this mode, both Distribution representations result in about the same (excruciatingly long) runtime. This leads me to believe that it's not the data representation that is inherently the problem but rather there's an optimization that's triggered with the newtype version that isn't with the data version.
When GHC is run in -O1 or higher, it engages certain rewrite rules to fuse different folds and maps of lists together so that it doesn't need to allocate intermediate values. (See https://markkarpov.com/tutorial/ghc-optimization-and-fusion.html#fusion for a decent tutorial on this concept as well as https://stackoverflow.com/a/38910170/14802384 which additionally has a link to a gist with all of the rewrite rules in base.) Since computeDistribution is basically just a bunch of list manipulations (which are all essentially folds), there is the potential for these to fire.
The key is that with the newtype representation of Distribution, the newtype wrapper is erased during compilation, and the list operations are allowed to fuse. However, with the data representation, the wrappers are not erased, and the rewrite rules do not fire.
Therefore, I will make an unsubstantiated claim: If you want your data representation to be as fast as the newtype one, you will need to set up rewrite rules similar to the ones for list folding but that work over the Distribution type. This may involve writing your own special fold functions and then rewriting your Functor/Applicative/Monad instances to use them.

How to access CoordinateMatrix entries directly in Spark?

I want to store a big sparse matrix using Spark,
so I tried to use CoordinateMatrix, since it is a distributed matrix.
However, I have not found a way to access each entry directly such as this way:
apply(int x, int y)
I only found the functions like:
public RDD<MatrixEntry> entries()
In this case, I have to loop over the entries to find out the one I want, which is not efficient way.
Has anyone used CoordinateMatrix before ?
What should I do to get each entry from CoordinateMatrix efficiently?
Short answer is you don't. RDDs, and CoordinateMatrix is more or less a wrapper around the RDD[MatrixEntry], are not well suited for random access. Moreover RDDs are immutable so you cannot simply modify a single entry. If it is your requirement you're probably looking at the wrong technology.
There is some limited support for random access if you use PairRDD. If such a RDD is partitioned you can use lookup method to efficiently recover a single value:
val n = ??? // Number of partitions
val pairs = mat.
map{case MatrixEntry(i, j, v) => ((i, j), v)}.
partitionBy(new HashPartitioner(n))
pairs.lookup((1, 1))

implementing a basic search engine with prefix tree

The problem is the implementing a prefix tree (Trie) in functional language without using any storage and iterative method.
I am trying to solve this problem. How should I approach this problem ? Can you give me exact algorithm or link which shows already implemented one in any functional language?
Why I am trying to do => creating a simple search engine with an feature of
adding word to tree
searching a word in tree
deleting a word in tree
Why I want to use functional language => I want improve my problem-solving ability a bit further.
NOTE : Since it is my hobby project, I will first implement basic features.
i.) What I mean about "without using storage" => I don't want use variable storage ( ex int a ), reference to a variable, array . I want calculate the result by recursively then showing result to the screen.
ii.) I have wrote some line but then I have erased because what I wrote is made me angry. Sorry for not showing my effort.
Take a look at haskell's Data.IntMap. It is purely functional implementation of
Patricia trie and it's source is quite readable.
bytestring-trie package extends this approach to ByteStrings
There is accompanying paper Fast Mergeable Integer Maps which is also readable and through. It describes implementation step-by-step: from binary tries to big-endian patricia trees.
Here is little extract from the paper.
At its simplest, a binary trie is a complete binary tree of depth
equal to the number of bits in the keys, where each leaf is either
empty, indicating that the corresponding key is unbound, or full, in
which case it contains the data to which the corresponding key is
bound. This style of trie might be represented in Standard ML as
datatype 'a Dict =
| Lf of 'a
| Br of 'a Dict * 'a Dict
To lookup a value in a binary trie, we simply read the bits of the
key, going left or right as directed, until we reach a leaf.
fun lookup (k, Empty) = NONE
| lookup (k, Lf x) = SOME x
| lookup (k, Br (t0,t1)) =
if even k then lookup (k div 2, t0)
else lookup (k div 2, t1)
The key point in immutable data structure implementations is sharing of both data and structure. To update an object you should create new version of it with the most possible number of shared nodes. Concretely for tries following approach may be used.
Consider such a trie (from Wikipedia):
Imagine that you haven't added word "inn" yet, but you already have word "in". To add "inn" you have to create new instance of the whole trie with "inn" added. However, you are not forced to copy the whole thing - you can create only new instance of the root node (this without label) and the right banch. New root node will point to new right banch, but to old other branches, so with each update most of the structure is shared with the previous state.
However, your keys may be quite long, so recreating the whole branch each time is still both time and space consuming. To lessen this effect, you may share structure inside one node too. Normally each node is a vector or map of all possible outcomes (e.g. in a picture node with label "te" has 3 outcomes - "a", "d" and "n"). There are plenty of implementations for immutable maps (Scala, Clojure, see their repositories for more examples) and Clojure also has excellent implementation of an immutable vector (which is actually a tree).
All operations on creating, updating and searching resulting tries may be implemented recursively without any mutable state.

Haskell caching results of a function

I have a function that takes a parameter and produces a result. Unfortunately, it takes quite long for the function to produce the result. The function is being called quite often with the same input, that's why it would be convenient if I could cache the results. Something like
let cachedFunction = createCache slowFunction
in (cachedFunction 3.1) + (cachedFunction 4.2) + (cachedFunction 3.1)
I was looking into Data.Array and although the array is lazy, I need to initialize it with a list of pairs (using listArray) - which is impractical . If the 'key' is e.g. the 'Double' type, I cannot initialize it at all, and even if I can theoretically assign an Integer to every possible input, I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array (or, preferably a hash table, as only a handful of resutls will be used) using a function instead of a list.
Update: I am reading the memoization articles and as far as I understand it the MemoTrie could work the way I want. Maybe. Could somebody try to produce the 'cachedFunction'? Prefereably for a slow function that takes 2 Double arguments? Or, alternatively, that takes one Int argument in a domain of ~ [0..1 billion] that wouldn't eat all memory?
Well, there's Data.HashTable. Hash tables don't tend to play nicely with immutable data and referential transparency, though, so I don't think it sees a lot of use.
For a small number of values, stashing them in a search tree (such as Data.Map) would probably be fast enough. If you can put up with doing some mangling of your Doubles, a more robust solution would be to use a trie-like structure, such as Data.IntMap; these have lookup times proportional primarily to key length, and roughly constant in collection size. If Int is too limiting, you can dig around on Hackage to find trie libraries that are more flexible in the type of key used.
As for how to cache the results, I think what you want is usually called "memoization". If you want to compute and memoize results on demand, the gist of the technique is to define an indexed data structure containing all possible results, in such a way that when you ask for a specific result it forces only the computations needed to get the answer you want. Common examples usually involve indexing into a list, but the same principle should apply for any non-strict data structure. As a rule of thumb, non-function values (including infinite recursive data structures) will often be cached by the runtime, but not function results, so the trick is to wrap all of your computations inside a top-level definition that doesn't depend on any arguments.
Edit: MemoTrie example ahoy!
This is a quick and dirty proof of concept; better approaches may exist.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)
mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode
unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral
instance HasTrie Double where
data Double :->: a = DoubleTrie ([Int] :->: a)
trie f = DoubleTrie $ trie $ f . unmangle
untrie (DoubleTrie t) = untrie t . mangle
slow x
| x < 1 = 1
| otherwise = slow (x / 2) + slow (x / 3)
memoSlow :: Double -> Integer
memoSlow = memo slow
Do note the GHC extensions used by the MemoTrie package; hopefully that isn't a problem. Load it up in GHCi and try calling slow vs. memoSlow with something like (10^6) or (10^7) to see it in action.
Generalizing this to functions taking multiple arguments or whatnot should be fairly straightforward. For further details on using MemoTrie, you might find this blog post by its author helpful.
See memoization
There are a number of tools in GHC's runtime system explicitly to support memoization.
Unfortunately, memoization isn't really a one-size fits all affair, so there are several different approaches that we need to support in order to cope with different user needs.
You may find the original 1999 writeup useful as it includes several implementations as examples:
Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell by Simon Peyton Jones, Simon Marlow, and Conal Elliott
I will add my own solution, which seems to be quite slow as well. First parameter is a function that returns Int32 - which is unique identifier of the parameter. If you want to uniquely identify it by different means (e.g. by 'id'), you have to change the second parameter in H.new to a different hash function. I will try to find out how to use Data.Map and test if I get faster results.
import qualified Data.HashTable as H
import Data.Int
import System.IO.Unsafe
cache :: (a -> Int32) -> (a -> b) -> (a -> b)
cache ident f = unsafePerformIO $ createfunc
createfunc = do
storage <- H.new (==) id
return (doit storage)
doit storage = unsafePerformIO . comp
comp x = do
look <- H.lookup storage (ident x)
case look of
Just res -> return res
Nothing -> do
result <- return (f x)
H.insert storage (ident x) result
return result
You can write the slow function as a higher order function, returning a function itself. Thus you can do all the preprocessing inside the slow function and the part that is different in each computation in the returned (hopefully fast) function. An example could look like this:
(SML code, but the idea should be clear)
fun computeComplicatedThing (x:float) (y:float) = (* ... some very complicated computation *)
fun computeComplicatedThingFast = computeComplicatedThing 3.14 (* provide x, do computation that needs only x *)
val result1 = computeComplicatedThingFast 2.71 (* provide y, do computation that needs x and y *)
val result2 = computeComplicatedThingFast 2.81
val result3 = computeComplicatedThingFast 2.91
I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array ... using a function instead of a list.
I'd go with listArray (start, end) (map func [start..end])
func doesn't really get called above. Haskell is lazy and creates thunks which will be evaluated when the value is actually required.
When using a normal array you always need to initialize its values. So the work required for creating these thunks is necessary anyhow.
Several tens of thousands is far from a lot. If you'd have trillions then I would suggest to use a hash table yada yada
I don't know haskell specifically, but how about keeping existing answers in some hashed datastructure (might be called a dictionary, or hashmap)? You can wrap your slow function in another function that first check the map and only calls the slow function if it hasn't found an answer.
You could make it fancy by limiting the size of the map to a certain size and when it reaches that, throwing out the least recently used entry. For this you would additionally need to keep a map of key-to-timestamp mappings.
