Haskell and Conditional Data Structures - data-structures

Is it possible to write something like:
data SomeData = SomeValue | (Integral a) => SomeConstructor a
And how exactly would one write this?

This is similar to Daniel Pratt's answer, but the more typical approach is to leave off the type constraints on your data definition, like this:
data SomeData a = SomeValue
| SomeConstructor a
Instead, you should put the (Integral a) constraint on any functions that require it, which you'd have to do even if you added the constraint to the data definition too. Putting the constraint on the data definition buys you nothing, but forces you to carry around the constraint on all uses of SomeData, even those that don't care what a is at all. See Chapter 10 of Real World Haskell for more.

For example, using GADTs:
{-# LANGUAGE GADTs #-}
data SomeData
where
SomeValue :: SomeData
SomeConstructor :: Integral a => a -> SomeData
Example of usage:
*Main> :t SomeValue
SomeValue :: SomeData
*Main> :t SomeConstructor 15
SomeConstructor 15 :: SomeData
*Main> :t SomeConstructor "aaa"
<interactive>:1:0:
No instance for (Integral [Char])
arising from a use of `SomeConstructor' at <interactive>:1:0-20
Possible fix: add an instance declaration for (Integral [Char])
In the expression: SomeConstructor "aaa"
*Main> let x = SomeConstructor 15 in case x of { SomeConstructor p -> fromIntegral p :: Int }
15

Yep, exactly as you want, but need to mention quantification:
{-# LANGUAGE ExistentialQuantification #-}
data SomeData = SomeValue
| forall a . Integral a => SomeConstructor a

You can do something like this:
data Integral a => SomeData a =
SomeValue
| SomeConstructor a

Related

How to turn applicative validation to return MonadThrow?

It seems to me that an idiomatic way to validate input data in Haskell is via an applicative chain:
mkMyData :: a -> b -> c -> Maybe MyData
mkMyData x y z =
MyData
<$> validateA x
<*> validateB y
<*> validateC z
where the validation functions themselves return Maybe values. To make my smart constructor mkMyData more flexible, I would like it to return MonadThrow. That is,
mkMyData :: MonadThrow m => a -> b -> c -> m MyData
Does this require each of the validation functions to return MonadThrow instead of Maybe? Or is there some way to convert the specific Maybe result of each validation into the more general MonadThrow without breaking up the applicative structure and greatly complicating the code?
Or maybe put differently? Is it worthwhile to strive for the more general MonadThrow return type in basic library functions, at the expense of more complex, less idiomatic code?
The answer to this is the same as your last question. The type you propose for your new validation function,
mkMyData :: MonadThrow m => a -> b -> c -> m MyData
means that it is able to work in any monad at all, so long as that monad has a way to throw things. If the implementation of that function relies on being able to return Nothing or Just results explicitly, then it will not satisfy that condition.
Instead, you must rewrite the functions that currently return Maybe a to rely on MonadThrow instead. For example, instead of
validateA :: a -> Maybe t
validateA x | acceptable x = Just $ convert x
| otherwise = Nothing
you will need to write
validateA :: MonadThrow m => a -> m t
validateA x | acceptable x = pure $ convert x
| otherwise = throwM $ problemWith x
(where all the functions taking x as an argument are made up, needing to be related to your domain somehow).

DRY type annotation for QuickCheck properties

With QuickCheck, one can write parametrically polymorphic properties, like this:
associativityLaw :: (Eq a, Show a, Semigroup a) => a -> a -> a -> Property
associativityLaw x y z = (x <> y) <> z === x <> (y <> z)
This is just an example, as my actual properties are more complex, but it illustrates the problem well enough. This property verifies that for a type a, the <> operator is associative.
Imagine that I'd like to exercise this property for more than one type. I could define my test list like this:
tests =
[
testGroup "Monoid laws" [
testProperty "Associativity law, [Int]" (associativityLaw :: [Int] -> [Int] -> [Int] -> Property),
testProperty "Associativity law, Sum Int" (associativityLaw :: Sum Int -> Sum Int -> Sum Int -> Property)
]
]
This works, but feels unnecessarily verbose. I'd like to be able to simply state that for a given property, a should be [Int], or a should be Sum Int.
Something like this hypothetical syntax:
testProperty "Associativity law, [Int]" (associativityLaw :: a = [Int]),
testProperty "Associativity law, Sum Int" (associativityLaw :: a = Sum Int)
Is there a way to do this, perhaps with a GHC language extension?
My actual problem involves higher-kinded types, and I'd like to be able to state that e.g. f a is [Int], or f a is Maybe String.
I'm aware of this answer, but both options (Proxy and Tagged) seem, as described there, at least, too awkward to really address the issue.
You can use TypeApplications to bind type variables like this:
{-# LANGUAGE TypeApplications #-}
associativityLaw #[Int]
In the case you mentioned where you have a higher kinded type and you want to bind f a to [Int], you have to bind the type variables f and a separately:
fmap #[] #Int
For functions with more than one type variable, you can apply the args in order:
f :: a -> b -> Int
-- bind both type vars
f #Int #String
-- bind just the first type var, and let GHC infer the second one
f #Int
-- bind just the second type var, and let GHC infer the first one
f #_ #String
Sometimes the "order" of the type variables may not be obvious, but you can use :type +v and ask GHCi for more info:
λ> :t +v traverse
traverse
:: Traversable t =>
forall (f :: * -> *) a b.
Applicative f =>
(a -> f b) -> t a -> f (t b)
In standard haskell, the "order" of the type variables doesn't matter, so GHC just makes one up for you. But in the presence of TypeApplications, the order does matter:
map :: forall b a. (a -> b) -> ([a] -> [b])
-- is not the same as
map :: forall a b. (a -> b) -> ([a] -> [b])
For this reason, when working with highly parametric code, or you expect your users are going to want to use TypeApplications on your functions, you might want to explicitly set the order of your type vars instead of letting GHC define an order for you, with ExplicitForAll:
{-# LANGUAGE ExplicitForAll #-}
map :: forall a b. (a -> b) -> ([a] -> [b])
Which feels a lot like <T1, T2> in java or c#

Caching possible values of a function constructed at runtime

I have a data constructor with a few value constructors:
data DataType = C1 | C2 | C3 | ... | Cn
I'd like to build a function at run time from that data type to some other values (in fact, I'm doing this in an IO monad):
buildFun :: IO (DataType -> b)
buildFun = do
....
return $ \x -> case x of
C1 -> someProcessesToGetTheValue C1
...
Cn -> someProcessesToGetTheValue Cn
Will this mean that someProcessesToGetTheValue will be called each time I call the returned function?
I'd prefer Haskell to evaluate someProcessesToGetTheValue inside buildFun (since those calls are quite expensive) and return a function which returns these fully evaluated expressions.
Can I force that behaviour? Perhaps by doing something like the following?:
buildFun :: IO (DataType -> b)
buildFun = do
C1value <- return $ someProcessesToGetTheValue C1
...
Cnvalue <- return $ someProcessesToGetTheValue Cn
return $ \x -> case x of
C1 -> C1value
...
Cn -> Cnvalue
You don't have to involve the IO monad at all (and indeed do { x <- return v; ... } is identical to let x = v in ...), just bind the values outside the lambda:
buildFun :: IO (DataType -> b)
buildFun = do
let v1 = someProcessesToGetTheValue C1
...
return $ \x -> case x of { C1 -> v1; ... }
Haskell doesn't really specify anything about runtime evaluation behaviour, but on all common implementations this will ensure that the results are shared; see What does "floated out" mean? for more information.
However, it still won't evaluate v1…vn inside buildFun; instead, they will each be evaluated the first time the corresponding result of the function you return is evaluated. If you want to force them to be evaluated up-front, you can say let !v1 = someProcessesToGetTheValue C1 (this requires the BangPatterns language extension), or v1 <- evaluate $ someProcessesToGetTheValue C1 (from Control.Exception; this behaves better if someProcessesToGetTheValue C1 might throw an exception).
Instead of a function, why not instead define some data structure, like a list, of all the the results of evaluating this function (indexed by position of constructor in data type)? For example, something like this (not tested):
data DataType = C1 | C2 | C3 | ... | Cn deriving (Enum, Bounded)
cachedValues :: [b]
cachedValues = map someProcessesToGetTheValue ([minBound .. maxBound] :: [DataType])
getCachedValue :: DataType -> b
getCachedValue x = cachedValues !! (fromEnum x)
Since Haskell is lazy, it will store a thunk until it is run for the first time, after which it will remember the value.
(If list traversal over the list of size n is inefficient; you can use an array or Map instead. The idea is the same.)

Sorting abstract datatypes in Haskell

For example I have the following,
type something = (Float, Float, Int, Aa, Bb, Cc, Int)
If I were to desire to find the smallest something in base to their first element (Float) how could I do that? The way I have reasoned it is the following, yet I cant manage to figureout how to implement it
Because I have a list of somethings the easiest way should be to create my own min helper function that compares 2 somethings and returns the smallest of the two. However it is trying to do that "easier way" that got me stuck with type compile errors...
findMin :: something -> something -> somthing
findMin x y = sortBy (compare `on` fst) x y
I am not familiar with sortBy and compare on, I just came across a similar question here in SO but I couldnt manage to make it work. As a beginner in Haskell, is there another way to approaching this?.
If you want to compare based on the first field of a data type, you can let Haskell write the code for you:
data Something = Something Float Float Int String Bool Char Int
deriving (Eq, Ord)
The deriving clause specifies which type classes implementations are automatically generated for the Something type. Here, we derive Eq which allows us to ask whether two Somethings are equal (e.g., with ==), and Ord, which allows us to compare two Somethings and know which one is "greater".
Haskell's default behavior when deriving Ord is to compare each field from first to last, so the default code will start by comparing the first Float of each Something, which is exactly what you want.
Once you're dealing with a type that implements Ord, you can use all sorts of built-in functions like minimum :: Ord a => [a] -> a. This takes a list of any type that implements Ord, and gives back the smallest element. So, as an example:
st1 = Something 3.14 2.72 7 "hello" False 'λ' 42
st2 = Something 3.15 2.72 7 "hello" False 'λ' 42
smallest = minimum [st1,st2]
Using a custom data type is usually the better option, but if you really want to use tuples, you can start by defining a helper function comparingFst that compares based on the first element of the tuple.
import Data.Ord
import Data.List
-- Dummy data types for example purposes. Derive from Show just so
-- that the example can be more easily tested interactively in ghci.
data Aa = Aa deriving Show
data Cc = Cc deriving Show
type Something = (Float, Float, Int, Aa, Cc, Int)
comparingFst :: Something -> Something -> Ordering
comparingFst = comparing fstSomething
where fstSomething (x,_,_,_,_,_) = x
Now you can take the smaller of two elements with:
findMin :: Something -> Something -> Something
findMin x y = case comparingFst x y of
LT -> x
_ -> y
or from a list of elements
findMinimum :: [Something] -> Something
findMinimum = minimumBy comparingFst
And you can also use the same helper function for sorting:
sortSomethings :: [Something] -> [Something]
sortSomethings = sortBy comparingFst
Also, it's worth mentioning that tuples are, by default, compared element-wise, starting from the first element, so assuming your Aa and Bb types can be derived from Ord and Eq, you don't need anything extra, i.e. the example becomes:
import Data.List
data Ab = Ab deriving (Show, Ord, Eq)
data Cc = Cc deriving (Show, Ord, Eq)
type Something = (Float, Float, Int, Ab, Cc, Int)
findMin :: Something -> Something -> Something
findMin x y = min x y
findMinimum :: [Something] -> Something
findMinimum = minimum
sortSomethings :: [Something] -> [Something]
sortSomethings = sort
In other words, you can just use the standard min and sort functions as-is.
You have some syntax errors, firstly.
There are two things you can do. Firstly, following the model of using an accessor function to get at the field you want (fst), we can define labels for the fields of your type:
data Something = Something { field_x, field_y :: Float,
field_z :: Int }
and then sort on field_x
import Data.List
import Data.Function
sortSomethings :: [Something] -> [Something]
sortSomethings = sortBy (compare `on` field_x)
getting at the mimimum is the same as taking the head off the sorted list:
minSomethings :: [Something] -> Something
minSomethings = head . sortSomethings
alternatively, you can write a custom Ord instance for the Something type that compares values only using field_x, then regular sort and minimum (and other Ord-based functions), will "just work".

Haskell mutable map/tree

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.

Resources