I need to generate a simple random number in Agda.
I tried googling phrases like 'random number agda' but couldn't find any working code.
In Haskell the code would be
import System.Random
main :: IO ()
main = do
-- num :: Float
num <- randomIO :: IO Float
-- This "extracts" the float from IO Float and binds it to the name num
print $ num
outputs would be
0.7665119
or
0.43071353
What Agda code would achieve the same results (if it's possible)?
Working code would be appreciated!
The easiest route is probably to postulate the existence of such a primitive and then to explain to Agda how to compile it by using a COMPILE pragma.
open import Agda.Builtin.Float
import IO.Primitive as Prim
open import IO
random : IO Float
random = lift primRandom where
postulate primRandom : Prim.IO Float
{-# FOREIGN GHC import qualified System.Random as Random #-}
{-# COMPILE GHC primRandom = Random.randomIO #-}
open import Codata.Musical.Notation
open import Function
main : Prim.IO _
main = run $
♯ random >>= λ f → ♯ putStrLn (primShowFloat f)
I've included a main so that you can compile this file (using agda -c FILENAME) and run it to see that you indeed get random floats.
Related
I'm a Haskell beginner and have chosen it to solve a programming task for my class, however my solution is too slow and doesn't get accepted. I'm trying to profile it and was hoping that I could get some pointers from more advanced Haskellers here.
The only other solution in my class that got accepted so far was written in Rust. I'm sure that I should be able to achieve similar performance in Haskell and I wrote horrible imperative code in the hope of improving performance, alas to no avail.
My first suspicion relates to work, where I am using forever to go over the in-degree array until I get an out-of-bounds exception. I was hoping for this to be tail-recursive and to compile to a while (true) style loop.
My second suspicion is that I/O is perhaps slowing things down.
EDIT: The problem has likely to do with my algorithm because I am not keeping a queue of nodes with indegree 0. Thank you #luqui.
EDIT2: It seems that the real bottleneck was I/O, I fixed that thanks to #Davislor.
The task is based on this: http://www.spoj.com/UKCPLAD/problems/TOPOSORT/ and I am constrained to use only the libraries in the Haskell Platform.
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE LambdaCase #-}
{-# OPTIONS_GHC -O3 #-}
import Control.Monad
import Data.Array.IO
import Data.IORef
import Data.Int
import Control.Exception
type List = []
type Node = Int32
type Edge = (Node, Node)
type Indegree = Int32
main = do
(numNodes, _) <- readPair <$> getLine
edges <- map readPair . lines <$> getContents
topo numNodes edges
-- lower bound
{-# INLINE lb #-}
lb = 1
topo :: Node -> List Edge -> IO ()
topo numNodes edges = do
result <- newIORef []
count <- newIORef 0
indegrees <- newArray (lb,numNodes) 0 :: IO (IOUArray Node Indegree)
neighbours <- newArray (lb,numNodes) [] :: IO (IOArray Node (List Node))
forM_ edges $ \(from,to) -> do
update indegrees to (+1)
update neighbours from (to:)
let work = forever $ do
z <- getNext indegrees
modifyIORef' result (z:)
modifyIORef' count (+1)
ns <- readArray neighbours z
forM_ ns $ \n -> update indegrees n pred
work `catch`
\(_ :: SomeException) -> do
count <- readIORef count
if numNodes == count
then (mapM_ (\n -> putStr (show n ++ " ")) . reverse) =<< readIORef result
else putStrLn "Sandro fails."
{-# INLINE update #-}
update a i f = do
x <- readArray a i
writeArray a i (f x)
{-# INLINE getNext #-}
getNext indegrees = getNext' indegrees =<< getBounds indegrees
{-# INLINE getNext' #-}
getNext' indegrees (lb,ub) = readArray indegrees lb >>= \case
0 -> writeArray indegrees lb (-1) >> return lb
_ -> getNext' indegrees (lb+1,ub)
readPair :: String -> (Node,Node)
{-# INLINE readPair #-}
readPair = toPair . map read . words
where toPair [x,y] = (x,y)
toPair _ = error "Only two entries per line allowed"
Example output
$ ./topo
8 9
1 4
1 2
4 2
4 3
3 2
5 2
3 5
8 2
8 6
^D
1 4 3 5 7 8 2 6
If you haven’t already, profile your program by compiling with -prof -fprof-auto and then executing with the command-line options +RTS -p. This will generate a profile *.prof that will tell you which functions the program is spending all its time in. However, I can see immediately where the biggest time-waster is. Your instincts were right: it’s the I/O.
Having done that a lot, I can guarantee you that you’ll find that it’s spending the vast majority of its time doing I/O. The first thing you should always do to speed up your program is rewrite it to use fast I/O. Haskell is a fast language, when you use the right data structures. The default I/O library in the Prelude uses singly-linked lists with lazily-evaluated thunks where each node holds a single Unicode character. That would be slow in C, too!
I’ve gotten the best results with Data.ByteString.Lazy.Char8 when the input is ASCII, and Data.ByteString.Builder to generate the output. (An alternative is Data.Text.) That gets you a lazily-evaluated list of strict character buffers on input (so interactive input and output still works), and fills a single buffer on output.
After you’ve written the skeleton of the program with fast I/O, the next step is to look at your algorithm, and especially your data structures. Use profiling to see where all the time goes. But I’d recommend you use a functional algorithm rather than trying to write imperative programs in Haskell with do.
I almost always approach problems like this in Haskell with a more functional style: in particular, my main function is almost always something similar to:
import qualified Data.ByteString.Lazy.Char8 as B8
main :: IO()
main = B8.interact ( output . compute . input )
This makes everything except the call to interact a pure function, and isolates the parsing code and the formatting code so the compute part in the middle can be independent of that.
Since this is an assignment and you want to solve the problem yourself, I’ll refrain from refactoring the program for you, but here’s an example I wrote in response to a question on another forum to perform a counting sort. It should be suitable as a skeleton for other kinds of problems.
import Data.Array.IArray (accumArray, assocs)
import Data.Array.Unboxed (UArray)
import Data.ByteString.Builder (Builder, char7, intDec, toLazyByteString)
import qualified Data.ByteString.Lazy.Char8 as B8
import Data.Monoid ((<>))
main :: IO()
main = B8.interact ( output . compute . input ) where
input :: B8.ByteString -> [Int]
input = map perLine . tail . B8.lines where
perLine = decode . B8.readInt
decode (Just (x, _)) = x
decode Nothing = error "Invalid input: expected integer."
compute :: [Int] -> [Int]
compute = concatMap expand . assocs . countingSort . map encode where
encode i = (i, 1)
countingSort :: [(Int, Int)] -> UArray Int Int
countingSort = accumArray (+) 0 (lower, upper)
lower = 0
upper = 1000000
expand (i,c) = replicate c i
output :: [Int] -> B8.ByteString
output = toLazyByteString . foldMap perCase where
perCase :: Int -> Builder
perCase x = intDec x <> char7 '\n'
At present, this version ran in less than half the time of anyone else’s Haskell solution for the same problem, the same holds true for the actual contest problems I’ve used it for, and the approach generalizes.
So I suggest changing the I/O to be similar to that, first, then profiling, and coming back with the profiling output if that doesn’t make enough of a difference. This might also be a good Code Review question.
Thanks to #Davislor's suggestions I managed to get it to be much faster and I also refactored the code for the better and now I actually have an m log(n) algorithm. Surprisingly this doesn't make that much of a difference—the I/O far outweighed the suboptimal complexity of the algorithm.
EDIT: got rid of unsafePerformIO and it actually runs a teeny-weeny bit faster. Plus adding -XStrict shaves off even more time.
{-# LANGUAGE Strict #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -O2 #-}
import Control.Monad
import Data.Array.IO
import Data.Int
import Data.Set (Set)
import qualified Data.Set as Set
import Data.ByteString.Builder (Builder, char7, intDec, toLazyByteString)
import qualified Data.ByteString.Lazy.Char8 as B8
import Data.Monoid ((<>))
type List = []
type Node = Int
type Edge = (Node, Node)
type Indegree = Int
main = B8.putStrLn =<< topo . map readPair . B8.lines =<< B8.getContents
readPair :: B8.ByteString -> (Node,Node)
readPair str = (x,y)
where
(Just (x, str')) = B8.readInt str
(Just (y, _ )) = B8.readInt (B8.tail str')
topo :: List Edge -> IO B8.ByteString
topo inp = do
let (numNodes, _) = head inp
edges = tail inp
indegrees <- newArray (1,numNodes) 0 :: IO (IOUArray Node Indegree)
neighbours <- newArray (1,numNodes) [] :: IO (IOArray Node (List Node))
-- setup
forM_ edges $ \(from,to) -> do
update indegrees to (+1)
update neighbours from (to:)
zeroes <- collectIndegreeZero [] indegrees =<< getBounds indegrees
processQueue (Set.fromList zeroes) [] numNodes indegrees neighbours
where
collectIndegreeZero acc indegrees (lb,ub)
| lb > ub = return acc
| otherwise = do
indegr <- readArray indegrees lb
let acc' = if indegr == 0 then (lb:acc) else acc
collectIndegreeZero acc' indegrees (lb+1,ub)
processQueue queue result numNodes indegrees neighbours = do
if null queue
then if numNodes == 0
then return . toLazyByteString . foldMap whitespace . reverse $ result
else return "Sandro fails."
else do
(node,queue) <- return $ Set.deleteFindMin queue
ns <- readArray neighbours node
queue <- foldM decrIndegrees queue ns
processQueue queue (node:result) (numNodes-1) indegrees neighbours
where
decrIndegrees :: Set Node -> Node -> IO (Set Node)
decrIndegrees q n = do
i <- readArray indegrees n
writeArray indegrees n (i-1)
return $ if i == 1 then Set.insert n q else q
whitespace x = intDec x <> char7 ' '
{-# INLINE update #-}
update a i f = do
x <- readArray a i
writeArray a i (f x)
I implemented the Winograd algorithm on Haskell and tried to speed up the algorithm due to strict calculations. In this I succeeded, but I completely did not understand why, adding strictness, it starts to work faster. Since my code for this algorithm is large enough, I wrote two small functions that demonstrate this problem.
module Main where
import qualified Data.Vector as V
import qualified Data.Matrix as M
import Control.DeepSeq
import Control.Exception
import System.Clock
import Data.Time
matrixCtor x y size = M.matrix size size $ \(i,j) -> x*i+y*j
group v s = foldl (\acc i ->acc + V.unsafeIndex v i * V.unsafeIndex v (i+1)) 0 [0,2..s-1]
size = 3000 :: Int
testWithForce :: IO ()
testWithForce = do
let a = matrixCtor 2 1 size
evaluate $ force a
start <- getCurrentTime
let c = V.generate size $ \j -> M.getCol (j+1) a
evaluate $ force c
let d = foldl (\acc i ->acc + group (V.unsafeIndex c i) size) 0 [0,1..(size-1)]
evaluate $ force d
end <- getCurrentTime
print (diffUTCTime end start)
testWithoutForce :: IO ()
testWithoutForce = do
let a = matrixCtor (-2) 1 size
evaluate $ force a
start <- getCurrentTime
let c = V.generate size $ \j -> M.getCol (j+1) a
let d = foldl (\acc i ->acc + group (V.unsafeIndex c i) size) 0 [0,1..(size-1)]
evaluate $ force d
end <- getCurrentTime
print (diffUTCTime end start)
main :: IO ()
main = do
testWithForce
testWithoutForce
In the implementation of the algorithm, the matrices are computed before use, just like here. In the function testWithForce I calculate the value c before it is used. In this case, the function testWithForce works faster than the function testWithoutForce. I got the following results:
0.945078s --testWithForce
1.785158s --testWithoutForce
I just can not understand why strictness in this case speeds up the work so much.
Pardon the non-answer, but make sure to control for GC: it appears that the second function may be burdened with the GC from the previous one, thereby inflating the difference.
I can reproduce what you're seeing:
$ ghc -O3 --make foo.hs && ./foo
[1 of 1] Compiling Main ( foo.hs, foo.o )
Linking foo ...
1.471109207s
2.001165795s
However, when I flipped the order of the test, the result was different:
main = do
testWithoutForce
testWithForce
$ ghc -O3 --make foo.hs && ./foo
1.626452918s
1.609818958s
So I made main GC between each test:
import System.Mem
main = do
performMajorGC
testWithForce
performMajorGC
testWithoutForce
The forced one is still faster, but the difference was massively reduced:
1.460686986s
1.581715988s
I have to sort the lines of large integer matrices in Haskell and I started benchmarking with random data. I found that Haskell is 3 times slower than C++.
Because of the randomness, I expect line comparison to always terminate at the first column (which should have no duplicates). So I narrowed the matrix to a single column implemented as a Vector (Unboxed.Vector Int) and compared its sorting to a usual Vector Int.
Vector Int sorts as fast as C++ (good news !), but again, the column matrix is 3 times slower. Do you have an idea why ? Please find the code below.
import qualified Data.Vector.Unboxed as UV(Vector, fromList)
import qualified Data.Vector as V(Vector, fromList, modify)
import Criterion.Main(env, bench, nf, defaultMain)
import System.Random(randomIO)
import qualified Data.Vector.Algorithms.Intro as Alg(sort)
randomVector :: Int -> IO (V.Vector Int)
randomVector count = V.fromList <$> mapM (\_ -> randomIO) [1..count]
randomVVector :: Int -> IO (V.Vector (UV.Vector Int))
randomVVector count = V.fromList <$> mapM (\_ -> do
x <- randomIO
return $ UV.fromList [x]) [1..count]
benchSort :: IO ()
benchSort = do
let bVVect = env (randomVVector 300000) $ bench "sortVVector" . nf (V.modify Alg.sort)
bVect = env (randomVector 300000) $ bench "sortVector" . nf (V.modify Alg.sort)
defaultMain [bVect, bVVect]
main = benchSort
As Edward Kmett as explained to me, the Haskell version has one extra layer of indirection. A UV.Vector looks something like
data Vector a = Vector !Int !Int ByteArray#
So each entry in your vector of vectors is actually a pointer to a record holding slice indices and a pointer to an array of bytes. This is an extra indirection that the C++ code doesn't have. The solution is to use an ArrayArray#, which is an array of direct pointers to byte arrays or to further ArrayArray#s. If you need vector, you'll have to figure out what to do about the slicing machinery. Another option is to switch to primitive, which offers simpler arrays.
Following dfeuer's advice, implementing a vector of vectors as an ArrayArray# is 4 times faster than Vector (Unboxed.Vector Int) and only 40% slower than sorting a c++ std::vector<std::vector<int> > :
import Control.Monad.Primitive
import Data.Primitive.ByteArray
import qualified Data.Vector.Generic.Mutable.Base as GM(MVector(..))
import GHC.Prim
data MutableArrayArray s a = MutableArrayArray (MutableArrayArray# s)
instance GM.MVector MutableArrayArray ByteArray where
{-# INLINE basicLength #-}
basicLength (MutableArrayArray marr) = I# (sizeofMutableArrayArray# marr)
{-# INLINE basicUnsafeRead #-}
basicUnsafeRead (MutableArrayArray marr) (I# i) = primitive $ \s -> case readByteArrayArray# marr i s of
(# s1, bar #) -> (# s1, ByteArray bar #)
{-# INLINE basicUnsafeWrite #-}
basicUnsafeWrite (MutableArrayArray marr) (I# i) (ByteArray bar) = primitive $ \s ->
(# writeByteArrayArray# marr i bar s, () #)
For example, sorting a matrix of integers will then use
sortIntArrays :: ByteArray -> ByteArray -> Ordering
sortIntArrays x y = let h1 = indexByteArray x 0 :: Int
h2 = indexByteArray y 0 :: Int in
compare h1 h2
I am trying to read a large vector of custom data type from a binary file. I tried to use the example given here.
The trouble with the example code is, it uses lists and I want to use vectors.
So I adapted that code as below, but it takes very long time (more than a minute, I gave up after that) to read even 1 MB file.
module Main where
import Data.Word
import qualified Data.ByteString.Lazy as BIN
import Data.Binary.Get
import qualified Data.Vector.Unboxed as Vec
main = do
b <- BIN.readFile "dat.bin" -- about 1 MB size file
let v = runGet getPairs (BIN.tail b) -- skip the first byte
putStrLn $ show $ Vec.length v
getPair :: Get (Word8, Word8)
getPair = do
price <- getWord8
qty <- getWord8
return (price, qty)
getPairs :: Get (Vec.Vector (Word8, Word8))
getPairs = do
empty <- isEmpty
if empty
then return Vec.empty
else do pair <- getPair
pairs <- getPairs
return (Vec.cons pair pairs) -- is it slow because V.cons is O(n)?
When I tried to run it with ghc --make -O2 pairs.hs I got the error Stack space overflow: current size ...
How to efficiently read pairs of values from bytestring into vector?
Again, I wish to get complete working code not just only pointers to Haskell site or RWH nor a just list of function/module names.
Here are a couple of examples of creating Vectors from files. They are not the most efficient, but both run in just a couple of seconds in ghci.
module Main where
import qualified Data.ByteString.Lazy as BIN
import qualified Data.ByteString as BS
import qualified Data.Vector.Unboxed as Vec
import System.IO
import System.Posix
getFileSize :: String -> IO Int
getFileSize path = do
stat <- getFileStatus path
return (fromEnum $ fileSize stat)
readVector1 path = do
size <- getFileSize path
withBinaryFile path ReadMode $ \h -> do
-- can also use: size <- hFileSize h
let go _ = do bs <- BS.hGet h 2
return (BS.index bs 0, BS.index bs 1)
Vec.generateM (div size 2) go
pairs (a:b:rest) = (a,b) : pairs rest
pairs _ = []
readVector2 path = do
contents <- BIN.readFile path
-- unfoldr :: Unbox a => (b -> Maybe (a, b)) -> b -> Vector a
let v = Vec.unfoldr go (pairs $ BIN.unpack contents)
where go [] = Nothing
go (p:ps) = Just (p, ps)
return v
main = do
v <- readVector1 "rand" -- large file
print $ Vec.length v
v <- readVector2 "rand"
print $ Vec.length v
A third alternative:
readVector3 path = do
contents <- BS.readFile path
let size = BS.length contents
v = Vec.generate (div (fromIntegral size) 2) go
where go i = let a = BS.index contents (2*i)
b = BS.index contents (2*i+1)
in (a,b)
return v
This one turns out to be the fastest of the three.
Here's an alternative approach for loading the vector, that uses pipes and pipes-bytestring to stream the file, and the vector function from foldl to create the vector:
{-# LANGUAGE PackageImports #-}
import Data.Functor (void)
import "pipes" Pipes
import qualified "pipes" Pipes.Prelude as P
import qualified "pipes-bytestring" Pipes.ByteString as B
import qualified "pipes-binary" Pipes.Binary as B
import qualified "vector" Data.Vector.Unboxed as V
import qualified "foldl" Control.Foldl as L
import "lens-family-core" Lens.Family (view)
import System.IO
main :: IO ()
main = do
v <- withBinaryFile "somefile" ReadMode (\h ->
-- for simplicity, errors are ignored with "void"
L.impurely P.foldM L.vector (void (view B.decoded (B.drop 1 (B.fromHandle h)))))
print (V.length (v::V.Vector (B.Word8,B.Word8)))
cons is inefficient. The approach taken by foldl's vector is to progressively double the vector's capacity using unsafeGrow, in order to accomodate incoming values, and at the end "trim" any excess capacity with unsafeTake.
I have been trying to encode an algorithm in Haskell that requires using lots of mutable references, but it is (perhaps not surprisingly) very slow in comparison to purely lazy code.
Consider a very simple example:
module Main where
import Data.IORef
import Control.Monad
import Control.Monad.Identity
list :: [Int]
list = [1..10^6]
main1 = mapM newIORef list >>= mapM readIORef >>= print
main2 = print $ map runIdentity $ map Identity list
Running GHC 7.8.2 on my machine, main1 takes 1.2s and uses 290MB of memory, while main2 takes only 0.4s and uses a mere 1MB. Is there any trick to prevent this growth, especially in space? I often need IORefs for non-primitive types unlike Int, and assumed that an IORef would use an additional pointer much like a regular thunk, but my intuition seems to be wrong.
I have already tried a specialized list type with an unpacked IORef, but with no significant difference.
The problem is your use of mapM, which always performs poorly on large lists both in time and space. The correct solution is to fuse away the intermediate lists by using mapM_ and (>=>):
import Data.IORef
import Control.Monad
list :: [Int]
list = [1..10^6]
main = mapM_ (newIORef >=> readIORef >=> print) list
This runs in constant space and gives excellent performance, running in 0.4 seconds on my machine.
Edit: In answer to your question, you can also do this with pipes to avoid having to manually fuse the loop:
import Data.IORef
import Pipes
import qualified Pipes.Prelude as Pipes
list :: [Int]
list = [1..10^6]
main = runEffect $
each list >-> Pipes.mapM newIORef >-> Pipes.mapM readIORef >-> Pipes.print
This runs in constant space in about 0.7 seconds on my machine.
This is very likely not about IORef, but about strictness. Actions in the IO monad are serial -- all previous actions must complete before the next one can be started. So
mapM newIORef list
generates a million IORefs before anything is read.
However,
map runIdentity . map Identity
= map (runIdentity . Identity)
= map id
which streams very nicely, so we print one element of the list, then generate the next one, etc.
If you want a fairer comparison, use a strict map:
map' :: (a -> b) -> [a] -> [b]
map' f [] = []
map' f (x:xs) = (f x:) $! map' f xs
I have found that the hack towards a solution is to use a lazy mapM instead, defined as
lazyMapM :: (a -> IO b) -> [a] -> IO [b]
lazyMapM f [] = return []
lazyMapM f (x:xs) = do
y <- f x
ys <- unsafeInterleaveIO $ lazyMapM f xs
return (y:ys)
This allows the monadic version to run within the same 1MB and similar time. I would expect that a lazy ST monad could solve this problem more elegantly without using unsafeInterleaveIO, as a function:
main = print $ runST (mapM (newSTRef) list >>= mapM (readSTRef))
but that does not work (you also need to use unsafeInterleaveST), what leaves me thinking about how lazy the Control.Monad.ST.Lazy really is. Does someone know? :)