I wrote this program and I compiled with:
ghc --make shell.hs
When I run it looks like:
$./shell
enter your number:
6
6
okay... it execute 6 time...
If I remove the sleep 2 statement then it exits fast but outputs only 6.
I tried to follow the advice (which is what is in the code below) given at this answer of a similar question but it didn't work.
It seems very strange that it is not executing all the commands. How do I force it to execute all commands and in a strict order? the lazyness is a good feature but when it comes to IO it just sucks or I am not enough of an expert to understand it.
What I want to do is execute all commands from a list in a strict order and I don't want Haskell to "intelligently" remove some of the commands on my behalf (if I want to execute sleep for 2 seconds in the shell, I should be allowed to do so).
If waitForProcess has to be used to resolve this, then my problem is I don't know how to use it. I have tried google but failed to see a simple example of it.
Please note that I want a working code solution for the program code I have given below and there should be a reasonable guarantee that the program works as expected on bash running on a typical Linux (say, Debian 7) as the Python program given below it runs.
import System.Process
import System.Exit
main = do
putStrLn "enter your number:"
n <- getLine
main1 (readInt n)
putStrLn ("okay... it execute " ++ n ++" time...")
readInt:: String -> Int
readInt = read
main1 n = do
ExitSuccess <- system ("echo " ++ (show n))
ExitSuccess <- system "sleep 2"
if n == 0 then (main1 (n-1)) else return ()
Below is a Python program and it works as desired with proper sleep:
import os
for i in range(6):
os.system("echo " + str(i))
os.system("sleep 2")
How about writing more like the Python version using forM_:
import Control.Monad
main1 n = do
forM_ [n,n-1 .. 1] $ \n -> do
ExitSuccess <- system ("echo " ++ (show n))
ExitSuccess <- system "sleep 2"
return ()
The program is doing what you are telling him to do:
main1 n = do
ExitSuccess <- system ("echo " ++ (show n))
ExitSuccess <- system "sleep 2"
if n == 0 then (main1 (n-1)) else return ()
Here we have n == 6 so the condition of the if is false and then return () is executed, which terminates main1 doing nothing.
Note that if you passed n == 0 the condition would be true and it would execute main1 (-1) which then stops. In any case main1 will not repeat the commands n times it will always either execute them once (if n /= 0) or twice (if n == 0).
You could fix this by doing:
main1 n = do
ExitSuccess <- system ("echo " ++ (show n))
ExitSuccess <- system "sleep 2"
if n > 0 then main1 (n-1) else return ()
Or, equivalently:
import Control.Monad
main1 n = do
ExitSuccess <- system ("echo " ++ (show n))
ExitSuccess <- system "sleep 2"
when (n > 0) $ main1 (n-1)
The problem here is that you are using general recursion to repeat a given statement. But general recursion can do anything and so you can easily produce bugs (as in your case). To avoid this it's often useful to use some other function to build your code. For example:
main1 n = sequence_ $ map execute [1..n]
execute i = do
ExitSuccess <- system $ "echo" ++ show i
ExitSuccess <- system $ "sleep 2"
return ()
Or:
main1 n = mapM_ execute [1..n]
Here you know what mapM_ and [1..n] do and so combining them tells you that you the execute action will be executed n times.
In any case the IO monad guarantees the the actions are all executed and in order, so you should not attribute to Haskell lazyness the fact that the code wasn't executed the expected number of times.
The problem with lazyness and input/output are different.
Related
This problem involves an arbitrary number of dice with each an arbitrary number of sides. We then find the maximal number of dice that can be put in a straight, see Google's Code Jam explanation. The implementation should be reasonably efficient for around 10^5 dice with each one having up to 10^6 sides.
I've been trying to solve the problem in Haskell and this is my solution. However, it is not fast enough to earn full points on the problem, so can this be optimized?
import Data.List (sort)
import Data.Foldable (foldl')
getMaxStraight :: [Int] -> Int
getMaxStraight sides =
foldl'
(\maxStraight side -> if side > maxStraight then succ maxStraight else maxStraight)
0
(sort sides)
-- Doing IO in Haskell
-- Assuming the above works perfectly, something might not perform well below
main :: IO ()
main = do
line <- getLine :: IO String
let numberOfCases = read line :: Int
in mapM_ solveCase [1 .. numberOfCases]
solveCase :: Show a => a -> IO ()
solveCase i = do
line1 <- getLine :: IO String
line2 <- getLine :: IO String
let numberOfDice = read line1 :: Int
let diceSides = map read (words line2) :: [Int]
let maxStraight = getMaxStraight diceSides
putStrLn $ "Case #" ++ show i ++ ": " ++ show maxStraight
Aside from this, I've also written a Python solution which did run in time. I'd expect that Haskell would run faster than Python. What is going on?
def get_max_straight(sides):
max_straight = 0
for side in sorted(sides):
if side > max_straight:
max_straight += 1
return max_straight
# Doing IO in Python
# This works as needed
if __name__ == '__main__':
tests_len = int(input())
for case_num in range(1, 1 + tests_len):
input() # Discard unneeded input
sides = [int(s) for s in input().split(' ')]
print(f'Case #{case_num}: {get_max_straight(sides)}')
I'd like to write one code that could be run in two "modes":
either in logging mode, i.e. it should log some informations (in my case I want to log the number of calls done on some particular functions at a given time)
or in efficient mode, i.e. it does not log anything but just runs as fast as possible
I tried to write the following code, which creates two Writers, one normal one (for the logging mode) and one stupid one (that does not record anything, for the efficient mode). I then define a new class LogFunctionCalls that allows me to run my function in one of these two Writers.
However, I tried to compare the speed of the code using the Stupid writer, and it's significantly slower than the normal code without writer: here is the profiling informations:
code without writer: total time = 0.27s, total alloc = 55,800 bytes
code with stupid writer StupidLogEntry: total time = 0.74 s, total alloc = 600,060,408 bytes (NB: the real time is much bigger than 0.74s...)
code with real writer LogEntry: total time = 5.03 s, total alloc = 1,920,060,624 bytes
Here is the code (you can comment depending on which run you want to use):
{-# LANGUAGE ScopedTypeVariables #-}
module Main where
--- It depends on the transformers, containers, and base packages.
--- You can profile it with:
--- $ cabal v2-run --enable-profiling debug -- +RTS -p
--- and a file debug.prof will be created.
import qualified Data.Map.Strict as MapStrict
import qualified Data.Map.Merge.Strict as MapMerge
import qualified Control.Monad as CM
import Control.Monad.Trans.Writer.Strict (Writer)
import qualified Control.Monad.Trans.Writer.Strict as Wr
import qualified Data.Time as Time
-- Test using writer monad
-- The actual LogEntry, that should associate a number
-- to each name
newtype LogEntry = LogEntry { logMap:: MapStrict.Map String Int }
deriving (Eq, Show)
-- A logentry that does not record anything, always empty
newtype StupidLogEntry = StupidLogEntry { stupidLogMap:: MapStrict.Map String Int }
deriving (Eq, Show)
-- Create the Monoid instances
instance Semigroup LogEntry where
(LogEntry m1) <> (LogEntry m2) =
LogEntry $ MapStrict.unionWith (+) m1 m2
instance Monoid LogEntry where
mempty = LogEntry MapStrict.empty
instance Semigroup StupidLogEntry where
(StupidLogEntry m1) <> (StupidLogEntry m2) =
StupidLogEntry $ m1
instance Monoid StupidLogEntry where
mempty = StupidLogEntry MapStrict.empty
-- Create a class that allows me to use the function "myTell"
-- that adds a number in the writer (either the LogEntry
-- or StupidLogEntry one)
class (Monoid r) => LogFunctionCalls r where
myTell :: String -> Int -> Writer r ()
instance LogFunctionCalls LogEntry where
myTell namefunction n = do
Wr.tell $ LogEntry $ MapStrict.singleton namefunction n
instance LogFunctionCalls StupidLogEntry where
myTell namefunction n = do
-- Wr.tell $ StupidLogEntry $ Map.singleton namefunction n
return ()
-- Function in itself, with writers
countNumberCalls :: (LogFunctionCalls r) => Int -> Writer r Int
countNumberCalls 0 = return 0
countNumberCalls n = do
myTell "countNumberCalls" 1
x <- countNumberCalls $ n - 1
return $ 1 + x
--- Without any writer, pretty efficient
countNumberCallsNoWriter :: Int -> Int
countNumberCallsNoWriter 0 = 0
countNumberCallsNoWriter n = 1 + countNumberCallsNoWriter (n-1)
main :: IO ()
main = do
putStrLn $ "Hello"
-- Version without any writter
print =<< Time.getZonedTime
let n = countNumberCallsNoWriter 15000000
putStrLn $ "Without any writer, the result is " ++ (show n)
-- Version with Logger
print =<< Time.getZonedTime
let (n, log :: LogEntry) = Wr.runWriter $ countNumberCalls 15000000
putStrLn $ "The result is " ++ (show n)
putStrLn $ "With the logger, the number of calls is " ++ (show $ (logMap log))
-- Version with the stupid logger
print =<< Time.getZonedTime
let (n, log :: StupidLogEntry) = Wr.runWriter $ countNumberCalls 15000000
putStrLn $ "The result is " ++ (show n)
putStrLn $ "With the stupid logger, the number of calls is " ++ (show $ (stupidLogMap log))
print =<< Time.getZonedTime
The Writer monad is the bottleneck. A better way to generalize your code so it can run in those two "modes" is to change the interface, i.e., the LogFunctionCalls class, to be parameterized by the monad:
class Monad m => LogFunctionCalls m where
myTell :: String -> Int -> m ()
Then we can use an identity monad (or monad transformer) to implement it trivially:
newtype NoLog a = NoLog a
deriving (Functor, Applicative, Monad) via Identity
instance LogFunctionCalls NoLog where
myTell _ _ = pure ()
Note also that the function to test has a different type now, that no longer refers to Writer explicitly:
countNumberCalls :: (LogFunctionCalls m) => Int -> m Int
Let's stick it in a benchmark, which has all kinds of methodological issues as pointed out in the comments, but still, something interesting happens if we compile it with ghc -O:
main :: IO ()
main = do
let iternumber = 1500000
putStrLn $ "Hello"
t0 <- Time.getCurrentTime
-- Non-monadic version
let n = countNumberCallsNoWriter iternumber
putStrLn $ "Without any writer, the result is " ++ (show n)
t1 <- Time.getCurrentTime
print (Time.diffUTCTime t1 t0)
-- NoLog version
let n = unNoLog $ countNumberCalls iternumber
putStrLn $ "The result is " ++ (show n)
t2 <- Time.getCurrentTime
print (Time.diffUTCTime t2 t1)
The output:
Hello
Without any writer, the result is 1500000
0.022030957s
The result is 1500000
0.000081533s
As we can see, the second version (the one we care about) took zero time. If we remove the first version from the benchmark, then the remaining one will take the 0.022s of the former.
So GHC actually optimized one of the two benchmarks away because it saw that they are the same, which achieves what we originally wanted: the "logging" code runs as fast as specialized code without logging because they're literally the same, and the benchmark numbers don't matter.
This can also be confirmed by looking at the generated Core; run ghc -O -ddump-simpl -ddump-to-file -dsuppres-all and make sense of the file Main.dump-simpl. Or use inspection-testing.
Compilable gist: https://gist.github.com/Lysxia/2f98c4a8a61034dcc614de5e95d7d5f8
I implemented the Winograd algorithm on Haskell and tried to speed up the algorithm due to strict calculations. In this I succeeded, but I completely did not understand why, adding strictness, it starts to work faster. Since my code for this algorithm is large enough, I wrote two small functions that demonstrate this problem.
module Main where
import qualified Data.Vector as V
import qualified Data.Matrix as M
import Control.DeepSeq
import Control.Exception
import System.Clock
import Data.Time
matrixCtor x y size = M.matrix size size $ \(i,j) -> x*i+y*j
group v s = foldl (\acc i ->acc + V.unsafeIndex v i * V.unsafeIndex v (i+1)) 0 [0,2..s-1]
size = 3000 :: Int
testWithForce :: IO ()
testWithForce = do
let a = matrixCtor 2 1 size
evaluate $ force a
start <- getCurrentTime
let c = V.generate size $ \j -> M.getCol (j+1) a
evaluate $ force c
let d = foldl (\acc i ->acc + group (V.unsafeIndex c i) size) 0 [0,1..(size-1)]
evaluate $ force d
end <- getCurrentTime
print (diffUTCTime end start)
testWithoutForce :: IO ()
testWithoutForce = do
let a = matrixCtor (-2) 1 size
evaluate $ force a
start <- getCurrentTime
let c = V.generate size $ \j -> M.getCol (j+1) a
let d = foldl (\acc i ->acc + group (V.unsafeIndex c i) size) 0 [0,1..(size-1)]
evaluate $ force d
end <- getCurrentTime
print (diffUTCTime end start)
main :: IO ()
main = do
testWithForce
testWithoutForce
In the implementation of the algorithm, the matrices are computed before use, just like here. In the function testWithForce I calculate the value c before it is used. In this case, the function testWithForce works faster than the function testWithoutForce. I got the following results:
0.945078s --testWithForce
1.785158s --testWithoutForce
I just can not understand why strictness in this case speeds up the work so much.
Pardon the non-answer, but make sure to control for GC: it appears that the second function may be burdened with the GC from the previous one, thereby inflating the difference.
I can reproduce what you're seeing:
$ ghc -O3 --make foo.hs && ./foo
[1 of 1] Compiling Main ( foo.hs, foo.o )
Linking foo ...
1.471109207s
2.001165795s
However, when I flipped the order of the test, the result was different:
main = do
testWithoutForce
testWithForce
$ ghc -O3 --make foo.hs && ./foo
1.626452918s
1.609818958s
So I made main GC between each test:
import System.Mem
main = do
performMajorGC
testWithForce
performMajorGC
testWithoutForce
The forced one is still faster, but the difference was massively reduced:
1.460686986s
1.581715988s
Say I have two pure but unsafe functions, that do the same, but one of them is working on batches, and is asymptotically faster:
f :: Int -> Result -- takes O(1) time
f = unsafePerformIO ...
g :: [Int] -> [Result] -- takes O(log n) time
g = unsafePerformIO ...
A naive implementation:
getUntil :: Int -> [Result]
getUntil 0 = f 0
getUntil n = f n : getUntil n-1
switch is the n value where g gets cheaper than f.
getUntil will in practice be called with ever increasing n, but it might not start at 0. So since the Haskell runtime can memoize getUntil, performance will be optimal if getUntil is called with an interval lower than switch. But once the interval gets larger, this implementation is slow.
In an imperative program, I guess I would make a TreeMap (which could quickly be checked for gaps) for caching all calls. On cache misses, it would get filled with the results of g, if the gap was greater than switch in length, and f otherwise, respectively.
How can this be optimized in Haskell?
I think I am just looking for:
an ordered map filled on-demand using a fill function that would fill all values up to the requested index using one function if the missing range is small, another if it is large
a get operation on the map which returns a list of all lower values up to the requested index. This would result in a function similar to getUntil above.
I'll elaborate in my proposal for using map, after some tests I just ran.
import System.IO
import System.IO.Unsafe
import Control.Concurrent
import Control.Monad
switch :: Int
switch = 1000
f :: Int -> Int
f x = unsafePerformIO $ do
threadDelay $ 500 * x
putStrLn $ "Calculated from scratch: f(" ++ show x ++ ")"
return $ 500*x
g :: Int -> Int
g x = unsafePerformIO $ do
threadDelay $ x*x `div` 2
putStrLn $ "Calculated from scratch: g(" ++ show x ++ ")"
return $ x*x `div` 2
cachedFG :: [Int]
cachedFG = map g [0 .. switch] ++ map f [switch+1 ..]
main :: IO ()
main = forever $ getLine >>= print . (cachedFG !!) . read
… where f, g and switch have the same meaning indicated in the question.
The above program can be compiled as is using GHC. When executed, positive integers can be entered, followed by a newline, and the application will print some value based on the number entered by the user plus some extra indication on what values are being calculated from scratch.
A short session with this program is:
User: 10000
Program: Calculated from scratch: f(10000)
Program: 5000000
User: 10001
Program: Calculated from scratch: f(10001)
Program: 5000500
User: 10000
Program: 5000000
^C
The program has to be killed/terminated manually.
Notice that the last value entered doesn't show a "calculated from scratch" message. This indicates that the program has the value cached/memoized somewhere. You can try executing this program yourself; but have into account that threadDelay's lag is proportional to the value entered.
The getUntil function then could be implemented using:
getUntil :: Int -> [Int]
getUntil n = take n cachedFG
or:
getUntil :: Int -> [Int]
getUntil = flip take cachedFG
If you don't know the value for switch, you can try evaluating f and g in parallel and use the fastest result, but that's another show.
The Problem
I want to simulate in Haskell a multivalue outputting functions. The Haskell code is generated (not hand written) - this is important information, see below:
This can be of course easly done by returning a tuple from function, like
f x y = (x+y, x-y)
But when using such function I have to know what kind of tuple it returns:
...
(out_f_1, out_f_2) = f a b
(out_g_1, out_g_2, out_g_3) = g out_f_1
...
And so on ... But while generating code, I don't know what is the type of ouput of lets say f, so right now I'm using the Data.List.Select package and simulate the above with:
import Data.List.Select
...
out_f = f a b
out_g = g (sel1 outf)
...
The problem is the performance - on my testing program, the version, which uses Data.List.Select is twice slower than the version written by hand.
This is very obvious situation, because Data.List.Select is written using classes and instances, so it uses some kind of runtime dictionary (If I'm not wrong).
(http://hackage.haskell.org/packages/archive/tuple/0.2.0.1/doc/html/src/Data-Tuple-Select.html#sel1)
The Question
I want to ask you If is it possible to somehow compile the version (which uses Data.List.Select) to be as fast as the manually crafted one?
I think there should be a switch to compiler, which will tell him to "instantiate" the classes and interfaces for each use (something like templates from C++).
Benchmarks
Test1.hs:
import qualified Data.Vector as V
import System.Environment
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
args <- getArgs
let iternum = read (head args) :: Int in do
putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> a (iternum-i))
$ V.enumFromTo 1 iternum
putStrLn "Done."
compile with ghc -O3 Test1.hs
Test2.hs:
import qualified Data.Vector as V
import Data.Tuple.Select
import Data.Tuple.OneTuple
import System.Environment
b x = OneTuple $ x + 5
c x = OneTuple $ (sel1 $ b x) + 1
d x = OneTuple $ (sel1 $ b x) - 1
a x = OneTuple $ (sel1 $ c x) + (sel1 $ d x)
main = do
putStrLn "Starting..."
args <- getArgs
let iternum = read (head args) :: Int in do
putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> sel1 $ a (iternum-i))
$ V.enumFromTo 1 iternum
putStrLn "Done."
compile with ghc -O3 Test2.hs
Results
time ./Test1 10000000 = 5.54 s
time ./Test2 10000000 = 10.06 s
I am not sure, but it might be worthwhile to try
http://www.haskell.org/ghc/docs/7.0.3/html/users_guide/pragmas.html#specialize-pragma
Ok, the results I've posted are not accurate - as #sabauma told - the two codes perform in the same time If you compile them with optimizations enabled.
The #tohava's answer is very good if you want to explicity show which functions to specialize (see the #sabauma comment above).