issue with sorting and addition - sorting

i'm beginner in haskell. i wrote code in haskell to read a text file and print 2nd and 4th column but i dont know where to put sorting mechnism and 2nd column consists of digits so if digits are repeating then its values should be added and show the total for ex.
2|23.45
4|36.89
1|77.20
2|20.20
so output should be
1|77.20
2|43.65
4|36.89
my code is:
module Main where
import Data.List.Split(splitOn)
import Data.List (intercalate)
project :: [Int] -> [String] -> [String]
project indices l = foldl (\acc i -> acc ++ [l !! i]) [] indices
fromString :: String -> [[String]]
fromString = map (splitOn "|") . lines
toString :: [[String]] -> String
toString = unlines . map (intercalate "|")
main :: IO ()
main = do
putStrLn =<<
return . toString . map (project [1, 3]) . fromString =<<
readFile("table.txt")
help me. Thanks

You should sort after map:
main :: IO ()
main = do
putStrLn =<<
return . toString . sort . map (project [1, 3]) . fromString =<<
readFile("table.txt")
By the way, return and =<< can be simplified as follows:
main :: IO ()
main =
putStrLn . toString . sort . map (project [1, 3]) . fromString =<<
readFile "table.txt"
This has still a couple of issues:
it sorts the ID as strings, not the numbers (e.g. 11 comes before 4)
it does not add the items with identical IDs
My suggestion would be:
convert the data into [(Int,Double)] first (have a look at read or -- if you want to sensibly handle malformed data -- reads).
sort the list.
apply a custom function to sum up the same-ID entries. This is a nice list-handling exercise for a beginner. Just fill the blanks below. Remember you can use recursion.
sumSameId :: [(Int,Double)] -> [(Int,Double)]
sumSameId [] = ???
sumSameId [(i,d)] = ???
sumSameId ((i1,d1):(i2,d2):rest) = if i1==i2 then ??? else ???
I have just realized that your code is not actually yours, but was taken verbatim from an answer to a question of yours. Honestly, it looks as you copied that code without making any effort to understand it, and are now trying to make it work using stackoverflow as a coding service. Of course, this may not be the case, yet it looks at it is. Just keep in mind that we usually are more keen to help people who show some effort.

Related

Getting the first tuple of list of tuples in Haskell

I am trying to get the first element of a tuple for every tuple in a list using the following:
getRow :: [(Integer,Integer)] -> [(Integer,Integer)]
getRow (row:rows) = do
(fst(head (row)))
I thought if I could get the first element of every head of the list of tuples that it would return just the first element, but that wasnt the case.
Based on your description, your expected output should be a list of elements, not a list of tuples. Therefore, the first step is to change the signature to:
getRow :: [(Integer,Integer)] -> [Integer]
But why restrict to Integer, when the method can work for any type? Let's make it more general by doing this:
getRow :: [(a,b)] -> [a]
Now the algorithm itself. You have the right idea about using fst to get the first element. We will use this function, together with a list comprehension to do the job as follows:
getRow lst = [fst x | x <- lst]
This will go through the list, extract the first element from each tuple and return a list of the extracted elements. Putting it all together, we get this:
getRow :: [(a,b)] -> [a]
getRow lst = [fst x | x <- lst]
Demo
Of course, this is one of many possible ways to go about the problem. Another solution would be to use a foldr function to do the same thing, like so:
getRow2 :: [(a,b)] -> [a]
getRow2 lst = foldr (\x acc -> (fst x):acc) [] lst
You can start off with a good tutorial to learn about the basics of Haskell, and use Hackage for reference. However, #Eric is absolutely correct to say that in any paradigm, you need to figure out the steps first before you start to write the code.

Haskell: performance of IORefs

I have been trying to encode an algorithm in Haskell that requires using lots of mutable references, but it is (perhaps not surprisingly) very slow in comparison to purely lazy code.
Consider a very simple example:
module Main where
import Data.IORef
import Control.Monad
import Control.Monad.Identity
list :: [Int]
list = [1..10^6]
main1 = mapM newIORef list >>= mapM readIORef >>= print
main2 = print $ map runIdentity $ map Identity list
Running GHC 7.8.2 on my machine, main1 takes 1.2s and uses 290MB of memory, while main2 takes only 0.4s and uses a mere 1MB. Is there any trick to prevent this growth, especially in space? I often need IORefs for non-primitive types unlike Int, and assumed that an IORef would use an additional pointer much like a regular thunk, but my intuition seems to be wrong.
I have already tried a specialized list type with an unpacked IORef, but with no significant difference.
The problem is your use of mapM, which always performs poorly on large lists both in time and space. The correct solution is to fuse away the intermediate lists by using mapM_ and (>=>):
import Data.IORef
import Control.Monad
list :: [Int]
list = [1..10^6]
main = mapM_ (newIORef >=> readIORef >=> print) list
This runs in constant space and gives excellent performance, running in 0.4 seconds on my machine.
Edit: In answer to your question, you can also do this with pipes to avoid having to manually fuse the loop:
import Data.IORef
import Pipes
import qualified Pipes.Prelude as Pipes
list :: [Int]
list = [1..10^6]
main = runEffect $
each list >-> Pipes.mapM newIORef >-> Pipes.mapM readIORef >-> Pipes.print
This runs in constant space in about 0.7 seconds on my machine.
This is very likely not about IORef, but about strictness. Actions in the IO monad are serial -- all previous actions must complete before the next one can be started. So
mapM newIORef list
generates a million IORefs before anything is read.
However,
map runIdentity . map Identity
= map (runIdentity . Identity)
= map id
which streams very nicely, so we print one element of the list, then generate the next one, etc.
If you want a fairer comparison, use a strict map:
map' :: (a -> b) -> [a] -> [b]
map' f [] = []
map' f (x:xs) = (f x:) $! map' f xs
I have found that the hack towards a solution is to use a lazy mapM instead, defined as
lazyMapM :: (a -> IO b) -> [a] -> IO [b]
lazyMapM f [] = return []
lazyMapM f (x:xs) = do
y <- f x
ys <- unsafeInterleaveIO $ lazyMapM f xs
return (y:ys)
This allows the monadic version to run within the same 1MB and similar time. I would expect that a lazy ST monad could solve this problem more elegantly without using unsafeInterleaveIO, as a function:
main = print $ runST (mapM (newSTRef) list >>= mapM (readSTRef))
but that does not work (you also need to use unsafeInterleaveST), what leaves me thinking about how lazy the Control.Monad.ST.Lazy really is. Does someone know? :)

Listing all the contents of a directory by breadth-first order results in low efficiency

I writed a Haskell module to list all the contents of a directory by breadth-first order. The below is the source code.
module DirElements (dirElem) where
import System.Directory (getDirectoryContents, doesDirectoryExist)
import System.FilePath ((</>))
dirElem :: FilePath -> IO [[FilePath]]
dirElem dirPath = iterateM (not.null) (concatMapM getDirectoryContents') [dirPath] >>= return.tail
getDirectoryContents' :: FilePath -> IO [FilePath]
getDirectoryContents' dirPath = do
isDir <- do doesDirectoryExist dirPath
if isDir then dirContent else return [] where
dirContent = do
contents <- getDirectoryContents dirPath
return.(map (dirPath</>)).tail.tail $ contents
iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
iterateM fb f x = do --Notice: Due to the the implementation of >>=, iterateM can't be writen like iterate which gives a infinite list and have type of iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
if fb x
then do
tail <- do {fx <- f x; iterateM fb f fx}
return (x:tail)
else return []
concatMapM :: Monad m => (a -> m[b]) -> [a] -> m[b]
concatMapM f list = mapM f list >>= return.concat
It works correct but when performing on a large directory, it will "suspend" for a little while, and spring out all the results.
After a research I find it is the same question with sequence $ map return [1..]::[[Int]] see Why the Haskell sequence function can't be lazy or why recursive monadic functions can't be lazy
This comes up every once in a while and the answer ends up being use an iteratee like library. Most often suggested recently has been the Proxy library.
Streaming recursive descent of a directory in Haskell
Older pipes solution out of date and non-iteratee like solution breadth-first traversal of directory tree is not lazy
I have seen Conduit solutions before and a few elegant monadic solutions, but I am not finding them now.
First of all, that's not related to strictness. Like many monads, IO is actually nonstrict in its monadic operations. This is related to lazy vs. eager I/O.
The problem is that you first do the directory traversal and then you process the result. You can improve that by using coroutines to interleave them. One simple way is to make the directory traversal take a callback as argument:
getDirectoryContents' :: (MonadIO m) => (FilePath -> m a) -> FilePath -> m ()
getDirectoryContents' k fp = {- ... -}
This is the simplest and least flexible solution. A more flexible solution is to actually implement coroutines. You can either roll your own coroutine monad by using free, monad-coroutine or operational, or you can use one of the many streaming abstractions like conduit, enumerator or pipes with the last one being my personal recommentation for simple cases like this one.
I modified the older answer that Davorak linked to to use the new pipes library.
It uses StateP to keep a queue of untraversed directories so that it can do a breadth first traversal. It uses MaybeP for exiting from the loop, as a convenience.
import Control.Monad
import Control.Proxy
import Control.Proxy.Trans.Maybe
import Control.Proxy.Trans.State as S
import Data.Sequence hiding (filter)
import System.FilePath.Posix
import System.Directory
getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
= fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path
traverseTree
:: (Proxy p)
=> FilePath
-> () -> Producer (MaybeP (StateP (Seq FilePath) p)) FilePath IO r
traverseTree path () = do
liftP $ S.modify (|> path)
forever $ do
x <- liftP $ S.gets viewl
case x of
EmptyL -> mzero
file :< s -> do
liftP $ S.put s
respond file
p <- lift $ doesDirectoryExist file
when p $ do
names <- lift $ getUsefulContents file
let namesfull = map (file </>) names
liftP $ forM_ namesfull $ \name ->
S.modify (|> name)
This defines a breadth-first lazy producer of files. If you hook it up to a printing stage, it will print out the files as it traverses the tree:
main = runProxy $ evalStateK empty $ runMaybeK $
traverseTree "/tmp" >-> putStrLnD
Laziness means that if you only demand 3 files, it will only traverse the tree as much as necessary to generate three files, then it will stop:
main = runProxy $ evalStateK empty $ runMaybeK $
traverseTree "/tmp" >-> takeB_ 3 >-> putStrLnD
If you want to learn more about the pipes library, then I recommend you read the tutorial.
Everyone is telling you to use iteratees or pipes or the like, which are the current popular approach. But there's another, classic way to do this! Just use unsafeInterleaveIO from System.IO.Unsafe. All this function of type IO a -> IO a does is modify an IO action so that it only actually performs the IO when the value thunk is demanded, which is exactly what you were asking for. You can use this to write an iterateM with your desired semantics trivially.
Examples like this are where unsafeInterleaveIO shines.
You have, however, I'm sure, noted the "unsafe" in the name -- there are other examples, where you want direct control over filehandles and resource usage or the like, where unsafeInterleaveIO will indeed be bad news, and potentially even introduce violations of referential transparency.
(see this answer for more discussion: When is unsafeInterleaveIO unsafe?)
But again, in a case like this, I think unsafeInterleaveIO is the obvious, correct, and straightforward result.

How can I sort my image sizes on both coordinates?

I have a list of String that are in the form of XxX where X are numbers up to 4 digits long(they are image sizes in (pixels)x(pixels)).
For example:
["192x192","64x84","96x96","64x64","292x192","32x32","64x12"]
Using a function mySort which is just insertion sort that looks only at the number up to x:
mysort [] = []
mysort [x] = [x]
mysort (x:xs) = insert (mysort xs)
where insert [] = [x]
insert (y:ys) | takeUntilX x <= takeUntilX y = x : y : ys
| otherwise = y : insert ys
I get this:
["192x192","292x192","32x32","64x84","64x64","64x12","96x96"]
Which is only partly sorted, all of the sorted "64x**" remaing in they original order but I want them also to be sorted so I get this:
["192x192","292x192","32x32","64x12","64x64","64x84","96x96"]
What would be a better solution - modifying function mySort or writing a new function that sorts the partially sorted list?
Can you give me the the basic idea how I could do either?
import Data.List
import Data.List.Split
res = map (intercalate "x") . sort . map (splitOn "x")
I'm using Data.List.Split from http://hackage.haskell.org/package/split
For future needs in mind, you could also:
1. convert your data to tuples e.g. (64, 64)
2. use builtin sort. It does exatly what you want
I would assume that in the future you will use the data as integers, so converting them as early as possible could save you a lot of trouble in the future.
br,
Juha
EDIT: Corrected -- I've figured out what you meant.
I'd separate out the concerns -- (1) parse the strings to get the dimensions; (2) sort the dimensions however you see fit; (3) convert the dimensions back to strings. In other words:
import List
stringToDim :: String -> (String,String)
stringToDim s = (a,c)
where (a,b) = break (== 'x') s
c = drop 1 b
dimToString :: (String,String) -> String
dimToString (x,y) = x ++ "x" ++ y
dimsort :: [String] -> [String]
dimsort = map dimToString . sort . map stringToDim
Ok, here is the final solution I'm happy with although it isn't what I originaly asked for. Modified copy from max taldykin:
res x = map (intercalate "x") $ map myshow $ sort $ map (readAsInt) $ map (splitOn "x") x
readAsInt [x,y] = [read x :: Int, read y ::Int]
myshow [x,y] = [show x, show y]
input: ["192x192","64x184","96x96","64x64","292x192","32x32","64x12"]
output: ["32x32","64x12","64x64","64x184","96x96","192x192","292x192"]
Although it does't give ["192x192","292x192","32x32","64x12","64x64","64x184","96x96"] it is still ok for what I had in mind.

Haskell mutable map/tree

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.

Resources