Haskell: Multi-property sort is not generic enough - sorting

I was trying to implement a ranked multi-property sort that works on lists of anything.
import Data.Ord (Ordering, Down (..), comparing)
import Data.List (sortBy)
import Data.Monoid (mconcat)
data Order a = ASC a | DESC a
orderBy :: Ord b => [Order (a -> b)] -> [a] -> [a]
orderBy rankedProperties unsorted =
sortBy rankedCompare unsorted
where
rankedCompare x y =
mconcat $ map
(\property ->
case property of
ASC f -> comparing f x y
DESC f -> comparing (Down . f) x y
) rankedProperties
It works now for tuples and records, however I've discovered a problem. The problem is that the b in orderBy has to be the same. That is consider this:
data Row = Row { shortListed :: Bool, cost :: Float, distance1 :: Int, distance2 :: Int } deriving (Show, Eq)
I want to be able to say: orderBy [ASC shortListed, DESC cost] listofrows.
But the error that comes back is:
<interactive>:1:31:
Couldn't match type ‘Float’ with ‘Bool’
Expected type: Row -> Bool
Actual type: Row -> Float
In the first argument of ‘ASC’, namely ‘cost’
In the expression: ASC cost
I need a way to make the b type generic, as the b only really has to be acceptable by the comparing function comparing :: Ord a => (b -> a) -> b -> b -> Ordering.
I've read a little bit about existential types and heterogenous lists, but I'm not sure how to proceed.

Since we have Monoid Ordering and instance Monoid b => Monoid (a -> b) in Prelude, we also have Monoid (a -> a -> Ordering) by iterating the function instance twice. This lets us solve the issue quite simply, without existentials:
import Data.Ord (Ordering, comparing)
import Data.List (sortBy)
import Data.Monoid ((<>), mconcat)
data Row = Row {
shortListed :: Bool,
cost :: Float,
distance1 :: Int,
distance2 :: Int
} deriving (Show, Eq)
asc, desc :: Ord b => (a -> b) -> a -> a -> Ordering
asc = comparing
desc = flip . asc
list :: [Row]
list = [Row False 0 10 20, Row True 10 30 40]
list' :: [Row]
list' = sortBy (asc shortListed <> desc cost <> asc distance1) list
Alternatively:
orderBy :: [a -> a -> Ordering] -> [a] -> [a]
orderBy = sortBy . mconcat
list'' :: [Row]
list'' = orderBy [asc shortListed, desc cost, asc distance1] list

Related

Haskell - Sort by first second element and then by first element

I have a list of tuples and I would like to sort it by second element (descending) and then by first element (ascending).
My code looks like this:
sortedOcc :: Eq a => [a] -> [(a, Int)]
sortedOcc = sortBy (flip compare `on` snd) . occurences
and this is the first sorting by the second element of list returned by occurences (function). How should I add the second sort (ascending) by the first element?
The Data.Ord module provides a Down newtype whose purpose is solely to reverse the ordering.
It also provides a comparing function:
comparing :: Ord a => (b -> a) -> b -> b -> Ordering
which must be fed some transformation function before it can be passed to sortBy.
Like this:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> sortBy (comparing (\(a,v) -> (Down v, a))) [(1,2),(1,3),(5,2),(5,3)]
[(1,3),(5,3),(1,2),(5,2)]
λ>
The values returned by the transformation function are then sorted using their own “natural” order. In our case, this is the lexicographic order on pairs of ordered types.
Overall, the code would require an Ord a constraint:
sortedOcc :: Ord a => [a] -> [(a, Int)]
sortedOcc = sortBy (comparing (\(a,v) -> (Down v, a))) . occurences
I'd probably write this using the Monoid instance on Ordering and on function types.
Sorting on the second value in the tuple looks like flip compare `on` snd, as you've already determined, while sorting on the first value looks like compare `on` fst.
These can be combined Monoidally with <>.
d :: [(String , Int)]
d = [("b", 1), ("a", 1), ("c",3), ("d",4)]
sortedD = sortBy ((flip compare `on` snd) <> (compare `on` fst)) d
I know that the rest of the answers are shorter, but I recommend you to implement these lazy functions yourself before using the already Haskell implemented ones, so you understand how it works.
-- Order a list of tuples by the first item
orderBy1stTupleItem :: Ord a => (a, b1) -> (a, b2) -> Ordering
orderBy1stTupleItem tup1 tup2
| item1 > item2 = GT
| item1 < item2 = LT
| otherwise = EQ
where
item1 = fst tup1
item2 = fst tup2
-- Order a list of tuples by the second item
orderBy2ndTupleItem :: Ord a1 => (a2, a1) -> (a3, a1) -> Ordering
orderBy2ndTupleItem tup1 tup2
| item1 > item2 = GT
| item1 < item2 = LT
| otherwise = EQ
where
item1 = snd tup1
item2 = snd tup2
-- Wrapper Function: Order a list of tuples by the first item and later by the second item
orderTuplesBy1stThenBy2ndItem :: (Ord a1, Ord a2) => [(a2, a1)] -> [(a2, a1)]
orderTuplesBy1stThenBy2ndItem listTuples =
sortBy orderBy2ndTupleItem (sortBy orderBy1stTupleItem listTuples)
Example
let exampleListTuples = [(1,2),(0,8),(6,1),(3,6),(9,1),(7,8),(0,9)]
Then let's get the 1st list, ordered by the first item of each tuple:
> listOrderedByTuple1stItem = sortBy orderBy1stTupleItem exampleListTuples
> listOrderedByTuple1stItem
[(0,8),(0,9),(1,2),(3,6),(6,1),(7,8),(9,1)]
Now we order this result list by the second item of each tuple
> sortBy orderBy2ndTupleItem listOrderedByTuple1stItem
[(6,1),(9,1),(1,2),(3,6),(0,8),(7,8),(0,9)]
Or, you can just run the wrapper function orderTuplesBy1stThenBy2ndItem as follows:
> sortBy orderTuplesBy1stThenBy2ndItem exampleListTuples
What is sortBy's signature?
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
This means that its first argument must have the type a -> a -> Ordering:
sortedOcc :: Eq a => [a] -> [(a, Int)]
sortedOcc = sortBy g . occurences
g :: a -> a -> Ordering
g = (flip compare `on` snd)
but that means that
g :: a -> a -> Ordering
g x y = (flip compare `on` snd) x y
= flip compare (snd x) (snd y)
= compare (snd y) (snd x)
and so to add your requirement into the mix we simply have to write it down,
= let test1 = compare (snd y) (snd x)
test2 = compare (snd y) (snd x)
in ......
right?
The above intentionally contains errors, which should be straightforward for you to fix.
A word of advice, only use point-free code if it is easy and natural for you to read and write, and modify.

Why is my Haskell selection sort implementation extremely fast?

I implemented selection sort and compared it to Data.List's sort. It is orders of magnitudes faster than Data.List's sort. If I apply it to 10,000 randomly generated numbers the results are as follows:
✓ in 1.22µs: Selection sort
✓ in 9.84ms: Merge sort (Data.List)
This can't be right. First I thought maybe merge sort's intermediate results are cached and selection sort uses those to be much faster. Even when I comment out merge sort and only time selection sort, it is this fast however. I also verified the output and it is correctly sorted.
What causes this behaviour?
I use this code to test:
{-# LANGUAGE BangPatterns #-}
module Lib
( testSortingAlgorithms
) where
import System.Random (randomRIO)
import Text.Printf
import Control.Exception
import System.CPUTime
import Data.List (sort, sortOn)
selectionSort :: Ord a => [a] -> [a]
selectionSort [] = []
selectionSort nrs =
let (smallest, rest) = getSmallest nrs
in smallest : selectionSort rest
where getSmallest :: Ord a => [a] -> (a, [a])
getSmallest [a] = (a, [])
getSmallest (a:as) = let (smallest, rest) = getSmallest as
in if smallest > a then (a, smallest : rest)
else (smallest, a : rest)
main :: IO ()
main = testSortingAlgorithms
testSortingAlgorithms :: IO ()
testSortingAlgorithms = do
!list' <- list (10000)
results <- mapM (timeIt list') sorts
let results' = sortOn fst results
mapM_ (\(diff, msg) -> printf (msg) (diff::Double)) results'
return ()
sorts :: Ord a => [(String, [a] -> [a])]
sorts = [
("Selection sort", selectionSort)
, ("Merge sort (Data.List)", sort)
]
list :: Int -> IO [Int]
list n = sequence $ replicate n $ randomRIO (-127,127::Int)
timeIt :: (Ord a, Show a)
=> [a] -> (String, [a] -> [a]) -> IO (Double, [Char])
timeIt vals (name, sorter) = do
start <- getCPUTime
--v <- sorter vals `seq` return ()
let !v = sorter vals
--putStrLn $ show v
end <- getCPUTime
let (diff, ext) = unit $ (fromIntegral (end - start)) / (10^3)
let msg = if correct v
then (" ✓ in %0.2f" ++ ext ++ ": " ++ name ++ "\n")
else (" ✗ in %0.2f" ++ ext ++ ": " ++ name ++ "\n")
return (diff, msg)
correct :: (Ord a) => [a] -> Bool
correct [] = True
correct (a:[]) = True
correct (a1:a2:as) = a1 <= a2 && correct (a2:as)
unit :: Double -> (Double, String)
unit v | v < 10^3 = (v, "ns")
| v < 10^6 = (v / 10^3, "µs")
| v < 10^9 = (v / 10^6, "ms")
| otherwise = (v / 10^9, "s")
You write
let !v = sorter vals
which is "strict", but only to WHNF. So you are only timing how long it takes to find the smallest element of the list, not how long it takes to sort the whole thing. Selection sort starts by doing exactly that, so it is "optimal" for this incorrect benchmark, while mergesort does a bunch more work that's "wasted" if you only look at the first element.

How to change Ord instance for tuple of type (String, Int) without declaring a new data?

I'm trying to sort a list of type [(String, Int)]. By default, it is sorting by Strings and then Ints (if the Strings are equal). I want it to be the opposite — first, compare Ints and then if equal compare the Strings. Additionally, I don't want to switch to [(Int, String)].
I found a way to do so by defining an instance, but it only works for my own data type, which I don't want to use.
You can sort with sortBy :: (a -> a -> Ordering) -> [a] -> [a]:
import Data.List(sortBy)
import Data.Ord(comparing)
import Data.Tuple(swap)
orderSwap :: (Ord a, Ord b) => [(a, b)] -> [(a, b)]
orderSwap = sortBy (comparing swap)
or with sortOn :: Ord b => (a -> b) -> [a] -> [a]:
import Data.List(sortOn)
import Data.Ord(comparing)
import Data.Tuple(swap)
orderSwap :: (Ord a, Ord b) => [(a, b)] -> [(a, b)]
orderSwap = sortOn swap
Or we can just perform two swaps and sort the intermediate result:
import Data.Tuple(swap)
orderSwap :: (Ord a, Ord b) => [(a, b)] -> [(a, b)]
orderSwap = map swap . sort . map swap
This is of course not the "standard ordering". If you want to define an inherent order differently than one that is derived by the instances already defined, you should define your own type.
For example with:
newtype MyType = MyType (String, Int) deriving Eq
instance Ord MyType where
compare (MyType a) (MyType b) = comparing swap a b
The newtype is the usual way to define a new instance for an existing type.
Although you will still need to have a constructor for a newtype type there is no computational overhead. The compiler will remove the newtype wrapper opposed to defining a type with data.

composing two comparison functions?

I'd like to sort by one property and then by another (if the first property is the same.)
What's the idiomatic way in Haskell of composing two comparison functions, i.e. a function used with sortBy?
Given
f :: Ord a => a -> a -> Ordering
g :: Ord a => a -> a -> Ordering
composing f and g would yield:
h x y = case v of
EQ -> g x y
otherwise -> v
where v = f x y
vitus points out the very cool instance of Monoid for Ordering. If you combine it with the instance instance Monoid b => Monoid (a -> b) it turns out your composition function is just (get ready):
mappend
Check it out:
Prelude Data.Monoid> let f a b = EQ
Prelude Data.Monoid> let g a b = LT
Prelude Data.Monoid> :t f `mappend` g
f `mappend` g :: t -> t1 -> Ordering
Prelude Data.Monoid> (f `mappend` g) undefined undefined
LT
Prelude Data.Monoid> let f a b = GT
Prelude Data.Monoid> (f `mappend` g) undefined undefined
GT
+1 for powerful and simple abstractions
You can use the <> operator. In this example bigSort sorts string by their numerical value, first comparing length and then comparing lexicographically.
import Data.List (sortBy)
import Data.Ord (compare, comparing)
bigSort :: [String] -> [String]
bigSort = sortBy $ (comparing length) <> compare
Example:
bigSort ["31415926535897932384626433832795","1","3","10","3","5"] =
["1","3","3","5","10","31415926535897932384626433832795"]
<> is an alias of mappend from the Data.Monoid module (see jberryman answer).
The (free) book Learn You a Haskell for Great Good! explains how it works here in Chapter 11
instance Monoid Ordering where
mempty = EQ
LT `mappend` _ = LT
EQ `mappend` y = y
GT `mappend` _ = GT
The instance is set up like this: when we mappend two Ordering values, the one on the left is kept, unless the value on the left is EQ, in which case the right one is the result. The identity is EQ.

First non-repeating char in a string ? in haskell or F#

Given a sequence of char what is the most efficient way to find the first non repeating char
Interested purely functional implementation haskell or F# preffered.
A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)
The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.
A simple F# solution:
let f (s: string) =
let n = Map(Seq.countBy id s)
Seq.find (fun c -> n.[c] = 1) s
Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.
open System
open System.IO
open System.Collections.Generic
let Solve (str : string) =
let arrStr = str.ToCharArray()
let sorted = Array.sort arrStr
let len = str.Length - 1
let rec Inner i =
if i = len + 1 then
'-'
else
let index = Array.BinarySearch(sorted, arrStr.[i])
if index = 0 && sorted.[index+1] <> sorted.[index] then
arrStr.[i]
elif index = len && sorted.[index-1] <> sorted.[index] then
arrStr.[i]
elif index > 0 && index < len &&
sorted.[index+1] <> sorted.[index] &&
sorted.[index-1] <> sorted.[index] then
arrStr.[i]
else
Inner (i + 1)
Inner 0
let _ =
printfn "%c" (Solve "abcdefabcf")
A - means all characters are repeated.
Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.
Here's a bit longish solution, but guaranteed to be worst-case O(n log n):
import List
import Data.Ord.comparing
sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)
index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]
dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
dropRepeated $ dropWhile ((==(fst x)).fst) xs
| otherwise =
x:(dropRepeated xs)
nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index
firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs
The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.
The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).
Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...
An alternate Haskell O(n log n) solution using Data.Map and no sorting:
module NonRepeat (
firstNonRepeat
)
where
import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)
data Occurance = Occ { first :: Int, count :: Int }
deriving (Eq, Ord)
note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)
combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)
firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
where occurances = toList . fromListWith combine . zipWith note [0..]
findMinimum = safeMinimum . filter ((== 1).count.snd)
safeMinimum [] = Nothing
safeMinimum xs = Just $ minimumBy (comparing snd) xs
let firstNonRepeating (str:string) =
let rec inner i cMap =
if i = str.Length then
cMap
|> Map.filter (fun c (count, index) -> count = 1)
|> Map.toSeq
|> Seq.minBy (fun (c, (count, index)) -> index)
|> fst
else
let c = str.[i]
let value = if cMap.ContainsKey c then
let (count, index) = cMap.[c]
(count + 1, index)
else
(1, i)
let cMap = cMap.Add(c, value)
inner (i + 1) cMap
inner 0 (Map.empty)
Here is a simpler version that sacrifices speed.
let firstNonRepeating (str:string) =
let (c, count) = str
|> Seq.countBy (fun c -> c)
|> Seq.minBy (fun (c, count) -> count)
if count = 1 then Some c else None
How about something like this:
let firstNonRepeat s =
let repeats =
((Set.empty, Set.empty), s)
||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
|> snd
s
|> Seq.tryFind (fun c -> not (Set.contains c repeats))
This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):
static char FstNonRepeatedChar(string s)
{
return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}

Resources