Haskell: shared letters between two words - algorithm

I just started learning Haskell. I am trying to get the list of all common letters between two words, for example, for "hello" and "llama" that would be [ 'l', 'l' ], for "happy" and "pay", [ 'a', 'p', 'y' ].
I tried using intersect but I have trouble with duplicates, "happy" and "pay" result in [ 'a', 'p', 'p', 'y' ]. I can't just remove duplicates cause they can exist, as in the first example.
I would be grateful for any suggestions. Thanks!

You can use the multiset package:
Data.MultiSet> fromList "hello" `intersection` fromList "llama"
fromOccurList [('l',2)]
Data.MultiSet> fromList "happy" `intersection` fromList "pay"
fromOccurList [('a',1),('p',1),('y',1)]
The data-ordlist package also offers this functionality:
Data.List Data.List.Ordered> sort "hello" `isect` sort "llama"
"ll"
Data.List Data.List.Ordered> sort "happy" `isect` sort "pay"
"apy"

Here's a nice technique that's worth learning. Suppose you have two sorted lists:
[1,1,5,10,15,15,18]
[2,5,8,10,15,20]
and you want to merge them together into a single sorted list. In Haskell, there's a very elegant way to write this algorithm using pattern matching and guards:
merge (x:xs) (y:ys) | x < y = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys
merge xs [] = xs
merge [] ys = ys
so that:
> merge [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[1,1,2,5,5,8,10,10,15,15,15,18,20]
>
In a nutshell, when both lists are non-empty, it compares the heads of both lists and outputs the smallest head; then it uses recursion to output "the rest".
It could also have been written with the three cases (less, greater, and equal) all made explicit:
merge (x:xs) (y:ys) | x < y = x : merge xs (y:ys)
| x > y = y : merge (x:xs) ys
| otherwise = y : merge (x:xs) ys
merge xs [] = xs
merge [] ys = ys
and this general template can be used to implement a number of interesting algorithms on sorted lists. Here's one that removes common elements, for example:
uncommon (x:xs) (y:ys) | x < y = x : uncommon xs (y:ys)
| x > y = y : uncommon (x:xs) ys
| otherwise = uncommon xs ys
uncommon xs [] = xs
uncommon [] ys = ys
so that:
> uncommon [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[1,1,2,8,15,18,20]
>
You might want to try modifying the uncommon function to create a diff function that outputs the result of removing the elements of the second list from the first. It will require modifying one of the first three guarded cases, and you'll also need to adjust one of the two "empty list" pattern matches:
> diff [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[1,1,15,18]
>
Once you've figured this out, you'll find it easy to create a common function that outputs the shared elements of the two sorted lists to give:
> common [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[5,10,15]
>
Since strings are just lists of characters, this would work for your problem, too, using sort from Data.List to pre-sort the lists:
> import Data.List
> common (sort "hello") (sort "llama")
"ll"
> common (sort "happy") (sort "pay")
"apy"
>

I think this is an ideal case to use Data.Map. I would implement this as follows;
import qualified Data.Map.Lazy as M
sharedLetters :: String -> String -> String
sharedLetters s1 s2 = let cm = foldr (checkMap (\(x,y) -> (x,y+1))) charMap s2
where checkMap f c m = if M.member c m then M.adjust f c m
else M.insert c (f (0,0)) m
charMap = foldr (checkMap (\(x,y) -> (x+1,y))) M.empty s1
in M.foldlWithKey (\r k (v1,v2) -> r ++ replicate (minimum [v1,v2]) k) "" cm
main :: IO String
main = do
putStr "Enter first string :"
s1 <- getLine
putStr "Enter second string :"
s2 <- getLine
return $ sharedLetters s1 s2
Enter first string :happy
Enter second string :pay
"apy"
Enter first string :pay
Enter second string :happy
"apy"
Enter first string :hello
Enter second string :llama
"ll"
Enter first string :llama
Enter second string :hello
"ll"

How about exploiting the fact that every letter shared between the words (allowing duplicates) shows up as a pair of that letter in the set formed from the union of those words? You can find such pairs efficiently by sorting the union set and picking out duplicates -
let find_dups ([]) = []; find_dups (x:y:xs) | x == y = x:find_dups(xs); find_dups (x:xs) = find_dups(xs)
let common_letters word1 word2 = find_dups (sort (word1 ++ word2))
> common_letters "hello" "fellows"
"ello"

Related

a haskell function to test if an integer appears after another integer

I'm writing a function called after which takes a list of integers and two integers as parameters. after list num1 num2 should return True if num1 occurs in the list and num2 occurs in list afternum1. (Not necessarily immediately after).
after::[Int]->Int->Int->Bool
after [] _ _=False
after [x:xs] b c
|x==b && c `elem` xs =True
|x/=b && b `elem` xs && b `elem` xs=True
This is what I have so far,my biggest problem is that I don't know how to force num2 to be after num1.
There's a few different ways to approach this one; while it's tempting to go straight for recursion on this, it's nice to
avoid using recursion explicitly if there's another option.
Here's a simple version using some list utilities. Note that it's a Haskell idiom that the object we're operating over is usually the last argument. In this case switching the arguments lets us write it as a pipeline with it's third argument (the list) passed implicitly:
after :: Int -> Int -> [Int] -> Bool
after a b = elem b . dropWhile (/= a)
Hopefully this is pretty easy to understand; we drop elements of the list until we hit an a, assuming we find one we check if there's a b in the remaining list. If there was no a, this list is [] and obviously there's no b there, so it returns False as expected.
You haven't specified what happens if 'a' and 'b' are equal, so I'll leave it up to you to adapt it for that case. HINT: add a tail somewhere ;)
Here are a couple of other approaches if you're interested:
This is pretty easily handled using a fold;
We have three states to model. Either we're looking for the first elem, or
we're looking for the second elem, or we've found them (in the right order).
data State =
FindA | FindB | Found
deriving Eq
Then we can 'fold' (aka reduce) the list down to the result of whether it matches or not.
after :: Int -> Int -> [Int] -> Bool
after a b xs = foldl go FindA xs == Found
where
go FindA x = if x == a then FindB else FindA
go FindB x = if x == b then Found else FindB
go Found _ = Found
You can also do it recursively if you like:
after :: Int -> Int -> [Int] -> Bool
after _ _ [] = False
after a b (x:xs)
| x == a = b `elem` xs
| otherwise = after a b xs
Cheers!
You can split it into two parts: the first one will find the first occurrence of num1. After that, you just need to drop all elements before it and just check that num2 is in the remaining part of the list.
There's a standard function elemIndex for the first part. The second one is just elem.
import Data.List (elemIndex)
after xs x y =
case x `elemIndex` xs of
Just i -> y `elem` (drop (i + 1) xs)
Nothing -> False
If you'd like to implement it without elem or elemIndex, you could include a subroutine. Something like:
after xs b c = go xs False
where go (x:xs) bFound
| x == b && not (null xs) = go xs True
| bFound && x == c = True
| null xs = False
| otherwise = go xs bFound

Sort a String list by String length

I want to sort a list of String first by the length of the strings, and if the length is the same then it should sort lexically. I thought I could use the Data.List library and write my own compare function that does that. So the compare function should take a list of String as the argument and compare all the the elements (which are Strings). A compare function for Strings would look like this
comp a b
| length a > length b = GT
| length a < length b = LT
How could I address all the list elements with such a function?
First of all, your cmp function does not handle the case where the lengths are equal: you need to add that. Otherwise you'll get an runtime pattern match error:
comp a b
| length a > length b = GT
| length a < length b = LT
| otherwise = undefined -- TODO
also, note that this implementation sometimes computes the length twice, but it's likely that GHC optimizes this one away on its own, and we'll get to solving this later on more fundamentally anyway.
Then, once you've fixed your comp, all you need to do is pass it to Data.List.sortBy together with the list of strings you want to sort. An ipmplementation like that is provided below (<$> is the operator alias of fmap which works the same as map does on lists).
However, there's a better solution where you first compute the length of all elements in the list, by mapping each of the elements into a pair where the first member is the original string and the second one is its length. You then use a modified comp function that takes 2 pairs instead of just 2 strings, but otherwise behaves the same as your original comp. However, you then need to map the intermediate list back to just containing the strings (which is what the fst <$> is for, which is equivalent to map fst but, again, uses the, IMO nicer looking, <$> opetator).
So the somewhat naive solution would be:
sortByLenOrLex :: [String] -> [String]
sortByLenOrLex as = sortBy cmp as where
cmp a b | n > m = GT
| n < m = LT
| otherwise = compare a b
where n = length a
m = length b
and the more efficient one, as leftaroundabout points out, would be:
sortByLenOrLex' :: [String] -> [String]
sortByLenOrLex' as = fst <$> sortBy cmp (addLen <$> as) where
cmp (a,n) (b,m) | n > m = GT
| n < m = LT
| otherwise = compare a b
addLen x = (x, length x)
where the list is first amended with the lengths of each of its elements, so as to avoid duplicate, expensive length calls.
EDIT: please see chi's answer for a much nicer implementation of this algorithm!
Furthermore:
You can make your functions generic by making them operate on lists of lists of Ord:
sortByLenOrLex'' :: Ord a => [[a]] -> [[a]]
sortByLenOrLex'' as = fst <$> sortBy cmp (addLen <$> as) where
cmp (a,n) (b,m) | n > m = GT
| n < m = LT
| otherwise = compare a b
addLen x = (x, length x)
this gives you:
*Main> sortByLenOrLex'' [[1,2], [1,3], [1,2,3]]
[[1,2],[1,3],[1,2,3]]
...and if you want to make it as generic as possible, you can sort lists of Foldable of Ord:
sortByLenOrLex''' :: (Foldable f, Ord a) => [f a] -> [f a]
sortByLenOrLex''' as = unamend <$> sortBy cmp (amend <$> as) where
cmp (a,n,a') (b,m,b') | n > m = GT
| n < m = LT
| otherwise = compare a' b'
amend x = (x, length x, toList x)
unamend (x,_,_) = x
this gives you:
*Main> sortByLenOrLex''' [Just 3, Just 4, Just 3, Nothing]
[Nothing,Just 3,Just 3,Just 4]
*Main> sortByLenOrLex''' [(4,1),(1,1),(1,2),(1,1),(3,1)]
[(4,1),(1,1),(1,1),(3,1),(1,2)]
*Main> sortByLenOrLex''' [Left "bla", Right "foo", Right "foo", Right "baz"]
[Left "bla",Right "baz",Right "foo",Right "foo"]
*Main> sortByLenOrLex''' [(3,"hello"),(2,"goodbye"),(1,"hello")]
[(2,"goodbye"),(3,"hello"),(1,"hello")]
A variant of #Erik's solution, using some combinators from the library:
import Data.List
import Control.Arrow
sortByLen = map snd . sort . map (length &&& id)
This is essentially a Schwartzian transform.

Writing infinite list to skip every factor of p?

How can I efficiently represent the list [0..] \\ [t+0*p, t+1*p ..]?
I have defined:
Prelude> let factors p t = [t+0*p, t+1*p ..]
I want to efficiently represent an infinite list that is the difference of [0..] and factors p t, but using \\ from Data.List requires too much memory for even medium-sized lists:
Prelude Data.List> [0..10000] \\ (factors 5 0)
<interactive>: out of memory
I know that I can represent the values between t+0*p and t+1*p with:
Prelude> let innerList p1 p2 t = [t+p1+1, t+p1+2 .. t+p2-1]
Prelude> innerList 0 5 0
[1,2,3,4]
However, repeatedly calculating and concatenating innerList for increasing intervals seems clumsy.
Can I efficiently represent [0..] \\ (factors p t) without calculating rem or mod for each element?
For the infinite list [0..] \\ [t,t+p..],
yourlist t p = [0..t-1] ++ [i | m <- [0,p..], i <- [t+m+1..t+m+p-1]]
Of course this approach doesn't scale, at all, if you'd want to remove some other factors, like
[0..] \\ [t,t+p..] \\ [s,s+q..] \\ ...
in which case you'll have to remove them in sequence with minus, mentioned in Daniel Fischer's answer. There is no magic bullet here.
But there's also a union, with which the above becomes
[0..] \\ ( [t,t+p..] `union` [s,s+q..] `union` ... )
the advantage is, we can arrange the unions in a tree, and get algorithmic improvement.
You can't use (\\) for that, because
(\\) :: (Eq a) => [a] -> [a] -> [a]
(\\) = foldl (flip delete)
the list of elements you want to remove is infinite, and a left fold never terminates when the list it folds over is infinite.
If you rather want to use something already written than write it yourself, you can use minus from the data-ordlist package.
The performance should be adequate.
Otherwise,
minus :: Ord a => [a] -> [a] -> [a]
minus xxs#(x:xs) yys#(y:ys)
| x < y = x : minus xs yys
| x == y = minus xs ys
| otherwise = minus xss ys
minus xs _ = xs
You can use a list comprehesion with a predicate, using rem:
>>> let t = 0
>>> let p = 5
>>> take 40 $ [ x | x <- [1..], x `rem` p /= t ]
[1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19,21,22,23,24,26,27,28,29,31,32,33,34,36,37,38,39,41,42,43,44,46,47,48,49]
If you want efficiency, why does your solution have to use list comprehension syntax?
Why not something like this?
gen' n i p | i == p = gen' (n + p) 1 p
gen' n i p = (n+i) : gen' n (i+1) p
gen = gen' 0 1
and then do
gen 5
Because you have ascending lists, you can simply lazily merge them:
nums = [1..]
nogos = factors p t
result = merge nums (dropWhile (<head nums) nogos) where
merge (a:as) (b:bs)
| a < b = a : merge as (b:bs)
| a == b = merge as bs
| otherwise = error "should not happen"
Writing this in a general way so that we have a function that builds the difference of two infinite lists, provided only that they are in ascending order, is left as exercise. In the end, the following should be possible
[1..] `infiniteDifference` primes `infiniteDifference` squares
For this, make it a left associative operator.

Removing elements in a functional style

I have been struggling with something that looks like a simple algorithm, but can't find a clean way to express it in a functional style so far. Here is an outline of the problem: suppose I have 2 arrays X and Y,
X = [| 1; 2; 2; 3; 3 |]
Y = [| 5; 4; 4; 3; 2; 2 |]
What I want is to retrieve the elements that match, and the unmatched elements, like:
matched = [| 2; 2; 3 |]
unmatched = [| 1; 3 |], [| 4; 4; 5 |]
In pseudo-code, this is how I would think of approaching the problem:
let rec match matches x y =
let m = find first match from x in y
if no match, (matches, x, y)
else
let x' = remove m from x
let y' = remove m from y
let matches' = add m to matches
match matches' x' y'
The problem I run into is the "remove m from x" part - I can't find a clean way to do this (I have working code, but it's ugly as hell). Is there a nice, idiomatic functional way to approach that problem, either the removal part, or a different way to write the algorithm itself?
This could be solved easily using the right data structures, but in case you wanted to do it manually, here's how I would do it in Haskell. I don't know F# well enough to translate this, but I hope it is similar enough. So, here goes, in (semi-)literate Haskell.
overlap xs ys =
I start by sorting the two sequences to get away from the problem of having to know about previous values.
go (sort xs) (sort ys)
where
The two base cases for the recursion are easy enough to handle -- if either list is empty, the result includes the other list in the list of elements that are not overlapping.
go xs [] = ([], (xs, []))
go [] ys = ([], ([], ys))
I then inspect the first elements in each list. If they match, I can be sure that the lists overlap on that element, so I add that to the included elements, and I let the excluded elements be. I continue the search for the rest of the list by recursing on the tails of the lists.
go (x:xs) (y:ys)
| x == y = let ( included, excluded) = go xs ys
in (x:included, excluded)
Then comes the interesting part! What I essentially want to know is if the first element of one of the lists does not exist in the second list – in that case I should add it to the excluded lists and then continue the search.
| x < y = let (included, ( xex, yex)) = go xs (y:ys)
in (included, (x:xex, yex))
| y < x = let (included, ( xex, yex)) = go (x:xs) ys
in (included, ( xex, y:yex))
And this is actually it. It seems to work for at least the example you gave.
> let (matched, unmatched) = overlap x y
> matched
[2,2,3]
> unmatched
([1,3],[4,4,5])
It seems that you're describing multiset (bag) and its operations.
If you use the appropriate data structures, operations are very easy to implement:
// Assume that X, Y are initialized bags
let matches = X.IntersectWith(Y)
let x = X.Difference(Y)
let y = Y.Difference(X)
There's no built-in Bag collection in .NET framework. You could use Power Collection library including Bag class where the above function signature is taken.
UPDATE:
You can represent a bag by a weakly ascending list. Here is an improved version of #kqr's answer in F# syntax:
let overlap xs ys =
let rec loop (matches, ins, outs) xs ys =
match xs, ys with
// found a match
| x::xs', y::ys' when x = y -> loop (x::matches, ins, outs) xs' ys'
// `x` is smaller than every element in `ys`, put `x` into `ins`
| x::xs', y::ys' when x < y -> loop (matches, x::ins, outs) xs' ys
// `y` is smaller than every element in `xs`, put `y` into `outs`
| x::xs', y::ys' -> loop (matches, ins, y::outs) xs ys'
// copy remaining elements in `xs` to `ins`
| x::xs', [] -> loop (matches, x::ins, outs) xs' ys
// copy remaining elements in `ys` to `outs`
| [], y::ys' -> loop (matches, ins, y::outs) xs ys'
| [], [] -> (List.rev matches, List.rev ins, List.rev outs)
loop ([], [], []) (List.sort xs) (List.sort ys)
After two calls to List.sort, which are probably O(nlogn), finding matches is linear to the sum of the lengths of two lists.
If you need a quick-and-dirty bag module, I would suggest a module signature like this:
type Bag<'T> = Bag of 'T list
module Bag =
val count : 'T -> Bag<'T> -> int
val insert : 'T -> Bag<'T> -> Bag<'T>
val intersect : Bag<'T> -> Bag<'T> -> Bag<'T>
val union : Bag<'T> -> Bag<'T> -> Bag<'T>
val difference : Bag<'T> -> Bag<'T> -> Bag<'T>

Most efficient way to create the Data.Set of all pairs of elements in a Set?

Given an arbitrary set holding an arbitrary number of elements of arbitrary type, e.g.
mySet1 = Set.fromList [1,2,3,4]
or
mySet2 = Set.fromList ["a","b","c","d"]
or
mySet3 = Set.fromList [A, B, C, D]
for some data constructors A, B, C, D, ...
What is the computationally most efficient way to generate the set of all unordered pairs of elements is the given set? I.e.
setPairs mySet1 == Set.fromList [(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
or
setPairs mySet2 == fromList [ ("a","b")
, ("a","c")
, ("a","d")
, ("b","c")
, ("b","d")
, ("c","d") ]
or
setPairs mySet2 == fromList [ (A,B)
, (A,C)
, (A,D)
, (B,C)
, (B,D)
, (C,D) ]
My initial naive guess would be:
setPairs s = fst $ Set.fold
(\e (pairAcc, elementsLeft) ->
( Set.fold
(\e2 pairAcc2 ->
Set.insert (e2, e) pairAcc2
) pairAcc $ Set.delete e elementsLeft
, Set.delete e elementsLeft )
) (Set.empty, s) s
but surely that cannot be the best solution?
Benchmarking might prove me wrong, but my suspicion is that there's no win in staying in the set representation. You're going to need O(n^2) regardless, because that's the size of the output. The key advantage would be producing your list such that you could use a call to S.fromDistinctAscList such that it only costs O(n) to build the set itself.
The following is pretty clean, preserves a fair amount of sharing, and is generally the simplest, most straightforward and intuitive solution I can imagine.
pairs s = S.fromDistinctAscList . concat $ zipWith zip (map (cycle . take 1) ts) (drop 1 ts)
where ts = tails $ S.toList s
Edit
Shorter/clearer (not sure performancewise, but probably as good/better):
pairs s = S.fromDistinctAscList [(x,y) | (x:xt) <- tails (S.toList s), y <- xt]
At first, you need to generate all sets. replicateM from Control.Monad helps with it.
λ> replicateM 2 [1..4]
[[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[2,4],[3,1],[3,2],[3,3],[3,4],[4,1],[4,2],[4,3],[4,4]]
Then you need to filter pairs, where second element is greater than first
λ> filter (\[x,y] -> x < y) $ replicateM 2 [1 .. 4]
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Finally, you need to convert every list in a tuple
λ> map (\[x,y] -> (x,y)) $ filter (\[x,y] -> x < y) $ replicateM 2 [1 .. 4]
[(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
Then we can formulate it into function pairs:
import Data.Set
import Control.Monad
import Data.List
mySet = Data.Set.fromList [1,2,3,4]
--setOfPairs = Data.Set.fromList [(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
setOfPairs = Data.Set.fromList $ pairs mySet
pairs :: Ord a => Set a -> [(a,a)]
pairs x = Data.List.map (\[x,y] -> (x,y)) $ Data.List.filter (\[x,y] -> x < y) $ replicateM 2 $ toList x
So, if I got you question right, you can use pairs mySet, where pairs generate the list of all unordered pairs of mySet.
Is it what you want?
UPD:
List comprehension could be more clear and fast technique to create such sublists, so here is another instance of pairs:
pairs :: Ord a => Set a -> [(a,a)]
pairs set = [(x,y) | let list = toList set, x <- list, y <- list, x < y]
So here is a first stab at a solution using conversion back and forth to a list. Again, I am not sure this is the fastest way to do this but I do know that iteration over sets it's not terribly efficient.
import Data.List
import qualified Data.Set as S
pairs :: S.Set String -> S.Set (String,String)
pairs s = S.fromList $ foldl' (\st e -> (zip l e) ++ st) [] ls
where (l:ls) = tails $ S.toList s
By folding zip over the tails, you get a nice and efficient way to create the set of unordered pairs. However, instinct encourages me that there may be a monadic filterM or foldM solution that's even more elegant. I will keep thinking.
[EDIT]
So here is what should be [but is not on account of the size of the powerset] a faster solution that does not require a toList.
import Data.List
import qualified Data.Set as S
import qualified Data.Foldable as F
pairs :: (Ord a) => S.Set a -> S.Set (a,a)
pairs s = S.fromList $ foldl two [] $ F.foldlM (\st e -> [[e]++st,st]) [] s
where two st (x:xa:[]) = (x,xa) : st
two st _ = st
Uses the power-set solution over monadic lists to build the powerset and then filter out the pairs. I can go into more detail if necessary.

Resources