Writing infinite list to skip every factor of p? - performance

How can I efficiently represent the list [0..] \\ [t+0*p, t+1*p ..]?
I have defined:
Prelude> let factors p t = [t+0*p, t+1*p ..]
I want to efficiently represent an infinite list that is the difference of [0..] and factors p t, but using \\ from Data.List requires too much memory for even medium-sized lists:
Prelude Data.List> [0..10000] \\ (factors 5 0)
<interactive>: out of memory
I know that I can represent the values between t+0*p and t+1*p with:
Prelude> let innerList p1 p2 t = [t+p1+1, t+p1+2 .. t+p2-1]
Prelude> innerList 0 5 0
[1,2,3,4]
However, repeatedly calculating and concatenating innerList for increasing intervals seems clumsy.
Can I efficiently represent [0..] \\ (factors p t) without calculating rem or mod for each element?

For the infinite list [0..] \\ [t,t+p..],
yourlist t p = [0..t-1] ++ [i | m <- [0,p..], i <- [t+m+1..t+m+p-1]]
Of course this approach doesn't scale, at all, if you'd want to remove some other factors, like
[0..] \\ [t,t+p..] \\ [s,s+q..] \\ ...
in which case you'll have to remove them in sequence with minus, mentioned in Daniel Fischer's answer. There is no magic bullet here.
But there's also a union, with which the above becomes
[0..] \\ ( [t,t+p..] `union` [s,s+q..] `union` ... )
the advantage is, we can arrange the unions in a tree, and get algorithmic improvement.

You can't use (\\) for that, because
(\\) :: (Eq a) => [a] -> [a] -> [a]
(\\) = foldl (flip delete)
the list of elements you want to remove is infinite, and a left fold never terminates when the list it folds over is infinite.
If you rather want to use something already written than write it yourself, you can use minus from the data-ordlist package.
The performance should be adequate.
Otherwise,
minus :: Ord a => [a] -> [a] -> [a]
minus xxs#(x:xs) yys#(y:ys)
| x < y = x : minus xs yys
| x == y = minus xs ys
| otherwise = minus xss ys
minus xs _ = xs

You can use a list comprehesion with a predicate, using rem:
>>> let t = 0
>>> let p = 5
>>> take 40 $ [ x | x <- [1..], x `rem` p /= t ]
[1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19,21,22,23,24,26,27,28,29,31,32,33,34,36,37,38,39,41,42,43,44,46,47,48,49]

If you want efficiency, why does your solution have to use list comprehension syntax?
Why not something like this?
gen' n i p | i == p = gen' (n + p) 1 p
gen' n i p = (n+i) : gen' n (i+1) p
gen = gen' 0 1
and then do
gen 5

Because you have ascending lists, you can simply lazily merge them:
nums = [1..]
nogos = factors p t
result = merge nums (dropWhile (<head nums) nogos) where
merge (a:as) (b:bs)
| a < b = a : merge as (b:bs)
| a == b = merge as bs
| otherwise = error "should not happen"
Writing this in a general way so that we have a function that builds the difference of two infinite lists, provided only that they are in ascending order, is left as exercise. In the end, the following should be possible
[1..] `infiniteDifference` primes `infiniteDifference` squares
For this, make it a left associative operator.

Related

Haskell: shared letters between two words

I just started learning Haskell. I am trying to get the list of all common letters between two words, for example, for "hello" and "llama" that would be [ 'l', 'l' ], for "happy" and "pay", [ 'a', 'p', 'y' ].
I tried using intersect but I have trouble with duplicates, "happy" and "pay" result in [ 'a', 'p', 'p', 'y' ]. I can't just remove duplicates cause they can exist, as in the first example.
I would be grateful for any suggestions. Thanks!
You can use the multiset package:
Data.MultiSet> fromList "hello" `intersection` fromList "llama"
fromOccurList [('l',2)]
Data.MultiSet> fromList "happy" `intersection` fromList "pay"
fromOccurList [('a',1),('p',1),('y',1)]
The data-ordlist package also offers this functionality:
Data.List Data.List.Ordered> sort "hello" `isect` sort "llama"
"ll"
Data.List Data.List.Ordered> sort "happy" `isect` sort "pay"
"apy"
Here's a nice technique that's worth learning. Suppose you have two sorted lists:
[1,1,5,10,15,15,18]
[2,5,8,10,15,20]
and you want to merge them together into a single sorted list. In Haskell, there's a very elegant way to write this algorithm using pattern matching and guards:
merge (x:xs) (y:ys) | x < y = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys
merge xs [] = xs
merge [] ys = ys
so that:
> merge [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[1,1,2,5,5,8,10,10,15,15,15,18,20]
>
In a nutshell, when both lists are non-empty, it compares the heads of both lists and outputs the smallest head; then it uses recursion to output "the rest".
It could also have been written with the three cases (less, greater, and equal) all made explicit:
merge (x:xs) (y:ys) | x < y = x : merge xs (y:ys)
| x > y = y : merge (x:xs) ys
| otherwise = y : merge (x:xs) ys
merge xs [] = xs
merge [] ys = ys
and this general template can be used to implement a number of interesting algorithms on sorted lists. Here's one that removes common elements, for example:
uncommon (x:xs) (y:ys) | x < y = x : uncommon xs (y:ys)
| x > y = y : uncommon (x:xs) ys
| otherwise = uncommon xs ys
uncommon xs [] = xs
uncommon [] ys = ys
so that:
> uncommon [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[1,1,2,8,15,18,20]
>
You might want to try modifying the uncommon function to create a diff function that outputs the result of removing the elements of the second list from the first. It will require modifying one of the first three guarded cases, and you'll also need to adjust one of the two "empty list" pattern matches:
> diff [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[1,1,15,18]
>
Once you've figured this out, you'll find it easy to create a common function that outputs the shared elements of the two sorted lists to give:
> common [1,1,5,10,15,15,18] [2,5,8,10,15,20]
[5,10,15]
>
Since strings are just lists of characters, this would work for your problem, too, using sort from Data.List to pre-sort the lists:
> import Data.List
> common (sort "hello") (sort "llama")
"ll"
> common (sort "happy") (sort "pay")
"apy"
>
I think this is an ideal case to use Data.Map. I would implement this as follows;
import qualified Data.Map.Lazy as M
sharedLetters :: String -> String -> String
sharedLetters s1 s2 = let cm = foldr (checkMap (\(x,y) -> (x,y+1))) charMap s2
where checkMap f c m = if M.member c m then M.adjust f c m
else M.insert c (f (0,0)) m
charMap = foldr (checkMap (\(x,y) -> (x+1,y))) M.empty s1
in M.foldlWithKey (\r k (v1,v2) -> r ++ replicate (minimum [v1,v2]) k) "" cm
main :: IO String
main = do
putStr "Enter first string :"
s1 <- getLine
putStr "Enter second string :"
s2 <- getLine
return $ sharedLetters s1 s2
Enter first string :happy
Enter second string :pay
"apy"
Enter first string :pay
Enter second string :happy
"apy"
Enter first string :hello
Enter second string :llama
"ll"
Enter first string :llama
Enter second string :hello
"ll"
How about exploiting the fact that every letter shared between the words (allowing duplicates) shows up as a pair of that letter in the set formed from the union of those words? You can find such pairs efficiently by sorting the union set and picking out duplicates -
let find_dups ([]) = []; find_dups (x:y:xs) | x == y = x:find_dups(xs); find_dups (x:xs) = find_dups(xs)
let common_letters word1 word2 = find_dups (sort (word1 ++ word2))
> common_letters "hello" "fellows"
"ello"

Sort a String list by String length

I want to sort a list of String first by the length of the strings, and if the length is the same then it should sort lexically. I thought I could use the Data.List library and write my own compare function that does that. So the compare function should take a list of String as the argument and compare all the the elements (which are Strings). A compare function for Strings would look like this
comp a b
| length a > length b = GT
| length a < length b = LT
How could I address all the list elements with such a function?
First of all, your cmp function does not handle the case where the lengths are equal: you need to add that. Otherwise you'll get an runtime pattern match error:
comp a b
| length a > length b = GT
| length a < length b = LT
| otherwise = undefined -- TODO
also, note that this implementation sometimes computes the length twice, but it's likely that GHC optimizes this one away on its own, and we'll get to solving this later on more fundamentally anyway.
Then, once you've fixed your comp, all you need to do is pass it to Data.List.sortBy together with the list of strings you want to sort. An ipmplementation like that is provided below (<$> is the operator alias of fmap which works the same as map does on lists).
However, there's a better solution where you first compute the length of all elements in the list, by mapping each of the elements into a pair where the first member is the original string and the second one is its length. You then use a modified comp function that takes 2 pairs instead of just 2 strings, but otherwise behaves the same as your original comp. However, you then need to map the intermediate list back to just containing the strings (which is what the fst <$> is for, which is equivalent to map fst but, again, uses the, IMO nicer looking, <$> opetator).
So the somewhat naive solution would be:
sortByLenOrLex :: [String] -> [String]
sortByLenOrLex as = sortBy cmp as where
cmp a b | n > m = GT
| n < m = LT
| otherwise = compare a b
where n = length a
m = length b
and the more efficient one, as leftaroundabout points out, would be:
sortByLenOrLex' :: [String] -> [String]
sortByLenOrLex' as = fst <$> sortBy cmp (addLen <$> as) where
cmp (a,n) (b,m) | n > m = GT
| n < m = LT
| otherwise = compare a b
addLen x = (x, length x)
where the list is first amended with the lengths of each of its elements, so as to avoid duplicate, expensive length calls.
EDIT: please see chi's answer for a much nicer implementation of this algorithm!
Furthermore:
You can make your functions generic by making them operate on lists of lists of Ord:
sortByLenOrLex'' :: Ord a => [[a]] -> [[a]]
sortByLenOrLex'' as = fst <$> sortBy cmp (addLen <$> as) where
cmp (a,n) (b,m) | n > m = GT
| n < m = LT
| otherwise = compare a b
addLen x = (x, length x)
this gives you:
*Main> sortByLenOrLex'' [[1,2], [1,3], [1,2,3]]
[[1,2],[1,3],[1,2,3]]
...and if you want to make it as generic as possible, you can sort lists of Foldable of Ord:
sortByLenOrLex''' :: (Foldable f, Ord a) => [f a] -> [f a]
sortByLenOrLex''' as = unamend <$> sortBy cmp (amend <$> as) where
cmp (a,n,a') (b,m,b') | n > m = GT
| n < m = LT
| otherwise = compare a' b'
amend x = (x, length x, toList x)
unamend (x,_,_) = x
this gives you:
*Main> sortByLenOrLex''' [Just 3, Just 4, Just 3, Nothing]
[Nothing,Just 3,Just 3,Just 4]
*Main> sortByLenOrLex''' [(4,1),(1,1),(1,2),(1,1),(3,1)]
[(4,1),(1,1),(1,1),(3,1),(1,2)]
*Main> sortByLenOrLex''' [Left "bla", Right "foo", Right "foo", Right "baz"]
[Left "bla",Right "baz",Right "foo",Right "foo"]
*Main> sortByLenOrLex''' [(3,"hello"),(2,"goodbye"),(1,"hello")]
[(2,"goodbye"),(3,"hello"),(1,"hello")]
A variant of #Erik's solution, using some combinators from the library:
import Data.List
import Control.Arrow
sortByLen = map snd . sort . map (length &&& id)
This is essentially a Schwartzian transform.

Recursion confusion in Haskell again - subsets with an inclusion test

I'm testing a simple program to generate subsets with an inclusion test. For example, given
*Main Data.List> factorsets 7
[([2],2),([2,3],1),([3],1),([5],1),([7],1)]
calling chooseP 3 (factorsets 7), I would like to get (read from right to left, a la cons)
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([2],2)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
But my program is returning an extra [([7],1),([5],1),([3],1)] (and missing a [([7],1),([5],1),([2],2)]):
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([3],1)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
The inclusion test is: members' first part of the tuple must have a null intersection.
Once tested as working, the plan is to sum the internal products of each subset's snds, rather than accumulate them.
Since I've asked a similar question before, I imagine that an extra branch is generated since when the recursion splits at [2,3], the second branch runs over the same possibilities once it passes the skipped section. Any pointers on how to resolve that would be appreciated; and if you'd like to share ideas about how to enumerate and sum such product combinations more efficiently, that would be great, too.
Haskell code:
chooseP k xs = chooseP' xs [] 0 where
chooseP' [] product count = if count == k then [product] else []
chooseP' yys product count
| count == k = [product]
| null yys = []
| otherwise = f ++ g
where (y:ys) = yys
(factorsY,numY) = y
f = let zzs = dropWhile (\(fs,ns) -> not . and . map (null . intersect fs . fst) $ product) yys
in if null zzs
then chooseP' [] product count
else let (z:zs) = zzs in chooseP' zs (z:product) (count + 1)
g = if and . map (null . intersect factorsY . fst) $ product
then chooseP' ys product count
else chooseP' ys [] 0
Your code is complicated enough that I might recommend starting over. Here's how I would proceed.
Write a specification. Let it be as stupidly inefficient as necessary -- for example, the spec I choose below will build all combinations of k elements from the list, then filter out the bad ones. Even the filter will be stupidly slow.
sorted xs = sort xs == xs
unique xs = nub xs == xs
disjoint xs = and $ liftM2 go xs xs where
go x1 x2 = x1 == x2 || null (intersect x1 x2)
-- check that x is valid according to all the validation functions in fs
-- (there are other fun ways to spell this, but this is particularly
-- readable and clearly correct -- just what we want from a spec)
allFuns fs x = all ($x) fs
choosePSpec k = filter good . replicateM k where
good pairs = allFuns [unique, disjoint, sorted] (map fst pairs)
Just to make sure it's right, we can test it at the prompt:
*Main> mapM_ print $ choosePSpec 3 [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
[([2],2),([3],1),([5],1)]
[([2],2),([3],1),([7],1)]
[([2],2),([5],1),([7],1)]
[([2,3],1),([5],1),([7],1)]
[([3],1),([5],1),([7],1)]
Looks good.
Now that we have a spec, we can try to improve the speed one refactoring at a time, always checking that it matches the spec. The first thing I'd want to do is notice that we can ensure uniqueness and sortedness just by sorting the input and picking things "in an increasing way". To do this, we can define a function which chooses subsequences of a given length. It piggy-backs on the tails function, which you can think of as nondeterministically choosing a place to split its input list.
subseq 0 xs = [[]]
subseq n xs = do
x':xt <- tails xs
xs' <- subseq (n-1) xt
return (x':xs')
Here's an example of this function in action:
*Main> subseq 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Now we can write a slightly faster chooseP by replacing replicateM with subseq. Recall that we're assuming the inputs are already sorted and unique, though.
choosePSlow k = filter good . subseq k where
good pairs = disjoint $ map fst pairs
We can sanity-check that it's working by running it on the particular input we have from above:
*Main> let i = [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
*Main> choosePSlow 3 i == choosePSpec 3 i
True
Or, better yet, we can stress-test it with QuickCheck. We'll need a tiny bit more code. The condition k < 5 is just because the spec is so hopelessly slow that bigger values of k take forever.
propSlowMatchesSpec :: NonNegative Int -> OrderedList ([Int], Int) -> Property
propSlowMatchesSpec (NonNegative k) (Ordered xs)
= k < 5 && unique (map fst xs)
==> choosePSlow k xs == choosePSpec k xs
*Main> quickCheck propSlowMatchesSpec
+++ OK, passed 100 tests.
There are several more opportunities to make things faster. For instance, the disjoint test could be sped up using choose 2 instead of liftM2; or we might be able to ensure disjointness during element selection and prune the search even earlier; etc. How you want to improve it from here I leave to you -- but the basic technique (start with stupid and slow, then make it smarter, testing as you go) should be helpful to you.

Two simple codes to generate divisors of a number. Why is the recursive one faster?

While solving a problem, I had to calculate the divisors of a number. I have two implementations that produce all divisors > 1 for a given number.
The first is using simple recursion:
divisors :: Int64 -> [Int64]
divisors k = divisors' 2 k
where
divisors' n k | n*n > k = [k]
| n*n == k = [n, k]
| k `mod` n == 0 = (n:(k `div` n):result)
| otherwise = result
where result = divisors' (n+1) k
The second one uses list processing functions from the Prelude:
divisors2 :: Int64 -> [Int64]
divisors2 k = k : (concatMap (\x -> [x, k `div` x]) $!
filter (\x -> k `mod` x == 0) $!
takeWhile (\x -> x*x <= k) [2..])
I find that the first implementation is faster (I printed the whole list returned, so that no part of the result remains unevaluated due to laziness). The two implementations produce differently ordered divisors, but that is not a problem for me. (In fact, if k is a perfect square, the square root is output twice in the second implementation - again not a problem).
In general are such recursive implementations faster in Haskell? Also, I would appreciate any pointers to make either of these codes faster. Thanks!
EDIT:
Here is the code I am using to compare these two implementations for performance: https://gist.github.com/3414372
Here are my timing measurements:
Using divisor2 with strict evaluation ($!)
$ ghc --make -O2 div.hs
[1 of 1] Compiling Main ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1
real 0m7.651s
user 0m7.604s
sys 0m0.012s
Using divisors2 with lazy evaluation ($):
$ ghc --make -O2 div.hs
[1 of 1] Compiling Main ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1
real 0m7.461s
user 0m7.444s
sys 0m0.012s
Using function divisors
$ ghc --make -O2 div.hs
[1 of 1] Compiling Main ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1
real 0m7.058s
user 0m7.036s
sys 0m0.020s
The recursive version is not in general faster than the list-based version. This is because the GHC compiler employs List fusion optimizations when a computation follows a certain pattern. This means that list generators and "list transformers" might be fused into one big generator instead.
However, when you use $!, you basically tell the compiler to "Please produce the first cons of this list before performing the next step." This means that GHC is forced to at least compute one intermediate list element, which disables the whole fusion optimization entirely.
So, the second algorithm is slower, because you produce intermediate lists that have to be constructed and destructed, while the recursive algorithm simply produces a single list straight away.
Since you asked, to make it faster a different algorithm should be used. Simple and straightforward is to find a prime factorization first, then construct the divisors from it somehow.
Standard prime factorization by trial division is:
factorize :: Integral a => a -> [a]
factorize n = go n (2:[3,5..]) -- or: `go n primes`
where
go n ds#(d:t)
| d*d > n = [n]
| r == 0 = d : go q ds
| otherwise = go n t
where (q,r) = quotRem n d
-- factorize 12348 ==> [2,2,3,3,7,7,7]
Equal prime factors can be grouped and counted:
import Data.List (group)
primePowers :: Integral a => a -> [(a, Int)]
primePowers n = [(head x, length x) | x <- group $ factorize n]
-- primePowers = map (head &&& length) . group . factorize
-- primePowers 12348 ==> [(2,2),(3,2),(7,3)]
Divisors are usually constructed, though out of order, with:
divisors :: Integral a => a -> [a]
divisors n = map product $ sequence
[take (k+1) $ iterate (p*) 1 | (p,k) <- primePowers n]
Hence, we have
numDivisors :: Integral a => a -> Int
numDivisors n = product [ k+1 | (_,k) <- primePowers n]
The product here comes from the sequence in the definition above it, because sequence :: Monad m => [m a] -> m [a] for list monad m ~ [] constructs lists of all possible combinations of elements picked by one from each member list, sequence_lists = foldr (\xs rs -> [x:r | x <- xs, r <- rs]) [[]], so that length . sequence_lists === product . map length, and or course length . take n === n for infinite argument lists.
In-order generation is possible, too:
ordDivisors :: Integral a => a -> [a]
ordDivisors n = foldr (\(p,k)-> foldi merge [] . take (k+1) . iterate (map (p*)))
[1] $ reverse $ primePowers n
foldi :: (a -> a -> a) -> a -> [a] -> a
foldi f z (x:xs) = f x (foldi f z (pairs xs)) where
pairs (x:y:xs) = f x y:pairs xs
pairs xs = xs
foldi f z [] = z
merge :: Ord a => [a] -> [a] -> [a]
merge (x:xs) (y:ys) = case (compare y x) of
LT -> y : merge (x:xs) ys
_ -> x : merge xs (y:ys)
merge xs [] = xs
merge [] ys = ys
{- ordDivisors 12348 ==>
[1,2,3,4,6,7,9,12,14,18,21,28,36,42,49,63,84,98,126,147,196,252,294,343,441,588,
686,882,1029,1372,1764,2058,3087,4116,6174,12348] -}
This definition is productive, too, i.e. it starts producing the divisors right away, without noticeable delay:
{- take 20 $ ordDivisors $ product $ concat $ replicate 5 $ take 11 primes
==> [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
(0.00 secs, 525068 bytes)
numDivisors $ product $ concat $ replicate 5 $ take 11 primes
==> 362797056 -}

Most efficient way to create the Data.Set of all pairs of elements in a Set?

Given an arbitrary set holding an arbitrary number of elements of arbitrary type, e.g.
mySet1 = Set.fromList [1,2,3,4]
or
mySet2 = Set.fromList ["a","b","c","d"]
or
mySet3 = Set.fromList [A, B, C, D]
for some data constructors A, B, C, D, ...
What is the computationally most efficient way to generate the set of all unordered pairs of elements is the given set? I.e.
setPairs mySet1 == Set.fromList [(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
or
setPairs mySet2 == fromList [ ("a","b")
, ("a","c")
, ("a","d")
, ("b","c")
, ("b","d")
, ("c","d") ]
or
setPairs mySet2 == fromList [ (A,B)
, (A,C)
, (A,D)
, (B,C)
, (B,D)
, (C,D) ]
My initial naive guess would be:
setPairs s = fst $ Set.fold
(\e (pairAcc, elementsLeft) ->
( Set.fold
(\e2 pairAcc2 ->
Set.insert (e2, e) pairAcc2
) pairAcc $ Set.delete e elementsLeft
, Set.delete e elementsLeft )
) (Set.empty, s) s
but surely that cannot be the best solution?
Benchmarking might prove me wrong, but my suspicion is that there's no win in staying in the set representation. You're going to need O(n^2) regardless, because that's the size of the output. The key advantage would be producing your list such that you could use a call to S.fromDistinctAscList such that it only costs O(n) to build the set itself.
The following is pretty clean, preserves a fair amount of sharing, and is generally the simplest, most straightforward and intuitive solution I can imagine.
pairs s = S.fromDistinctAscList . concat $ zipWith zip (map (cycle . take 1) ts) (drop 1 ts)
where ts = tails $ S.toList s
Edit
Shorter/clearer (not sure performancewise, but probably as good/better):
pairs s = S.fromDistinctAscList [(x,y) | (x:xt) <- tails (S.toList s), y <- xt]
At first, you need to generate all sets. replicateM from Control.Monad helps with it.
λ> replicateM 2 [1..4]
[[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[2,4],[3,1],[3,2],[3,3],[3,4],[4,1],[4,2],[4,3],[4,4]]
Then you need to filter pairs, where second element is greater than first
λ> filter (\[x,y] -> x < y) $ replicateM 2 [1 .. 4]
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Finally, you need to convert every list in a tuple
λ> map (\[x,y] -> (x,y)) $ filter (\[x,y] -> x < y) $ replicateM 2 [1 .. 4]
[(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
Then we can formulate it into function pairs:
import Data.Set
import Control.Monad
import Data.List
mySet = Data.Set.fromList [1,2,3,4]
--setOfPairs = Data.Set.fromList [(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
setOfPairs = Data.Set.fromList $ pairs mySet
pairs :: Ord a => Set a -> [(a,a)]
pairs x = Data.List.map (\[x,y] -> (x,y)) $ Data.List.filter (\[x,y] -> x < y) $ replicateM 2 $ toList x
So, if I got you question right, you can use pairs mySet, where pairs generate the list of all unordered pairs of mySet.
Is it what you want?
UPD:
List comprehension could be more clear and fast technique to create such sublists, so here is another instance of pairs:
pairs :: Ord a => Set a -> [(a,a)]
pairs set = [(x,y) | let list = toList set, x <- list, y <- list, x < y]
So here is a first stab at a solution using conversion back and forth to a list. Again, I am not sure this is the fastest way to do this but I do know that iteration over sets it's not terribly efficient.
import Data.List
import qualified Data.Set as S
pairs :: S.Set String -> S.Set (String,String)
pairs s = S.fromList $ foldl' (\st e -> (zip l e) ++ st) [] ls
where (l:ls) = tails $ S.toList s
By folding zip over the tails, you get a nice and efficient way to create the set of unordered pairs. However, instinct encourages me that there may be a monadic filterM or foldM solution that's even more elegant. I will keep thinking.
[EDIT]
So here is what should be [but is not on account of the size of the powerset] a faster solution that does not require a toList.
import Data.List
import qualified Data.Set as S
import qualified Data.Foldable as F
pairs :: (Ord a) => S.Set a -> S.Set (a,a)
pairs s = S.fromList $ foldl two [] $ F.foldlM (\st e -> [[e]++st,st]) [] s
where two st (x:xa:[]) = (x,xa) : st
two st _ = st
Uses the power-set solution over monadic lists to build the powerset and then filter out the pairs. I can go into more detail if necessary.

Resources