Implementing Radix Sort in SML - sorting

I am trying to implement radix sort in SML via a series of helper functions. The helper function I am having trouble with is called sort_nth_digit, it takes a digit place to be sorted by and a list to sort (n and L respectively). The way I am doing this is to find the first two elements of the list (for now we can assume there are at least 3), compare them by digit n, then concatenating them back onto the list in the proper order. The list should be sorted in ascending order. Now, the problem: The function compiles but I get the following:
HW4.sml:40.5-44.30 Warning: match nonexhaustive
(0,L) => ...
(n,nil) => ...
(n,a :: b :: L) => ...
val sort_nth_digit = fn : int -> int list -> int list
Additionally, when you pass arguments, you don't get an answer back which I believe indicates infinite recursion?
Q:How is the match nonexhaustive and why am I recursing infinitely:
fun sort_nth_digit 0 L = []
| sort_nth_digit n [] = []
| sort_nth_digit n (a::b::L) = if ((nth_digit a n) < (nth_digit b n)) then a::b::(sort_nth_digit n L)
else
b::a::(sort_nth_digit n L)
Thanks for the help in advance! (*My first post on stackoverflow ^.^ *)
Nonexhasutive match fix:
fun sort_nth_digit 0 L = []
| sort_nth_digit n [] = []
| sort_nth_digit n (a::[]) = a::[]
| sort_nth_digit n (a::b::L) = if ((nth_digit a n) < (nth_digit b n)) then a::b::(sort_nth_digit n L)
else
b::a::(sort_nth_digit n L)
Input that results in no output, console just sits at this line:
- sort_nth_digit 1 [333,222,444,555,666,444,333,222,999];
Code for nth_digit & anonymous helper pow:
fun nth_digit x 0 = 0
| nth_digit x n = if (num_digits x) < n then 0
else
let
fun pow x 1 = x
| pow x y= x * pow x (y-1)
in
(*Finding the nth digit of x: ((x - x div 10^n) * 10^n div 10^n-1))*)
(x - ((x div pow 10 n) * pow 10 n)) div (pow 10 (n-1)) (*Me*)
end
If anyone thinks it would be useful to have access to the rest of my code I can provide it via github as an eclipse project (you can just pull the .sml file if you don't have eclipse set up for sml)

The match is not exhaustive because it does not cover the case of a list with only one element (and inductively, any list with an odd number of elements).
I'm not sure what you mean by "not getting an answer". This function does not diverge (recurse infinitely), unless your nth_digit helper does. Instead, you should get a Match exception when you feed it a list with odd length, because of the above.

Related

List of tuples by taking the same index for an element in haskell

I have been trying to solve the following problem in haskell:
Generate a list of tuples (n, s) where 0 ≤ n ≤ 100 and n mod 2 = 0,
and where s = sum(1..n) The output should be the list
[(0,0),(2,3),(4,10),...,(100,5050)] Source
I tried to solve the problem with following code:
genListTupleSumUntilX :: Int -> [(Int,Int)]
genListTupleSumUntilX x =
take x [(n, s) | n <- [1..x], s <- sumUntilN x]
where
sumUntilN :: Int -> [Int]
sumUntilN n
| n == 0 = []
| n == 1 = [1]
| otherwise = sumUntilN (n-1) ++ [sum[1..n]]
However, this code does not give the expected result. (as #Guru Stron Pointed out- Thank you!)
I would also appreciate it if somebody could help me make this code more concise. I am also new to the concept of lazy evaluation, so am unable to determine the runtime complexity. Help will be appreciated.
However I feel like this code could still be improved upon, espically with:
take x in the function seems really inelegant. So Is there a way to have list comprhensions only map to the same index?
sumUntilN feels really verbose. Is there an idiomatic way to do the same in haskell?
Finally, I am extremely new to haskell and have trouble evaluating the time and space complexity of the function. Can somebody help me there?
sumOfNumsUptoN n = n * (n + 1) `div` 2
genListTupleSumUntilX :: Int -> [(Int, Int)]
genListTupleSumUntilX n = zip [0, 2 .. n] $ map sumOfNumsUptoN [0, 2 .. n]
This is of linear complexity on the size of the list.
I would say that you overcomplicate things. To produce correct output you can use simple list comprehension:
genListTupleSumUntilX :: Int -> [(Int,Int)]
genListTupleSumUntilX x = [(n, sum [1..n]) | n <- [0,2..x]]
Note that this solution will recalculate the same sums repeatedly (i.e for n+1 element sum is actually n + 2 + n + 1 + sumForNthElemnt, so you can potentially reuse the computation) which will lead to O(n^2) complexity, but for such relatively small n it is not a big issue. You can handle this using scanl function (though maybe there is more idiomatic approach for memoization):
genListTupleSumUntilX :: Int -> [(Int,Int)]
genListTupleSumUntilX 0 = []
genListTupleSumUntilX x = scanl (\ (prev, prevSum) curr -> (curr, prevSum + prev + 1 + curr)) (0,0) [2,4..x]

Convert this algorithm into a procedure, that doesn't use recursion

Few weeks ago I had a task to accomplish and my way of thinking is correct, but the implementation I gave at that time was horrible. I have a chance to get some points but only if I provide a correct implementation.
The problem is:
Consider a list of integers [x_1; x_2;...;x_n]. We'll call index i "a hole"
if 1 < i < n, such as x_i < max(x_1,...,x_{i-1}) and x_i < max(x_{i+1},...,x_n).
The depth of this hole is min(max(x_1,...,x_{i-1})-x_i, max(x_{i+1},...,x_n)-x_i).
Write procedure hole : int list -> int which for given list of integers find the max depth on this list. If there is no hole on the list, then the correct answer is 0. For example: hole [5;2;7;10;3;7] = 3, because for i=2, max on the left is 5, max on the right is 10, min (5 - 2, 10-2) = 3. hole [1;2;3;2;1] = 0, because there is no such index i, that matches our predicate.
Now, my procedure looks like this with using recursion:
let hole list =
let rec aux maxL acc = function
| [] -> (min_int, [])
| x::xs ->
let newMax = max maxL x in
let (maxR, aclist) = aux newMax acc xs
in
if maxL > x && x < maxR then (max maxR x,(min (maxL-x) (maxR-x))::aclist)
else (max maxR x, aclist)
in fold_left max 0 (snd (aux min_int [] list))
but I have to make it without using recursion, while I'm able to use high order functions.
I wanted to use something called "function accumulation", but I can't get any idea on how to do it (been thinking about this problem for over 7 hours now).
Any code given in Haskell/OCaml is welcome. The only problem is that you CANNOT use recursion.
Here is code that may do what I think you are looking for, i.e. find the depth of the deepest 'hole' in a list of integers according to your description of a 'hole'. It zips up left and right scans for max with a list of the 'middle' values, which could be described with the word 'accumulating' if you want.
Not the most efficient implementation, I'm sure, but I think it is better than the obvious brute force solution. Where no hole is found, it returns Nothing.
deepestHole' :: [Int] -> Maybe Int
deepestHole' xs
| length xs < 3 = Nothing
| maxHole < 1 = Nothing
| otherwise = Just maxHole
where
lMaxes = scanl1 max $ take (length xs - 2) xs
rMaxes = scanr1 max (drop 2 xs)
middles = tail $ init xs
holeDepth lMax mid rMax = min lMax rMax - mid
maxHole = maximum $ zipWith3 holeDepth lMaxes middles rMaxes

Recursion with accumulators that are not reversed - is it possible?

I've been playing with Haskell a fair amount lately, and I came up with this function to find the nth prime:
nthPrime 1 = 2
nthPrime 2 = 3
nthPrime n = aux [2, 3] 3 5 n
where
aux knownPrimes currentNth suspect soughtNth =
let currentIsPrime = foldl (\l n -> l && suspect `mod` n /= 0)
True knownPrimes
in case (currentIsPrime, soughtNth == currentNth) of
(True, True) -> suspect
(True, False) -> aux (suspect:knownPrimes) (currentNth + 1)
(suspect + 2) soughtNth
_ -> aux knownPrimes currentNth (suspect + 2) soughtNth
My question is, is there a way to have an accumulative parameter (in this case knownPrimes) that is not reversed (as occurs when passing (suspect:knownPrimes))?
I have tried using knownPrimes ++ [suspect] but this seems inefficient as well.
My hope is that if I can pass the known primes in order then I can shortcut some of the primality checks further.
In Haskell, if you are using an accumulator to build a list, but end up having to reverse it, it is often the case that it is better to drop the accumulator and instead produce the list lazily as the result of your computation.
If you apply this kind of thinking to searching for primes, and take full advantage of laziness, you end up with a well-known technique of producing an infinite list of all the primes. If we refactor your code as little as possible to use this technique, we get something like:
allPrimes = [2, 3] ++ aux 5
where
aux suspect =
let currentIsPrime = foldl (\l n -> l && suspect `mod` n /= 0) True
$ takeWhile (\n -> n*n <= suspect) allPrimes
in case currentIsPrime of
True -> suspect : aux (suspect + 2)
False -> aux (suspect + 2)
nthPrime n = allPrimes !! (n-1)
I have removed now unnecessary parameters and changed the code from accumulating into lazily producing, and to use its own result as the source of prime divisors to test (this is called "tying the knot"). Other than that, the only change here is to add a takeWhile check: since the list we are testing divisors from is defined in terms of itself, and is infinite to boot, we need to know where on the list to stop checking for divisors so that we don't get a truly infinite recursion.
Apart from this, there is an inefficiency in this code:
foldl (\l n -> l && suspect `mod` n /= 0) True
is not a good way for checking whether there are no divisors in a list, because as written, it won't stop once a divisor has been found, even though && itself is shortcutting (stopping as soon as its first argument is found to be False).
To allow proper shortcutting, a foldr could be used instead:
foldr (\n r -> suspect `mod` n /= 0 && r) True
Or, even better, use the predefined function all:
all (\n -> suspect `mod` n /= 0)
Using my remarks
This is how it would look like if you use all and refactor it a bit:
allPrimes :: [Integer]
allPrimes = 2 : 3 : aux 5
where
aux suspect
| currentIsPrime = suspect : nextPrimes
| otherwise = nextPrimes
where
currentIsPrime =
all (\n -> suspect `mod` n /= 0)
$ takeWhile (\n -> n*n <= suspect) allPrimes
nextPrimes = aux (suspect + 2)
nthPrime :: Int -> Integer
nthPrime n = allPrimes !! (n-1)

Understanding the runtime of a recursive SML function involving list appending (using #)

I'm new to algorithm analysis and SML and got hung up on the average-case runtime of the following SML function. I would appreciate some feedback on my thinking.
fun app([]) = []
| app(h::t) = [h] # app(t)
So after every recursion we will end up with a bunch of single element lists (and one no-element list).
[1]#[2]#[3]#...#[n]#[]
Where n is the number of elements in the original list and 1, 2, 3, ..., n is just to illustrate what elements in the original list we are talking about. L # R takes time linear in the length of list L. Assuming A is the constant amount of time # takes for every element, I imagine this as if:
[1,2]#[3]#[4]#...#[n]#[] took 1A
[1,2,3]#[4]#...#[n]#[] took 2A
[1,2,3,4]#...#[n]#[] took 3A
...
[1,2,3,4,...,n]#[] took (n-1)A
[1,2,3,4,...,n] took nA
I'm therefore thinking that a recurrence for the time would look something like this:
T(0) = C (if n = 0)
T(n) = T(n-1) + An + B (if n > 0)
Where C is just the final matching of the base case app([]) and B is the constant for h::t. Close the recurrence and we will get this (proof omitted):
T(n) = (n²+n)A/2 + Bn + C = (A/2)n² + (A/2)n + Bn + C = Θ(n²)
This is my own conclusion which differs from the answer that was presented to me, namely:
T(0) = B (if n = 0)
T(n) = T(n-1) + A (if n > 0)
Closed form
T(n) = An + B = Θ(n)
Which is quite different. (Θ(n) vs Θ(n²)!) But isn't this assuming that L # R takes constant time rather than linear? For example, it would be true for addition
fun add([]) = 0
| add(h::t) = h + add(t) (* n + ... + 2 + 1 + 0 *)
or even concatenation
fun con([]) = []
| con(h::t) = h::con(t) (* n :: ... :: 2 :: 1 :: [] *)
Am I misunderstanding the way that L # R exists or is my analysis (at least sort of) correct?
Yes. Running the app [1,2,3] command by hand one function call at a time gives:
app [1,2,3]
[1]#(app [2,3])
[1]#([2]#(app [3]))
[1]#([2]#([3]#(app [])))
[1]#([2]#([3]#([])))
[1]#([2]#[3])
[1]#([2,3])
[1,2,3]
This is a consequence of the function call being on the left-side of the #.
Compare this to a naïve version of rev:
fun rev [] = []
| rev (x::xs) = rev xs # [x]
This one has the running time you expect: Once the recursion has fully expanded into an expression ((([])#[3])#[2])#[1] (taking linear time), it requires n + (n - 1) + (n - 2) + ... + 1, or n(n+1)/2, or O(n^2) steps to complete the computation. A more effective rev could look like this:
local
fun rev' [] ys = ys
| rev' (x::xs) ys = rev' xs (x::ys)
in
fun rev xs = rev' xs []
end

Functional learning woes

I'm a beginner to functional languages, and I'm trying to get the whole thing down in Haskell. Here's a quick-and-dirty function that finds all the factors of a number:
factors :: (Integral a) => a -> [a]
factors x = filter (\z -> x `mod` z == 0) [2..x `div` 2]
Works fine, but I found it to be unbearably slow for large numbers. So I made myself a better one:
factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
| y `elem` z = sort z
| x `mod` y == 0 = factorcalc x (y+1) (z ++ [y] ++ [(x `div` y)])
| otherwise = factorcalc x (y+1) z
But here's my problem: Even though the code works, and can cut literally hours off the execution time of my programs, it's hideous!
It reeks of ugly imperative thinking: It constantly updates a counter and a data structure in a loop until it finishes. Since you can't change state in purely functional programming, I cheated by holding the data in the parameters, which the function simply passes to itself over and over again.
I may be wrong, but there simply must be a better way of doing the same thing...
Note that the original question asked for all the factors, not for only the prime factors. There being many fewer prime factors, they can probably be found more quickly. Perhaps that's what the OQ wanted. Perhaps not. But let's solve the original problem and put the "fun" back in "functional"!
Some observations:
The two functions don't produce the same output---if x is a perfect square, the second function includes the square root twice.
The first function enumerates checks a number of potential factors proportional to the size of x; the second function checks only proportional to the square root of x, then stops (with the bug noted above).
The first function (factors) allocates a list of all integers from 2 to n div 2, where the second function never allocates a list but instead visits fewer integers one at a time in a parameter. I ran the optimizer with -O and looked at the output with -ddump-simpl, and GHC just isn't smart enough to optimize away those allocations.
factorcalc is tail-recursive, which means it compiles into a tight machine-code loop; filter is not and does not.
Some experiments show that the square root is the killer:
Here's a sample function that produces the factors of x from z down to 2:
factors_from x 1 = []
factors_from x z
| x `mod` z == 0 = z : factors_from x (z-1)
| otherwise = factors_from x (z-1)
factors'' x = factors_from x (x `div` 2)
It's a bit faster because it doesn't allocate, but it's still not tail-recursive.
Here's a tail-recursive version that is more faithful to the original:
factors_from' x 1 l = l
factors_from' x z l
| x `mod` z == 0 = factors_from' x (z-1) (z:l)
| otherwise = factors_from' x (z-1) l
factors''' x = factors_from x (x `div` 2)
This is still slower than factorcalc because it enumerates all the integers from 2 to x div 2, whereas factorcalc stops at the square root.
Armed with this knowledge, we can now create a more functional version of factorcalc which replicates both its speed and its bug:
factors'''' x = sort $ uncurry (++) $ unzip $ takeWhile (uncurry (<=)) $
[ (z, x `div` z) | z <- [2..x], x `mod` z == 0 ]
I didn't time it exactly, but given 100 million as an input, both it and factorcalc terminate instantaneously, where the others all take a number of seconds.
How and why the function works is left as an exercise for the reader :-)
ADDENDUM: OK, to mitigate the eyeball bleeding, here's a slightly saner version (and without the bug):
saneFactors x = sort $ concat $ takeWhile small $
[ pair z | z <- [2..], x `mod` z == 0 ]
where pair z = if z * z == x then [z] else [z, x `div` z]
small [z, z'] = z < z'
small [z] = True
Okay, take a deep breath. It'll be all right.
First of all, why is your first attempt slow? How is it spending its time?
Can you think of a recursive definition for the prime factorization that doesn't have that property?
(Hint.)
Firstly, although factorcalc is "ugly", you could add a wrapper function factors' x = factorscalc x 2 [], add a comment, and move on.
If you want to make a 'beautiful' factors fast, you need to find out why it is slow. Looking at your two functions, factors walks the list about n/2 elements long, but factorcalc stops after around sqrt n iterations.
Here is another factors that also stops after about sqrt n iterations, but uses a fold instead of explicit iteration. It also breaks the problem into three parts: finding the factors (factor); stopping at the square root of x (small) and then computing pairs of factors (factorize):
factors' :: (Integral a) => a -> [a]
factors' x = sort (foldl factorize [] (takeWhile small (filter factor [2..])))
where
factor z = x `mod` z == 0
small z = z <= (x `div` z)
factorize acc z = z : (if z == y then acc else y : acc)
where y = x `div` z
This is marginally faster than factorscalc on my machine. You can fuse factor and factorize and it is about twice as fast as factorscalc.
The Profiling and Optimization chapter of Real World Haskell is a good guide to the GHC suite's performance tools for tackling tougher performance problems.
By the way, I have a minor style nitpick with factorscalc: it is much more efficient to prepend single elements to the front of a list O(1) than it is to append to the end of a list of length n O(n). The lists of factors are typically small, so it is not such a big deal, but factorcalc should probably be something like:
factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
| y `elem` z = sort z
| x `mod` y == 0 = factorcalc x (y+1) (y : (x `div` y) : z)
| otherwise = factorcalc x (y+1) z
Since you can't change state in purely
functional programming, I cheated by
holding the data in the parameters,
which the function simply passes to
itself over and over again.
Actually, this is not cheating; this is a—no, make that the—standard technique! That sort of parameter is usually known as an "accumulator," and it's generally hidden within a helper function that does the actual recursion after being set up by the function you're calling.
A common case is when you're doing list operations that depend on the previous data in the list. The two problems you need to solve are, where do you get the data about previous iterations, and how do you deal with the fact that your "working area of interest" for any particular iteration is actually at the tail of the result list you're building. For both of these, the accumulator comes to the rescue. For example, to generate a list where each element is the sum of all of the elements of the input list up to that point:
sums :: Num a => [a] -> [a]
sums inp = helper inp []
where
helper [] acc = reverse acc
helper (x:xs) [] = helper xs [x]
helper (x:xs) acc#(h:_) = helper xs (x+h : acc)
Note that we flip the direction of the accumulator, so we can operate on the head of that, which is much more efficient (as Dominic mentions), and then we just reverse the final output.
By the way, I found reading The Little Schemer to be a useful introduction and offer good practice in thinking recursively.
This seemed like an interesting problem, and I hadn't coded any real Haskell in a while, so I gave it a crack. I've run both it and Norman's factors'''' against the same values, and it feels like mine's faster, though they're both so close that it's hard to tell.
factors :: Int -> [Int]
factors n = firstFactors ++ reverse [ n `div` i | i <- firstFactors ]
where
firstFactors = filter (\i -> n `mod` i == 0) (takeWhile ( \i -> i * i <= n ) [2..n])
Factors can be paired up into those that are greater than sqrt n, and those that are less than or equal to (for simplicity's sake, the exact square root, if n is a perfect square, falls into this category. So if we just take the ones that are less than or equal to, we can calculate the others later by doing div n i. They'll be in reverse order, so we can either reverse firstFactors first or reverse the result later. It doesn't really matter.
This is my "functional" approach to the problem. ("Functional" in quotes, because I'd approach this problem the same way even in non-functional languages, but maybe that's because I've been tainted by Haskell.)
{-# LANGUAGE PatternGuards #-}
factors :: (Integral a) => a -> [a]
factors = multiplyFactors . primeFactors primes 0 [] . abs where
multiplyFactors [] = [1]
multiplyFactors ((p, n) : factors) =
[ pn * x
| pn <- take (succ n) $ iterate (* p) 1
, x <- multiplyFactors factors ]
primeFactors _ _ _ 0 = error "Can't factor 0"
primeFactors (p:primes) n list x
| (x', 0) <- x `divMod` p
= primeFactors (p:primes) (succ n) list x'
primeFactors _ 0 list 1 = list
primeFactors (_:primes) 0 list x = primeFactors primes 0 list x
primeFactors (p:primes) n list x
= primeFactors primes 0 ((p, n) : list) x
primes = sieve [2..]
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
primes is the naive Sieve of Eratothenes. There's better, but this is the shortest method.
sieve [2..]
=> 2 : sieve [x | x <- [3..], x `mod` 2 /= 0]
=> 2 : 3 : sieve [x | x <- [4..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : sieve [x | x <- [5..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : 5 : ...
primeFactors is the simple repeated trial-division algorithm: it walks through the list of primes, and tries dividing the given number by each, recording the factors as it goes.
primeFactors (2:_) 0 [] 50
=> primeFactors (2:_) 1 [] 25
=> primeFactors (3:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 1 [(2, 1)] 5
=> primeFactors (5:_) 2 [(2, 1)] 1
=> primeFactors _ 0 [(5, 2), (2, 1)] 1
=> [(5, 2), (2, 1)]
multiplyPrimes takes a list of primes and powers, and explodes it back out to a full list of factors.
multiplyPrimes [(5, 2), (2, 1)]
=> [ pn * x
| pn <- take (succ 2) $ iterate (* 5) 1
, x <- multiplyPrimes [(2, 1)] ]
=> [ pn * x | pn <- [1, 5, 25], x <- [1, 2] ]
=> [1, 2, 5, 10, 25, 50]
factors just strings these two functions together, along with an abs to prevent infinite recursion in case the input is negative.
I don't know much about Haskell, but somehow I think this link is appropriate:
http://www.willamette.edu/~fruehr/haskell/evolution.html
Edit: I'm not entirely sure why people are so aggressive about the downvoting on this. The original poster's real problem was that the code was ugly; while it's funny, the point of the linked article is, to some extent, that advanced Haskell code is, in fact, ugly; the more you learn, the uglier your code gets, to some extent. The point of this answer was to point out to the OP that apparently, the ugliness of the code that he was lamenting is not uncommon.

Resources