Implementation of (^) - performance

I was reading the code of the implementation of (^) of the standard haskell library :
(^) :: (Num a, Integral b) => a -> b -> a
x0 ^ y0 | y0 < 0 = errorWithoutStackTrace "Negative exponent"
| y0 == 0 = 1
| otherwise = f x0 y0
where -- f : x0 ^ y0 = x ^ y
f x y | even y = f (x * x) (y `quot` 2)
| y == 1 = x
| otherwise = g (x * x) ((y - 1) `quot` 2) x
-- g : x0 ^ y0 = (x ^ y) * z
g x y z | even y = g (x * x) (y `quot` 2) z
| y == 1 = x * z
| otherwise = g (x * x) ((y - 1) `quot` 2) (x * z)
Now this part where g is defined seems odd to me why not just implement it like this:
expo :: (Num a ,Integral b) => a -> b ->a
expo x0 y0
| y0 == 0 = 1
| y0 < 0 = errorWithoutStackTrace "Negative exponent"
| otherwise = f x0 y0
where
f x y | even y = f (x*x) (y `quot` 2)
| y==1 = x
| otherwise = x * f x (y-1)
But indeed plugging in say 3^1000000 shows that (^) is about 0,04 seconds faster than expo.
Why is (^) faster than expo?

As the person who wrote the code, I can tell you why it's complex. :)
The idea is to be tail recursive to get loops, and also to perform the minimum number of multiplications. I don't like the complexity, so if you find a more elegant way please file a bug report.

A function is tail-recursive if the return value of a recursive call is returned as-is, without further processing. In expo, f is not tail-recursive, because of otherwise = x * f x (y-1): the return value of f is multiplied by x before it is returned. Both f and g in (^) are tail-recursive, because their return values are returned unmodified.
Why does this matter? Tail-recursive functions can implemented much more efficiently than general recursive functions. Because the compiler doesn't need to create a new context (stack frame, what have you) for a recursive call, it can reuse the caller's context as the context of the recursive call. This saves a lot of the overhead of calling a function, much like in-lining a function is more efficient than calling the function proper.

Whenever you see a bread-and-butter function in the standard library and it's implemented weirdly, the reason is almost always "because doing it like that triggers some special performance-critical optimization [possibly in a different version of the compiler]".
These odd workarounds are usually to "force" the compiler to notice that some specific, important optimization is possible (e.g., to force a particular argument to be considered strict, to allow worker/wrapper transformation, whatever). Typically some person has compiled their program, noticed it's epicly slow, complained to the GHC devs, and they looked at the compiled code and thought "oh, GHC isn't seeing that it can inline that 3rd worker function... how do I fix that?" The result is that if you rephrase the code just slightly, the desired optimization then fires.
You say you tested it and there's not much speed difference. You didn't say for what type. (Is the exponent Int or Integer? What about the base? It's quite possible it makes a significant difference in some obscure case.)
Occasionally functions are also implemented weirdly to maintain strictness / laziness guarantees. (E.g., the library spec says it has to work a certain way, and implementing it the most obvious way would make the function more strict / less strict than the spec claims.)
I don't know what's up with this specific function, but I would suggest #chi is probably onto something.

Related

Sml program -> confusion on "AS syntax error"

so I have to write a small program in SML ->>
a file named ‘p0.sml’ that contains a function named epoly, which accepts as parameters a list of real values a0 through an, and a single real value x. The list contains the coefficients of a polynomial of the form a0 + a1x + a2x 2 + … + anx n, where the real x used is the x parameter passed to your function. Your implementation must accept the list of coefficients as the first parameter and the value of x as the second. Your function must return the value of the polynomial specified by the parameters passed to it.
this is what I have so far but it won't compile because of a syntax error with as. "Error: syntax error found at AS". If you have any pointers that would be greatly appreciated.
fun epoly([], x:real) = 0.0
= epoly(L:real list as h::T, x:real) = h + (x * epoly(T, x));
It looks like you have a typo. Your second = should be a |.
fun epoly([], x:real) = 0.0
| epoly(L:real list as h::T, x:real) =
h + (x * epoly(T, x));
There is, further, no need to specify types. Your SML compiler can infer the types from data presented. Along with removing unnecessary bindings, this can be reduced to:
fun epoly([], _) = 0.0
| epoly(h::T, x) =
h + (x * epoly(T, x));
From fun epoly([], _) = 0.0 we know epoly will take a tuple of a list and some type and return real.
From:
| epoly(h::T, x) =
h + (x * epoly(T, x));
We know that x is being multiplied by a real, so x must be real. And since h is being added to a real, it must be a real, so the entire list is a real list.
Thus the type of epoly can be inferred correctly to be real list * real -> real.

Haskell State monad vs state as parameter performance test

I start to learn a State Monad and one idea bother me. Instead of passing accumulator as parameter, we can wrap everything to the state monad.
So I wanted to compare performance between using State monad vs passing it as parameter.
So I created two functions:
sum1 :: Int -> [Int] -> Int
sum1 x [] = x
sum1 x (y:xs) = sum1 (x + y) xs
and
sumState:: [Int] -> Int
sumState xs = execState (traverse f xs) 0
where f n = modify (n+)
I compared them on the input array [1..1000000000].
sumState running time was around 15s
sum1 around 5s
We can see clear winner, but the I realised that sumState can be optimised as:
We can use strict version of modify
We do not need necessary the map list output, so we can use traverse_ instead
So the new optimised state function is:
sumState:: [Int] -> Int
sumState xs = execState (traverse_ f xs) 0
where f n = modify' (n+)
which has running time around 350ms. This is a huge improvement. It was shocking.
Why the modified sumState has better performance then sum1? Can sum1 be optimised to match or even be better then sumState?
I also tried other different implementation of sum as
using built in sum function, which gives me around 240ms ((sum [1..x] ::Int))
using strict foldl', which gives me the same result around 240ms (with implicit [Int] -> Int)
Does it actually mean that it is better to use foldl function or State monad to pass accumulator instead of passing it as argument to the function?
Thank you for help.
EDIT:
Each function was in separate file with own main function and compiled with "-O2" flag.
main = do
x <- (read . head ) <$> getArgs
print $ <particular sum function> [1..x]
Runtime was measured via time command on linux.
To give a bit more explanation as to why traverse is slower: traverse f xs has has type State [()] and that [()] (list of unit tuples) is built up during the summation. This prevents further optimizations and would cause a memory leak if you were not using lazy state.
Update: I think GHC should have been able to notice that that list of unit tuples is never used, so I opened a GHC issue.
In both cases, To get the best performance we want to combine (or fuse) the summation with the enumeration [1..x] into a tight recursive loop which simply increments and adds until it reaches x. The resulting code would look something like this:
sumFromTo :: Int -> Int -> Int -> Int
sumFromTo s x y
| x == y = s + x
| otherwise = sumFromTo (s + x) (x + 1) y
This avoids allocations for the list [1..x].
The base library achieves this optimization using foldr/build fusion, also known as short cut fusion. The sum, foldl' and traverse (for lists) functions are implemented using the foldr function and [1..x] is implemented using the build function. The foldr and build function have special optimization rules so that they can be fused. Your custom sum1 function doesn't use foldr and so it can never be fused with [1..x] in this way.
Ironically, the same problem that plagued your implementation of sumState is also the problem with sum1. You don't have strict accumulation, so you build up thunks like so:
sum 0 [1, 2, 3]
sum (0 + 1) [2, 3]
sum ((0 + 1) + 2) [3]
sum (((0 + 1) + 2) + 3) []
(((0 + 1) + 2) + 3)
((1 + 2) + 3)
(3 + 3)
6
If you add strictness to sum1, you should see a dramatic improvement in efficiency because you eliminate the non-tail-recursive evaluation of the thunk (((0 + 1) + 2) + 3), which is the costly part of sum1. Using strict accumulation makes this much more efficient:
sum1 x [] = []
sum1 x (y : xs) = x `seq` sum1 (x + y) xs
should give you comparable performance to sum (although as noted in another answer, GHC may not be able to use fusion properly to give you the truly magical performance of sum on the list [1..x]).

Obtain x as result for Re[x] in mathematica

I'm trying to obtain the real part of the result of an operation which involves an undefined variable (let's say x).
How can I have Mathematica return x when I execute Re[x] if I know that x will never be a complex number? I think this involves telling Mathematica that x is a real, but I don't know how.
In my case the expression for which I want the real part is more complicated than a simple variable, but the concept will remain the same.
Some examples:
INPUT OUTPUT DESIRED RESULT
----- ------ --------------
Re[x] Re[x] x
Re[1] 1 1
Re[Sin[x]] Re[Sin[x]] Sin[x]
Re[1+x+I] 1 + Re[x] 1+x
Re[1 + x*I] 1-Im[x] 1
You can use for example the input Simplify[Re[x], x \[Element] Reals] which will give x as output.
Use ComplexExpand. It assumes that the variables are real unless you indicate otherwise. For example:
In[76]:= ComplexExpand[Re[x]]
Out[76]= x
In[77]:= ComplexExpand[Re[Sin[x]]]
Out[77]= Sin[x]
In[78]:= ComplexExpand[Re[1+x+I]]
Out[78]= 1+x
Two more possibilities:
Assuming[x \[Element] Reals, Refine[Re[x]]]
Refine[Re[x], x \[Element] Reals]
Both return x.
It can at times be useful to define UpValues for a symbol. This is far from robust, but it nevertheless can handle a number of cases.
Re[x] ^= x;
Im[x] ^= 0;
Re[x]
Re[1]
Re[1 + x + I]
Re[1 + x*I]
x
1
1 + x
1
Re[Sin[x]] does not evaluate as you desire, but one of the transformations used by FullSimplify does place it in a form that triggers Re[x]:
Re[Sin[x]] // FullSimplify
Sin[x]

existential search and query without the fuss

Is there an extensible, efficient way to write existential statements in Haskell without implementing an embedded logic programming language? Oftentimes when I'm implementing algorithms, I want to express existentially quantified first-order statements like
∃x.∃y.x,y ∈ xs ∧ x ≠ y ∧ p x y
where ∈ is overloaded on lists. If I'm in a hurry, I might write perspicuous code that looks like
find p [] = False
find p (x:xs) = any (\y -> x /= y && (p x y || p y x)) xs || find p xs
or
find p xs = or [ x /= y && (p x y || p y x) | x <- xs, y <- xs]
But this approach doesn't generalize well to queries returning values or predicates or functions of multiple arities. For instance, even a simple statement like
∃x.∃y.x,y,z ∈ xs ∧ x ≠ y ≠ z ∧ f x y z = g x y z
requires writing another search procedure. And this means a considerable amount of boilerplate code. Of course, languages like Curry or Prolog that implement narrowing or a resolution engine allow the programmer to write statements like:
find(p,xs,z) = x ∈ xs & y ∈ xs & x =/= y & f x y =:= g x y =:= z
to abuse the notation considerably, which performs both a search and returns a value. This problem arises often when implementing formally specified algorithms, and is often solved by combinations of functions like fmap, foldr, and mapAccum, but mostly explicit recursion. Is there a more general and efficient, or just general and expressive, way to write code like this in Haskell?
There's a standard transformation that allows you to convert
∃x ∈ xs : P
to
exists (\x -> P) xs
If you need to produce a witness you can use find instead of exists.
The real nuisance of doing this kind of abstraction in Haskell as opposed to a logic language is that you really must pass the "universe" set xs as a parameter. I believe this is what brings in the "fuss" to which you refer in your title.
Of course you can, if you prefer, stuff the universal set (through which you are searching) into a monad. Then you can define your own versions of exists or find to work with the monadic state. To make it efficient, you can try Control.Monad.Logic, but it may involve breaking your head against Oleg's papers.
Anyway, the classic encoding is to replace all binding constructs, including existential and universal quantifiers, with lambdas, and proceed with appropriate function calls. My experience is that this encoding works even for complex nested queries with a lot of structure, but that it always feels clunky.
Maybe I don't understand something, but what's wrong with list comprehensions? Your second example becomes:
[(x,y,z) | x <- xs, y <- xs, z <- xs
, x /= y && y /= z && x /= z
, (p1 x y z) == (p2 x y z)]
This allows you to return values; to check if the formula is satisfied, just use null (it won't evaluate more than needed because of laziness).

Functional learning woes

I'm a beginner to functional languages, and I'm trying to get the whole thing down in Haskell. Here's a quick-and-dirty function that finds all the factors of a number:
factors :: (Integral a) => a -> [a]
factors x = filter (\z -> x `mod` z == 0) [2..x `div` 2]
Works fine, but I found it to be unbearably slow for large numbers. So I made myself a better one:
factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
| y `elem` z = sort z
| x `mod` y == 0 = factorcalc x (y+1) (z ++ [y] ++ [(x `div` y)])
| otherwise = factorcalc x (y+1) z
But here's my problem: Even though the code works, and can cut literally hours off the execution time of my programs, it's hideous!
It reeks of ugly imperative thinking: It constantly updates a counter and a data structure in a loop until it finishes. Since you can't change state in purely functional programming, I cheated by holding the data in the parameters, which the function simply passes to itself over and over again.
I may be wrong, but there simply must be a better way of doing the same thing...
Note that the original question asked for all the factors, not for only the prime factors. There being many fewer prime factors, they can probably be found more quickly. Perhaps that's what the OQ wanted. Perhaps not. But let's solve the original problem and put the "fun" back in "functional"!
Some observations:
The two functions don't produce the same output---if x is a perfect square, the second function includes the square root twice.
The first function enumerates checks a number of potential factors proportional to the size of x; the second function checks only proportional to the square root of x, then stops (with the bug noted above).
The first function (factors) allocates a list of all integers from 2 to n div 2, where the second function never allocates a list but instead visits fewer integers one at a time in a parameter. I ran the optimizer with -O and looked at the output with -ddump-simpl, and GHC just isn't smart enough to optimize away those allocations.
factorcalc is tail-recursive, which means it compiles into a tight machine-code loop; filter is not and does not.
Some experiments show that the square root is the killer:
Here's a sample function that produces the factors of x from z down to 2:
factors_from x 1 = []
factors_from x z
| x `mod` z == 0 = z : factors_from x (z-1)
| otherwise = factors_from x (z-1)
factors'' x = factors_from x (x `div` 2)
It's a bit faster because it doesn't allocate, but it's still not tail-recursive.
Here's a tail-recursive version that is more faithful to the original:
factors_from' x 1 l = l
factors_from' x z l
| x `mod` z == 0 = factors_from' x (z-1) (z:l)
| otherwise = factors_from' x (z-1) l
factors''' x = factors_from x (x `div` 2)
This is still slower than factorcalc because it enumerates all the integers from 2 to x div 2, whereas factorcalc stops at the square root.
Armed with this knowledge, we can now create a more functional version of factorcalc which replicates both its speed and its bug:
factors'''' x = sort $ uncurry (++) $ unzip $ takeWhile (uncurry (<=)) $
[ (z, x `div` z) | z <- [2..x], x `mod` z == 0 ]
I didn't time it exactly, but given 100 million as an input, both it and factorcalc terminate instantaneously, where the others all take a number of seconds.
How and why the function works is left as an exercise for the reader :-)
ADDENDUM: OK, to mitigate the eyeball bleeding, here's a slightly saner version (and without the bug):
saneFactors x = sort $ concat $ takeWhile small $
[ pair z | z <- [2..], x `mod` z == 0 ]
where pair z = if z * z == x then [z] else [z, x `div` z]
small [z, z'] = z < z'
small [z] = True
Okay, take a deep breath. It'll be all right.
First of all, why is your first attempt slow? How is it spending its time?
Can you think of a recursive definition for the prime factorization that doesn't have that property?
(Hint.)
Firstly, although factorcalc is "ugly", you could add a wrapper function factors' x = factorscalc x 2 [], add a comment, and move on.
If you want to make a 'beautiful' factors fast, you need to find out why it is slow. Looking at your two functions, factors walks the list about n/2 elements long, but factorcalc stops after around sqrt n iterations.
Here is another factors that also stops after about sqrt n iterations, but uses a fold instead of explicit iteration. It also breaks the problem into three parts: finding the factors (factor); stopping at the square root of x (small) and then computing pairs of factors (factorize):
factors' :: (Integral a) => a -> [a]
factors' x = sort (foldl factorize [] (takeWhile small (filter factor [2..])))
where
factor z = x `mod` z == 0
small z = z <= (x `div` z)
factorize acc z = z : (if z == y then acc else y : acc)
where y = x `div` z
This is marginally faster than factorscalc on my machine. You can fuse factor and factorize and it is about twice as fast as factorscalc.
The Profiling and Optimization chapter of Real World Haskell is a good guide to the GHC suite's performance tools for tackling tougher performance problems.
By the way, I have a minor style nitpick with factorscalc: it is much more efficient to prepend single elements to the front of a list O(1) than it is to append to the end of a list of length n O(n). The lists of factors are typically small, so it is not such a big deal, but factorcalc should probably be something like:
factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
| y `elem` z = sort z
| x `mod` y == 0 = factorcalc x (y+1) (y : (x `div` y) : z)
| otherwise = factorcalc x (y+1) z
Since you can't change state in purely
functional programming, I cheated by
holding the data in the parameters,
which the function simply passes to
itself over and over again.
Actually, this is not cheating; this is a—no, make that the—standard technique! That sort of parameter is usually known as an "accumulator," and it's generally hidden within a helper function that does the actual recursion after being set up by the function you're calling.
A common case is when you're doing list operations that depend on the previous data in the list. The two problems you need to solve are, where do you get the data about previous iterations, and how do you deal with the fact that your "working area of interest" for any particular iteration is actually at the tail of the result list you're building. For both of these, the accumulator comes to the rescue. For example, to generate a list where each element is the sum of all of the elements of the input list up to that point:
sums :: Num a => [a] -> [a]
sums inp = helper inp []
where
helper [] acc = reverse acc
helper (x:xs) [] = helper xs [x]
helper (x:xs) acc#(h:_) = helper xs (x+h : acc)
Note that we flip the direction of the accumulator, so we can operate on the head of that, which is much more efficient (as Dominic mentions), and then we just reverse the final output.
By the way, I found reading The Little Schemer to be a useful introduction and offer good practice in thinking recursively.
This seemed like an interesting problem, and I hadn't coded any real Haskell in a while, so I gave it a crack. I've run both it and Norman's factors'''' against the same values, and it feels like mine's faster, though they're both so close that it's hard to tell.
factors :: Int -> [Int]
factors n = firstFactors ++ reverse [ n `div` i | i <- firstFactors ]
where
firstFactors = filter (\i -> n `mod` i == 0) (takeWhile ( \i -> i * i <= n ) [2..n])
Factors can be paired up into those that are greater than sqrt n, and those that are less than or equal to (for simplicity's sake, the exact square root, if n is a perfect square, falls into this category. So if we just take the ones that are less than or equal to, we can calculate the others later by doing div n i. They'll be in reverse order, so we can either reverse firstFactors first or reverse the result later. It doesn't really matter.
This is my "functional" approach to the problem. ("Functional" in quotes, because I'd approach this problem the same way even in non-functional languages, but maybe that's because I've been tainted by Haskell.)
{-# LANGUAGE PatternGuards #-}
factors :: (Integral a) => a -> [a]
factors = multiplyFactors . primeFactors primes 0 [] . abs where
multiplyFactors [] = [1]
multiplyFactors ((p, n) : factors) =
[ pn * x
| pn <- take (succ n) $ iterate (* p) 1
, x <- multiplyFactors factors ]
primeFactors _ _ _ 0 = error "Can't factor 0"
primeFactors (p:primes) n list x
| (x', 0) <- x `divMod` p
= primeFactors (p:primes) (succ n) list x'
primeFactors _ 0 list 1 = list
primeFactors (_:primes) 0 list x = primeFactors primes 0 list x
primeFactors (p:primes) n list x
= primeFactors primes 0 ((p, n) : list) x
primes = sieve [2..]
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
primes is the naive Sieve of Eratothenes. There's better, but this is the shortest method.
sieve [2..]
=> 2 : sieve [x | x <- [3..], x `mod` 2 /= 0]
=> 2 : 3 : sieve [x | x <- [4..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : sieve [x | x <- [5..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : 5 : ...
primeFactors is the simple repeated trial-division algorithm: it walks through the list of primes, and tries dividing the given number by each, recording the factors as it goes.
primeFactors (2:_) 0 [] 50
=> primeFactors (2:_) 1 [] 25
=> primeFactors (3:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 1 [(2, 1)] 5
=> primeFactors (5:_) 2 [(2, 1)] 1
=> primeFactors _ 0 [(5, 2), (2, 1)] 1
=> [(5, 2), (2, 1)]
multiplyPrimes takes a list of primes and powers, and explodes it back out to a full list of factors.
multiplyPrimes [(5, 2), (2, 1)]
=> [ pn * x
| pn <- take (succ 2) $ iterate (* 5) 1
, x <- multiplyPrimes [(2, 1)] ]
=> [ pn * x | pn <- [1, 5, 25], x <- [1, 2] ]
=> [1, 2, 5, 10, 25, 50]
factors just strings these two functions together, along with an abs to prevent infinite recursion in case the input is negative.
I don't know much about Haskell, but somehow I think this link is appropriate:
http://www.willamette.edu/~fruehr/haskell/evolution.html
Edit: I'm not entirely sure why people are so aggressive about the downvoting on this. The original poster's real problem was that the code was ugly; while it's funny, the point of the linked article is, to some extent, that advanced Haskell code is, in fact, ugly; the more you learn, the uglier your code gets, to some extent. The point of this answer was to point out to the OP that apparently, the ugliness of the code that he was lamenting is not uncommon.

Resources