When I use the following code in Racket:
#lang racket
(define (sieve x)
(if (stream-empty? x) empty-stream
(stream-cons (stream-first x)
(sieve (stream-filter (λ(q)(not (zero? (modulo q (stream-first x)))))
(stream-rest x))))))
(define (in-primes)
(sieve (in-naturals 2)))
(define (nth-prime n)
(for/last ([i (in-range n)]
[j (in-primes)]) j))
The largest number for which I can effectively compute the nth prime is 1000. Is this a reasonable implementation of the sieve of Eratosthenes, or is there something I can do to significantly speed up the above?
No, it's not. It's a trial division algorithm, and an extremely inefficient and suboptimal one.
Each candidate here is tested by all its preceding primes, whereas just those not greater than its square root are enough. This translates to immense worsening of complexity. I expect it runs at ~ n2 at best, in n primes produced, instead of ~ n1.45 of an optimal trial division sieve, or ~ n1.1 of a proper sieve of Eratosthenes implementation.
The creation of filters should be postponed until a prime's square is seen among the candidates, to make it an optimal trial division algorithm.
You can greatly improve your code's performance with a minimal edit, following the principle of "do less, get done more": instead of calling stream-first at each step, don't. Just produce the intermediate streams in full, as they are:
(define (sieves x)
(if (stream-empty? x)
empty-stream
(stream-cons x ; <-- here
(sieves (stream-filter
(λ (q) (not (zero? (modulo q (stream-first x)))))
(stream-rest x))))))
Now sieves produces a stream of streams. In each interim stream, all the numbers in the initial prefix from the first value up to its square are prime by construction. Now we can stop early, and thus drastically reduce the number of the interim streams.
To actually produce the primes, take first element from each interim stream except the last interim stream, and from the last interim stream take all elements, from the first element up to its square (or the desired upper limit - below that square). This will have roughly the same overall time complexity as the optimal trial division (which, at each step, takes away not just the head element from the current interim stream, but the whole prefix up to the head's square, so the next filter starts from there).
To estimate the magnitude of n-th prime, you can use formula p_n < n * log(n*log(n)), for n > 6 (according to Wiipedia).
You can find stream-based SoE code here, though in Scheme, not Racket.
see also:
From Turner's sieve to Bird's -- Haskell gist
How do I define the sieve function for prime computation using higher–order functions?
Related
Section 2.2.4 here contains the following:
2.2.4 Totally Inappropriate Data Structures
Some might find this example hard to believe. This really occurred in some code I’ve seen:
(defun make-matrix (n m)
(let ((matrix ()))
(dotimes (i n matrix)
(push (make-list m) matrix))))
(defun add-matrix (m1 m2)
(let ((l1 (length m1))
(l2 (length m2)))
(let ((matrix (make-matrix l1 l2)))
(dotimes (i l1 matrix)
(dotimes (j l2)
(setf (nth i (nth j matrix))
(+ (nth i (nth j m1))
(nth i (nth j m2)))))))))
What’s worse is that in the particular application, the matrices were all fixed size, and matrix arithmetic would have been just as fast in Lisp as in FORTRAN.
This example is bitterly sad: The code is absolutely beautiful, but it adds matrices slowly. Therefore it is excellent prototype code and lousy production code. You know, you cannot write production code as bad as this in C.
Clearly, the author thinks that something is fundamentally wrong with the data structures used in this code. On a technical level, what has went wrong? I worry that this question might be opinion-based, but the author's phrasing suggests that there is an objectively correct and very obvious answer.
Lisp lists are singly-linked. Random access to an element (via nth) requires traversing all predecessors. The storage is likewise wasteful for this purpose. Working with matrices this way is very inefficient.
Lisp has built-in multidimensional arrays, so a natural fit for this problem would be a two-dimensional array, which can access elements directly instead of traversing links.
There's a strong assumption in numerical code that access to elements of matrices, or more generally arrays, is approximately constant-time. The time taken for a[n, m] should not depend on n and m. That's hopelessly untrue in this case, since, given the obvious definition of matrix-ref:
(defun matrix-ref (matrix n m)
(nth m (nth n matrix)))
then, since nth takes time proportional to its first argument (more generally: accessing the nth element of a Lisp list takes time proportional to n+1, counting from zero), then the time taken by matrix-ref is proportional to the sum of the two indices (or in fact to the sum of the two (indices + 1) but this does not matter.).
This means that, for instance, almost any algorithms involving matrices will move up time complexity classes. That's bad.
List type of matrix is slow for products as descripted above. However, it's good for teaching, you can build a matrix library with very little knowledge of lisp and with less bugs. I've build such a basic matrix library when I read "Neural Network Design", see this code in github: https://github.com/hxzrx/nnd/blob/master/linear-algebra.lisp.
The summation procedure of section 1.3.1 of SICP produces a linear recursive process with order of N space and time complexity. The code for this procedure is:
(define (sum-integers a b)
(if (< a b)
0
(+ a (sum-integers (+ a 1) b))))
What I would like to know is, if I decided that I want to sum a range of Fibonacci numbers using the analogous procedure:
(define (sum-fib a b)
(if (< a b)
0
(+ (fib a) (sum-fib (+ a 1) b))))
with fib defined as:
(define (fib n)
(cond ((= n 0) 0)
((= n 1) 1)
(else (+ (fib (- n 1))
(fib (- n 2))))))
How would I analyse the space and time complexity of sum-fib? Would I ignore the linear recursive flavor of the overall procedure and prioritize the tree recursion of fib within it as a worst case scenario? Would I have to somehow combine the space/time complexities of fib and sum-fib, and if so, how? Also, say I got sum-fib from another programmer and I was using it as a component in a larger system. If my program slowed down because of how fib was implemented, how would I know?
This is my first question on this platform so please also advise on how to better post and find answers to questions. Your contribution is appreciated.
There is a slight error in your code. After checking SICP, I am assuming you meant to use a > instead of a < in both sum-integers and sum-fib. That is the only modification I made, please correct me if it was done erroneously.
Note: I do not have a formal background, but this question has been unanswered for quite a while, so I thought I would share my thoughts for anyone else who happens across it.
Time
When dealing with the time complexity, we care about how many iterations are performed as n grows larger. Here, we can assume n to be the distance between a and b (inclusive) in sum-fib. The function sum-fib itself will only recurse n times in this case. If a was 0 and b was 9, then the function will run 10 times. This is completely linear, or O(n), but it isn't so simple: the next question to ask is what happens for each of these iterations?
We know that the summation part is linear, so all that's left is the Fibonnaci function. Inside, you see that it either immediately terminates ( O(1) ), or branches off into two recursive calls to itself. Big-O notation is concerned with the worst-case, meaning the branch. We'll have 1 call turn to 2, which turns to 4, which turns to 8, etc, n times. This behavior is O(2^n).
Don't forget that this is called n times as part of the overarching O(n) summation loop, so the total function will be O(n(2^n)).
Space
The space requirements of a function are a bit different. By writing out what's going on by hand, you can start to see the shape of the function form. This is what is shown early on in SICP, where a "pyramid" function is compared to a linear one.
One thing to keep in mind is that Scheme is tail-call optimized. This means that, if a recursive call is at the end of a function (meaning that there are no instructions which take place after the recursive call), then the frame can be reused, and no extra space is required. For example:
(define (loop n)
(if (> n 2)
0
(loop (+ n 1))))
Drawing out (loop 0) would be:
(loop 0)
(loop 1)
(loop 2)
0
You can see that the space required is linear. Compare this to:
(define (loop n)
(if (> n 2)
0
(+ n (loop (+ n 1)))))
With (loop 0):
(loop 0)
(1 + (loop 1))
(1 + (2 + (loop 2)))
(1 + (2 + 0))
(1 + 2)
3
You can see that the space required grows as the number of iterations required grows in this case.
In your case, the space required is going to increase dramatically as n increases, since fib generates a full tree for each number, and is not tail-recursive, nor is sum-fib.
I suspect that the space required will also be O(n(2^n)). The sum-fib function (ignoring the fib calls), seems to be linear in space, or O(n). It calls 2 fibs per iteration. Each fib branches off into 2 more, and is not tail-recursive, so the space required is O(2^n). Combining them, we get O(n(2^n)). Whether or not this will always be the case, I am not certain.
How to Test for Slow Functions
What you are looking for is called a profiler. It will watch your code while it runs, and report back to you with information on which functions took the most time, which functions were called most often, etc. For Scheme, Dr. Racket is an IDE which has a built-in profiler.
A word of advice: Get your software working first, then worry about profiling and optimizations. Many programmers get stuck in hyper-optimizing their code without first finishing to see where the true bottlenecks lie. You can spend weeks gaining a 1% performance boost utilizing arcane algorithms when it turns out that a 5-minute tweak could net you a 50% boost.
I'm looking to retrieving the fractional part of a number. Ie. if i given the number 3.14. I need the output to be 0.14 or 14.
And I need to do this without the using the built-in functions round, floor or ceiling.
For a sneaky solution, you can use regular expressions to crop off everything before the dot:
(define (fraction-only num)
(string->number (regexp-replace #rx".*(\\..*)" (number->string num) "\\1")))
A simple solution(provided you can use truncate) is something like this:
(define (fract-part n)
(- n (truncate n)))
It will have the usual floating-point math rounding errors though, so (fract-part 3.14) returns 0.14000000000000012
I need to write a basic scheme procedure that can find the median of a list and another for the mean.
This is what I've come up with so far:
Mean:
(define (mean lst)
(if (null? lst) ()
(+ 1 (car lst) (mean (cdr lst))))
I know I need to divide my the length somewhere but not sure how to do so. My thought process for this is to add each element to the stack of the list and then divide my the length of the list?
Median:
I'm not sure where to start for median.I know I need to determine if the list has an odd number of elements or even, so to do that I've come up with
(define (median lst)
(if (integer? (/ (length lst) 2) ;which is the one for even
I don't know if I need another procedure to get me to the middle of the list?
The median procedure was already discussed here.
Calculating the mean is simple, just add all the elements and divide by the length of the list, the only special case to take care of is when the list is empty (because that will lead to a division by zero: the length is zero!), return an appropriate value indicating this.
By now you should definitely know how to add all the elements in a list, check with your instructor in case of doubts, but it's a basic operation, it shouldn't be a problem.
I make simple factorial program in Clojure.
(defn fac [x y]
(if (= x 1) y (recur (- x 1) (* y x)))
)
(def fact [n] (fac n 1))
How can it be done faster? If it can be done some faster way.
You can find many fast factorial algorithms here: http://www.luschny.de/math/factorial/FastFactorialFunctions.htm
As commented above, Clojure is not the best language for that. Consider using C, C++, ForTran.
Be careful with the data structures that you use, because factorials grow really fast.
Here is my favorite:
(defn fact [n] (reduce *' (range 1 (inc n))))
The ' tells Clojure to use BigInteger transparently so as to avoid overflow.
With the help of your own fact function (or any other), we can define this extremely fast version:
(def fact* (mapv fact (cons 1 (range 1 21))))
This will give the right results for arguments in the range from 1 to 20 in constant time. Beyond that range, your version doesn't give correct results either (i.e. there's an integer overflow with (fact 21)).
EDIT: Here's an improved implementation that doesn't need another fact implementation, does not overflow and should be much faster during definition because it doesn't compute each entry in its lookup table from scratch:
(def fact (persistent! (reduce (fn [v n] (conj! v (*' (v n) (inc n))))
(transient [1])
(range 1000))))
EDIT 2: For a different fast solution, i.e. without building up a lookup table, it's probably best to use a library that's already highly optimized. Google's general utility library Guava includes a factorial implementation.
Add it to your project by adding this Leiningen dependency: [com.google.guava/guava "15.0"]. Then you need to (import com.google.common.math.BigIntegerMath) and can then call it with (BigIntegerMath/factorial n).