In clojure[script], how to return nearest elements between 2 sorted vectors - algorithm

In clojure[script], how to write a function nearest that receives two sorted vectors a, b and returns for each element of a the nearest element of b?
As an example,
(nearest [1 2 3 101 102 103] [0 100 1000]); [0 0 0 100 100 100]
I would like the solution to be both idiomatic and with good performances: O(n^2) is not acceptable!

Using a binary search or a sorted-set incurs a O(n*log m) time complexity where n is (count a) and m (count b).
However leveraging the fact that a and b are sorted the time complexity can be O(max(n, m)).
(defn nearest [a b]
(if-let [more-b (next b)]
(let [[x y] b
m (/ (+ x y) 2)
[<=m >m] (split-with #(<= % m) a)]
(lazy-cat (repeat (count <=m) x)
(nearest >m more-b)))
(repeat (count a) (first b))))
=> (nearest [1 2 3 101 102 103 501 601] [0 100 1000])
(0 0 0 100 100 100 100 1000)

Let n be (count a) and m be (count b). Then, if a and b are both ordered, then this can be done in what I believe ought to be O(n log(log m)) time, in other words, very close to linear in n.
First, let's re-implement abs and a binary-search (improvements here) to be independent of host (leveraging a native, e.g. Java's, version ought to be significantly faster)
(defn abs [x]
(if (neg? x) (- 0 x) x))
(defn binary-search-nearest-index [v x]
(if (> (count v) 1)
(loop [lo 0 hi (dec (count v))]
(if (== hi (inc lo))
(min-key #(abs (- x (v %))) lo hi)
(let [m (quot (+ hi lo) 2)]
(case (compare (v m) x)
1 (recur lo m)
-1 (recur m hi)
0 m))))
0))
If b is sorted, a binary search in b takes log m steps. So, mapping this over a is a O(n log m) solution, which for the pragmatist is likely good enough.
(defn nearest* [a b]
(map #(b (binary-search-nearest-index b %)) a))
However, we can also use the fact that a is sorted to divide and conquer a.
(defn nearest [a b]
(if (< (count a) 3)
(nearest* a b)
(let [m (quot (count a) 2)
i (binary-search-nearest-index b (a m))]
(lazy-cat
(nearest (subvec a 0 m) (subvec b 0 (inc i)))
[(b i)]
(nearest (subvec a (inc m)) (subvec b i))))))
I believe this ought to be O(n log(log m)). We start with the median of a and find nearest in b in log m time. Then we recurse on each half of a with split portions of b. If a m-proportional factor of b are split each time, you have O(n log log m). If only a constant portion is split off then the half of a working on that portion is linear time. If that continues (iterative halves of a work on constant size portions of b) then you have O(n).

Inspired by #amalloy, I have found this interesting idea by Chouser and wrote this solution:
(defn abs[x]
(max x (- x)))
(defn nearest-of-ss [ss x]
(let [greater (first (subseq ss >= x))
smaller (first (rsubseq ss <= x))]
(apply min-key #(abs (- % x)) (remove nil? [greater smaller]))))
(defn nearest[a b]
(map (partial nearest-of-ss (apply sorted-set a)) b))
Remark: It's important to create the sorted-set only once, in order to avoid performance penalty!

Related

Finding primes up to a certain number in Racket

I'm learning Racket (with the HtDP course) and it's my first shot at a program in a functional language.
I've tried to design a function that finds all primes under a certain input n using (what I think is) a functional approach to the problem, but the program can get really slow (86 seconds for 100.000, while my Python, C and C++ quickly-written solutions take just a couple of seconds).
The following is the code:
;; Natural Natural -> Boolean
;; Helper function to avoid writing the handful (= 0 (modulo na nb))
(define (divisible na nb) (= 0 (modulo na nb)))
;; Natural ListOfNatural -> Boolean
;; n is the number to check, lop is ALL the prime numbers less than n
(define (is-prime? n lop)
(cond [(empty? lop) true]
[(divisible n (first lop)) false]
[ else (is-prime? n (rest lop))]))
;; Natural -> ListOfNatural
(define (find-primes n)
(if (= n 2)
(list 2)
(local [(define LOP (find-primes (sub1 n)))]
(if (is-prime? n LOP)
(append LOP (list n))
LOP))))
(time (find-primes 100000))
I'm using the divisible function instead of just plowing the rest in because I really like to have separated functions when they could be of use in another part of the program. I also should probably define is-prime? inside of find-primes, since no one will ever call is-prime? on a number while also giving all the prime numbers less than that number.
Any pointers on how to improve this?
Here are some ideas for improving the performance, the procedure now returns in under two seconds for n = 100000.
(define (is-prime? n lop)
(define sqrtn (sqrt n))
(if (not (or (= (modulo n 6) 1) (= (modulo n 6) 5)))
false
(let loop ([lop lop])
(cond [(or (empty? lop) (< sqrtn (first lop))) true]
[(zero? (modulo n (first lop))) false]
[else (loop (rest lop))]))))
(define (find-primes n)
(cond [(<= n 1) '()]
[(= n 2) '(2)]
[(= n 3) '(2 3)]
[else
(let loop ([lop '(2 3)] [i 5])
(cond [(> i n) lop]
[(is-prime? i lop) (loop (append lop (list i)) (+ i 2))]
[else (loop lop (+ i 2))]))]))
Some of the optimizations are language-related, others are algorithmic:
The recursion was converted to be in tail position. In this way, the recursive call is the last thing we do at each step, with nothing else to do after it - and the compiler can optimize it to be as efficient as a loop in other programming languages.
The loop in find-primes was modified for only iterating over odd numbers. Note that we go from 3 to n instead of going from n to 2.
divisible was inlined and (sqrt n) is calculated only once.
is-prime? only checks up until sqrt(n), it makes no sense to look for primes after that. This is the most important optimization, instead of being O(n) the algorithm is now O(sqrt(n)).
Following #law-of-fives's advice, is-prime? now skips the check when n is not congruent to 1 or 5 modulo 6.
Also, normally I'd recommend to build the list using cons instead of append, but in this case we need the prime numbers list to be constructed in ascending order for the most important optimization in is-prime? to work.
Here's Óscar López's code, tweaked to build the list in the top-down manner:
(define (is-prime? n lop)
(define sqrtn (sqrt n))
(let loop ([lop lop])
(cond [(or (empty? lop) (< sqrtn (mcar lop))) true]
[(zero? (modulo n (mcar lop))) false]
[else (loop (mcdr lop))])))
(define (find-primes n)
(let* ([a (mcons 3 '())]
[b (mcons 2 a)])
(let loop ([p a] [i 5] [d 2] ; d = diff +2 +4 +2 ...
[c 2]) ; c = count of primes found
(cond [(> i n) c]
[(is-prime? i (mcdr a))
(set-mcdr! p (mcons i '()))
(loop (mcdr p) (+ i d) (- 6 d) (+ c 1))]
[else (loop p (+ i d) (- 6 d) c )]))))
Runs at about ~n1.25..1.32, empirically; compared to the original's ~n1.8..1.9, in the measured range, inside DrRacket (append is the culprit of that bad behaviour). The "under two seconds" for 100K turns into under 0.05 seconds; two seconds gets you well above 1M (one million):
; (time (length (find-primes 100000))) ; with cons times in milliseconds
; 10K 156 ; 20K 437 ; 40K 1607 ; 80K 5241 ; 100K 7753 .... n^1.8-1.9-1.7 OP's
; 10K 62 ; 20K 109 ; 40K 421 ; 80K 1217 ; 100K 2293 .... n^1.8-1.9 Óscar's
; mcons:
(time (find-primes 2000000))
; 100K 47 ; 200K 172 ; 1M 1186 ; 2M 2839 ; 3M 4851 ; 4M 7036 .... n^1.25-1.32 this
; 9592 17984 78498 148933 216816 283146
It's still just a trial division though... :) The sieve of Eratosthenes will be much faster yet.
edit: As for set-cdr!, it is easy to emulate any lazy algorithm with it... Otherwise, we could use extendable arrays (lists of...), for the amortized O(1) snoc/append1 operation (that's lots and lots of coding); or maintain the list of primes split in two (three, actually; see the code below), building the second portion in reverse with cons, and appending it in reverse to the first portion only every so often (specifically, judging the need by the next prime's square):
; times: ; 2M 1934 ; 3M 3260 ; 4M 4665 ; 6M 8081 .... n^1.30
;; find primes up to and including n, n > 2
(define (find-primes n)
(let loop ( [k 5] [q 9] ; next candidate; square of (car LOP2)
[LOP1 (list 2)] ; primes to test by
[LOP2 (list 3)] ; more primes
[LOP3 (list )] ) ; even more primes, in reverse
(cond [ (> k n)
(append LOP1 LOP2 (reverse LOP3)) ]
[ (= k q)
(if (null? (cdr LOP2))
(loop k q LOP1 (append LOP2 (reverse LOP3)) (list))
(loop (+ k 2)
(* (cadr LOP2) (cadr LOP2)) ; next prime's square
(append LOP1 (list (car LOP2)))
(cdr LOP2) LOP3 )) ]
[ (is-prime? k (cdr LOP1))
(loop (+ k 2) q LOP1 LOP2 (cons k LOP3)) ]
[ else
(loop (+ k 2) q LOP1 LOP2 LOP3 ) ])))
;; n is the number to check, lop is list of prime numbers to check it by
(define (is-prime? n lop)
(cond [ (null? lop) #t ]
[ (divisible n (car lop)) #f ]
[ else (is-prime? n (cdr lop)) ]))
edit2: The easiest and simplest fix though, closest to your code, was to decouple the primes calculations of the resulting list, and of the list to check divisibility by. In your
(local [(define LOP (find-primes (sub1 n)))]
(if (is-prime? n LOP)
LOP is used as the list of primes to check by, and it is reused as part of the result list in
(append LOP (list n))
LOP))))
immediately afterwards. Breaking this entanglement enables us to stop the generation of testing primes list at the sqrt of the upper limit, and thus it gives us:
;times: ; 1M-1076 2M-2621 3M-4664 4M-6693
; n^1.28 ^1.33 n^1.32
(define (find-primes n)
(cond
((<= n 4) (list 2 3))
(else
(let* ([LOP (find-primes (inexact->exact (floor (sqrt n))))]
[lp (last LOP)])
(local ([define (primes k ps)
(if (<= k lp)
(append LOP ps)
(primes (- k 2) (if (is-prime? k LOP)
(cons k ps)
ps)))])
(primes (if (> (modulo n 2) 0) n (- n 1)) '()))))))
It too uses the same is-prime? code as in the question, unaltered, as does the second variant above.
It is slower than the 2nd variant. The algorithmic reason for this is clear — it tests all numbers from sqrt(n) to n by the same list of primes, all smaller or equal to the sqrt(n) — but in testing a given prime p < n it is enough to use only those primes that are not greater than sqrt(p), not sqrt(n). But it is the closest to your original code.
For comparison, in Haskell-like syntax, under strict evaluation,
isPrime n lop = null [() | p <- lop, rem n p == 0]
-- OP:
findprimes 2 = [2]
findprimes n = lop ++ [n | isPrime n lop]
where lop = findprimes (n-1)
= lop ++ [n | n <- [q+1..n], isPrime n lop]
where lop = findprimes q ; q = (n-1)
-- 3rd:
findprimes n | n < 5 = [2,3]
findprimes n = lop ++ [n | n <- [q+1..n], isPrime n lop]
where lop = findprimes q ;
q = floor $ sqrt $ fromIntegral n
-- 2nd:
findprimes n = g 5 9 [2] [3] []
where
g k q a b c
| k > n = a ++ b ++ reverse c
| k == q, [h] <- b = g k q a (h:reverse c) []
| k == q, (h:p:ps) <- b = g (k+2) (p*p) (a++[h]) (p:ps) c
| isPrime k a = g (k+2) q a b (k:c)
| otherwise = g (k+2) q a b c
The b and c together (which is to say, LOP2 and LOP3 in the Scheme code) actually constitute a pure functional queue a-la Okasaki, from which sequential primes are taken and appended at the end of the maintained primes prefix a (i.e. LOP1) now and again, on each consecutive prime's square being passed, for a to be used in the primality testing by isPrime.
Because of the rarity of this appending, its computational inefficiency has no impact on the time complexity of the code overall.

How to go about composing core functions, rather then using imperative style?

I have translated this code, the snippet below, from Python to Clojure. I replaced Python's while construct with Clojure's loop-recur here. But this doesn't look idiomatic.
(loop [d 2 [n & more] (list 256)]
(if (> n 1)
(recur (inc d)
(loop [x n sublist more]
(if (= (rem x d) 0)
(recur (/ x d) (conj sublist d))
(conj sublist x))))
(sort more)))
This routine gives me (3 3 31), that is prime factors of 279. For 256, it gives, (2 2 2 2 2 2 2 2), that means, 2^8.
Moreover, it performs worse for large values, say 987654123987546 instead of 279; whereas Python's counterpart works like charm.
How to start composing core functions, rather then translating imperative code as is? And specifically, how to improve this bit?
Thanks.
[Edited]
Here is the python code, I referred above,
def prime_factors(n):
factors = []
d = 2
while n > 1:
while n % d == 0:
factors.append(d)
n /= d
d = d + 1
return factors
A straight translation of the Python code in Clojure would be:
(defn prime-factors [n]
(let [n (atom n) ;; The Python code makes use of mutability which
factors (atom []) ;; isn't idiomatic in Clojure, but can be emulated
d (atom 2)] ;; using atoms
(loop []
(when (< 1 #n)
(loop []
(when (== (rem #n #d) 0)
(swap! factors conj #d)
(swap! n quot #d)
(recur)))
(swap! d inc)
(recur)))
#factors))
(prime-factors 279) ;; => [3 3 31]
(prime-factors 987654123987546) ;; => [2 3 41 14389 279022459]
(time (prime-factors 987654123987546)) ;; "Elapsed time: 13993.984 msecs"
;; same performance on my machine
;; as the Rosetta Code solution
You can improve this code to make it more idiomatic:
from nested loops to a single loop:
(loop []
(cond
(<= #n 1) #factors
(not= (rem #n #d) 0) (do (swap! d inc)
(recur))
:else (do (swap! factors conj #d)
(swap! n quot #d)
(recur))))))
get rid of the atoms:
(defn prime-factors [n]
(loop [n n
factors []
d 2]
(cond
(<= n 1) factors
(not= (rem n d) 0) (recur n factors (inc d))
:else (recur (quot n d) (conj factors d) d))))
replace == 0 by zero?:
(not (zero? (rem n d))) (recur n factors (inc d))
You can also overhaul it completely to make a lazy version of it:
(defn prime-factors [n]
((fn step [n d]
(lazy-seq
(when (< 1 n)
(cond
(zero? (rem n d)) (cons d (step (quot n d) d))
:else (recur n (inc d)))))
n 2))
I planned to have a section on optimization here, but I'm no specialist. The only thing I can say is that you can trivially make this code faster by interrupting the loop when d is greater than the square root of n:
(defn prime-factors [n]
(if (< 1 n)
(loop [n n
factors []
d 2]
(let [q (quot n d)]
(cond
(< q d) (conj factors n)
(zero? (rem n d)) (recur q (conj factors d) d)
:else (recur n factors (inc d)))))
[]))
(time (prime-factors 987654123987546)) ;; "Elapsed time: 7.124 msecs"
Not every loop unrolls cleanly into an elegant "functional" decomposition.
The Rosetta Code solution suggested by #edbond is pretty simple and concise; I would say it's idiomatic since no obvious "functional" solution is apparent. That solution runs noticeably faster on my machine than your Python version for 987654123987546.
More generally, if you're looking to expand your understanding of functional idioms, Bedra and Halloway's "Programming Clojure" (pp.90-95) presents an excellent comparison of different versions of the Fibonacci sequence, using loop, lazy seqs, and an elegant "functional" version. Chouser and Fogus's "Joy of Clojure" (MEAP version) also has a nice section on function composition.

Why does this simplification make my function slower?

The following function computes the Fibonacci series by tail recursive and squaring:
(defun fib1 (n &optional (a 1) (b 0) (p 0) (q 1))
(cond ((zerop n) b)
((evenp n)
(fib1 (/ n 2)
a
b
(+ (* p p) (* q q))
(+ (* q q) (* 2 p q))))
(t
(fib1 (1- n)
(+ (* b q) (* a (+ p q)))
(+ (* b p) (* a q))
p
q))))
Basically it reduces every odd input to a even one, and reduces every even input by half. For example,
F(21)
= F(21 1 0 0 1)
= F(20 1 1 0 1)
= F(10 1 1 1 1)
= F(5 1 1 2 3)
= F(4 8 5 2 3)
= F(2 8 5 13 21)
= F(1 8 5 610 987)
= F(0 17711 10946 610 987)
= 10946
When I saw this I thought it might be better to combine the even and odd cases (since odd minus one = even), so I wrote
(defun fib2 (n &optional (a 1) (b 0) (p 0) (q 1))
(if (zerop n) b
(fib2 (ash n -1)
(if (evenp n) a (+ (* b q) (* a (+ p q))))
(if (evenp n) b (+ (* b p) (* a q)))
(+ (* p p) (* q q))
(+ (* q q) (* 2 p q)))))
and hoping this will make it faster, as the equations above now becomes
F(21)
= F(21 1 0 0 1)
= F(10 1 1 1 1)
= F(5 1 1 2 3)
= F(2 8 5 13 21)
= F(1 8 5 610 987)
= F(0 17711 10946 1346269 2178309)
= 10946
However, it turned out to be much slower (takes about 50% more time in e.g. Clozure CL, CLisp and Lispworks) when I check the time needed for Fib(1000000) by the following code (Ignore the progn, I just don't want my screen filled with numbers.)
(time (progn (fib1 1000000)()))
(time (progn (fib2 1000000)()))
I can only see fib2 may do more evenp than fib1, so why is it that much slower?
EDIT: I think n.m. get it right, and I edited the second group of formulae. E.g. in the example of F(21) above, fib2 actually computes F(31) and F(32) in p and q, which is never used. So in F(1000000), fib2 computes F(1048575) and F(1048576).
Lazy evaluation rocks, that's a very good point. I guess in Common Lisp only some macro like "and" and "or" are evaluated lazily?
The following modified fib2 (defined for n>0) actually runs faster:
(defun fib2 (n &optional (a 1) (b 0) (p 0) (q 1))
(if (= n 1) (+ (* b p) (* a q))
(fib2 (ash n -1)
(if (evenp n) a (+ (* b q) (* a (+ p q))))
(if (evenp n) b (+ (* b p) (* a q)))
(+ (* p p) (* q q))
(+ (* q q) (* 2 p q)))))
Insert printing of the intermediate results. Pay attention to p and q towards the end of the computation.
You will notice that fib2 computes much larger values for p and q at the last step. These two values account for all the performance difference.
The ironic thing is that these expensive values are unused. This is why Haskell doesn't suffer from this performance problem: the unused values are not actually computed.
If nothing else, fib2 has more conditionals (while computing the arguments). That may well change how the code flow is done. Conditionals imply jumps, implies pipeline stalls.
It would probably be instructive to look at the generated code (try (disassemble #'fib1) and (disassemble #'fib2) and see if there's any blatant differences). It might also be worth to change the optimization settings, there's usually a fair few optimizations that are not done unless you request heavy optimization for speed.

Performance of function in Clojure 1.3

I was wondering if someone could help me with the performance of this code snippet in Clojure 1.3. I am trying to implement a simple function that takes two vectors and does a sum of products.
So let's say the vectors are X (size 10,000 elements) and B (size 3 elements), and the sum of products are stored in a vector Y, mathematically it looks like this:
Y0 = B0*X2 + B1*X1 + B2*X0
Y1 = B0*X3 + B1*X2 + B2*X1
Y2 = B0*X4 + B1*X3 + B2*X2
and so on ...
For this example, the size of Y will end up being 9997, which corresponds to (10,000 - 3). I've set up the function to accept any size of X and B.
Here's the code: It basically takes (count b) elements at a time from X, reverses it, maps * onto B and sums the contents of the resulting sequence to produce an element of Y.
(defn filt [b-vec x-vec]
(loop [n 0 sig x-vec result []]
(if (= n (- (count x-vec) (count b-vec)))
result
(recur (inc n) (rest sig) (conj result (->> sig
(take (count b-vec))
(reverse)
(map * b-vec)
(apply +)))))))
Upon letting X be (vec (range 1 10001)) and B being [1 2 3], this function takes approximately 6 seconds to run. I was hoping someone could suggest improvements to the run time, whether it be algorithmic, or perhaps a language detail I might be abusing.
Thanks!
P.S. I have done (set! *warn-on-reflection* true) but don't get any reflection warning messages.
You are using count many times unnecessary. Below code calculate count one time only
(defn filt [b-vec x-vec]
(let [bc (count b-vec) xc (count x-vec)]
(loop [n 0 sig x-vec result []]
(if (= n (- xc bc))
result
(recur (inc n) (rest sig) (conj result (->> sig
(take bc)
(reverse)
(map * b-vec)
(apply +))))))))
(time (def b (filt [1 2 3] (range 10000))))
=> "Elapsed time: 50.892536 msecs"
If you really want top performance for this kind of calculation, you should use arrays rather than vectors. Arrays have a number of performance advantages:
They support O(1) indexed lookup and writes - marginally better than vectors which are O(log32 n)
They are mutable, so you don't need to construct new arrays all the time - you can just create a single array to serve as the output buffer
They are represented as Java arrays under the hood, so benefit from the various array optimisations built into the JVM
You can use primitive arrays (e.g. of Java doubles) which are much faster than if you use boxed number objects
Code would be something like:
(defn filt [^doubles b-arr
^doubles x-arr]
(let [bc (count b-arr)
xc (count x-arr)
rc (inc (- xc bc))
result ^doubles (double-array rc)]
(dotimes [i rc]
(dotimes [j bc]
(aset result i (+ (aget result i) (* (aget x-arr (+ i j)) (aget b-arr j))))))
result))
To follow on to Ankur's excellent answer, you can also avoid repeated calls to the reverse function, which gets us even a little more performance.
(defn filt [b-vec x-vec]
(let [bc (count b-vec) xc (count x-vec) bb-vec (reverse b-vec)]
(loop [n 0 sig x-vec result []]
(if (= n (- xc bc))
result
(recur (inc n) (rest sig) (conj result (->> sig
(take bc)
(map * bb-vec)
(apply +))))))))

Fermat factorization method limit

I am trying to implement Fermat's factorization (Algorithm C in The Art of Computer Programming, Vol. 2). Unfortunately in my edition (ISBN 81-7758-335-2), this algorithm is printed incorrectly. what should be the condition on factor-inner loop below? I am running the loop till y <= n [passed in as limit].
(if (< limit y) 0 (factor-inner x (+ y 2) (- r y) limit))
Is there anyway to avoid this condition altogether, as it will double the speed of loop?
(define (factor n)
(let ((square-root (inexact->exact (floor (sqrt n)))))
(factor-inner (+ (* 2 square-root) 1)
1
(- (* square-root square-root) n)
n)))
(define (factor-inner x y r limit)
(if (= r 0)
(/ (- x y) 2)
(begin
(display x) (display " ") (display y) (display " ") (display r) (newline)
;;(sleep-current-thread 1)
(if (< r 0)
(factor-inner (+ x 2) y (+ r x) limit)
(if (< limit y)
0
(factor-inner x (+ y 2) (- r y) limit))))))
The (< limit y) check is not necessary because, worst-case, the algorithm will eventually find this solution:
x = N + 2
y = N
It will then return 1.
Looking through Algorithm C, it looks like the issue is with the recursion step, which effectively skips step C4 whenever r < 0, because x is not incremented and r is only decremented by y.
Using the notation of a, b and r from the 1998 edition of Vol. 2 (ISBN 0-201-89684-2), a Scheme version would be as follows:
(define (factor n)
(let ((x (inexact->exact (floor (sqrt n)))))
(factor-inner (+ (* x 2) 1)
1
(- (* x x) n))))
(define (factor-inner a b r)
(cond ((= r 0) (/ (- a b) 2))
((< 0 r) (factor-inner a (+ b 2) (- r b)))
(else (factor-inner (+ a 2) (+ b 2) (- r (- a b))))))
EDIT to add: Basically, we are doing a trick that repeatedly checks whether
r <- ((a - b) / 2)*((a + b - 2)/2) - N
is 0, and we're doing it by simply tracking how r changes when we increment a or b. If we were to set b to b+2 in the expression for r above, it's equivalent to reducing r by the old value of b, which is why both are done in parallel in step C4 of the algorithm. I encourage you to expand out the algebraic expression above and convince yourself that this is true.
As long as r > 0, you want to keep decreasing it to find the right value of b, so you keep repeating step C4. However, if you overshoot, and r < 0, you need to increase it. You do this by increasing a, because increasing a by 2 is equivalent to decreasing r by the old value of a, as in step C3. You will always have a > b, so increasing r by a in step C3 automatically makes r positive again, so you just proceed directly on to step C4.
It's also easy to prove that a > b. We start with a manifestly greater than b, and if we ever increase b to the point where b = a - 2, we have
N = (a - (a - 2))/2 * ((a + (a - 2) - 2)/2 = 1 * (a - 2)
This means that N is prime, as the largest factor it has that is less than sqrt(N) is 1, and the algorithm has terminated.

Resources