hacker news algorithm in php? - algorithm

This is the Hacker News ranking algorithm, which I think is a simple way of ranking things, espcially if users are voting on items, but I really dnt understand this, can this be converted to PHP, so I can understand it fully?
; Votes divided by the age in hours to the gravityth power.
; Would be interesting to scale gravity in a slider.
(= gravity* 1.8 timebase* 120 front-threshold* 1
nourl-factor* .4 lightweight-factor* .17 gag-factor* .1)
(def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
(* (/ (let base (- (scorefn s) 1)
(if (> base 0) (expt base .8) base))
(expt (/ (+ (item-age s) timebase*) 60) gravity))
(if (no (in s!type 'story 'poll)) .8
(blank s!url) nourl-factor*
(mem 'bury s!keys) .001
(* (contro-factor s)
(if (mem 'gag s!keys)
gag-factor*
(lightweight s)
lightweight-factor*
1)))))

Directly ripped from http://amix.dk/blog/post/19574 and translated to PHP from the Python:
function calculate_score($votes, $item_hour_age, $gravity=1.8){
return ($votes - 1) / pow(($item_hour_age+2), $gravity);
}

There are write-ups about how this algorithm works. A quick search discovered: How Hacker News ranking algorithm works.
Lisp can make things seem more complicated than they really are.

Related

How does this Scheme code return a value?

This code is taken from Sussman and Wisdom's Structure and Interpretation of Classical Mechanics, its purpose is to derive (close to) the smallest positive floating point the host machine supports.
https://github.com/hnarayanan/sicm/blob/e37f011db68f8efc51ae309cd61bf497b90970da/scmutils/src/kernel/numeric.scm
Running it in DrRacket results in 2.220446049250313e-016 on my machine.
My question, what causes this to even return a value? This code is tail recursive, and it makes sense at some point the computer can no longer divide by 2. Why does it not throw?
(define *machine-epsilon*
(let loop ((e 1.0))
(if (= 1.0 (+ e 1.0))
(* 2 e)
(loop (/ e 2)))))
*machine-epsilon*
This code is tail recursive, and it makes sense at some point the computer can no longer divide by 2. Why does it not throw?
No, the idea is different: at some point the computer still can divide by 2, but the result (e) becomes indistinguishable from 0 [upd: in the context of floating-point addition only - very good point mentioned in the comment] (e + 1.0 = 1.0, this is exactly what if clause is checking). We know for sure that the previous e was still greater than zero "from the machine point of view" (otherwise we wouldn't get to the current execution point), so we simply return e*2.
This form of let-binding is syntactic sugar for recursion.
You may avoid using too much syntax until you master the language and write as much as possible using the kernel language, to focus on essential problem. For example, in full SICP text, never is specified this syntactic sugar for iteration.
The r6rs definition for iteration is here.
The purpose of this code is not to find the smallest float that the machine can support: it is to find the smallest float, epsilon such that (= (+ 1.0 epsilon) 1.0) is false. This number is useful because it's the upper bound on the error you get from adding numbers In particular what you know is that, say, (+ x y) is in the range [(x+y)*(1 - epsilon), (x+y)*(1 + epsilon)], where in the second expression + &c mean the ideal operations on numbers.
In particular (/ *machine-epsilon* 2) is a perfectly fine number, as is (/ *machine-epsilon* 10000) for instance, and (* (/ *machine-epsilon* x) x) will be very close to *machine-epsilon* for many reasonable values of x. It's just the case that (= (+ (/ *machine-epsilon* 2) 1.0) 1.0) is true.
I'm not familiar enough with floating-point standards, but the number you are probably thinking of is what Common Lisp calls least-positive-double-float (or its variants). In Racket you can derive some approximation to this by
(define *least-positive-mumble-float*
;; I don't know what float types Racket has if it even has more than one.
(let loop ([t 1.0])
(if (= (/ t 2) 0.0)
t
(loop (/ t 2)))))
I am not sure if this is allowed to raise an exception: it does not in practice and it gets a reasonable-looking answer.
It becomes clearer when you get rid of the confusing named let notation.
(define (calculate-epsilon (epsilon 1.0))
(if (= 1.0 (+ 1.0 epsilon))
(* epsilon 2)
(calculate-epsilon (/ epsilon 2))))
(define *machine-epsilon* (calculate-epsilon))
Is what the code does actually.
So now we see for what the named let expression is good.
It defines locally the function and runs it. Just that the name of the function as loop was very imprecise and confusing and the naming of epsilon to e is a very unhappy choice. Naming is the most important thing for readable code.
So this example of SICP should be an example for bad naming choices. (Okay, maybe they did it by intention to train the students).
The named let defines and calls/runs a function/procedure. Avoiding it would lead to better code - since clearer.
In common lisp such a construct would be much clearer expressed:
(defparameter *machine-epsilon*
(labels ((calculate-epsilon (&optional (epsilon 1.0))
(if (= 1.0 (+ 1.0 epsilon))
(* epsilon 2)
(calculate-epsilon (/ epsilon 2)))))
(calculate-epsilon)))
In CLISP implementation, this gives: 1.1920929E-7

Why isn't this function showing a performance speedup when its primary constituent function does?

I am optimizing a program I've been working on, and have hit a wall. The function julia-subrect maps over for-each-pixel a large number of times. I've optimized for-each-pixel to have a ~16x speedup. However, my optimized version of julia-subrect shows no evidence of this. Here are my benchmarks and relevant code:
; ======== Old `for-each-pixel` ========
;(bench (julia/for-each-pixel (->Complex rc ic) max-itrs radius r-min x-step y-step [xt yt])))
;Evaluation count : 3825300 in 60 samples of 63755 calls.
;Execution time mean : 16.018466 µs
; ======== New `for-each-pixel`. optimized 16x. ========
;(bench (julia/for-each-pixel-opt [rc ic] [max-itrs radius r-min] [x-step y-step] [xt yt])))
;Evaluation count : 59542860 in 60 samples of 992381 calls.
;Execution time mean : 1.038955 µs
(defn julia-subrect [^Long start-x ^Long start-y ^Long end-x ^Long end-y ^Long total-width ^Long total-height ^Complex constant ^Long max-itrs]
(let [grid (for [y (range start-y end-y)]
(vec (for [x (range start-x end-x)]
[x y])))
radius (calculate-r constant)
r-min (- radius)
r-max radius
x-step (/ (Math/abs (- r-max r-min)) total-width)
y-step (/ (Math/abs (- r-max r-min)) total-height)
; Uses old implementation of `for-each-pixel`
calculate-pixel (partial for-each-pixel constant max-itrs radius r-min x-step y-step)
for-each-row (fn [r] (map calculate-pixel r))]
(map for-each-row grid)))
; ======== Old `julia-subrect` ========
;(bench (doall (julia/julia-subrect start-x start-y end-x end-y total-width total-height c max-itrs))))
;Evaluation count : 22080 in 60 samples of 368 calls.
;Execution time mean : 2.746852 ms
(defn julia-subrect-opt [[^long start-x ^long start-y ^long end-x ^long end-y] [^double rc ^double ic] total-width total-height max-itrs ]
(let [grid (for [y (range start-y end-y)]
(vec (for [x (range start-x end-x)]
[x y])))
radius (calculate-r-opt rc ic)
r-min (- radius)
r-max radius
x-step (/ (Math/abs (- r-max r-min)) total-width)
y-step (/ (Math/abs (- r-max r-min)) total-height)
;Uses new implementation of `for-each-pixel`
calculate-pixel (fn [px] (for-each-pixel-opt [rc ic] [max-itrs radius r-min] [x-step y-step] px))
for-each-row (fn [r] (map calculate-pixel r))]
(map for-each-row grid)))
; ======== New `julia-subrect`, but no speedup ========
;(bench (doall (julia/julia-subrect-opt [start-x start-y end-x end-y] [rc ic] total-width total-height max-itrs))))
;Evaluation count : 21720 in 60 samples of 362 calls.
;Execution time mean : 2.831553 ms
Here is a gist containing source code for all the functions I've specified:
https://gist.github.com/johnmarinelli/adc5533c19fb0b6d74cf4ef04ae55ee6
So, can anyone tell me why julia-subrect is showing no signs of speedup? Also, I'm still new to clojure so bear with me if the code is unidiomatic/ugly. Right now, I'm focusing on making the program run quicker.
As a general guideline:
profile!
actually get around to profiling, like for real ;-)
remove reflection (looks like you did this)
split the operations into easy to think about functions
remove laziness (transducers should be the last step in this part)
combine steps using loop/recur to make your code impossible to figure out and slightly faster (this is the last step for a reason)
Specifically thinking about the code you posted:
At a glance, it looks like this function will spend much of it's time generating a lazy list of value in the for loop which are then immediately realized (evaluated to no longer be lazy) so the time spent generating that structure is wasted. You may consider changing this to produce vectors directly, mapv is useful for this.
The second part is the call to map in for-each-row which will produce a lot of intermediate data structures. For that one you may consider using a non-lazy expression like mapv or loop/recur.
It looks like you have done steps 2-4 already, and there is no obvious reason for you to skip to step seven. I'd spend the next couple hours on limiting laziness and if you have to, learning about transducers.

Scheme: Accelerated Stream

From this example code I found online, which functions are the unaccelerated stream, the singly-accelrated stream, and the super-accelerated stream? Thank you in advance.
Cite: lawfulsamurai.blogspot.com/2009/01/sicp-section-35-streams.html
(define (log2-summands n)
(cons-stream (/ 1.0 n)
(stream-map - (log2-summands (+ n 1)))))
(define log2-stream
(partial-sums (log2-summands 1)))
(define log2-stream-euler
(euler-transform log2-stream))
(define log2-stream-accelerated
(accelerated-sequence euler-transform log2-stream))
Well, you didn't tell us what either a "singly-accelrated" or "super-accelerated" are, so it's hard to say where in the code they are. It's like playing "Where's Waldo", but without knowing what a "Waldo" is.
That said, I can see that log2-summands, euler-transform, make-tableau, and accelerated-sequence all return streams, so it seems like they'd be the candidates. Now, if we actually look at the blog post that you linked to, SICP Section 3.5 Streams, we read:
Straightforward summation using partial-sums. The value of log2 oscillates between 0.6687714031754279 and 0.7163904507944756 after 20
iterations.
(define log2-stream
(partial-sums (log2-summands 1)))
Log2 using Euler Transformation. Value converges to 0.6932106782106783 after 10 iterations.
(define log2-stream-euler
(euler-transform log2-stream))
Accelerated summation. Value converges to 0.6931488693329254 in 4 iterations.
(define log2-stream-accelerated
(accelerated-sequence euler-transform log2-stream))
It sounds like that the log2-stream, log2-stream-euler, and log2-stream-accelerated are, respectively, the "unaccelerated stream, the singly-accelrated stream, and the super-accelerated stream".

Unable to evaluate a lambda expression as argument in SICP ex-1.37

The problem can be found at http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-12.html#%_thm_1.37
The problem is to expand a continuing fraction in order to approximate phi. It suggests that your procedure should be able to calculate phi by evaluating:
(cont-frac (lambda (i) 1.0)
(lambda (i) 1.0)
k)
My solution is as follows:
(define (cont-frac n d k)
(if (= k 1) d
(/ n (+ d (cont-frac n d (- k 1))))))
This solution works when calling (cont-frac 1 1 k), but not when using the lambda expressions as the problem suggests. I get what looks like a type error
;ERROR: "ex-1.37.scm": +: Wrong type in arg1 #<CLOSURE <anon> (x) 1.0>
; in expression: (##+ ##d (##cont-frac ##n ##d (##- ##k 1)))
; in scope:
; (n d k) procedure cont-frac
; defined by load: "ex-1.37.scm"
;STACK TRACE
1; ((##if (##= ##k 1) ##d (##/ ##n (##+ ##d (##cont-frac ##n ##d ...
My question is two-part:
Question 1. Why am I getting this error when using the lambda arguments? I (mistakenly, for sure) thought that (lambda (x) 1) should evaluate to 1. It clearly does not. I'm not sure I understand what it DOES evaluate to: I presume that it doesn't evaluate to anything (i.e., "return a value" -- maybe the wrong term for it) without being passed an argument for x.
It still leaves unanswered why you would have a lambda that returns a constant. If I understand correctly, (lambda (x) 1.0) will always evaluate to 1.0, regardless of what the x is. So why not just put 1.0? This leads to:
Question 2. Why should I use them? I suspect that this will be useful in ex-1.38, which I've glanced at, but I can't understand why using (lambda (x) 1.0) is any different that using 1.0.
In Scheme lambda expression creates a function, therefore expression such as:
(lambda (i) 1.0)
really does have result, it is a function object.
But if you add parentheses around that expression, it will indeed be evaluated to 1.0 as you expected:
((lambda (i) 1.0))
Using of lambdas in that exercise is necessary for building general solution, as you've correctly noticed in exercise 1.38, you'll be using the same implementation of cont-frac function but with different numerator and denominator functions, and you'll see an example, where you should calculate one of them in runtime using loop counter.
You could compare your exercise solutions with mine, e.g. 1.37 and 1.38
(/ n (+ d (cont-frac n d (- k 1))))))
In this case 'd' being the lambda statement, it doesn't make any sense to '+' it, same for 'n' and '/' try something like
(/ (n k) (+ (d k) (cont-frac n d (- k 1))))))
you'll see why in the next exercise you can also make this tail-recursive
I named my variables F-d and F-n instead of d and n, becuase they accept a function that calculates the numerator and denominator terms. (lambda (i) 1.0) is a function that accepts one argument and returns 1.0, 1.0 is just a number. In other continued fractions, the value may vary with the depth (thus why you need to pass k to the numerator and denomenator function to calculate the proper term.

Concurrent cartesian product algorithm in Clojure

Is there a good algorithm to calculate the cartesian product of three seqs concurrently in Clojure?
I'm working on a small hobby project in Clojure, mainly as a means to learn the language, and its concurrency features. In my project, I need to calculate the cartesian product of three seqs (and do something with the results).
I found the cartesian-product function in clojure.contrib.combinatorics, which works pretty well. However, the calculation of the cartesian product turns out to be the bottleneck of the program. Therefore, I'd like to perform the calculation concurrently.
Now, for the map function, there's a convenient pmap alternative that magically makes the thing concurrent. Which is cool :). Unfortunately, such a thing doesn't exist for cartesian-product. I've looked at the source code, but I can't find an easy way to make it concurrent myself.
Also, I've tried to implement an algorithm myself using map, but I guess my algorithmic skills aren't what they used to be. I managed to come up with something ugly for two seqs, but three was definitely a bridge too far.
So, does anyone know of an algorithm that's already concurrent, or one that I can parallelize myself?
EDIT
Put another way, what I'm really trying to achieve, is to achieve something similar to this Java code:
for (ClassA a : someExpensiveComputation()) {
for (ClassB b : someOtherExpensiveComputation()) {
for (ClassC c : andAnotherOne()) {
// Do something interesting with a, b and c
}
}
}
If the logic you're using to process the Cartesian product isn't somehow inherently sequential, then maybe you could just split your inputs into halves (perhaps splitting each input seq in two), calculate 8 separate Cartesian products (first-half x first-half x first-half, first-half x first-half x second-half, ...), process them and then combine the results. I'd expect this to give you quite a boost already. As for tweaking the performance of the Cartesian product building itself, I'm no expert, but I do have some ideas & observations (one needs to calculate a cross product for Project Euler sometimes), so I've tried to summarise them below.
First of all, I find the c.c.combinatorics function a bit strange in the performance department. The comments say it's taken from Knuth, I believe, so perhaps one of the following obtains: (1) it would be very performant with vectors, but the cost of vectorising the input sequences kills its performance for other sequence types; (2) this style of programming doesn't necessarily perform well in Clojure in general; (3) the cumulative overhead incurred due to some design choice (like having that local function) is large; (4) I'm missing something really important. So, while I wouldn't like to dismiss the possibility that it might be a great function to use for some use cases (determined by the total number of seqs involved, the number of elements in each seq etc.), in all my (unscientific) measurements a simple for seems to fare better.
Then there are two functions of mine, one of which is comparable to for (somewhat slower in the more interesting tests, I think, though it seems to be actually somewhat faster in others... can't say I feel prepared to make a fully educated comparison), the other apparently faster with a long initial input sequence, as it's a restricted functionality parallel version of the first one. (Details follow below.) So, timings first (do throw in the occasional (System/gc) if you care to repeat them):
;; a couple warm-up runs ellided
user> (time (last (doall (pcross (range 100) (range 100) (range 100)))))
"Elapsed time: 1130.751258 msecs"
(99 99 99)
user> (time (last (doall (cross (range 100) (range 100) (range 100)))))
"Elapsed time: 2428.642741 msecs"
(99 99 99)
user> (require '[clojure.contrib.combinatorics :as comb])
nil
user> (time (last (doall (comb/cartesian-product (range 100) (range 100) (range 100)))))
"Elapsed time: 7423.131008 msecs"
(99 99 99)
;; a second time, as no warm-up was performed earlier...
user> (time (last (doall (comb/cartesian-product (range 100) (range 100) (range 100)))))
"Elapsed time: 6596.631127 msecs"
(99 99 99)
;; umm... is syntax-quote that expensive?
user> (time (last (doall (for [x (range 100)
y (range 100)
z (range 100)]
`(~x ~x ~x)))))
"Elapsed time: 11029.038047 msecs"
(99 99 99)
user> (time (last (doall (for [x (range 100)
y (range 100)
z (range 100)]
(list x y z)))))
"Elapsed time: 2597.533138 msecs"
(99 99 99)
;; one more time...
user> (time (last (doall (for [x (range 100)
y (range 100)
z (range 100)]
(list x y z)))))
"Elapsed time: 2179.69127 msecs"
(99 99 99)
And now the function definitions:
(defn cross [& seqs]
(when seqs
(if-let [s (first seqs)]
(if-let [ss (next seqs)]
(for [x s
ys (apply cross ss)]
(cons x ys))
(map list s)))))
(defn pcross [s1 s2 s3]
(when (and (first s1)
(first s2)
(first s3))
(let [l1 (count s1)
[half1 half2] (split-at (quot l1 2) s1)
s2xs3 (cross s2 s3)
f1 (future (for [x half1 yz s2xs3] (cons x yz)))
f2 (future (for [x half2 yz s2xs3] (cons x yz)))]
(concat #f1 #f2))))
I believe that all versions produce the same results. pcross could be extended to handle more sequences or be more sophisticated in the way it splits its workload, but that's what I came up with as a first approximation... If you do test this out with your programme (perhaps adapting it to your needs, of course), I'd be very curious to know the results.
'clojure.contrib.combinatorics has a cartesian-product function.
It returns a lazy sequence and can cross any number of sequences.

Resources