Sort faster in racket using hash table - sorting

So I have an example list of elements like this
(define A (list 'a 'c 'd 'e 'f 'e 'a))
Now I want to make a ranking from this sample
(define (scan lst)
(foldl (lambda (element a-hash) (hash-update a-hash element add1 0))
(hash)
lst))
The result should be like this:
> #(('a . 2) ('f . 1) ('e . 2) ....)
Because `scan function will make a hash table containing unique keys and the number of repetitions of that key (if it catches an unindexed key it will create a new place for that new key, counting from 0).
Then I'd like to sort that hash-table because it's unsorted:
(define (rank A)
(define ranking (scan A))
(sort ranking > #:key cdr)))
So the result would look like this:
#(('a . 2) ('e . 2) ('f . 1) ...)
Now I'd like to truncate the hash-table and throw away the bottom at the threshold of n = 1 (aka only take the elements with more than 2 repetitions).
(define (truncate lst n)
(define l (length lst))
(define how-many-to-take
(for/list
([i l]
#:when (> (cdr (list-ref lst i))
n))
i))
(take lst (length how-many-to-take)))
So the result might look like this:
(('a . 2) ('e . 2))
However, at the big scale, this procedure is not very efficient, it takes too long. Would you have any suggestion to improve the performance?
Thank you very much,
Part 2:
I have this data structure:
(automaton x
(vector (state y (vector a b c))
(state y (vector a b c)) ...))
Then i generate randomly a population of 1000 of them. Then i scan and rank them using the above functions. If i just scan them as is, it already takes long time. If i try to flatten them into a list like this
(list x y a b c y a b c...)
it'd take even more time. Here is the flatten function:
(define (flatten-au au)
(match-define (automaton x states) au)
(define l (vector-length states))
(define body
(for/list ([i (in-range l)])
(match-define (state y z) (vector-ref states i))
(list y (vector->list z))))
(flatten (list x body)))
The scan function will look a bit different:
(define (scan population)
(foldl (lambda (auto a-hash) (hash-update a-hash (flatten-automaton auto) add1 0))
(hash)
population))

Yep, I believe I see the problem. Your algorithm has O(n^2) ("n-squared") running time. This is because you're counting from one to the length of the list, then for each index, performing a list-ref, which takes time proportional to the size of the index.
This is super-easy to fix.
In fact, there's really no reason to sort it or convert it to a list if this is what you want; just filter the hash table directly. Like this...
#lang racket
(define A (build-list 1000000 (λ (idx) (random 50))))
(define (scan lst)
(foldl (lambda (element a-hash) (hash-update a-hash element add1 0))
(hash)
lst))
(define ht (scan A))
(define only-repeated
(time
(for/hash ([(k v) (in-hash ht)]
#:when (< 1 v))
(values k v))))
I added the call to time to see how long it takes. For a list of size one million, on my computer this takes a measured time of 1 millisecond.
Asymptotic complexity is important!

Related

Geometric Series function in Scheme language

Im trying to learn scheme and Im having trouble with the arithmetic in the Scheme syntax.
Would anyone be able to write out a function in Scheme that represents the Geometric Series?
You have expt, which is Scheme power procedure. (expt 2 8) ; ==> 256 and you have * that does multiplication. eg. (* 2 3) ; ==> 6. From that you should be able to make a procedure that takes a n and produce the nth number in a specific geometric series.
You can also produce a list with the n first if you instead of using expt just muliply in a named let, basically doing the expt one step at a time and accumulate the values in a list. Here is an example of a procedure that makes a list of numbers:
(define (range from to)
(let loop ((n to) (acc '())
(if (< n from)
acc
(loop (- 1 n) (cons n acc)))))
(range 3 10) ; ==> (3 4 5 6 7 8 9 10)
Notice I'm doing them in reverse. If I cannot do it in reverse I would in the base case do (reverse acc) to get the right order as lists are always made from end to beginning. Good luck with your series.
range behaves exactly like Python's range.
(define (range from (below '()) (step 1) (acc '()))
(cond ((null? below) (range 0 from step))
((> (+ from step) below) (reverse acc))
(else (range (+ from step) below step (cons from acc)))))
Python's range can take only one argument (the upper limit).
If you take from and below as required arguments, the definition is shorter:
(define (range from below (step 1) (acc '()))
(cond ((> (+ from step) below) (reverse acc))
(else (range (+ from step) below step (cons from acc)))))
Here is an answer, in Racket, that you probably cannot submit as homework.
(define/contract (geometric-series x n)
;; Return a list of x^k for k from 0 to n (inclusive).
;; This will be questionable if x is not exact.
(-> number? natural-number/c (listof number?))
(let gsl ((m n)
(c (expt x n))
(a '()))
(if (zero? m)
(cons 1 a)
(gsl (- m 1)
(/ c x)
(cons c a)))))

Quicksort in Scheme using a partition

I have a partition for a quicksort:
(define (partition pivot lst)
((lambda (s) (s s lst list))
(lambda (s l* c)
(if (null? l*)
(c '() '())
(let ((x (car l*)))
(s s (cdr l*)
(lambda (a b)
(if (< x pivot)
(c (cons x a) b)
(c a (cons x b))))))))))
partition code source
Testing:
=>(partition '5 '(1 3 5 7 9 8 6 4 2))
;Value: ((1 3 4 2) (5 7 9 8 6))
How can I implement this partition in a quicksort? I've tried this so far:
(define (quicksort lst)
(if (null? lst) '()
(let* ((y (car lst))
(pn (partition y (cdr lst))))
(append (quicksort (car pn))
(list y)
(quicksort (cdr pn))))))
First, your code is trivially fixed by changing one cdr to cadr:
(define (partition pivot lst)
((lambda (s) (s s lst list))
......)) ; ^^^^ `list` the top continuation
(define (quicksort lst)
(if (null? lst) '()
(let* ((y (car lst))
(pn (partition y (cdr lst))))
(append
(quicksort (car pn))
(list y)
(quicksort (cadr pn))))))
;; ^^^^ cdr --> cadr
because the top continuation used in partition is list, and so the call
(partition pivot lst)
is equivalent to the call
(list { x IN lst SUCH THAT x < pivot }
{ x IN lst SUCH THAT x >= pivot } )
(the parts in {...} are pseudocode, where we don't care about the implementation, just the results)
And so to access the two parts of that list built by partition you need to use car and cadr.
Or you could keep the cdr in the accessing part of your code in quicksort if you'd change that top continuation to cons:
(define (partition pivot lst)
((lambda (s) (s s lst cons))
......)) ; ^^^^ `cons` the top continuation
(define (quicksort lst)
(if (null? lst)
'()
(let* ((y (car lst))
(pn (partition y (cdr lst))))
(append
(quicksort (car pn))
(list y)
(quicksort (cdr pn))))))
;; ^^^^ `cdr` works fine with `cons`
This because of the general principle in programming, where the functions used to build our data dictate which functions are to be used to access that data:
(list <A> <B> )
car cadr
(cons <A> <B> )
car cdr
( this particular correspondence is because (list <A> <B>) is the same as (cons <A> (cons <B> '())) and (cadr <C>) is the same as (car (cdr <C>)): )
(list <A> <B> )
=
(cons <A> (cons <B> '()))
car cdr
car
And conversely, the functions we use to access our data dictate the implementation of the function which must be used to build that data.
Of course that way of coding in your question is considered unnecessarily convoluted by modern standards since it emulates recursion through argument passing and reuse, -- just like in the famous Y combinator, -- but any Scheme worthy of its name already supports recursion.
So this partition would normally be written as the fully equivalent yet more readable code using the "named let" construct,
(define (partition pivot lst)
(let s ( (l* lst) ; first `l*` is `lst`
(c cons) ) ; top `c` is `cons`
(if (null? l*)
(c '() '())
(let ((x (car l*)))
(s (cdr l*)
(lambda (a b)
(if (< x pivot)
(c (cons x a) b)
(c a (cons x b)))))))))
except the name loop is conventionally used in place of s here (which itself most probably is intended as the shortening of "self").
But the true trouble with your quicksort/partition pair is algorithmic.
Yes I say pair (in non-cons sense of course) since the two go together -- just as with the data access/creation functions which must work together too.
Implementation of one dictates the implementation of the other -- in both directions, too. partition's code dictates quicksort's, or if we'd written quicksort first, we'd need to implement the partition in the corresponding way -- so that the two work together. Which means quicksort indeed producing the correct results, turning any input list into a sorted one:
(quicksort lst) --->
{ xs SUCH THAT
FOR ANY splitting xs = { ..., x, ...ys }
AND ANY splitting ys = { ..., y, ... }
IT HOLDS THAT x <= y
AND ALSO xs is a permutation of lst
(which implies (length lst) == (length xs))
}
So then, what is that trouble? It is that the true quicksort does no work whatsoever after the partitioning. None:
(define (quicksort! lst)
(if (null? lst)
'()
(let* ((y (car lst))
(pn (partition! y lst)))
(quicksort! (car pn)) ; no `append`, NB!
(quicksort! (cdr pn))))) ; no (list y) either
How is that even possible? What kind of partition! implementation would make that work? Well, most certainly not a functional one.
Instead it must be changing (i.e. mutating) the very lst itself somehow:
{ a, b, c, ....., k, l, m, ..... }
-->
{ d, e, ...., p, n, o, ..... }
~~~~~~~~~~~ ~~~~~~~~~~~
where we denote with p the partition point -- so that indeed all that's left to do after this kind of partitioning "in-place" is to sort the first part, and then to sort the second part, -- and then there's nothing more left to be done, after that! Which was the key insight in the original Tony Hoare's formulation of it:
TO SORT
{ a, b, c, ....., k, l, m, ..... } DO:
PARTITION it into
{ d, e, ...., p, n, o, ..... } AND THEN:
~~~~~~~~~~~ ~~~~~~~~~~~
SORT! SORT!
DONE.
This partitioning is usually implemented with swap! which actually swaps two elements in the underlying data structure. Most usually that data structure is an array with its facilities to change the value stored in it at any given index.
But it can also be a list, where the change i.e. mutation can be done with the set-car! primitive.
Looks like we'd need to build a list of cdrs out of the input list, and another one in reverse, -- to be able to iterate over them in both directions, back and forth, -- to make that happen.
I'll leave that for another day, for now.
Once you have the partition, there is still a small step to do.
Take care, you need to be sure partition splits the input in smaller sets all the time. In other word, partition not to return some empty set. The pivot can go in any of the sets and use this fact to check that you do not return an empty set, in case your comparison operator does not really decrease the size of the input. This is why I inserted the equality operator -- to be able to check if I insert the pivot in the first returned set or in the second one.
(define (partition pivot lst ret)
((lambda (s)
(s s lst
(lambda (a b p*)
(if (and (null? a) (null? b))
(ret (list pivot) (cdr p*))
(if (null? a)
(ret p* b)
(if (null? b)
(ret a p*)
(if (< (car b) pivot)
(ret a (append p* b))
(if (< (car a) pivot)
(ret (append a p*) b)
(error "never here")))))))))
(lambda (s l* c)
(if (null? l*)
(c '() '() '())
(let ((x (car l*)))
(s s (cdr l*)
(lambda (a b p*)
(if (= x pivot)
(c a b (cons pivot p*))
(if (< x pivot)
(c (cons x a) b p*)
(c a (cons x b) p*))))))))))
(define choose-pivot car)
In a real implementation, you will all the time use vectors and this is why the append will not be present, as, sorting on the place, at the end of partition, both sides will be sorted relatively one to the other. Here, we need to reassemble the 2 sides using append:
(define (quicksort lst)
(if (null? lst) '()
(if (null? (cdr lst))
lst
(let* ((pivot (choose-pivot lst)))
(partition pivot lst
(lambda (p< p>)
(append
(quicksort p<)
(quicksort p>))))))))
A test:
1 ]=> (quicksort '(1 3 5 7 9 8 6 4 2))
;Value: (1 2 3 4 5 6 7 8 9)
1 ]=> (quicksort '(1 9 3 8 5 7 7 6 9 5 8 4 6 3 4 2 2 1))
;Value: (1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9)
I used as pivot the first element of the input to split, but you can redefine the choose-pivot to select other element.
In practice, this algorithm is used in combination with other sorts -- when the input has fewer than 4-8 elements, the quicksort is not recurred any more, but other sorting is used for the lowest cases of recurrence relation.
I used directly < in the code -- you can insert it as a parameter in case you prefer a more generic procedure... In any case, the operator that you use needs to simulate the equality and different of in the same time.
UPDATE I have updated the partition, such that to consider duplicated elements. In my first version, it ignored duplicated elements.

Compare a list of numbers with a variable

The function below is intended to compare every number in a list (2nd parameter) with the first parameter and for every num in the list that is greater than the second param, count it and return the total amount of elements in the list that were greater than the 'threshold'
The code I have doesn't run because I have tried to learn how recursion in Dr. Racket works, but I can't seem to understand. I am just frustrated so just know the code below isn't supposed to be close to working; functional programming isn't my thing, haha.
(define (comp-list threshold list-nums)
(cond [(empty? list-nums) 0]
[(cons? list-nums) (let {[my-var 0]}
(map (if (> threshold (first list-nums))
threshold 2) list-nums ))]))
The following doesn't use lambda of foldl (and is recursive) - can you understand how it works?
(define (comp-list threshold list-nums)
(cond [(empty? list-nums) 0]
[else
(cond [(> (car list-nums) threshold) (+ 1 (comp-list threshold (cdr list-nums)))]
[else (comp-list threshold (cdr list-nums))])]))
Tested:
> (comp-list 1 '(1 1 2 2 3 3))
4
> (comp-list 2 '(1 1 2 2 3 3))
2
> (comp-list 3 '(1 1 2 2 3 3))
0
map takes a procedure as first argument and applied that to every element in the given list(s). Since you are counting something making a list would be wrong.
foldl takes a procedure as first argument, the starting value as second and one or more lists. It applies the procedure with the elements and the starting value (or the intermediate value) and the procedure get to decide the next intermediate value. eg. you can use it to count a list:
(define (my-length lst)
(foldl (lambda (x acc) (+ acc 1))
0
lst))
(my-length '(a b c)) ; ==> 3
You can easily change this to only count when x is greater than some threshold, just evaluate to acc to keep it unchanged when you are not increasing the value.
UPDATE
A recursive solution of my-length:
(define (my-length lst)
;; auxiliary procedure since we need
;; an extra argument for counting
(define (aux lst count)
(if (null? lst)
count
(aux (cdr lst)
(+ count 1))))
;; call auxiliary procedure
(aux lst 0))
The same alteration to the procedure to foldl have to be done with this to only count in some circumstances.
(define (comp-list threshold list-nums)
(cond
[(empty? list-nums) ; there are 0 elements over the threshold in an empty list
0]
[(cons? list-nums) ; in a constructed list, we look at the the first number
(cond
[(< threshold (first list-nums))
(+ 1 ; the first number is over
(comp-list threshold (rest list-nums))] ; add the rest
[else
(comp-list threshold (rest list-nums))])])) ; the first number is lower
A simple functional start
#lang racket
(define (comp-list threshold list-nums)
(define (my-filter-function num)
(< num threshold))
(length (filter my-filter-function list-nums)))
Replacing define with lambda
#lang racket
(define (comp-list threshold list-nums)
(length (filter (lambda (num) (< num threshold))
list-nums)))
Racket's implementation of filter
In DrRacket highlighting the name of a procedure and right clicking and selecting "jump to definition in other file" will allow review of the source code. The source code for filter is instructive:
(define (filter f list)
(unless (and (procedure? f)
(procedure-arity-includes? f 1))
(raise-argument-error 'filter "(any/c . -> . any/c)" f))
(unless (list? list)
(raise-argument-error 'filter "list?" list))
;; accumulating the result and reversing it is currently slightly
;; faster than a plain loop
(let loop ([l list] [result null])
(if (null? l)
(reverse result)
(loop (cdr l) (if (f (car l)) (cons (car l) result) result)))))

Building accumulator for lazy lists in Racket

I defined a simple lazy list of all integers from zero:
(define integers-from
(lambda (n)
(cons n
(lambda () (integers-from (+ 1 n))))))
(define lz (integers-from 0))
I also coded an accumaltor that gets a lazy list as a parameter
(define lz-lst-accumulate
(lambda (op initial lz)
(if (null? lz)
initial
(cons (op (head lz) initial)
(lambda () (lz-lst-accumulate op (op initial (head lz)) (tail lz)))))))
Does this accumaltor answer the format of lazy lists?
Here is a simple test of the accumulator:
(define acc (lz-lst-accumulate * 1 lz))
(take acc 4)
=> '(1 2 6 24)
take is a helper function that creates a list from the first n elements of a lazy list:
(define head car)
(define tail
(lambda (lz-lst)
((cdr lz-lst)) ))
(define take
(lambda (lz-lst n)
(if (= n 0)
(list)
(cons (car lz-lst)
(take (tail lz-lst) (sub1 n)))) ))
In your lz-lst-accumulate you calculate once (op (head lz) initial) and then also (op initial (head lz)). This is inconsistent; both should be the same and actually calculated only once, since it's the same value:
(define lz-lst-accumulate
(lambda (op initial lz)
(if (lz-lst-empty? lz)
initial
(let ((val (op (head lz) initial)))
(cons val
(lambda () (lz-lst-accumulate op val (tail lz))))))))
It works in your example with numbers only because you use the type-symmetrical operation *. With cons it wouldn't work.
Other than that it's OK. lz-lst-accumulate is usually known as left fold (scanl in Haskell, actually, since you produce the progression of "accumulated" values, foldl f z xs = last (scanl f z xs)).
re: your version of take, it is forcing one too many elements of a stream. Better make it
(define take
(lambda (lz n)
(if (or (<= n 0) (lz-lst-empty? lz))
(list)
(if (= n 1)
(list (car lz)) ; already forced
(cons (car lz)
(take (tail lz) (sub1 n)))))))
so that it only forces as many elements as it has to produce, and not one more (which might be e.g. divergent, like (/ 1 0), invalidating the whole calculation for no reason).
That way, the counter-example in SRFI 41 (of (take 4 (stream-map 1/ (ints-from-by 4 -1)))) will just work (it calculates (1/4 1/3 1/2 1/1) without forcing 1/0, which the usual version of take, like the one you're using, would do).

Local in scheme

I am just making up random programs and looking if I can use local. Is there a way I can use local for this? -
(define (filt n)
(filter number? n))
(define (mapn n)
(map add1 (filt n)))
(define (mapl n)
(map list (mapn n)))
(check-expect (mapl(list 1 2 false "true" "b" 'b true 4 9)) (list (list 2) (list 3) (list 5) (list 10)))
The first program makes sure the output are only numbers. The second one adds 1 to each number, and the third program creates a list of each individual element inside the list.
I tried but I get no answer and end up with a user break.
(define (mapit n)
(local [(define (filt l)
(filter number? l))]
(local [(define (mapn b)
(map add1 b))]
(mapit (map list n)))))
You want to use let.
(let ( (name value)
(othername othervalue))
expression-using-name)
So
(define (map-it n)
(let ((filt (lambda (l)
(filter number? l)))
(mpan (lambda (b)
(map add1 b))))
(mapit map list n)))
But what are you trying to accomplish here? You're not actually using mapn or filt anywhere.
I think you can just use
(define (mapl n)
(define (filt n)
(filter number? n))
(define (mapn n)
(map add1 (filt n)))
(map list (mapn n)))
Since scheme allows nested definitions.

Resources