Nearest Neighbor and majority of a list in Scheme - scheme

I'm trying to define a procedure knn that takes a positive integer k, the coordinates of a house, a distance function, and the training data, and returns a list of the (at most) k nearest neighbors to the given house in the training data. The returned list should be in non-decreasing order by proximity to the given house.
Training data is given by:
(define training-data
'((d (1 8)) (d (2 9)) (d (8 10)) (d (4 2)) (r (1 3)) (r (2 1)) (r (4 8)) (r (6 4))
(d (7 3)) (r (1 5)) (d (1 9)) (d (6 2)) (r (10 9)) (d (7 7))
(d (5 11)) (r (1 1)) (r (0 9)) (r (12 12)) (r (20 30))))
The distance function I'm using is taxicab-distance that takes two points and returns the sum of the absolute differences of their coordinates. Which is given by:
(define (taxicab-distance ls1 ls2)
(+ (abs (- (car ls1) (cadr ls1))) (abs(- (car ls2) (cadr ls2)))))
An example of what I'm trying to do would be:
~(knn 3 '(3 8) taxicab-distance training-data)
-> ((r (4 8)) (d (1 8)) (d (2 9)))
So I know it will start off as:
(define (knn k point distance data)
I know I will have to take the distance of each point so do the caadr of the training-data and then repeat on the cdr of the list to get each value, but then how to compare it to the original value and then return the whole nested list is where I get lost.
Lastly, with this I also want to define a function called majority that takes a non-empty list of labeled data and a non-empty list of labels, and returns the label that occurs most frequently. If there is more than one such label, then it doesn't matter which one your program returns.
With majority it'll take two arguments, a data set, and a list of what it's looking for. I know for training-data, it'll have to look at the caar of the data and do the same for the cdr and then count each d and r.

(import (rnrs)
(rnrs sorting)
(only (srfi :1) take))
(define (knn k point distance data)
;; Is p1 is closer to point than p2?
(define (data-distance-point< p1 p2)
(< (distance point (cadr p1))
(distance point (cadr p2))))
;; take k elements of the coordinates
;; with the shortest path to point
(take (list-sort data-distance-point< data) k))
(knn 3 '(3 8) point-distance training-data) ; ==> ((r (4 8)) (d (2 9)) (d (1 8)))
I didn't use your distance function since (0,0)->(0,10) has shorter distance than (0,5)->(0,10) which doesn't make any sense. My result is what you get if you use Pythagoras to get the distance.
The fact that point has a different structure than the elements of data is strange. I would have preferred not having cadr in data-distance-point<.

Related

Seating a matrix in another matrix in Lisp

I've got a 5x5 matrix (2D array)
#2A((C C C C C)
(C C C C C)
(C C C C C)
(C C C C C)
(C C C C C))
and a 3x3 matrix
#2A((X X X)
(X X X)
(X X X))
I'd like to seat d into the m at the index 1x1
(defvar *m* (make-array '(5 5) :initial-element 'C ))
(defvar *d* (make-array '(3 3)
:displaced-to *m*
:displaced-index-offset (array-row-major-index *m* 1 1)))
(dotimes (i 3)
(dotimes (j 3)
(setf (aref *d* i j) 'X)))
OUTPUT
#2A((C C C C C)
(C X X X X)
(X X X X X)
(C C C C C)
(C C C C C))
THE DESIRED OUTPUT
#2A((C C C C C)
(C X X X C)
(C X X X C)
(C X X X C)
(C C C C C))
How to arrange the code as practically as possible, to get the desired output? The code doesn't need to use the 2d arrays or its functions. Actually it's preferable not use the 2d arrays at all because for this particular problem, they seem unfit. I've used the 2d arrays solely because I thought they would have the appropriate features to solve this problem. Any other ways to solve the problem are welcome.
System: CLisp on Windows
You can't do this with displaced arrays. The reason you can't do it is because arrays are (treated as) contiguous chunks of memory (so, one-dimensional objects), and displaced arrays are are displaced to a given index of another array considered as a contiguous chunk of memory and are themselves then also a contiguous objects in memory which share some of the storage of the array they are displaced to. And the array you want is not contiguous within the one it is displaced to.
To see this, here are two functions: make-self-indexed-array makes a two-dimensional array whose elements are lists of the indices of themselves. make-rma-displaced-array then makes a 1-dimensional array which is displaced to another array:
(defun make-self-indexed-array (r c)
(let ((a (make-array (list r c))))
(dotimes (row r a)
(dotimes (col c)
(setf (aref a row col) (list row col))))))
(defun make-rma-displaced-array (array)
(make-array (array-total-size array)
:displaced-to array))
Then look at this:
> (let* ((a (make-self-indexed-array 5 5))
(b (make-rma-array a)))
(pprint a)
(pprint b))
#2A(((0 0) (0 1) (0 2) (0 3) (0 4))
((1 0) (1 1) (1 2) (1 3) (1 4))
((2 0) (2 1) (2 2) (2 3) (2 4))
((3 0) (3 1) (3 2) (3 3) (3 4))
((4 0) (4 1) (4 2) (4 3) (4 4)))
#((0 0) (0 1) (0 2) (0 3) (0 4) (1 0) (1 1) (1 2) (1 3) (1 4) (2 0) (2 1)
(2 2) (2 3) (2 4) (3 0) (3 1) (3 2) (3 3) (3 4) (4 0) (4 1) (4 2) (4 3)
(4 4))
You can see that the 'displaced' array you want simply is not contiguous within the original array.
To do what you want you'd need to construct some kind of object which does the appropriate index calculations to address its parent in memory. Here is a very rudimentary approach to doing that (the link mentioned in a comment is probably much more general). This
only deals with 2d-arrays;
will likely have terrible performance;
but shows you the sort of thing you need to do.
In real life you can generalise this to arrays of arbitrary shape, doing some cleverness to calculate, effectively, the row-major-aref index you need.
(defclass 2d-subarray ()
;; 2d subarrays of other 2d arrays
((parent :reader subarray-parent
:initarg :parent
:initform (error "need a parent"))
(row-offset :initform 0
:initarg :row-offset
:reader subarray-row-offset)
(column-offset :initform 0
:initarg :column-offset
:reader subarray-column-offset)
(rows :initform 0
:initarg :rows
:reader subarray-rows)
(columns :initform 0
:initarg :columns
:reader subarray-columns)))
(defmethod initialize-instance :after ((subarray 2d-subarray) &key)
;; do at least some sanity checks
(let ((p (subarray-parent subarray)))
(unless (= (length (array-dimensions p)) 2)
(error "parent not 2d"))
(unless (<= (+ (subarray-row-offset subarray)
(subarray-rows subarray))
(array-dimension p 0))
(error "subarray has too many rows"))
(unless (<= (+ (subarray-column-offset subarray)
(subarray-columns subarray))
(array-dimension p 1))
(error "subarray has too many columns"))))
(defgeneric subarray-aref (subarray &rest indices))
(defgeneric (setf subarray-aref) (new subarray &rest indices))
(defmethod subarray-aref ((subarray 2d-subarray) &rest indices)
(declare (dynamic-extent indices))
(destructuring-bind (row column) indices
(assert (and (<= 0 row (subarray-rows subarray))
(<= 0 column (subarray-columns subarray)))
(row column) "indices out of range")
(aref (subarray-parent subarray)
(+ row (subarray-row-offset subarray))
(+ column (subarray-column-offset subarray)))))
(defmethod (setf subarray-aref) (new (subarray 2d-subarray) &rest indices)
(declare (dynamic-extent indices))
(destructuring-bind (row column) indices
(assert (and (<= 0 row (subarray-rows subarray))
(<= 0 column (subarray-columns subarray)))
(row column) "indices out of range")
(setf (aref (subarray-parent subarray)
(+ row (subarray-row-offset subarray))
(+ column (subarray-column-offset subarray)))
new)))

Weird Common Lisp intersection behaviour

I'm trying to get the common elements of two lists.
I've tried both the available intersection function and one I implemented myself, both giving the same weird result when trying to test them on lists like (a a ... a) and (a b c d ... z).
Whenever the first list contains only the same element several times and the second list begins with that element the result is the first list.
For example: (intersection '(2 2 2 2) '(2 2 2 3)) returns (2 2 2 2)
The intersection I implemented:
(defun presentp (a l)
(cond ((null l) nil)
((and (atom (car l)) (equal a (car l))) t)
((not (atom (car l))) (presentp a (car l)))
(t (presentp a (cdr l)))))
(defun intersectionp (a b)
(cond ((not (and a b)) nil)
((presentp (car a) b) (append (list (car a)) (intersection (cdr a) b)))
(t (intersection (cdr a) b))))
How can I get a good result on lists of that type? For example I want (2 2 2) from (intersection '(2 2 2 2) '(2 2 2 3)).
You need to remove matches from the b list.. When you found an 2 in (2 2 2 3) you should continue with (2 2 3) as b.
Also.. (append (list x) result-list) is the same as (cons x result-list) just with the same or fewer CPU cylces.
(defun intersection (a b)
(cond ((not (and a b)) nil)
((presentp (car a) b)
(cons (car a)
(intersection (cdr a)
(remove (car a) b :count 1))))
(t (intersection (cdr a) b))))
There's already an accepted answer, but I want to point out that the answer the implementation provides, where
(cl:intersection '(2 2 2 2) '(2 2 2 3))
;=> (2 2 2 2)
is correct. It's important to recognize that the intersection, nintersection, etc., are intended for use with lists that are being treated as sets. Conceptually, a set has no duplicate elements (for that you'd need a multiset), so the lists (2), (2 2), (2 2 2), etc., all represent the same set, {2}.
14.1.2.2 Lists as Sets
Lists are sometimes viewed as sets by considering their elements
unordered and by assuming there is no duplication of elements.
adjoin nset-difference set-difference union
intersection nset-exclusive-or set-exclusive-or
nintersection nunion subsetp
Figure 14-5. Some defined names related to sets.
Now, that bit about "assuming there is no duplication of elements" actually means that you probably shouldn't be using the set functions with a list like (2 2 2 2), since there's obvious duplication of elements. Even so, if you posit that lists like (2 2 2) and (2 2 2 2) represent the same set, you can see that intersection is actually giving you the correct set back. I think that the specification actually mandates that the result will have three or four elements. From the HyperSpec entry on intersection:
The intersection operation is described as follows. For all possible
ordered pairs consisting of one element from list-1 and one element
from list-2, :test or :test-not are used to determine whether they
satisfy the test. The first argument to the :test or :test-not
function is an element of list-1; the second argument is an element of
list-2. If :test or :test-not is not supplied, eql is used. It is an
error if :test and :test-not are supplied in the same function call. …
For every pair that satifies the test, exactly one of the two elements
of the pair will be put in the result. No element from either list
appears in the result that does not satisfy the test for an element
from the other list. If one of the lists contains duplicate elements,
there may be duplication in the result.
So, in the case of (2 2 2 2) and (2 2 2 3), there are 16 pairs to consider:
(2 2) (2 2) (2 2) (2 3) ; first element is first 2 from list-1, second elements are from list-2
(2 2) (2 2) (2 2) (2 3) ; first element is second 2 from list-1, second elements are from list-2
(2 2) (2 2) (2 2) (2 3) ; first element is third 2 from list-1, second elements are from list-2
(2 2) (2 2) (2 2) (2 3) ; first element is fourth 2 from list-1, second elements are from list-2
Since "For every pair that satifies the test, exactly one of the two elements of the pair will be put in the result," it seems to me that you're going to end up with between 3 and 4 2's in the result, because you've got 12 pairs that satisfy the test, and you need to cover each row and column of those 12 pairs. This hinges, I suppose, on the interpretation of "exactly one of the two elements of the pair will be put in the result". In general though, if you have, e.g., lists-as-sets (a1 a2) and (b1 b2 b3) then you have the pairs:
(a1 b1) (a1 b2) (a1 b3)
(a2 b1) (a2 b2) (a2 b3)
I think that the spec should be read as saying that each ai and bi will be included at most once, and that you never include a given ai and bi based on the particular pair (ai bi). So, if from row one you were to select (a1 b2) and include b2 in the result, then the remaining pairs that could contribute elements to the result are
(a1 b1) (a1 b3)
(a2 b1) (a2 b3)
if you had taken a1 from (a1 b2), then the remaining pairs would be
(a2 b1) (a2 b2) (a2 b3)
That is, when you include an element from one of the pairs, you've either removed a row or a column from the table of pairs that determine the possible results. In the first case, you could still add two more elements to the result, but in the second, there could be three.
In fact, in LispWorks, if you reverse the order of the arguments, you'll get the 3 element version:
CL-USER 5 > (intersection '(2 2 2 3) '(2 2 2 2))
(2 2 2)
There is no guarantee that the order of elements in the result will
reflect the ordering of the arguments in any particular way. The
result list may share cells with, or be eq to, either list-1 or list-2
if appropriate.
You didn't mention whether you're just getting an equivalent list back, or if you're actually getting list-1 back. In Lispworks, it seems that you're actually getting the same list back, although that's not required:
CL-USER 2 > (let ((l1 '(2 2 2 2))
(l2 '(2 2 2 3)))
(eq l1 (intersection l1 l2)))
T
Here is mine that works well. I used remove to remove duplicating symbols.
(defun my-intersection (x y)
(cond ((or (null x) (null y)) nil)
((find (first x) y) (cons (first x)
(my-intersection (remove (first x) x) y)))
(t (my-intersection (rest x) y))))

List length comparison

I wanted to write the code for comparing the size of two lists. I made use of the length and wrote this down.
(define (same-size-matrix? mtrx1 mtrx2)
(equal? (length mtrx1) (length mtrx2))).
I thought this was going to work for me, but I found out it only checks the overall length, not the sublist. For example it returns true when it compares for. '((1 2 3 4) (4 5 6 6) (6 7 8 9)) and '(( 5 4) (3 2) (7 1)), but it's supposed to return false, because the first has 4 values within the list and the second has only two even though they both overally have same length. How do I go about this. Any help would be appreciated.
Try this instead:
(define (same-size-matrix? mtrx1 mtrx2)
(equal? (map length mtrx1) (map length mtrx2)))
Notice that in your solution you're comparing the total length of each list (the number of rows in the matrix), but ignoring the length of each sublist (the number of columns for each row in the matrix). In my soultion, first we calculate the length of each sublist and after that we check if all the lengths are equal. For example, take this input:
(define mtrx1 '((1 2 3 4) (4 5 6 6) (6 7 8 9)))
(define mtrx2 '((5 4) (3 2) (7 1)))
(same-size-matrix? mtrx1 mtrx2)
First the same-size-matrix? evaluates this expression, which finds the length of each sublist in mtrx1. It's necessary to check all the lengths, not just the first one, in case we're dealing with a jagged array:
(map length mtrx1)
; evaluates to '(4 4 4)
And then we have this expression, which performs the same operation for mtrx2:
(map length mtrx2)
; evaluates to '(2 2 2)
Finally, we compare the two lists of lengths (in fact: the number of columns per row), returning the expected result:
(equal? '(4 4 4) '(2 2 2))
> #f
Notice that the last comparison will also detect if the lists are of different size, in case the matrices have a different number of rows.
is it scheme?
(define m1 `((1 2 3 4) (4 5 6 6 ) (6 7 8 9)))
(define m2 `((5 4) (3 2) (7 1)))
(define (same-size-matrix? m1 m2) (equal? (map length m1) (map length m2)))
(same-size-matrix? m1 m2) ; => #f
(same-size-matrix? m1 m1) ; => #t
Here is a simple definition of same-size?.
#lang racket
; A MATRIX is a list of ROWs.
; A ROW is a list of numbers.
; In a matrix all rows are of the same length.
(define (row-size list-of-rows)
(length list-of-rows))
(define (column-size matrix)
(define first-row (first matrix))
(length first-row))
(define (same-size? matrix1 matrix2)
(and (= (row-size matrix1) (row-size matrix2))
(= (column-size matrix1) (column-size matrix2))))
As a bonus here is a predicate that test whether an object
is a matrix or not. Compare it to the data definitions.
(define (row? object)
(and (list? object)
(andmap number? object)))
(define (matrix? object)
(and (list? object)
(andmap row? object)
(apply = (map row-size object))))
You need to clarify if you want to check 1) the exact shape of the matrix or 2) the overall 'flattened' length.
what should be the result for (same-size-matrix? '((1 2) (3 4) (5 6)) '((1 2 3) (4 5 6)))?
1) => #f
2) => #t
Óscar López's answer is for 1.
If your requirement is 2, based on Óscar's answer:
(define (same-size-matrix? mtrx1 mtrx2)
(equal? (apply + (map length mtrx1)) (apply + (map length mtrx2))))

Writing a simple matrix class in common lisp for practice

Common lisp newbie. Writing lisp code is quite different from writing c++/java, as I wrote them before.
I am trying to write a simple matrix class in common lisp for practice. Some codes like that:
(defun make-matrix (row col)
(make-list row :initial-element (make-list col :initial-element nil)))
(defun init-matrix (matrix init-value)
(labels ((set-element-value (lst)
(if (and lst
(listp lst))
(mapcar #'set-element-value lst)
(setf lst init-value))))
(set-element-value matrix)))
(defun matrix+ (&rest matrices)
(apply #'mapcar (lambda (&rest rows)
(apply #'mapcar #'+ rows)) matrices))
My question is can I write a matrix+ accepting different number of arguments without 'apply', or in a better way ? In a way that lisp should be?
And how about the matrix*, can somebody show me some awesome code accepting arbitrary number of arguments in matrix* ? Thanks.
Common Lisp has n-dimensional arrays. I would use those for matrix operations.
See: MAKE-ARRAY, AREF, ...
Typically I would also then write a binary (taking two arguments) matrix operation. Use then REDUCE to operate over a list of matrices.
CL-USER > (make-array '(3 5) :initial-element 0)
#2A((0 0 0 0 0) (0 0 0 0 0) (0 0 0 0 0))
Above creates a 2-dimensional array of size 3x5 with 0 as initial content.
Matrix multiplication. I can't promise this is the best example possible, but it is really straight-forward. This is given you use arrays rather than lists. Also, of course, you can optimize for square matrices, or special cases, like identity matrices etc. But this is meant only to be simple, not efficient etc.
(defun matrix* (&rest matrices)
(assert (cdr matrices) nil
"You will achieve nothing by multiplying a single matrix.")
(reduce
#'(lambda (a b)
(assert (= (array-dimension a 0) (array-dimension b 1)) nil
"The number of rows in the first matrix should be the number ~
of columns in the second matrix")
(let ((result
(make-array
(list (array-dimension a 1) (array-dimension b 0))
:initial-element 0)))
(dotimes (i (array-dimension a 1) result)
(dotimes (j (array-dimension b 0))
(dotimes (k (array-dimension a 0))
(incf (aref result i j) (* (aref a k i) (aref b j k))))))))
matrices))
(format t "result: ~s~&" (matrix* #2A((1 2) (3 4)) #2A((5 6) (7 8))))
;; #2A((23 31) (34 46)) =
;; (1 * 5 + 3 * 6 = 23) (1 * 7 + 3 * 8 = 31)
;; (2 * 5 + 4 * 6 = 34) (2 * 7 + 4 * 8 = 46)

Finding arbitrary length subsets of elements of a list

disclaimer: I'm pretty sure I've managed to muck up something really simple, possibly because I've been poking at this in between
"real work" while waiting for some slow C++ builds, so my head's not
in the right place.
In looking at
What's the most efficient way of generating all possible combinations of skyrim (PC Game) potions? I had the naïve notion that it would be a really, really simple recursive filter in Lisp to generate all combinations of size "n." The answer given there, in R, is elegant and shows off the language well, but that combn(list,n) method caught my attention. ( http://stat.ethz.ch/R-manual/R-patched/library/utils/html/combn.html )
(defun combn (list n)
(cond ((= n 0) nil)
((null list) nil)
((= n 1) (mapcar #'list list))
(t (mapcar #'(lambda (subset) (cons (car list) subset))
(combn (cdr list) (1- n))))))
(combn '(1 2 3 4 5 6 7 8 9) 3)
((1 2 3) (1 2 4) (1 2 5) (1 2 6) (1 2 7) (1 2 8) (1 2 9))
Except, this just returns the first set of combinations … I can't wrap my head around what's wrong, precisely. It seems that the (= n 1) case works right, but the t case should be doing something differently, such as stripping (1 2) off the list and repeating?
So, my attempt to fix it, got nastier:
(defun combn (list n)
(cond ((= n 0) nil) ((= n 1) (mapcar #'list list))
((null list) nil)
(t (cons (mapcar #'(lambda (subset) (cons (car list) subset))
(combn (cdr list) (1- n)))
(combn (cdr list) n)))))
which is wrong at the point of (t cons(… I think. But, if cons is the wrong answer, I'm not sure what is right…? (Reduced to using 2 to demonstrate output…)
(combn '(1 2 3 4 5 6 7 8 9) 2)
(((1 2) (1 3) (1 4) (1 5) (1 6) (1 7) (1 8) (1 9))
((2 3) (2 4) (2 5) (2 6) (2 7) (2 8) (2 9))
((3 4) (3 5) (3 6) (3 7) (3 8) (3 9))
((4 5) (4 6) (4 7) (4 8) (4 9))
((5 6) (5 7) (5 8) (5 9))
((6 7) (6 8) (6 9))
((7 8) (7 9))
((8 9))
NIL)
… which appears to be right, except for the extraneous nesting and the bonus NIL at the end. (I had anticipated that ((null list) nil) would have filtered that out?)
What did I do wrong? :-(
(And, also, is there a standard routine for doing this more efficiently?)
Yes, the cons is not the right thing, you need an append. And that's also what gets you the NIL at the end. I can't write Lisp, so I'll give you Haskell:
comb :: Int -> [a] -> [[a]]
comb 0 _ = [[]]
comb k (x:xs) = [x:ys | ys <- comb (k-1) xs] ++ comb k xs
comb _ _ = []
That's short and sweet, but inefficient (and doesn't check for negative k). It will often try to choose more elements than the list has for a long time. To prevent that, one would keep track of how many elements are still available.
comb :: Int -> [a] -> [[a]]
comb 0 _ = [[]]
comb k xs
| k < 0 = []
| k > len = []
| k == len = [xs]
| otherwise = go len k xs
where
len = length xs
go l j ys
| j == 1 = map (:[]) ys
| l == j = [ys]
| otherwise = case ys of
(z:zs) -> [z:ws | ws <- go (l-1) (j-1) zs] ++ go (l-1) j zs
Ugly, but efficient.
A solution using Common Lisp.
Note that this version intentionally uses assert to give you a continuable error if the list passed in isn't evenly divisible by the specified number but it'd be easy enough to have it just place any "leftover" items in a shorter list at the end, or use error to just make it bail completely without possibility of interactive fixing.
Based on scheme's srfi-1, tweaked to CL by me, and improved greatly by Rainer Joswig
(defun split-by (list n &aux length)
"splits a list into multiple lists of length n.
Parameters:
* list - the list to be split
* n - the size of the lists it should be broken into.
Returns:
A list of smaller lists of the specified length (or signals an error).
Examples:
(split-by '(1 2 3 4) 2) ; => ((1 2) (3 4))
(split-by '(1 2 3) 2) ; => not evenly divisible"
(assert (zerop (mod (setf length (length list)) n))
(list)
"list is not evenly divisible by ~A: ~A" n list)
(if (plusp length)
(cons (subseq list 0 n)
(split-by (subseq list n) n))
'()))

Resources