Issue with sklearn k nearest neighbors - algorithm

I wonder if there is a way to force sklearn NearestNeighbors algorithm, to take into account the order of a point in the input array, when there are duplicate points.
To illustrate:
>>> from sklearn.neighbors import NearestNeighbors
>>> import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)
indices
>>>> array([[0, 1],
[1, 0],
[2, 1],
[3, 4],
[4, 3],
[5, 4]])
Because the query set matches the training set, the nearest neighbor of each point is the point itself, at a distance of zero. If however, I allow for duplicate points in X, the algorithm, understandably, does not distinguish between the duplicates:
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1],[3, 2],[-1,-1],[-1,-1]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='auto').fit(X)
distances, indices = nbrs.kneighbors(X)
indices
>>>> array([[6, 0],
[1, 0],
[2, 1],
[3, 4],
[4, 3],
[5, 4],
[6, 0],
[6, 0]])
Ideally, I would like the last output to be something like:
>>>> array([[0, 6],
[1, 0],
[2, 1],
[3, 4],
[4, 3],
[5, 4],
[6, 0],
[7, 6]])

I think you cannot do that, since from the ref we got:
Warning: Regarding the Nearest Neighbors algorithms, if two neighbors,
neighbor k+1 and k, have identical distances but different labels, the
results will depend on the ordering of the training data.

Related

Are there any algorithm that solves the following problem in time less than O(n!)?

Are there any algorithm that solves the following problem in time less than O(n!), like polynomial time?
Otherwise, for this problem, does not anyone have found any polynomial time algorithm, like NP problems?
Input: n (number of elements)
Output: a list of all combinations of two, where, from top of the list, each unit of combinations of n/2 must have all elements.
Example 1
Input: n=4
Output:
[0, 1], [2, 3],
[0, 2], [1, 3],
[0, 3], [1, 2]
Example 2
Input: n=8
Output:
[0, 1], [2, 3], [4, 5], [6, 7],
[0, 2], [1, 3], [4, 6], [5, 7],
[0, 3], [1, 2], [4, 7], [5, 6],
[0, 4], [1, 5], [2, 6], [3, 7],
[0, 5], [1, 4], [2, 7], [3, 6],
[0, 6], [1, 7], [2, 4], [3, 5],
[0, 7], [1, 6], [2, 5], [3, 4]
P.S.
The following answer does not meet the requirements.
The first two (= n/2) pairs ([0, 1], [0, 2]) do not have "3", so the answer does not meet the condition where "0" and "1", "2", "3" must be in the first two pairs.
>>> n=4
>>> for i in range(0, n-1):
... for j in range(i+1,n):
... print( [i, j] )
...
[0, 1]
[0, 2]
[0, 3]
[1, 2]
[1, 3]
[2, 3]
As I said in my comment, this appears to be a type of (relaxed) Sports League Scheduling problem. If I understand what you are asking for, it can be summarized as follows:
Given a positive even integer N generate a set of n/2 "rounds" of pairings with the following qualities:
A pairing is a pair of two different integers [a, b] such that a and b are integers from 0..n-1 and a < b.
A round consists of n/2 pairings, such that every element from 0..n-1 appears exactly once in a pairing in the round, and
All pairings are unique across all rounds (that is no pairing ever appears more than once in the complete solution).
Assuming that this is a correct formulation of your problem, then the answer is
Yes, this can be done in O(n^2).
Further, not only can it be done, there exists a simple method to solve it for any even N:
For the first round, make n-1 pairs, filling in the first element of the pairs with the integers from 0 to (n/2)-1 going left-to-right. This how it would look for N=8:
[0, ], [1, ], [2, ], [3, ]
Then, fill in the second elements with (n/2) to n-1, but going right-to-left:
[0, 7], [1, 6], [2, 5], [3, 4]
This completes your first round.
For the next round, copy the first round, but keeping 0 in the same place, move the remaining left-side elements up the list, and the right-side elements down the list. When an element reaches the end of the list, reverse direction and swap them from first elements to second elements (or vice-versa):
----------------------->
[0, 7], [1, 6], [2, 5], [3, 4]
<-----------------------
Becomes
----------------------->
[0, 6], [7, 5], [1, 4], [2, 3]
<-----------------------
Now you just continue this process until you have N/2 rounds:
[0, 7], [1, 6], [2, 5], [3, 4]
[0, 6], [7, 5], [1, 4], [2, 3]
[0, 5], [6, 4], [7, 3], [1, 2]
[0, 4], [5, 3], [6, 2], [7, 1]
Finally swap any pairings where the first element happens to be greater than the second:
[0, 7], [1, 6], [2, 5], [3, 4]
[0, 6], [5, 7], [1, 4], [2, 3]
[0, 5], [4, 6], [3, 7], [1, 2]
[0, 4], [3, 5], [2, 6], [1, 7]
If you check this solution you will find that it fulfills all of the constraints. This solutions works for any even value of N and obviously runs in O(n^2) time.
Yes, this problem can be solved in quadratic time. It is not too hard to explicitly construct these pairings.
It is quite helpful to consider a regular (n-1)-gon with one additional point in the middle. Then take the lines through one of the (n-1) endpoints and the midpoint and choose the pairs given by the symmetry of this line.

Ruby array product with asterisk

I was studying how to list out all divisors of a number and came across this solution by Marc-Andre here. In his solution, there is one part of the code which does something like this:
array.product(*arrays_of_array) # the asterisk seems to have done sth.
I tried it in irb to try play around but I couldn't make sense of the outputs. I tried:
a=[0,1,2]
b=[3,4]
c=[[5,6],[7,8]]
I understand that array.product(other_array) is a method to list all combinations of the two arrays into one. With this knowledge, I tested out several experiments
a.product(b) => [[0, 3], [0, 4], [1, 3], [1, 4], [2, 3], [2, 4]] / 6 elements
a.product(*b) => TypeError: no implicit conversion of Fixnum into Array
a.product(c) => [[0, [5, 6]], [0, [7, 8]], [1, [5, 6]], [1, [7, 8]], [2, [5, 6]], [2, [7, 8]]] / 6 elements
a.product(*c) => [[0, 5, 7], [0, 5, 8], [0, 6, 7], [0, 6, 8], [1, 5, 7], [1, 5, 8], [1, 6, 7], [1, 6, 8], [2, 5, 7], [2, 5, 8], [2, 6, 7], [2, 6, 8]]
From observation, It seems the asterisk (*) has to be applied to a multi-dimensional array? (i.e. matrix?). Without the asterisk, the product returns 6 elements and the combinations only one level. While with the asterisk, the combination will go 1 level deeper and returns 12 elements, and combine until there is no array within the combinations. Where can I find more examples to study this behaviour of the asterisk?
Edit:
I tried to introduce one more variable
d=[[[9,0],[1,2]],[[3,4],[5,6]]]
a.product(*d) => [[0, [9, 0], [3, 4]], [0, [9, 0], [5, 6]], [0, [1, 2], [3, 4]], [0, [1, 2], [5, 6]], [1, [9, 0], [3, 4]], [1, [9, 0], [5, 6]], [1, [1, 2], [3, 4]], [1, [1, 2], [5, 6]], [2, [9, 0], [3, 4]], [2, [9, 0], [5, 6]], [2, [1, 2], [3, 4]], [2, [1, 2], [5, 6]]]
So the asterisk sign only makes it go one level deeper.
In the context of finding the list of divisors. Can anyone explain what the code exactly does?
require 'prime'
def factors_of(number)
primes, powers = number.prime_division.transpose
exponents = powers.map{|i| (0..i).to_a}
divisors = exponents.shift.product(*exponents).map do |powers|
primes.zip(powers).map{|prime, power| prime ** power}.inject(:*)
end
divisors.sort.map{|div| [div, number / div]}
end
p factors_of(4800) # => [[1, 4800], [2, 2400], ..., [4800, 1]]
*(splat) is used to expand collections.
In your example, with b = [3,4],
a.product(*b)
is equivalent to
a.product(3, 4)
which generates an error because Array#product expects an Array as argument, not two integers.

Find efficient maximum of groupings in all sets

I have a 2D array and want to generate a 3D array that will show the most efficient groupings for all sets. Example:
[[1, 2],
[1, 2, 3, 4],
[3, 4],
[1, 2, 5]]
Result:
[[[1, 2]],
[[1, 2], [3, 4]],
[[3, 4]],
[[1, 2], [5]]]
I think I would need to do a nested loop and determine the intersection and differences to generate the 3D array. However, inject(&:&) seems like it might solve it more elegantly, though I'm a bit new to inject and unsure how to implement it for this problem. This is to be done in Ruby.
Any help is appreciated. Thanks!
--Update--
By efficient groupings I mean find the best combination that generates the least amount of total sets in the result by finding the largest duplicate sets.
Another example:
[[1, 2, 3, 4],
[1, 4],
[1, 3, 4],
[1, 2, 3, 4, 5],
[2, 5]]
Possible Result (8 total sets):
[[[1, 3, 4], [2]],
[[1, 4]],
[[1, 4], [3]],
[[1, 3, 4], [2, 5]],
[[2, 5]]]
This is a good result, but the first set could be optimized.
Better Result (7 total sets):
[[[1, 2, 3, 4]],
[[1, 4]],
[[1, 4], [3]],
[[1, 2, 3, 4], [5]],
[[2, 5]]]
Both results contain a total of 5 unique sets. The sets in the better result are (1, 2, 3, 4), (1, 4), (3), (5), and (2, 5). The total number of sets in the better result is 7 as opposed to 8 in the possible result. We want the least amount of sets.
You definitely should explain what "the most efficient grouping" means. In the meantime, if you just need to split arrays into 2-element chunks, just combine map and each_slice:
arr.map{|a| a.each_slice(2).to_a}
# => [[[1, 2]], [[1, 2], [3, 4]], [[3, 4]], [[1, 2], [5]]]
First you should figure out the algorithm in pseudocode for what you want to do (its not the Ruby that's keeping you back right now). Then go use these ruby primatives to make it happen
http://ruby-doc.org/core/classes/Enumerable.html
http://www.ruby-doc.org/core/classes/Array.html
http://corelib.rubyonrails.org/classes/Set.html

Why "..." appears in my answer of matrix in Prolog

I made a little code for creating a matrix of coordinates (like a chessboard), it's the following:
createMatrix(N,M,R) :- creaMatriu(N,M,A), reversed(R,A).
creaMatriu(N,0,[T]) :- creafila(N,0,T),!.
creaMatriu(N,M,[T|C]) :- creafila(N,M,T), M1 is M-1, creaMatriu(N,M1,C).
creafila(0,M,[[M,0]]):-!.
creafila(N,M,[[M,N]|C]) :-N1 is N-1,creafila(N1,M,C).
reversed(A, B) :- reversed(B, [], A).
reversed([A|B], C, D) :- reverse(N,A),reversed(B, [N|C], D).
reversed([], A, A).
The first time I executed it went well, but when i incremented the dimensions of the matrix, the "dots" at the end of the matrix begin to appear incrementing one coordinate as the dimension rises, as like this:
?- createMatrix(1,1,R).
R = [[[0, 0], [0, 1]], [[1, 0], [1, 1]]] .
?- createMatrix(2,1,R).
R = [[[0, 0], [0, 1], [0, 2]], [[1, 0], [1, 1], [1, 2]]] .
?- createMatrix(2,2,R).
R = [[[0, 0], [0, 1], [0, 2]], [[1, 0], [1, 1], [1, 2]], [[2, 0], [2, 1], [2, 2]]] .
?- createMatrix(3,2,R).
R = [[[0, 0], [0, 1], [0, 2], [0, 3]], [[1, 0], [1, 1], [1, 2], [1, 3]], [[2, 0], [2, 1], [2, 2], [2, 3]]] .
?- createMatrix(3,3,R).
R = [[[0, 0], [0, 1], [0, 2], [0, 3]], [[1, 0], [1, 1], [1, 2], [1, 3]], [[2, 0], [2, 1], [2, 2], [2, 3]], [[3, 0], [3, 1], [3, 2], [3|...]]] .
?- createMatrix(4,3,R).
R = [[[0, 0], [0, 1], [0, 2], [0, 3], [0, 4]], [[1, 0], [1, 1], [1, 2], [1, 3], [1, 4]], [[2, 0], [2, 1], [2, 2], [2, 3], [2|...]], [[3, 0], [3, 1], [3, 2], [3|...], [...|...]]] .
?- createMatrix(4,4,R).
R = [[[0, 0], [0, 1], [0, 2], [0, 3], [0, 4]], [[1, 0], [1, 1], [1, 2], [1, 3], [1, 4]], [[2, 0], [2, 1], [2, 2], [2, 3], [2|...]], [[3, 0], [3, 1], [3, 2], [3|...], [...|...]], [[4, 0], [4, 1], [4|...], [...|...]|...]] .
Anyone have any clue why this happens?
Thank you!
By default, the toplevel loop of SWI prints terms up to depth 10. Deeper parts are replaced by ... You can extend that depth or remove that limit by setting the depth to 0.
?- length(L,10).
L = [_A,_B,_C,_D,_E,_F,_G,_H,_I|...].
?- current_prolog_flag(toplevel_print_options,V).
V = [quoted(true),portray(true),max_depth(10),spacing(next_argument)].
?- set_prolog_flag(toplevel_print_options, [quoted(true), portray(true), max_depth(0), spacing(next_argument)]).
true.
?- length(L,10).
L = [_A,_B,_C,_D,_E,_F,_G,_H,_I,_J].
— update: in newer versions of SWI, another flag must be changed:
?- current_prolog_flag(T,V), atom_concat(_,options,T).
T = answer_write_options,
V = [quoted(true),portray(true),max_depth(10),spacing(next_argument)]
; true.
?- set_prolog_flag(answer_write_options, [quoted(true), portray(true), max_depth(0), spacing(next_argument)]).
true.

Fill sparse array

I have a sparse array, for example:
rare = [[0,1], [2,3], [4,5], [7,8]]
I want to plot a chart with these data, each pair are point coordinates.
As you can see I don't have points for x=1, x=3 , x=5, x=6
I want to fill the array with the previous values, so for the above example I will get:
filled = [[0,1], [1,1], [2,3], [3,3], [4,5], [5,5], [6,5], [7,8]
As you can see, for calculating the y value, I simply take the last y value I used.
What is the best aproach to accomplish this ?
Range.new(*rare.transpose.first.sort.values_at(0,-1)).inject([]){|a,i|
a<<[i, Hash[rare][i] || a.last.last]
}
Step-by-step explanation:
rare.transpose.first.sort.values_at(0,-1) finds min and max x ([0,7] in your example)
Range.new() makes a range out of it (0..7)
inject iterates through the range and for every x returns pair [x,y], where y is:
y from input array, where defined
y from previously evaluated pair, where not
Note: here are some other ways of finding min and max x:
[:min,:max].map{|m| Hash[rare].keys.send m}
rare.map{|el| el.first}.minmax # Ruby 1.9, by steenslag
rare = [[0,1], [2,3], [4,5], [7,8]]
filled = rare.inject([]) do |filled, point|
extras = if filled.empty?
[]
else
(filled.last[0] + 1 ... point[0]).collect do |x|
[x, filled.last[1]]
end
end
filled + extras + [point]
end
p filled
# => [[0, 1], [1, 1], [2, 3], [3, 3], [4, 5], [5, 5], [6, 5], [7, 8]]
An inject solution:
filled = rare.inject([]) do |filled_acc, (pair_x, pair_y)|
padded_pairs = unless filled_acc.empty?
last_x, last_y = filled_acc.last
(last_x+1...pair_x).map { |x| [x, last_y] }
end || []
filled_acc + padded_pairs + [[pair_x, pair_y]]
end
More about Enumerable#inject and functional programming with Ruby here.
irb(main):001:0> rare = [[0,1], [2,3], [4,5], [7,8]]
=> [[0, 1], [2, 3], [4, 5], [7, 8]]
irb(main):002:0> r=rare.transpose
=> [[0, 2, 4, 7], [1, 3, 5, 8]]
irb(main):003:0> iv = (r[0][0]..r[0][-1]).to_a.select {|w| !r[0].include?(w) }
=> [1, 3, 5, 6]
irb(main):004:0> r[1][-1]=r[1][-2]
=> 5
irb(main):005:0> p (iv.zip(r[1]) + rare).sort
[[0, 1], [1, 1], [2, 3], [3, 3], [4, 5], [5, 5], [6, 5], [7, 8]]
=> [[0, 1], [1, 1], [2, 3], [3, 3], [4, 5], [5, 5], [6, 5], [7, 8]]

Resources