How do I represent a hextile/hex grid in memory? - data-structures

Say I'm building a board game with a hextile grid, like Settlers of Catan:
Note that each vertex and edge may have an attribute (a road and settlement above).
How would I make a data structure which represents this board? What are the patterns for accessing each tile's neighbors, edges and vertices?

Amit Patel has posted an amazing page on this topic. It's so comprehensive and wonderful that it needs to be the definitive answer to this question: Hexagonal Grids

Such a grid can be represented in a two-dimensional array:
If
2
7 3
1
6 4
5
is the number one with its neighbors in the hex grid, then you can put this into a 2D array like so:
2 3
7 1 4
6 5
Obviously neighbor-ness is determined in this grid not only by being horizontally or vertically adjacent but also using one diagonal.
You can use a graph too, if you like to, though.

This article goes through how to set up a Isomeric/Hexagonal grid game. I recommend you have a look at the Forcing Isometric and Hexagonal Maps onto a Rectangular Grid section and the the movement section. Although it is different from what you are looking for it may help you formulate how to do what you want.

I've dealt a lot with hexes. In cases like this, you track each of the 6 points for the borders of the hex. This lets you draw it quite easily.
You would have a single array of objects that represent hexes. Each of these hex objects also has 6 "pointers" (or an index to another array) pointing to another array of "sides". Same thing for "vertices". Of course the vertices would have 3 pointers to the adjoining hexes, and the sides would have 2.
So, a hex may be something like:
X, Y, Point(6), Vertices(6), Sides(6)
Then you have a Hex array, vertice array, and side array.
Then it is pretty simple to find the vertices/sides for a hex, or whatever.
When I say pointer it could just as easily be an integer pointing to the element in the vertice or side array or whatever. And of course arrays could be lists or whatever.

You could create a 2D array and then consider the valid positions as:
On even-numbered rows (0,2,4,...): the odd numbered cells.
On odd-numbered rows (1,3,5,...): the even numbered cells.
For each cell, its neighbors would be:
Same column, 2 rows up
Same column, 2 rows down
1 left + 1 up
1 left + 1 down
1 right + 1 up
1 right + 1 down
Illustration:
The x marks are hexes. x that are diagonal to each other are neighbors. | connects vertical neighbors.

2
7 3
1
6 4
5
You can also try to 'flat' rows of your map. For this example it would be:
2
7 1 3
6 5 4
Its sometimes more useful to have rows in one row:P

I would suggest something like the following (I'll use Delphi-style declarations):
type
THexEdge = record
Hexes: array[1..2] of Integer; // Index of adjoining hexes.
// Other edge stuff goes here.
end;
THexVertex = record
Hexes: array[1..3] of Integer; // Index of adjoining hexes.
// Other vertex stuff goes here.
end;
THex = record
Edges: array[1..6] of Integer; // Index of edge.
Vertices: array[1..6] of Integer; // Index of vertex.
// Other hex stuff goes here.
end;
var
Edges: array of THexEdge;
Vertices: array of THexVertex;
HexMap: array of THex;
Each hex has six edges and six vertices. Each edge keeps track of its two adjoining hexes, and each vertex keeps track of its three adjoining hexes (hexes on the edges of the map will be a special case).
There are many things that you could do a different way of course. You could use pointers rather than arrays, you could use objects rather than records, and you could store your hexes in a two-dimensional array as other answerers have suggested.
Hopefully, that might give you some ideas about one way to approach it though.

We implemented a Settlers of Catan AI for a class project, and modified code from this answer (which was buggy) to create a Board with constant time random access to vertices and edges. It was a fun problem, but the board took a lot of time, so in case anyone is still looking for a simple implementation here is our Python code:
class Board:
# Layout is just a double list of Tiles, some will be None
def __init__(self, layout=None):
self.numRows = len(layout)
self.numCols = len(layout[0])
self.hexagons = [[None for x in xrange(self.numCols)] for x in xrange(self.numRows)]
self.edges = [[None for x in xrange(self.numCols*2+2)] for x in xrange(self.numRows*2+2)]
self.vertices = [[None for x in xrange(self.numCols*2+2)] for x in xrange(self.numRows*2+2)]
for row in self.hexagons:
for hexagon in row:
if hexagon == None: continue
edgeLocations = self.getEdgeLocations(hexagon)
vertexLocations = self.getVertexLocations(hexagon)
for xLoc,yLoc in edgeLocations:
if self.edges[xLoc][yLoc] == None:
self.edges[xLoc][yLoc] = Edge(xLoc,yLoc)
for xLoc,yLoc in vertexLocations:
if self.vertices[xLoc][yLoc] == None:
self.vertices[xLoc][yLoc] = Vertex(xLoc,yLoc)
def getNeighborHexes(self, hex):
neighbors = []
x = hex.X
y = hex.Y
offset = 1
if x % 2 != 0:
offset = -1
if (y+1) < len(self.hexagons[x]):
hexOne = self.hexagons[x][y+1]
if hexOne != None: neighbors.append(hexOne)
if y > 0:
hexTwo = self.hexagons[x][y-1]
if hexTwo != None: neighbors.append(hexTwo)
if (x+1) < len(self.hexagons):
hexThree = self.hexagons[x+1][y]
if hexThree != None: neighbors.append(hexThree)
if x > 0:
hexFour = self.hexagons[x-1][y]
if hexFour != None: neighbors.append(hexFour)
if (y+offset) >= 0 and (y+offset) < len(self.hexagons[x]):
if (x+1) < len(self.hexagons):
hexFive = self.hexagons[x+1][y+offset]
if hexFive != None: neighbors.append(hexFive)
if x > 0:
hexSix = self.hexagons[x-1][y+offset]
if hexSix != None: neighbors.append(hexSix)
return neighbors
def getNeighborVertices(self, vertex):
neighbors = []
x = vertex.X
y = vertex.Y
offset = -1
if x % 2 == y % 2: offset = 1
# Logic from thinking that this is saying getEdgesOfVertex
# and then for each edge getVertexEnds, taking out the three that are ==vertex
if (y+1) < len(self.vertices[0]):
vertexOne = self.vertices[x][y+1]
if vertexOne != None: neighbors.append(vertexOne)
if y > 0:
vertexTwo = self.vertices[x][y-1]
if vertexTwo != None: neighbors.append(vertexTwo)
if (x+offset) >= 0 and (x+offset) < len(self.vertices):
vertexThree = self.vertices[x+offset][y]
if vertexThree != None: neighbors.append(vertexThree)
return neighbors
# used to initially create vertices
def getVertexLocations(self, hex):
vertexLocations = []
x = hex.X
y = hex.Y
offset = x % 2
offset = 0-offset
vertexLocations.append((x, 2*y+offset))
vertexLocations.append((x, 2*y+1+offset))
vertexLocations.append((x, 2*y+2+offset))
vertexLocations.append((x+1, 2*y+offset))
vertexLocations.append((x+1, 2*y+1+offset))
vertexLocations.append((x+1, 2*y+2+offset))
return vertexLocations
# used to initially create edges
def getEdgeLocations(self, hex):
edgeLocations = []
x = hex.X
y = hex.Y
offset = x % 2
offset = 0-offset
edgeLocations.append((2*x,2*y+offset))
edgeLocations.append((2*x,2*y+1+offset))
edgeLocations.append((2*x+1,2*y+offset))
edgeLocations.append((2*x+1,2*y+2+offset))
edgeLocations.append((2*x+2,2*y+offset))
edgeLocations.append((2*x+2,2*y+1+offset))
return edgeLocations
def getVertices(self, hex):
hexVertices = []
x = hex.X
y = hex.Y
offset = x % 2
offset = 0-offset
hexVertices.append(self.vertices[x][2*y+offset]) # top vertex
hexVertices.append(self.vertices[x][2*y+1+offset]) # left top vertex
hexVertices.append(self.vertices[x][2*y+2+offset]) # left bottom vertex
hexVertices.append(self.vertices[x+1][2*y+offset]) # right top vertex
hexVertices.append(self.vertices[x+1][2*y+1+offset]) # right bottom vertex
hexVertices.append(self.vertices[x+1][2*y+2+offset]) # bottom vertex
return hexVertices
def getEdges(self, hex):
hexEdges = []
x = hex.X
y = hex.Y
offset = x % 2
offset = 0-offset
hexEdges.append(self.edges[2*x][2*y+offset])
hexEdges.append(self.edges[2*x][2*y+1+offset])
hexEdges.append(self.edges[2*x+1][2*y+offset])
hexEdges.append(self.edges[2*x+1][2*y+2+offset])
hexEdges.append(self.edges[2*x+2][2*y+offset])
hexEdges.append(self.edges[2*x+2][2*y+1+offset])
return hexEdges
# returns (start, end) tuple
def getVertexEnds(self, edge):
x = edge.X
y = edge.Y
vertexOne = self.vertices[(x-1)/2][y]
vertexTwo = self.vertices[(x+1)/2][y]
if x%2 == 0:
vertexOne = self.vertices[x/2][y]
vertexTwo = self.vertices[x/2][y+1]
return (vertexOne, vertexTwo)
def getEdgesOfVertex(self, vertex):
vertexEdges = []
x = vertex.X
y = vertex.Y
offset = -1
if x % 2 == y % 2: offset = 1
edgeOne = self.edges[x*2][y-1]
edgeTwo = self.edges[x*2][y]
edgeThree = self.edges[x*2+offset][y]
if edgeOne != None: vertexEdges.append(edgeOne)
if edgeTwo != None: vertexEdges.append(edgeTwo)
if edgeThree != None: vertexEdges.append(edgeThree)
return vertexEdges
def getHexes(self, vertex):
vertexHexes = []
x = vertex.X
y = vertex.Y
xOffset = x % 2
yOffset = y % 2
if x < len(self.hexagons) and y/2 < len(self.hexagons[x]):
hexOne = self.hexagons[x][y/2]
if hexOne != None: vertexHexes.append(hexOne)
weirdX = x
if (xOffset+yOffset) == 1: weirdX = x-1
weirdY = y/2
if yOffset == 1: weirdY += 1
else: weirdY -= 1
if weirdX >= 0 and weirdX < len(self.hexagons) and weirdY >= 0 and weirdY < len(self.hexagons):
hexTwo = self.hexagons[weirdX][weirdY]
if hexTwo != None: vertexHexes.append(hexTwo)
if x > 0 and x < len(self.hexagons) and y/2 < len(self.hexagons[x]):
hexThree = self.hexagons[x-1][y/2]
if hexThree != None: vertexHexes.append(hexThree)
return vertexHexes

I am sitting here "in my free time coding for fun" with hexes. And it goes like this... I will tell you what it looks like in words.
Hexagon: it has six neighbour hexagons. It can deliver the reference for each neighbouring hex tile. It can tell you what it consists of(water ,rock, dust). It can connect itself to others and vice versa. It can even automatically connect the others surrounding him to create a greater field and or making sure all fields can be adressed by its neighbours.
A building references up to three roads and three Hex Tiles. They can tell you which they are.
A road references two hexes and other roads when they are adressed by neighbouring tiles. They can tell which tiles that are and which roads or buildings they connect to.
This is just an idea how I would work on it.

Related

Searching a 3D array for closest point satisfying a certain predicate

I'm looking for an enumeration algorithm to search through a 3D array "sphering" around a given starting point.
Given an array a of size NxNxN where each N is 2^k for some k, and a point p in that array. The algorithm I'm looking for should do the following: If a[p] satisfies a certain predicate, the algorithm stops and p is returned. Otherwise the next point q is checked, where q is another point in the array that is the closest to p and hasn't been visited yet. If that doesn't match either, the next q'is checked an so on until in the worst case the whole array has been searched.
By "closest" here the perfect solution would be the point q that has the smallest Euclidean distance to p. As only discrete points have to be considered, perhaps some clever enumeration algorithm woukd make that possible. However, if this gets too complicated, the smallest Manhattan distance would be fine too. If there are several nearest points, it doesn't matter which one should be considered next.
Is there already an algorithm that can be used for this task?
You can search for increasing squared distances, so you won't miss a point. This python code should make it clear:
import math
import itertools
# Calculates all points at a certain distance.
# Coordinate constraint: z <= y <= x
def get_points_at_squared_euclidean_distance(d):
result = []
x = int(math.floor(math.sqrt(d)))
while 0 <= x:
y = x
while 0 <= y:
target = d - x*x - y*y
lower = 0
upper = y + 1
while lower < upper:
middle = (lower + upper) / 2
current = middle * middle
if current == target:
result.append((x, y, middle))
break
if current < target:
lower = middle + 1
else:
upper = middle
y -= 1
x -= 1
return result
# Creates all possible reflections of a point
def get_point_reflections(point):
result = set()
for p in itertools.permutations(point):
for n in range(8):
result.add((
p[0] * (1 if n % 8 < 4 else -1),
p[1] * (1 if n % 4 < 2 else -1),
p[2] * (1 if n % 2 < 1 else -1),
))
return sorted(result)
# Enumerates all points around a center, in increasing distance
def get_next_point_near(center):
d = 0
points_at_d = []
while True:
while not points_at_d:
d += 1
points_at_d = get_points_at_squared_euclidean_distance(d)
point = points_at_d.pop()
for reflection in get_point_reflections(point):
yield (
center[0] + reflection[0],
center[1] + reflection[1],
center[2] + reflection[2],
)
# The function you asked for
def get_nearest_point(center, predicate):
for point in get_next_point_near(center):
if predicate(point):
return point
# Example usage
print get_nearest_point((1,2,3), lambda p: sum(p) == 10)
Basically you consume points from the generator until one of them fulfills your predicate.
This is pseudocode for a simple algorithm that will search in increasing-radius spherical husks until it either finds a point or it runs out of array. Let us assume that condition returns either true or false and has access to the x, y, z coordinates being tested and the array itself, returning false (instead of exploding) for out-of-bounds coordinates:
def find_from_center(center, max_radius, condition) returns a point
let radius = 0
while radius < max_radius,
let point = find_in_spherical_husk(center, radius, condition)
if (point != null) return point
radius ++
return null
the hard part is inside find_in_spherical_husk. We are interested in checking out points such that
dist(center, p) >= radius AND dist(center, p) < radius+1
which will be our operating definition of husk. We could iterate over the whole 3D array in O(n^3) looking for those, but that would be really expensive in terms of time. A better pseudocode is the following:
def find_in_spherical_husk(center, radius, condition)
let z = center.z - radius // current slice height
let r = 0 // current circle radius; maxes at equator, then decreases
while z <= center + radius,
let z_center = (z, center.x, point.y)
let point = find_in_z_circle(z_center, r)
if (point != null) return point
// prepare for next z-sliced cirle
z ++
r = sqrt(radius*radius - (z-center.z)*(z-center.z))
the idea here is to slice each husk into circles along the z-axis (any axis will do), and then look at each slice separately. If you were looking at the earth, and the poles were the z axis, you would be slicing from north to south. Finally, you would implement find_in_z_circle(z_center, r, condition) to look at the circumference of each of those circles. You can avoid some math there by using the Bresenham circle-drawing algorithm; but I assume that the savings are negligible compared with the cost of checking condition.

Print the elements which making min cost path from a start point to end point in a grid

We can calculate min cost suppose take this recurrence relation
min(mat[i-1][j],mat[i][j-1])+mat[i][j];
0 1 2 3
4 5 6 7
8 9 10 11
for calculating min cost using the above recurrence relation we will get for min-cost(1,2)=0+1+2+6=9
i am getting min cost sum, that's not problem..now i want to print the elements 0,1,2,6 bcz this elements are making min cost path.
Any help is really appreciated.
Suppose, your endpoint is [x, y] and start-point is [a, b]. After the recursion step, now start from the endpoint and crawl-back/backtrack to start point.
Here is the pseudocode:
# Assuming grid is the given input 2D grid
output = []
p = x, q = y
while(p != a && q != b):
output.add(grid[p][q])
min = infinity
newP = -1, newQ = -1
if(p - 1 >= 0 && mat[p - 1][q] < min):
min = matrix[p -1][q]
newP = p - 1
newQ = q
if(q - 1 >= 0 && mat[p][q - 1] < min):
min = mat[p][q - 1]
newP = p
newQ = q - 1
p = newP, q = newQ
end
output.add(grid[a][b])
# print output
Notice, here we used mat and grid - two 2D matrix where grid is the given input and mat is the matrix generated after the recursion step mat[i][j] = min(mat[i - 1][j], mat[i][j - 1]) + grid[i][j]
Hope it helps!
Besides computing the min cost matrix using the relation that you mentioned, you can also create a predecessor matrix.
For each cell (i, j), you should also store the information about who was the "min" in the relation that you mentioned (was it the left element, or is it the element above?). In this way, you will know for each cell, which is its preceding cell in an optimal path.
Afterwards, you can generate the path by starting from the final cell and moving backwards according to the "predecessor" matrix, until you reach the top-left cell.
Note that the going backwards idea can be applied also without explicitly constructing a predecessor matrix. At each point, you would need to look which of the candidate predecessors has a lower total cost.

How to insert a vector with different length to a matrix in MATLAB?

How to insert a vector with different length to a matrix? For example I have a randomly located nodes=10; After that I am finding who are the neighbours of this nodes for example for node: i = 2 neighbours are nodes: j = 5 and 6, so the vector which containts this value is neighb_i = [neighb_i j]; but maybe I have to add some zeros to fill the vector till the length of a matrix ( in this case nodes are 10, so the matrix will be 10x10 ) But I want to keep all the values of neighbours for all nodes in a matrix, because when I am using a vector after the next interation this value is replacing themselves so at the next iteration when i = 3 I don't have information about when i = 2. How can I store this in a matrix, which has length nodes x nodes ?
close all
clear all
clc
x = 2000; %m
y = 2000; %m
nodes = 8;
%% Random location and direction, calculating the distance;
loc_x = x*rand(1,nodes)
loc_y = y*rand(1,nodes)
loc = [loc_x' loc_y']
dist(loc_x);
dist(loc_y);
distance = sqrt(dist(loc_x).^2 + dist(loc_y).^2) % = dist(loc')
distance(1:nodes+1:nodes^2) = inf; % replace zero diagonal with infinity
%% Power:
noise_power_dBm = -90; %dBm
noise_power_w = 10^((noise_power_dBm - 30)/10); % W
% The channel is based on a path-loss model in which is distance between nodes i and j
% and 3 - is attenuation exponent considerd as 3 ( 1:6 )
channel_gain = (1)./(distance).^3; %dB
channel_gain(1:nodes+1:nodes^2) = inf; % replace zero diagonal with infinity
min_SNR_dB = 10; %dB, minimum required SNR at the receiving nodes
min_SNR = 10^(min_SNR_dB/10);
p_max_dBm = 10; % dBm
p_max_w = 10^((p_max_dBm - 30)/10); % W
p_uni = (min_SNR*noise_power_w)./(channel_gain); % unicast power between two nodes
p_uni(1:nodes+1:nodes^2) = inf; % replace zero diagonal with infinity
plot(loc_x,loc_y,'r*');
change = 1;
cost = inf(1,nodes) % at the beginning cost of all nodes are infinity
cost(1) = 0; % except the cost of node 1 = 0 ( source node )
while change == 1 % change is when a child chooses parent node and change its parent node; if its a =1 its go again in for loops
change = 0; % when i am coming inside while loop I change the change parameter to 0 and if some changes occurse
for i = 2:nodes %i - child node first i = 2, then j is 1,2,3...8 it checks all the nodes starts from 1 because source node can be a neighbour of other node
neighb_i = [];
for j = 1:nodes % j - parent node
if p_uni(i,j) < p_max_w
neighb_i = [neighb_i j]; %Found who are the neighbours ; to use a matrix not a vector ????????
end
% calculate the cost for every nodes First Dijkstra, later MC
% and choose the parent node for every node ; if we have some changes
% ---> change = 1
for k = [neighb_i j] % neighb_i
% cost(i,k) = 1:nodes;
end
end
keyboard
end
% At the end of this while - I have to know what is the parent node for
% every node and when I come out I can draw a graph.
end
%%
gplot(nodes,loc)
%plot(source,'b*');
plot(loc_x,loc_y,'rO');
axis([x y x y]);
hold on;
grid on;
grid on, xlabel('x'), ylabel('y');
title('Random nodes');

How to go through the elements of a matrix, layer by layer

It is difficult to explain what I want. Lets say I have a matrix of 0 and 1
000000
000000
001100
000000
000000
I want to start from a certain group of ones (this is given in the beginning, and then I want to go outwards.
000000,,,,,,, 000000
011110 OR 001100
010010,,,,,,, 010010
011110,,,,,,, 001100
000000,,,,,,, 000000
The difference is not important, as long as I will go through everything, outwards.
The reason I want to do this is, this matrix of 1 and 0 corresponds to a matrix of some 2D function, and I want to examine the points in that function going outwards. I want to
If i understand the question correctly, basically what you want is to find a group of 1s inside a matrix and invert the group of 1s and all of it's surrounding. This is actually an image-processing problem, so my explanation will be accordingly. Sidenote: the term 'polygon' is here used for the group of 1s in the matrix. Some assumptions made: the polygon is always filled. The polygon doesn't contain any points that are directly at the outer bounds of the matrix (ex.: the point (0 , 2) is never part of the polygon). The solution can be easily found this way:
Step 1: search an arbitrary 1 that is part of the outer bound of the polygon represented by the 1s in the matrix. By starting from the upper left corner it's guaranteed that the returned coordinated will belong to a 1 that is either on the left side of the polygon, the upper-side or at a corner.
point searchArb1(int[][] matrix)
list search
search.add(point(0 , 0))
while NOT search.isEmpty()
point pt = search.remove(0)
//the point wasn't the searched one
if matrix[pt.x][pt.y] == 1
return pt
//continue search in 3 directions: down, right, and diagonally down/right
point tmp = pt.down()
if tmp.y < matrix.height
search.add(tmp)
tmp = pt.right()
if tmp.x < matrix.width
search.add(tmp)
tmp = pt.diagonal_r_d()
if tmp.x < matrix.width AND tmp.y < matrix.height
search.add(tmp)
return null
Step 2: now that the we have an arbitrary point in the outer bound of the polygon, we can simply proceed by searching the outer bound of the polygon. Due to the above mentioned assumptions, we only have to search for 1s in 3 directions (diagonals are always represented by 3 points forming a corner). This method will search the polygon bound clockwise.
int UP = 0
int RIGHT = 1
int DOWN = 2
int LEFT = 3
list searchOuterBound(int[][] matrix , point arbp)
list result
point pt = arbp
point ptprev
//at each point one direction can't be available (determined using the previous found 1
int dir_unav = LEFT
do
result.add(pt)
//generate all possible candidates for the next point in the polygon bounds
map candidates
for int i in [UP , LEFT]
if i == dir_unav
continue
point try
switch i
case UP:
try = pt.up()
break
case DOWN:
try = pt.down()
break
case RIGHT:
try = pt.right()
break
case LEFT:
try = pt.left()
break
candidates.store(i , try)
ptprev = pt
for int i in [0 , 2]
//the directions can be interpreted as cycle of length 4
//always start search for the next 1 at the clockwise next direction
//relatively to the direction we come from
//eg.: dir_unav = LEFT -> start with UP
int dir = (dir_unav + i + 1) % 4
point try = candidates.get(dir)
if matrix[pt.x][pt.y] == 1
//found the first match
pt = try
//direction we come from is the exact opposite of dir
dir_unav = (dir + 2) % 4
break
//no matching candidate was found
if pt == ptprev
return result
while pt != arbp
//algorithm has reached the starting point again
return result
Step 3: Now we've got a representation of the polygon. Next step: Inverting the points around the polygon aswell. Due to the fact that the polygon itself will be filled with 0s later on, we can simply fill up the surrounding of every point in the polygon with 1s. Since there are two options for generating this part of the matrix-state, i'll split up into two solutions:
Step 3.1: Fill points that are diagonal neighbours of points of the polygon with 1s aswell
void fillNeighbours_Diagonal_Included(int[][] matrix , list polygon)
for point p in polygon
for int x in [-1 , 1]
for int y in [-1 , 1]
matrix[p.x + x][p.y + y] = 1
Step 3.1: Don't fill points that are diagonal neighbours of points of the polygon
void fillNeighbours_Diagonal_Excluded(int[][] matrix , list polygon)
for point p in polygon
matrix[p.x - 1][p.y] = 1
matrix[p.x + 1][p.y] = 1
matrix[p.x][p.y - 1] = 1
matrix[p.x][p.y + 1] = 1
Step 4: Finally, last step: Invert all 1s in the polygon into 0s. Note: I'm too lazy to optimize this any further, so this part is implemented as brute-force.
void invertPolygon(int[][] matrix , list polybounds)
//go through each line of the matrix
for int i in [0 , matrix.height]
sortedlist cut_x
//search for all intersections of the line with the polygon
for point p in polybounds
if p.y == i
cut_x.add(p.x)
//remove ranges of points to only keep lines
int at = 0
while at < cut_x.size()
if cut_x.get(at - 1) + 1 == cut_x.get(at)
AND cut_x.get(at) == cut_x.get(at + 1) - 1
cut_x.remove(at)
--at
//set all points in the line that are part of the polygon to 0
for int j in [0 , cut_x.size()[ step = 2
for int x in [cut_x.get(j) , cut_x.get(j + 1)]
matrix[x][i] = 0
I hope you understand the basic idea behind this. Sry for the long answer.

How Could One Implement the K-Means++ Algorithm?

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means algorithm.
Is the probability function used based on distance or Gaussian?
In the same time the most long distant point (From the other centroids) is picked for a new centroid.
I will appreciate a step by step explanation and an example. The one in Wikipedia is not clear enough. Also a very well commented source code would also help. If you are using 6 arrays then please tell us which one is for what.
Interesting question. Thank you for bringing this paper to my attention - K-Means++: The Advantages of Careful Seeding
In simple terms, cluster centers are initially chosen at random from the set of input observation vectors, where the probability of choosing vector x is high if x is not near any previously chosen centers.
Here is a one-dimensional example. Our observations are [0, 1, 2, 3, 4]. Let the first center, c1, be 0. The probability that the next cluster center, c2, is x is proportional to ||c1-x||^2. So, P(c2 = 1) = 1a, P(c2 = 2) = 4a, P(c2 = 3) = 9a, P(c2 = 4) = 16a, where a = 1/(1+4+9+16).
Suppose c2=4. Then, P(c3 = 1) = 1a, P(c3 = 2) = 4a, P(c3 = 3) = 1a, where a = 1/(1+4+1).
I've coded the initialization procedure in Python; I don't know if this helps you.
def initialize(X, K):
C = [X[0]]
for k in range(1, K):
D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
r = scipy.rand()
for j,p in enumerate(cumprobs):
if r < p:
i = j
break
C.append(X[i])
return C
EDIT with clarification: The output of cumsum gives us boundaries to partition the interval [0,1]. These partitions have length equal to the probability of the corresponding point being chosen as a center. So then, since r is uniformly chosen between [0,1], it will fall into exactly one of these intervals (because of break). The for loop checks to see which partition r is in.
Example:
probs = [0.1, 0.2, 0.3, 0.4]
cumprobs = [0.1, 0.3, 0.6, 1.0]
if r < cumprobs[0]:
# this event has probability 0.1
i = 0
elif r < cumprobs[1]:
# this event has probability 0.2
i = 1
elif r < cumprobs[2]:
# this event has probability 0.3
i = 2
elif r < cumprobs[3]:
# this event has probability 0.4
i = 3
One Liner.
Say we need to select 2 cluster centers, instead of selecting them all randomly{like we do in simple k means}, we will select the first one randomly, then find the points that are farthest to the first center{These points most probably do not belong to the first cluster center as they are far from it} and assign the second cluster center nearby those far points.
I have prepared a full source implementation of k-means++ based on the book "Collective Intelligence" by Toby Segaran and the k-menas++ initialization provided here.
Indeed there are two distance functions here. For the initial centroids a standard one is used based numpy.inner and then for the centroids fixation the Pearson one is used. Maybe the Pearson one can be also be used for the initial centroids. They say it is better.
from __future__ import division
def readfile(filename):
lines=[line for line in file(filename)]
rownames=[]
data=[]
for line in lines:
p=line.strip().split(' ') #single space as separator
#print p
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
#print [float(x) for x in p[1:]]
return rownames,data
from math import sqrt
def pearson(v1,v2):
# Simple sums
sum1=sum(v1)
sum2=sum(v2)
# Sums of the squares
sum1Sq=sum([pow(v,2) for v in v1])
sum2Sq=sum([pow(v,2) for v in v2])
# Sum of the products
pSum=sum([v1[i]*v2[i] for i in range(len(v1))])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/len(v1))
den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1)))
if den==0: return 0
return 1.0-num/den
import numpy
from numpy.random import *
def initialize(X, K):
C = [X[0]]
for _ in range(1, K):
#D2 = numpy.array([min([numpy.inner(c-x,c-x) for c in C]) for x in X])
D2 = numpy.array([min([numpy.inner(numpy.array(c)-numpy.array(x),numpy.array(c)-numpy.array(x)) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
#print "cumprobs=",cumprobs
r = rand()
#print "r=",r
i=-1
for j,p in enumerate(cumprobs):
if r 0:
for rowid in bestmatches[i]:
for m in range(len(rows[rowid])):
avgs[m]+=rows[rowid][m]
for j in range(len(avgs)):
avgs[j]/=len(bestmatches[i])
clusters[i]=avgs
return bestmatches
rows,data=readfile('/home/toncho/Desktop/data.txt')
kclust = kcluster(data,k=4)
print "Result:"
for c in kclust:
out = ""
for r in c:
out+=rows[r] +' '
print "["+out[:-1]+"]"
print 'done'
data.txt:
p1 1 5 6
p2 9 4 3
p3 2 3 1
p4 4 5 6
p5 7 8 9
p6 4 5 4
p7 2 5 6
p8 3 4 5
p9 6 7 8

Resources