How to transform many numbers into one? Algorithmic issue - algorithm

I want to have function that transform input (n-size vector) into one int.
Formally:
F: (x1,x2,...,xn) -> y
This could be something like:
http://en.wikipedia.org/wiki/G%C3%B6del_numbering
but should be:
-unambiguous
-big size of input cant produce huge output number.
My case is to encode graph neighbours. I just want to keep information about edges for specific vertex.
First idea was that assume some vertex has assigned value 41. In binary representation it is 101001. That means this vertex is connected with vertices numer: 1, 4 and 6. Then getting info about neighbors was simply tab[i] & (1 << j), where tab is 1-d array of vertices and for vertex i it stores for example 41. j is a number of checking vertex. But with this solution is on problem: I can only encode information about 32 neighbours (max size of int - 2^32) or am I wrong?
So I want to have simply representation of graph neighbours as a number with fast method of obtaining info about neighborhood of two vertices. Is such a solution exist, or maybe some of u have an idea how to deal with it?
Thanks for any help.

I don't think it is possible.
If you have k potential neighbors,then your representation needs to 2^k possible values so as to map a value to a certain combination of neighbors. So you do need at least k bits to represent k neighbors.
Also, your proposed solution of using a 32 bit int to represent neighbors is not much different from an adjacency matrix. In fact its exactly the same thing, except you are using bits instead of maybe using booleans. You could use an array of bit vectors instead of of using 2-D array of booleans.

Related

Grouping large set of similar vectors

I have a 3d mesh of ~200,000 triangles.
To find all the flat (or near enough flat) surfaces on the model I thought I could try and group triangles by their normal vectors (giving me ones which face the same way) and then I can search these smaller sets for ones which are similar in position or connected.
I cannot think of a good way to practically do this while also keeping things relatively speedy. I have come up with solutions which would take n² but none which are elegant and quicker than that.
I have vertex information and triangle information (vertices, centre and normal).
Any suggestions would be appreciated.
It is possible that I have misunderstood the problem so I am stating what I think you need to do : "Given a set of vectors, group parallel vectors together".
You could use a hash-map to solve this problem. I am assuming that you stored the normal vectors in the form:
a + b + c = 0
You just need to write a function that converts a vector to an integer, for example, if I know that 0 <= a, b, c <= 1000, then I can use F(a, b, c) = a + 1000b + 1000000c which guarantees unique integer for every unique vector. After this, its just a matter of creating a hashmap which maps some integer to a list and store all the parallel vectors in the same list.
You want to find connected components on the graph from your triangles. The only thing you need is to store adjacency information in a convenient form.
Create a list of all edges (min, max), if all edges have two triangles adjacent, then there are 300'000 edges. This can be done in linear time:
For every vertex count number of adjacent vertices with greater index, do the partial sum on these numbers.
Allocate and fill an array for edges (second vertex and utility data). Use array from step 1 to access edges adjacent to a vertex. Such an access can be done in the constant time if we know that the number of edges adjacent to a vertex is bounded from above by a constant and the whole step can be done in the linear time.
So, mentioned utility data is the numbers of pair of triangles adjacent to the edge.
Ok, now you have adjacency info. It is time to find connected components. You can use DFS for it. It will work in the linear time because every triangle has three (constant number of) neighbors.
Here you need to allocate 200'000 * sizeof(int) * 4 bytes. And it can be done in the linear time.
You could also want to read about doubly connected edge list, but it is pretty expensive.

Compressing coordinates in Fenwick tree

Let's say we have n empty boxes in a row. We are going to put m groups of coins in some consequtive boxes, which are known in advance. We put the 1st group of coins in boxes from i_1 to j_1, the 2nd group in boxes from i_2 to j_2 and so on.
Let be c_i number of coins in box i, after putting all the coins in the boxes. We want to be able to quickly determine, how many coins are there in the boxes with indexes i = s, s + 1, ... e - 1, e, i. e. we want to compute sum
c_s +c_(s+1) + ... + c_e
efficiently. This can be done by using Fenwick tree. Without any improvements, Fenwick tree needs O(n) space for storing c_i's (in a table; actually, tree[i] != c_i, values are stored smarter) and O(log n) time for computing the upper sum.
If we have the case where
n is too big for us to make a table of length n (let's say ~ 10 000 000 000)
m is sufficiently small (let's say ~ 500 000)
there is a way to somehow compress coordinates (indexes) of the boxes, i.e. it suffices to store just boxes with indexes i_1, i_2, ... , i_m. Since a value that is stored in tree[i] depends on binary representation of i, my idea is to sort indexes i_1, j_1, i_2, j_2, ... , i_m, j_m and make a tree with length O(m). Adding a new value to the tree would then be straight forward. Also, to compute that sum, we only have to find the first index that is not greater than e and the last that is not smaller than s. Both can be done with binary search. After that the sum can be easily computed.
Problem occurs in 2D case. Now, we have an area of points (x,y) in the plane, 0 < x,y < n. There are m rectangles in that area. We know coordinates of their down-left and up-right corners and we want to compute how many rectangles contain a point (a,b). The simplest (and my only) idea is to follow the manner from the 1D case: for each coordinate x_i of corners store all the coordinates y_i of the corners. The idea is not so clever, since it needs O(m^2) = too much space. My question is
How to store coordinates in the tree in a more efficient way?
Solutions of the problem that use Fenwick trees are preferred, but every solution is welcome!
The easiest approach is using map/unordered_map instead of 2d array. In that case you even have no need in coordinates compression. Map will create a key-value pair only when it needed, so it creates log^2(n) key-value pairs for each point from input.
Also you could you segment tree based on pointers (instead of arrays) with lazy initialisation (you should create node only when it needed).
Use 2d Segment Tree. It could be noticed that for each canonical segment by y-coordinate you can build segment tree (1d) for x-coordinates only for points lying in zone y_min <= y < y_max, where y_min and y_max are bounds of the canonical segment by y. It implies that each input point will be only in log(n) segment trees for x-coordinates, which makes O(n log n) memory in total.

Clustering Data in a 3D matrix with another matrix

I Have got 2 Data cubes represented as 3D matrices. Both of them will be of same dimensions. we have to do rule based ordering. our condition is that if any sub cube of both of them ( sub cube must match exactly in location and orientation) matches atleast p% we can tell that they are similar. now given two 3D matrices containing the data , we have to write an algorithm which prints the number of similar subcubes that are similar in the given two cubes.
I tried brute force algorithm but it turned out to be very slow on large data sets. Is there any specific algorithm I can use here or any technique??
Thanks in advance.
We can adapt the first solution in this question. Construct another 3D matrix called count and fill all its edge cells corresponding to matching data with 1s. Then, starting from count(1,1,1), consider the cells in lexicographic order and set the count(i, j, k) for i,j,k such that the data matches to the smallest value of any of its neighbours which have already been set. If the data doesn't match, set count(i, j, k) = 0.
At the end, the non-zero elements of count contain the matching cubes, and their value denotes the width of the cube.

Find an algorithm that minimize the maximum distance of two sets, better than Greedy algorithm

Here is the interesting but complicated problem:
Suppose we have two sets of points. One set A includes points in some space grid, like regular 1D or 3D grid. The other set B includes points that are randomly spaced and are of the same size as the space grid. Mathematically, we could order the two sets and construct a corresponding matrix with respect to the distance between A and B. For example, A(i, j) may refer to the distance between i of A and j of B.
Given some ordering, we have a matrix. Then, the diagonal element (i,i) in the matrix is the distance between point i of A and point i of B. The problem is how to find a good reordering/indexing such that the maximum distance is as small as possible? In matrix form, how to find a good reordering/indexing such that the largest diagonal element as small as possible?
Notes from myself:
Suppose set A is corresponding to rows of the matrix, and set B is to columns of the matrix. Then reordering the matrix means we are doing row/column permutation. Therefore, our problem is equivalent to find a good permutation to minimize the largest diagonal element.
Greedy algorithm may be a choice. But I am trying to find an ideally perfect reordering that minimize the largest diagonal element.
The reordering you are referring to is essentially a correspondence problem i.e. you are trying to find the closest match for each point in the other set. The greedy algorithm will work fine. The distance you are looking for is commonly referred to as the Hausdorff distance.

Finding the farthest point in one set from another set

My goal is a more efficient implementation of the algorithm posed in this question.
Consider two sets of points (in N-space. 3-space for the example case of RGB colorspace, while a solution for 1-space 2-space differs only in the distance calculation). How do you find the point in the first set that is the farthest from its nearest neighbor in the second set?
In a 1-space example, given the sets A:{2,4,6,8} and B:{1,3,5}, the answer would be
8, as 8 is 3 units away from 5 (its nearest neighbor in B) while all other members of A are just 1 unit away from their nearest neighbor in B. edit: 1-space is overly simplified, as sorting is related to distance in a way that it is not in higher dimensions.
The solution in the source question involves a brute force comparison of every point in one set (all R,G,B where 512>=R+G+B>=256 and R%4=0 and G%4=0 and B%4=0) to every point in the other set (colorTable). Ignore, for the sake of this question, that the first set is elaborated programmatically instead of iterated over as a stored list like the second set.
First you need to find every element's nearest neighbor in the other set.
To do this efficiently you need a nearest neighbor algorithm. Personally I would implement a kd-tree just because I've done it in the past in my algorithm class and it was fairly straightforward. Another viable alternative is an R-tree.
Do this once for each element in the smallest set. (Add one element from the smallest to larger one and run the algorithm to find its nearest neighbor.)
From this you should be able to get a list of nearest neighbors for each element.
While finding the pairs of nearest neighbors, keep them in a sorted data structure which has a fast addition method and a fast getMax method, such as a heap, sorted by Euclidean distance.
Then, once you're done simply ask the heap for the max.
The run time for this breaks down as follows:
N = size of smaller set
M = size of the larger set
N * O(log M + 1) for all the kd-tree nearest neighbor checks.
N * O(1) for calculating the Euclidean distance before adding it to the heap.
N * O(log N) for adding the pairs into the heap.
O(1) to get the final answer :D
So in the end the whole algorithm is O(N*log M).
If you don't care about the order of each pair you can save a bit of time and space by only keeping the max found so far.
*Disclaimer: This all assumes you won't be using an enormously high number of dimensions and that your elements follow a mostly random distribution.
The most obvious approach seems to me to be to build a tree structure on one set to allow you to search it relatively quickly. A kd-tree or similar would probably be appropriate for that.
Having done that, you walk over all the points in the other set and use the tree to find their nearest neighbour in the first set, keeping track of the maximum as you go.
It's nlog(n) to build the tree, and log(n) for one search so the whole thing should run in nlog(n).
To make things more efficient, consider using a Pigeonhole algorithm - group the points in your reference set (your colorTable) by their location in n-space. This allows you to efficiently find the nearest neighbour without having to iterate all the points.
For example, if you were working in 2-space, divide your plane into a 5 x 5 grid, giving 25 squares, with 25 groups of points.
In 3 space, divide your cube into a 5 x 5 x 5 grid, giving 125 cubes, each with a set of points.
Then, to test point n, find the square/cube/group that contains n and test distance to those points. You only need to test points from neighbouring groups if point n is closer to the edge than to the nearest neighbour in the group.
For each point in set B, find the distance to its nearest neighbor in set A.
To find the distance to each nearest neighbor, you can use a kd-tree as long as the number of dimensions is reasonable, there aren't too many points, and you will be doing many queries - otherwise it will be too expensive to build the tree to be worthwhile.
Maybe I'm misunderstanding the question, but wouldn't it be easiest to just reverse the sign on all the coordinates in one data set (i.e. multiply one set of coordinates by -1), then find the first nearest neighbour (which would be the farthest neighbour)? You can use your favourite knn algorithm with k=1.
EDIT: I meant nlog(n) where n is the sum of the sizes of both sets.
In the 1-Space set I you could do something like this (pseudocode)
Use a structure like this
Struct Item {
int value
int setid
}
(1) Max Distance = 0
(2) Read all the sets into Item structures
(3) Create an Array of pointers to all the Items
(4) Sort the array of pointers by Item->value field of the structure
(5) Walk the array from beginning to end, checking if the Item->setid is different from the previous Item->setid
if (SetIDs are different)
check if this distance is greater than Max Distance if so set MaxDistance to this distance
Return the max distance.

Resources