Closest point to another point on a hypersphere [closed] - algorithm

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have n (about 10^5) points on a hypersphere of dimension m (between 10^4 to 10^6).
I am going to make a bunch of queries of the form "given a point p, find the closest of the n points to p". I'll make about n of these queries.
(Not sure if the hypersphere fact helps at all.)
The simple naive algorithm to solve this is, for each query, to compare p to all other n points. Doing this n times ends up with a runtime of O(n^2 m), which is far too big for me to be able to compute.
Is there a more efficient algorithm I can use? If I could get it to O(nm) with some log factors that'd be great.

Probably not. Having many dimensions makes efficient indexing extremely hard. That is why people look for opportunities to reduce the number of dimensions to something manageable.
See https://en.wikipedia.org/wiki/Curse_of_dimensionality and https://en.wikipedia.org/wiki/Dimensionality_reduction for more.

Divide your space up into hypercubes -- call these cells -- with edge size chosen so that on average you'll have one point per cube. You'll want a map from hypercells to the set of points they contain.
Then, given a point, check its hypercell for other points. If it is empty, look at the adjacent hypercells (I'd recommend a literal hypercube of hypercells for simplicity rather than some approximation to a hypersphere built out of hypercells). Check that for other points. Keep repeating until you get a point. Assuming your points are randomly distributed, odds are high that you'll find a second point within 1-2 expansions.
Once you find a point, check all hypercells that could possibly contain a closer point. This is possible because the point you find may be in a corner, but there's some closer point outside of the hypercube containing all the hypercells you've inspected so far.

Related

Algorithm consulting: Check conditions in a square [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Suppose I have a image that looks like
x x x x
x x x x
x x x x
Dimensions might change, but just give a rough idea about this.
I am curious if there is an algorithm that will help me quickly check if the current point I am looking at is a corner / points on one of the 4 sides / within the square itself, as well as helping me check all points around the point I am currently looking at.
My current approach is like writing a few helper functions that separately check if the current coordinate is a corner / points on one of the 4 sides / within the square itself? And within each helper function, I use several loops to check all the neighbor points around the point I am currently looking at. But I feel like this approach is extremely ineffective, I believe there must exists a more advanced way to do this, can anyone help me if you have encountered this kind of question before?
Thanks.
Largely you are correct, but there should be no need to use loops. You can make your functions efficient by using some index calculations and using direct access to 1-dimensional array.
Imagine that your image is stored in a 1-dimensional array D. The image is of size (m,n). Hence the array will have a size of m x n. Each data point will have its ID as the index to the array D.
To access neighbors of ID = a, use the following offsets:
a-1, a+1 for left and right neighbors
a-m, a+m for bottom and top neighbors
a-m+1, a-m-1, a+m+1, a+m-1 for diagonal neighbors
After every offset you need to check for the following:
is the neighbor index out of bound for the array D?
does the neighbor index wrap around the x-bounds i.e. assert that
abs((neighbor_id % m)-(a%m)) <= 1 , else neighbor_id is not my neighbor.
Of course, the second test assumes that your image is large enough (perhaps m > 3).

Fast hill climbing algorithm that can stabilize when near optimal [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a floating point number x from [1, 500] that generates a binary y of 1 at some probability p. And I'm trying to find the x that can generate the most 1 or has highest p. I'm assuming there's only one maximum.
Is there a algorithm that can converge fast to the x with highest p while making sure it doesn't jump around too much after it's achieved for e.x. within 0.1% of the optimal x? Specifically, it would be great if it stabilizes when near < 0.1% of optimal x.
I know we can do this with simulated annealing but I don't think I should hard code temperature because I need to use the same algorithm when x could be from [1, 3000] or the p distribution is different.
This paper provides an for smart hill-climbing algorithm. The idea is basically you take n samples as starting points. The algorithm is as follows (it is simplified into one dimensional for your problem):
Take n sample points in the search space. In the paper, he uses Linear Hypercube Sampling since the dimensions of the data in the paper is assumed to be large. In your case, since it is one-dimensional, you can just use random sapling as usual.
For each sample points, gather points from its "local neighborhood" and find a best fit quadratic curve. Find the new maximum candidate from the quadratic curve. If the objective function of the new maximum candidate is actually higher than the previous one, update the sample point to the new maximum candidate. Repeat this step with smaller "local neighborhood" size for each iteration.
Use the best point from the sample points
Restart: repeat step 2 and 3, and then compare the maximums. If there is no improvement, stop. If there is improvement, repeat again.

How to select the number of cluster centroid in K means [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am going through a list of algorithm that I found and try to implement them for learning purpose. Right now I am coding K mean and is confused in the following.
How do you know how many cluster there is in the original data set
Is there any particular format that I have follow in choosing the initial cluster centroid besides all centroid have to be different? For example does the algorithm converge if I choose cluster centroids that are different but close together?
Any advice would be appreciated
Thanks
With k-means you are minimizing a sum of squared distances. One approach is to try all plausible values of k. As k increases the sum of squared distances should decrease, but if you plot the result you may see that the sum of squared distances decreases quite sharply up to some value of k, and then much more slowly after that. The last value that gave you a sharp decrease is then the most plausible value of k.
k-means isn't guaranteed to find the best possible answer each run, and it is sensitive to the starting values you give it. One way to reduce problems from this is to start it many times, with different starting values, and pick the best answer. It looks a bit odd if an answer for larger k is actually larger than an answer for smaller k. One way to avoid this is to use the best answer found for k clusters as the basis (with slight modifications) for one of the starting points for k+1 clusters.
In the standard K-Means the K value is chosen by you, sometimes based on the problem itself ( when you know how many classes exists OR how many classes you want to exists) other times a "more or less" random value. Typically the first iteration consists of randomly selecting K points from the dataset to serve as centroids. In the following iterations the centroids are adjusted.
After check the K-Means algorithm, I suggest you also see the K-means++, which is an improvement of the first version, as it tries to find the best K for each problem, avoiding the sometimes poor clusterings found by the standard k-means algorithm.
If you need more specific details on implementation of some machine learning algorithm, please let me know.

Powers of a half that sum to one [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Call every subunitary ratio with its denominator a power of 2 a perplex.
Number 1 can be written in many ways as a sum of perplexes.
Call every sum of perplexes a zeta.
Two zetas are distinct if and only if one of the zeta has as least one perplex that the other does not have. In the image shown above, the last two zetas are considered to be the same.
Find all the numbers of ways 1 can be written as a zeta with N perplexes. Because this number can be big, calculate it modulo 100003.
Please don't post the code, but rather the algorithm. Be as precise as you can.
This problem was given at a contest and the official solution, written in the Romanian language, has been uploaded at https://www.dropbox.com/s/ulvp9of5b3bfgm0/1112_descr_P2_fractii2.docx?dl=0 , as a docx file. (you can use google translate)
I do not understand what the author of the solution meant to say there.
Well, this reminds me of BFS algorithms(Breadth first search), where you radiate out from a single point to find multiple solutions w/ different permutations.
Here you can use recursion, and set the base case as when N perplexes have been reached in that 1 call stack of the recursive function.
So you can say:
function(int N <-- perplexes, ArrayList<Double> currentNumbers, double dividedNum)
if N == 0, then you're done - enter the currentNumbers array into a hashtable
clone the currentNumbers ArrayList as cloneNumbers
remove dividedNum from cloneNumbers and add 2 dividedNum/2
iterate through index of cloneNumbers
for every number x in cloneNumbers, call function(N--, cloneNumbers, x)
This is a rough, very inefficient but short way to do it. There's obviously a lot of ways you can prune the algorithm(reduce the amount of duplicates going into the hashtable, prevent cloning as much as possible, etc), but because this shows the absolute permutation of every number, and then enters that sequence into a hashtable, the hashtable will use its equals() comparison to see that the sequence already exists(such as your last 2 zetas), and reject the duplicate. That way, you'll be left with the answer you want.
The efficiency of the current algorithm: O(|E|^(N)), where |E| is the absolute number of numbers you can have inside of the array at the end of all insertions, and N is the number of insertions(or as you said, # of perplexes). Obviously this isn't the most optimal speed, but it does definitely work.
Hope this helps!

Uva 10364 Tell if it is possible to use sticks of varying lengths to make a square [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been stuck on this uva problem from a long time now.
Abridged problem statement : Given a set of sticks of various lengths, is it possible to join them end-to-end to form a square? There are a maximum of 20 sticks and each stick has a length less than 10000.
There are different solutions possible for this problem. One of them is a backtracking solution explained here. But there exists other dynamic programming solutions explained here, here and here with a better running time. But I can't understand what approach they are using. Please help me understand the dp algorithm.
If you are not familiar with dynamic programming over subsets, I suggest you read about it first. This link can help, but there may be better tutorials out there.
Back to the given problem, since M is no more than 20, the following 2M×M approach will probably work.
For each of the 2M subsets of the given sticks, we know the total length of the sticks in that subset. We also know the total length of all the given sticks and thus the length of the square side. We construct the square by laying sticks on its sides. Let us fix the order in which we construct our square: we start at the upper left corner and move along the square border in clockwise direction, laying sticks on the way and leaving no gaps. So, first we fully construct the upper side (from left to right), then the right side (top-down), then the lower one (right-to-left) and finally the left one (bottom-up). Whenever the distance to the next square corner in our traversal is L, we can't lay a stick of length greater than L; at least not until we reach that corner using other sticks. Now, the question is: can we order the sticks in such a way that the square can be constructed by our procedure?
There are M! different orders in which we can try to lay the sticks. But, if we lay sticks one-by-one, when choosing the next stick, all that we are concerned with is the set of sticks already laid, not their particular order. This observation leads us to considering only 2M subsets which is way smaller than M! orders.
Next we define subproblems of the problem defined before. For each subset of sticks, the question is: can we order the sticks in such a way that all of them can be laid sequentially by the rules of the above procedure? In other words, can we construct a "valid prefix" of the square traversal, as it is defined above?
We will say a subset of sticks is good if the answer to the above question is "yes", and bad otherwise. Sure, the empty subset is good. In the end, we are interested in whether the whole given set of sticks is also good. To find that out, we can process the subsets in natural order (for M=3, that would be 000, 001, 010, 011, 100, 101, 110, 111 where 1s correspond to sticks in the subset). For each new non-empty subset S, it is good if and only if some of its "immediate subsets" T (S without exactly one element - say a stick of length X) is good, and T can be extended by that stick of length X according to the rules of our construction procedure (that is, laying a stick of length X, we won't have to bend it around some corner).
What's left is implementation details. For each subset, either store or calculate the total length of the sticks in it and find the distance to the next corner L. If this subset is good, it can be extended only by sticks having two properties: (1) length no more than L and (2) not already in the subset.

Resources