Algorithm to assign best value between points based on distance - algorithm

I am having trouble figuring out an algorithim to best assign values to different points on a diagram based on the distance between the points.
Essentially, I am given a diagram with a block and a dynamic amount of points. It should look something like this:
I am then given a list of values to assign to each point. Here are the rules and info:
I know the Lat,Long values for each point and the central block. In other words, I can get the direct distance from every object to another.
The list of values may be shorter that the total number of points. In this case, values can be repeated multiple times.
In the case where values must be repeated, the duplicate values should be as far away as possible from one another.
Here is an example using a value list of {1,2}:
In reality, this is a very simple example. In truth, there may be thousands of points.

Find out how many values you need to repeat, in your example you have 2 values and 5 points so, you need to have 2 repetition for 2 values, then you will have 2x2=4 positions [call this pNum] (you have to use different pairs as much as possible so that they are far apart from each other).
Calculate a distance array then find the max pNum values in the array, in other words find the greates 4 values in the array in your example.
assigne the repeated values for the the points found most far apart, and assign the rest of the points based on the array distance values.

Related

Given an array of houses find how many segments exists after n queries

I recently just encountered a leetcode style programming problem and I was wondering what the most optimal way to solve it is. The question goes like this:
Given an array of houses like houses = [1,2,3,7,8,10,11] and an array of queries like q=[2,10,8], return an array of how many segments exist after each query. Each query indicates the house that will be destroyed and the queries are executed in order.
A segment refers to a consecutive group of houses. There can technically be one house in a segment if there are no other house that are consecutive to it(it doesn't have neighbors), however it is still one segment.
Ex.
houses->[None,house,house,house,None,None,None,house,house,None,house,house]
As can be seen the house indexes match up. Before any queries there are 3 segments(they are bolded)
After the first query, the house at index 2 will be destroyed.
[None,house,None,house,None,None,None,house,house,None,house,house]
Now there are 4 segments, which means res=[4]
After the next query, house 10 is destroyed.
[None,house,None,house,None,None,None,house,house,None,None,house]
There are still 4 segments which. means res=[4,4]
After the last query, house 8 is destroyed.
[None,house,None,house,None,None,None,house,None,None,None,house]
There are still 4 segments which mean res=[4,4,4]
The array returned from this is [4,4,4]
I was wondering what the most optimal way to approach this problem is. My take was to construct a boolean array from 0 to maximum element in houses and make the indices that have a house on them True and the indices which don't have a house False. After this for each query we can just check that index and check how many neighbors it has. If it has 0 neighbors, then the number of segments decreases by 1, if it has 1 neighbor, then it stays the same, and if it has 2 then the number of segment increases by one. This approach allows me to process queries in O(1) but it is not very space efficient since we could have a house that lies at a very large index, Ex. houses=[1,1000000]. Is there a better approach to this problem? Thank you in advance.
Your solution is right but your problem of using enormous amount of space for finding in O(1) can be solved with HashMap
Make a HashMap between the house number to the Index within the array and for each query ask if the HashMap contains the number of the query, if it does just ask for the index and use your neighbor logic on the original array instead of the boolean array.
If you have multiple houses with the same number in the array use HashMap between the numer to a list of indexes with the same logic, just iterate on the list instead of the single house.
this solution has O(|houses| + |queries|) time complexity and O(|houses|) space complexity

algorithm to accomplish comparing two arrays with user define criteria

I want to compare tow float arrays' value. But it may be different from other criteria. Here is how I define which array is the best.
Say we have two array named a,b.First, we compare the max value of these two array, and the array with smaller max value wins. If they have same value, then we can divide each array into two parts. The first part is a[1:max_loc(a)-1] and a[max_loc(a)+1,len(a)], and b is similar. Then we use the same criteria on a[1:max_loc(a)-1] and b[1:max_loc(b)-1] to see which array has the smaller max value. If they have the same max value on these intervals, then divide them to smaller arrays and do the same comparison. We also do the same thing for the a[max_loc(a)+1,len(a)] and b[max_loc(b)+1,len(b)]. Until we find smaller max value on the same intervals, the program end and print out the best array.
What's the algorithm to fulfill this comparison?
P.S. these two arrays may have different length.
Most of the time, what you search is somewhere already on the Internet :
https://www.ics.uci.edu/~eppstein/161/960118.html
Here you got 2 examples with full explanations which follows the divide and conquer idea (MergeSort and QuickSort)

An alldifferent for tuples

I'm trying to solve a sudoku with the viewpoint that every number has 9 positions. This is the representation for my sudoku:
From the table you can read that number 5 has following positions (Row,Col) in the sudoku: (2,8),(4,2),(6,5).
When I mention a row in my explanation, I mean a row like this:
For example, the above row is row 1.
What I have done is the following:
For every row check if all ROW-Values in that row are different using alldifferent from ic_global.
Do the same as above but then for the COLUMN-Values.
For every row, check if the square numbers are different (calculated using a row and col value each time), using alldifferent again.
The above things work fine and I get a solution for the sudoku but not the correct one. This is because I have to check one more thing: every position must be different. With the current state of my solver I could get a solution that has multiple numbers on the same position, f.e.: 2 and 3 could both be at position (5,7) because I don't check if all positions are different.
How would I tackle this problem?
I tried to get ALL the positions in one list in tuple form and then check if all tuples are different but I have been struggling for hours and I'm getting really desperate. I hope I can find a solution here.
EDIT: Added code
As you already know, all_different/1 and related constraints work on integers. Also, in your case, you are actually interested in a special case of tuples, namely pairs consisting of rows and columns.
So, your question can actually be reduced to:
How can I injectively map pairs of integers to integers?
Suppose you have pairs of the form A-B, where both A and B are constrained to 1..9.
I can put such pairs in a one-to-one correspondence to integers in several ways. A very easy function that does this is: 9×A + B. Think about it!
Thus, I recommend you map such positions to integers in this way or a similar one, and then post all_different/1 on these integers.
Exercise: Think about other possible mappings and their properites. Then generalize them to work on tuples.

Get most unique text from a group of text

I have a number of texts, for example 100.
I would keep the 10 most unique among them. I made a 100x100 matrix where I compared each text among them with the Levenshtein algorithm.
Is there an algorithm to select the 10 most unique?
EDIT :
What i want is the N most unique text that maximize the distance between this N text regardless of the 1st element of my set.
I want the most unique because i will publish these text to the web and i want avoid near duplicate.
A long comment rather than an answer ...
I don't think you've specified your requirement(s) clearly enough. How do you select the 1st element of your set of 10 strings ? Is it the string with the largest distance from any other string (in which case you are looking for the largest element in your array) or the one with the largest distance from all the other strings (in which case you are looking for the largest row- or column-sum in the array).
Moving on to the N (or 10 as you suggest) most distant strings, you have a number of choices.
You could select the N largest distances in the array. I suspect, not having seen your data, that it is likely that the string which is furthest from any other string may also be furthest away from several other strings too -- I mean you may find that several of the N largest entries in your array occur in the same row or column.
You could simply select the N strings with the largest row sums.
Or perhaps you are looking for a cluster of N strings which maximises the distance between all the strings in that cluster and all the strings in the remaining 100-N strings. This might lead you towards looking at, rather obviously, clustering algorithms.
I suggest you clarify your requirements and edit your question.
Since this looks like an eigenvalue problem, I would try to execute the Power iteration on the matrix, and reject the 90 highest values from the resulting vector. The power iteration normally converges very fast, within ~ten iterations. BTW: this solution assumes a similarity matrix. If the entries of your matrix are a measure of *dis*similarity ("distance"), you might need to use their inverses instead.

Find the "largest" dense sub matrix in a large sparse matrix

Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect ratio constraints.
Are there any known exact or aproxamate solutions to this problem?
A quick scan on Google seems to give a lot of close-but-not-exactly results. What terms should I be looking for?
edit: Just to clarify; the sub matrix need not be continuous. In fact the row and column order is completely arbitrary so adjacency is completely irrelevant.
A thought based on Chad Okere's idea
Order the rows from largest count to smallest count (not necessary but might help perf)
Select two rows that have a "large" overlap
Add all other rows that won't reduce the overlap
Record that set
Add whatever row reduces the overlap by the least
Repeat at #3 until the result gets to small
Start over at #2 with a different starting pair
Continue until you decide the result is good enough
I assume you want something like this. You have a matrix like
1100101
1110101
0100101
You want columns 1,2,5,7 and rows 1 and 2, right? That submatrix would 4x2 with 8 elements. Or you could go with columns 1,5,7 with rows 1,2,3 which would be a 3x3 matrix.
If you want an 'approximate' method, you could start with a single non-zero element, then go on to find another non-zero element and add it to your list of rows and columns. At some point you'll run into a non-zero element that, if it's rows and columns were added to your collection, your collection would no longer be entirely non-zero.
So for the above matrix, if you added 1,1 and 2,2 you would have rows 1,2 and columns 1,2 in your collection. If you tried to add 3,7 it would cause a problem because 1,3 is zero. So you couldn't add it. You could add 2,5 and 2,7 though. Creating the 4x2 submatrix.
You would basically iterate until you can't find any more new rows and columns to add. That would get you too a local minimum. You could store the result and start again with another start point (perhaps one that didn't fit into your current solution).
Then just stop when you can't find any more after a while.
That, obviously, would take a long time, but I don't know if you'll be able to do it any more quickly.
I know you aren't working on this anymore, but I thought someone might have the same question as me in the future.
So, after realizing this is an NP-hard problem (by reduction to MAX-CLIQUE) I decided to come up with a heuristic that has worked well for me so far:
Given an N x M binary/boolean matrix, find a large dense submatrix:
Part I: Generate reasonable candidate submatrices
Consider each of the N rows to be a M-dimensional binary vector, v_i, where i=1 to N
Compute a distance matrix for the N vectors using the Hamming distance
Use the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm to cluster vectors
Initially, each of the v_i vectors is a singleton cluster. Step 3 above (clustering) gives the order that the vectors should be combined into submatrices. So each internal node in the hierarchical clustering tree is a candidate submatrix.
Part II: Score and rank candidate submatrices
For each submatrix, calculate D, the number of elements in the dense subset of the vectors for the submatrix by eliminating any column with one or more zeros.
Select the submatrix that maximizes D
I also had some considerations regarding the min number of rows that needed to be preserved from the initial full matrix, and I would discard any candidate submatrices that did not meet this criteria before selecting a submatrix with max D value.
Is this a Netflix problem?
MATLAB or some other sparse matrix libraries might have ways to handle it.
Is your intent to write your own?
Maybe the 1D approach for each row would help you. The algorithm might look like this:
Loop over each row
Find the index of the first non-zero element
Find the index of the non-zero row element with the largest span between non-zero columns in each row and store both.
Sort the rows from largest to smallest span between non-zero columns.
At this point I start getting fuzzy (sorry, not an algorithm designer). I'd try looping over each row, lining up the indexes of the starting point, looking for the maximum non-zero run of column indexes that I could.
You don't specify whether or not the dense matrix has to be square. I'll assume not.
I don't know how efficient this is or what its Big-O behavior would be. But it's a brute force method to start with.
EDIT. This is NOT the same as the problem below.. My bad...
But based on the last comment below, it might be equivilent to the following:
Find the furthest vertically separated pair of zero points that have no zero point between them.
Find the furthest horizontally separated pair of zero points that have no zeros between them ?
Then the horizontal region you're looking for is the rectangle that fits between these two pairs of points?
This exact problem is discussed in a gem of a book called "Programming Pearls" by Jon Bentley, and, as I recall, although there is a solution in one dimension, there is no easy answer for the 2-d or higher dimensional variants ...
The 1=D problem is, effectively, find the largest sum of a contiguous subset of a set of numbers:
iterate through the elements, keeping track of a running total from a specific previous element, and the maximum subtotal seen so far (and the start and end elemnt that generateds it)... At each element, if the maxrunning subtotal is greater than the max total seen so far, the max seen so far and endelemnt are reset... If the max running total goes below zero, the start element is reset to the current element and the running total is reset to zero ...
The 2-D problem came from an attempt to generate a visual image processing algorithm, which was attempting to find, within a stream of brightnesss values representing pixels in a 2-color image, find the "brightest" rectangular area within the image. i.e., find the contained 2-D sub-matrix with the highest sum of brightness values, where "Brightness" was measured by the difference between the pixel's brighness value and the overall average brightness of the entire image (so many elements had negative values)
EDIT: To look up the 1-D solution I dredged up my copy of the 2nd edition of this book, and in it, Jon Bentley says "The 2-D version remains unsolved as this edition goes to print..." which was in 1999.

Resources