determining state space based on area - algorithm

I have been tasked with figuring out a state space for a problem based on the area of a rectangle. It seems that I have made my state space far too large and need some feedback.
So far I have an area that has a value fo 600 for a y axis and 300 for an x axis. I determined the number of points to be
(600 x 300) ! or 180,000!
Therefore my robot would need to inspect this many potential spaces, before I apply an algorithm.
This number seems quite high and if that is the case it would make my problem unsolveable before I die especially if I implement the algorithm incorrectly. Any help would be greatly appreciated especially if my math is off in determining the number of points.
I was under the impression to see how many pairs of points you would have to take the cartesian product of the total available points. Which in turn would be (600x300)! . If this is incorrect please let me know.

First of all, the number of "points" (as defined in mathematics - the only relevant definition) in a rectangle of any size (non-zero area) is infinity. Why? Because a point does not necessarily have to have integer coordinates - there can be a point at (0,0), (0,0.1), (0.001), (0,0.0001) and so on. I think what you mean by points in your question is that all points must have integer coordinates (i.e. lattice points), or alternately, "cells" in a rectangular grid (like cells on a chess board). Please let me know if I misunderstood your question.
There are 600 rows and 300 coloumns. This means that there are 600 * 300 = 180,000 different cells. It follows that there are nCr(180,000,2) = 16,199,910,000 unique pairs in the grid. I am assuming you consider the pair ((1,1),(2,2)) and ((2,2),(1,1)) equivalent. Otherwise, there are 180,000*180,000 = 32,400,000,000 pairs.


Ripleys K not plotting correctly?

I am doing point pattern analysis using the package spatstat and ran Ripley's K (spatstat::Kest) on my points to see if there is any clustering. However, it appears that not all the lines that should appear in the graph (kFem) have plotted. For example, the red line (Ktrans) stops at around x=12 and the green line (Kbord) doesn't appear at all. I would appreciate any insights as to how to interpret this and if there might be a bug.
Here is my study window. It is an irregular shape because I am analyzing a point pattern along a transect line.
And here is a density plot of my point pattern:
It is unlikely (but not impossible) that there is a simple bug in Kest that causes this, since this particular function has been tested intensively by many users. More likely you have a observation window that is irregular and there is a mathematical reason why the various estimates cannot be calculated at all distances. Please add a plot/summary of your point pattern so we have knowledge of the observation window (or even better give access to the observation window).
Furthermore, to manually inspect the estimates of the K-function you can convert the function value (fv) object to a data.frame and print it:
dat <-
head(dat, n = 10)
Your window is indeed very irregular and the explanation of why it is not producing some corrections at large distances. I guess your transect is only a few metres wide and you are considering distances up to 50m. The border correction can only be calculated for distances up to something like the half width of the transect.
Using Kest implies that you believe that your transect is a subset of a big homogeneous point process (of equal intensity everywhere and with same correlation structure throughout space). If that is true then Kest provides a sensible estimate of the unknown true homogeneous K-function. However, you provide a plot where you have divided the region into sections of high, medium and low intensity which doesn't agree with the assumption of homogeneity. The deviation from the theoretical Poisson line may just be due to inhomogeneous intensity and not actual correlation between points. You should probably only consider distances that are much smaller than 50 (you can set rmax when you call Kest).

Constrained random solution of an underspecified system of linear equations

I've a system of linear equations
dot(A,x) = y
whose solutions have many degrees of freedom: indeed the Number of linearly independent Equations (E) is less than the dimension of x, A.K.A. the Number of Variables (N).
The number of degrees of freedom left constrains the solutions to be a hyperplane N-E of the overall space R^N. Given the (unimportant) characteristics of A, I am always able to write the solutions x (a vector N x 1) as
where B is a N x (N-E) matrix, t a (N-E) x 1 vector and q a N x 1 vector. This define the hyperplane of the solutions of my original problem, A x = y in parametric form.
I need to extract a random solution, with uniform probability over any possible point of the hyperplane, such that all x are positive (we will refer to it as a positive solution). Note that, for the specific problem I am dealing with, the space of positive solutions of x exists and it is bounded (that's how the notion of uniform probability is reasonable for the specific case, to clarify as suggested by #Petr comment). In the beginning, once I was able to write x=Bt+q, I thought it extremely simple. Now I am starting to doubt it.
Proposed Solution
By now I do something like this:
For each dimension i in range(N-E) I compute the maximum and minimum value of t[i]: t_min[i] and t_max[i]. Intervals big enough to not exclude any possible positive solution. Those are algebraically computed, always existing and defining a limited space.
I extract N-E uniform random values t[i], each comprised between t_min [i] and t_max[i].
I compute x = dot(B,t)+q
If all x[j] are positives, accept the solution. If some x[j] is negative, go back to point 2.
An example is visible for a two dimensional space N-E in the next figure.
Caption: A problem in N dimension reduced to a N-E=2 space. The yellow diamond is the space of positive solutions of the N-dimensional problem. I randomly sample points in the orange box between (t1(min),t2(min)) and (t1(max),t2(max)) until I find a point in the yellow box.
I think it is a good enough solution, but...
When N-E is big, the space of the hyperparallelogram bounded inside the hypercube can be small. In general it will be small^(N-E), that can be very small. How small?
While for sure an infinite number of positive solutions to the original problem exist, the space of the solutions can have measure zero in the N-E dimensional space. This can happen if all the positive solutions of the original problem have one dimension of x = 0. The borders of a diamond will make contact, transforming the diamond of solutions to a line. Of course you will never randomly pick EXACTLY a line in 2D, let alone in 5D.
A obvious idea would be to further reduce the dimensionality from N-E to a smaller number, i.e. to extract directly points from the aforementioned line instead of the square. Algebra is not easy, but I'm working on it. I'm not positive I will be able to solve it.
Note that choosing first one dimension (for example t1), computing the new limits of t2 conditional to the value of t1 extracted and then extract a possible value of t2 in this boundary, while much faster, does not give a uniform probability among all the possible solutions.
I know that the problem is very specific, but even some general ideas or thoughts would be gladly received. I am doubtful if there is some computing technique to extract directly the solution in the diamond...

Challenge: Take a 48x48 image, find contiguous areas that result in the cheapest Lego solution to create that image! [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Lego produces the X-Large Gray Baseplate, which is a large building plate that is 48 studs wide and 48 studs tall, resulting in a total area of 2304 studs. Being a Lego fanatic, I've modeled a few mosaic-style designs that can be put onto these baseplates and then perhaps hung on walls or in a display (see: Android, Dream Theater, The Galactic Empire, Pokemon).
The Challenge
My challenge is now to get the lowest cost to purchase these designs. Purchasing 2304 individual 1x1 plates can get expensive. Using BrickLink, essentially an eBay for Lego, I can find data to determine what the cheapest parts are for given colors. For example, a 1x4 plate at $0.10 (or $0.025 per stud) would be cheaper than a 6x6 plate at $2.16 (or $0.06 per stud). We can also determine a list of all possible plates that can be used to assemble an image:
2x2 corner!
4x4 corner!
The Problem
For this problem, let's assume that we have a list of all plates, their color(s), and a "weight" or cost for each plate. For the sake of simplicity, we can even remove the corner pieces, but that would be an interesting challenge to tackle. How would you find the cheapest components to create the 48x48 image? How would you find the solution that uses the fewest components (not necessarily the cheapest)? If we were to add corner pieces as allowable pieces, how would you account for them?
We can assume we have some master list that is obtained by querying BrickLink, getting the average price for a given brick in a given color, and adding that as an element in the list. So, there would be no black 16x16 plate simply because it is not made or for sale. The 16x16 Bright Green plate, however, would have a value of $3.74, going by the current available average price.
I hope that my write-up of the problem is succint enough. It's something I've been thinking about for a few days now, and I'm curious as to what you guys think. I tagged it as "interview-questions" because it's challenging, not because I got it through an interview (though I think it'd be a fun question!).
Here's a link to the 2x2 corner piece and to the 4x4 corner piece. The answer doesn't necessarily need to take into account color, but it should be expandable to cover that scenario. The scenario would be that not all plates are available in all colors, so imagine that we've got a array of elements that identify a plate, its color, and the average cost of that plate (an example is below). Thanks to Benjamin for providing a bounty!
This list would NOT have the entry:
This is because an 8x8 yellow plate does not exist. The list itself is trivial and should only be thought about as providing references for the solution; it does not impact the solution itself.
Changed some wording for clarity.
Karl's approach is basically sound, but could use some more details. It will find the optimal cost solution, but will be too slow for certain inputs. Large open areas especially will have too many possibilities to search through naively.
Anyways, I made a quick implementation in C++ here:
It solves filling in the empty space (periods), with 4 different kinds of bricks:
0: 1x1 cost = 1000
1: 1x2 cost = 150
2: 2x1 cost = 150
3: 1x3 cost = 250
4: 3x1 cost = 250
5: 3x3 cost = 1
.......... 1112222221
...#####.. 111#####11
..#....#.. 11#2222#13
..####.#.. 11####1#13
..#....#.. 22#1221#13
.......... 1221122555
..##..#... --> 11##11#555
..#.#.#... 11#1#1#555
..#..##... 11#11##221
.......... 1122112211
......#..# 122221#11#
...####.#. 555####1#0
...#..##.. 555#22##22
...####... 555####444 total cost = 7352
So, the algorithm fills in a given area. It is recursive (DFS):
- find next empty square
- if no empty square, return 0
- for each piece type available
- if it's legal to place the piece with upper-left corner on the empty square
- place the piece
- total cost = cost to place this piece + FindBestCostToFillInRemainingArea()
- remove the piece
return the cheapest "total cost" found
Once we figure out the cheapest way to fill a sub-area, we'll cache the result. To very efficiently identify a sub-area, we'll use a 64-bit integer using Zobrist hashing. Warning: hash collisions may cause incorrect results. Once our routine returns, we can reconstruct the optimal solution based on our cached values.
In the example, 41936 nodes (recursive calls) are explored (searching for empty square top-to-bottom). However, if we search for empty squares left-to-right, ~900,000 nodes are explored.
For large open areas: I'd suggest finding the most cost-efficient piece and filling in a lot of the open area with that piece as a pre-process step. Another technique is to divide your image into a few regions, and optimize each region separately.
Good luck! I'll be unavailable until March 26th, so hopefully I didn't miss anything!
Step 1: Iterate through all solutions.
Step 2: Find the cheapest solution.
Create pieces inventory
For an array of possible pieces (include single pieces of each color), make at least n duplicates of each piece, where n = max(board#/piece# of each color). Therefore, at most n of that piece can cover all of the entire board's colors by area.
Now we have a huge collection of possible pieces, bounded because it is guaranteed that a subset of this collection will completely fill the board.
Then it becomes a subset problem, which is NP-Complete.
Solving the subset problem
For each unused piece in the set
For each possible rotation (e.g. for a square only 1, for a rectangle piece 2, for an elbow piece 4)
For each possible position in the *remaining* open places on board matching the color and rotation of the piece
- Put down the piece
- Mark the piece as used from the set
- Recursively decent on the board (with already some pieces filled)
Obviously being an O(2^n) algorithm, pruning of the search tree early is of utmost importance. Optimizations must be done early to avoid long-running. n is a very large number; just consider a 48x48 board -- you have 48x48xc (where c = number of colors) just for single pieces alone.
Therefore, 99% of the search tree must be pruned from the first few hundred plies in order for this algorithm to complete in any time. For example, keep a tally of the lowest cost solution found so far, and just stop searching all lower plies and backtrack whenever the current cost plus (the number of empty board positions x lowest average cost for each color) > current lowest cost solution.
For example, further optimize by always favoring the largest pieces (or the lowest average-cost pieces) first, so as to reduce the baseline lowest cost solution as quickly as possible and to prune as many future cases as possible.
Finding the cheapest
Calculate cost of each solution, find the cheapest!
This algorithm is generic. It does not assume a piece is of the same color (you can have multi-colored pieces!). It does not assume that a large piece is cheaper than the sum of smaller pieces. It doesn't really assume anything.
If some assumptions can be made, then this information can be used to further prune the search tree as early as possible. For example, when using only single-colored pieces, you can prune large sections of the board (with the wrong colors) and prune large number of pieces in the set (of the wrong color).
Do not try to do 48x48 at once. Try it on something small, say, 8x8, with a reasonably small set of pieces. Then increase number of pieces and board size progressively. I really have no idea how long the program will take -- but would love for somebody to tell me!
First you use flood fill to break up the problem into filling continuous regions of lego bricks. Then for each of those you can use a dfs with memoization you wish. The flood fill is trivial so I will not describe it farther.
Make sure to follow a right hand rule while expanding the search tree to not repeat states.
My solution will be:
Sort all the pieces by stud cost.
For each piece in the sorted list, try to place as many as you can in the plate:
Raster a 2D image of your design looking for regions of the image with uniform color, the shape of the current piece and free studs for each stud that the piece will use.
If the color of the region found do not exist for that particular piece, ignore an continue searching.
If the color exists: tag the studs used by that pieces and increment a counter for that kind of piece and that color.
Step 2 will be done once for squared pieces, twice for rectangular pieces (once vertical and once horizontal) and 4 times for corner pieces.
Iterate to 2 until the plate is full or no more type of pieces are available.
Once arrived to the end you will have the number of pieces of each kind and each color that you needed with a minimum cost.
If cost by stubs can change by color, then the original sorted list must include not only the type of piece by also the color.

math: scale coordinate system so that certain points get integer coordinates

this is more a mathematical problem. nonethelesse i am looking for the algorithm in pseudocode to solve it.
given is a one dimensional coordinate system, with a number of points. the coordinates of the points may be in floating point.
now i am looking for a factor that scales this coordinate system, so that all points are on fixed number (i.e. integer coordinate)
if i am not mistaken, there should be a solution for this problem as long as the number of points is not infinite.
if i am wrong and there is no analytical solution for this problem, i am interested in an algorithm that approximates the solution as close as possible. (i.e. the coordinates will look like 15.0001)
if you are interested for the concrete problem:
i would like to overcome the well known pixelsnapping problem in adobe flash, which cuts of half-pixels at the border of bitmaps if the whole stage is scaled. i would like to find out an ideal scaling factor for the stage which makes my bitmaps being placed on whole (screen-)pixel coordinates.
since i am placing two bitmaps on the stage, the number of points will be 4 in each direction (x,y).
As suggested, you have to convert your floating point numbers to rational ones. Fix a tolerance epsilon, and for each coordinate, find its best rational approximation within epsilon.
An algorithm and definitions is outlined there in this section.
Once you have converted all the coordinates into rational numbers, the scaling is given by the least common multiple of the denominators.
Note that this latter number can become quite huge, so you may want to experiment with epsilon so that to control the denominators.
My own inclination, if I were in your situation, would be to use rational numbers not with floating point.
And the algorithms you are looking for is finding the lowest common denominator.
A floating point number is an integer, multiplied by a power of two (the power might be negative).
So, find the largest necessary power of two among your inputs, and that gives you a scale factor that will work. The power of two isn't just -1 times the exponent of the float, it's a few more than that (according to where the least significant 1 bit is in the significand).
It's also optimal, because if x times a power of 2 is an odd integer then x in its float representation was already in simplest rational form, there's no smaller integer that you can multiply x by to get an integer.
Obviously if you have a mixture of large and small values among your input, then the resulting integers will tend to be bigger than 64 bit. So there is an analytical solution, but perhaps not a very good one given what you want to do with the results.
Note that this approach treats floats as being precise representations, which they are not. You may get more sensible results by representing each float as a rational number with smaller denominator (within some defined tolerance), then taking the lowest common multiple of all the denominators.
The problem there though is the approximation process - if the input float is 0.334[*] then I can't in general be sure whether the person who gave it to me really mean 0.334, or whether it's 1/3 with some inaccuracy. I therefore don't know whether to use a scale factor of 3 and say the scaled result is 1, or use a scale factor of 500 and say the scaled result is 167. And that's just with 1 input, never mind a bunch of them.
With 4 inputs and allowed final tolerance of 0.0001, you could perhaps find the 10 closest rationals to each input with a certain maximum denominator, then try 10^4 different possibilities and see whether the resulting scale factor gives you any values that are too far from an integer. Brute force seems nasty, but you might a least be able to bound the search a bit as you go. Also "maximum denominator" might be expressed in terms of the primes present in the factorization, rather than just the number, since if you can find a lot of common factors among them then they'll have a smaller lcm and hence smaller deviation from integers after scaling.
[*] Not that 0.334 is an exact float value, but that sort of thing. Decimal examples are easier.
If you are talking about single precision floating point numbers, then the number can be expressed like this according to wikipedia:
From this formula you can deduce that you always get an integer if you multiply by 2127+23. (Actually, when e is 0 you have to use another formula for the special range of "subnormal" numbers so 2126+23 is sufficient. See the linked wikipedia article for details.)
To do this in code you will probably need to do some bit twiddling to extract the factors in the above formula from the bits in the floating point value. And then you will need some kind of support for unlimited size numbers to express the integer result of the scaling (e.g. BigInteger in .NET). Normal primitive types in most languages/platforms are typically limited to much smaller sizes.
It's really a problem in statistical inference combined with noise reduction. This is the method I'm going to try out soon. I'm assuming you're trying to get a regularly spaced 2-D grid but a similar method could work on a regularly spaced grid of 3 or more dimensions.
First tabulate all the differences and note that (dx,dy) and (-dx,-dy) denote the same displacement, so there's an equivalence relation. Group those differenecs that are within a pre-assigned threshold (epsilon) of one another. Epsilon should be large enough to capture measurement errors due to random noise or lack of image resolution, but small enough not to accidentally combine clusters.
Sort the clusters by their average size (dr = root(dx^2 + dy^2)).
If the original grid was, indeed, regularly spaced and generated by two independent basis vectors, then the two smallest linearly independent clusters will indicate so. The smallest cluster is the one centered on (0, 0). The next smallest cluster (dx0, dy0) has the first basis vector up to +/- sign (-dx0, -dy0) denotes the same displacement, recall.
The next smallest clusters may be linearly dependent on this (up to the threshold epsilon) by virtue of being multiples of (dx0, dy0). Find the smallest cluster which is NOT a multiple of (dx0, dy0). Call this (dx1, dy1).
Now you have enough to tag the original vectors. Group the vector, by increasing lexicographic order (x,y) > (x',y') if x > x' or x = x' and y > y'. Take the smallest (x0,y0) and assign the integer (0, 0) to it. Take all the others (x,y) and find the decomposition (x,y) = (x0,y0) + M0(x,y) (dx0, dy0) + M1(x,y) (dx1,dy1) and assign it the integers (m0(x,y),m1(x,y)) = (round(M0), round(M1)).
Now do a least-squares fit of the integers to the vectors to the equations (x,y) = (ux,uy) m0(x,y) (u0x,u0y) + m1(x,y) (u1x,u1y)
to find (ux,uy), (u0x,u0y) and (u1x,u1y). This identifies the grid.
Test this match to determine whether or not all the points are within a given threshold of this fit (maybe using the same threshold epsilon for this purpose).
The 1-D version of this same routine should also work in 1 dimension on a spectrograph to identify the fundamental frequency in a voice print. Only in this case, the assumed value for ux (which replaces (ux,uy)) is just 0 and one is only looking for a fit to the homogeneous equation x = m0(x) u0x.

Averaging a set of points on a Google Map into a smaller set

I'm displaying a small Google map on a web page using the Google Maps Static API.
I have a set of 15 co-ordinates, which I'd like to represent as points on the map.
Due to the map being fairly small (184 x 90 pixels) and the upper limit of 2000 characters on a Google Maps URL, I can't represent every point on the map.
So instead I'd like to generate a small list of co-ordinates that represents an average of the big list.
So instead of having 15 sets, I'd end up with 5 sets, who's positions approximate the positions of the 15. Say there are 3 points that are in closer proximity to each-other than to any other point on the map, those points will be collapsed into 1 point.
So I guess I'm looking for an algorithm that can do this.
Not asking anyone to spell out every step, but perhaps point me in the direction of a mathematical principle or general-purpose function for this kind of thing?
I'm sure a similar function is used in, say, graphics software, when pixellating an image.
(If I solve this I'll be sure to post my results.)
I recommend K-means clustering when you need to cluster N objects into a known number K < N of clusters, which seems to be your case. Note that one cluster may end up with a single outlier point and another with say 5 points very close to each other: that's OK, it will look closer to your original set than if you forced exactly 3 points into every cluster!-)
If you are searching for such functions/classes, have a look at MarkerClusterer and MarkerManager utility classes. MarkerClusterer closely matches the described functionality, as seen in this demo.
In general I think the area you need to search around in is "Vector Quantization". I've got an old book title Vector Quantization and Signal Compression by Allen Gersho and Robert M. Gray which provides a bunch of examples.
From memory, the Lloyd Iteration was a good algorithm for this sort of thing. It can take the input set and reduce it to a fixed sized set of points. Basically, uniformly or randomly distribute your points around the space. Map each of your inputs to the nearest quantized point. Then compute the error (e.g. sum of distances or Root-Mean-Squared). Then, for each output point, set it to the center of the set that maps to it. This will move the point and possibly even change the set that maps to it. Perform this iteratively until no changes are detected from one iteration to the next.
Hope this helps.
