Which algorithm will be required to do this? - algorithm

I have data of this form:
for x=1, y is one of {1,4,6,7,9,18,16,19}
for x=2, y is one of {1,5,7,4}
for x=3, y is one of {2,6,4,8,2}
....
for x=100, y is one of {2,7,89,4,5}
Only one of the values in each set is the correct value, the rest is random noise.
I know that the correct values describe a sinusoid function whose parameters are unknown. How can I find the correct combination of values, one from each set?
I am looking something like "travelling salesman"combinatorial optimization algorithm

You're trying to do curve fitting, for which there are several algorithms depending on the type of curve you want to fit your curve to (linear, polynomial, etc.). I have no idea whether there is a specific algorithm for sinusoidal curves (Fourier approximations), but my first idea would be to use a polynomial fitting algorithm with a polynomial approximation of the sine.
I wonder whether you need to do this in the course of another larger program, or whether you are trying to do this task on its own. If so, then you'd be much better off using a statistical package, my preferred one being R. It allows you to import your data and fit curves and draw graphs in just a few lines, and you could also use R in batch-mode to call it from a script or even a program (this is what I tend to do).

It depends on what you mean by "exactly", and what you know beforehand. If you know the frequency w, and that the sinusoid is unbiased, you have an equation
a cos(w * x) + b sin(w * x)
with two (x,y) points at different x values you can find a and b, and then check the generated curve against all the other points. Choose the two x values with the smallest number of y observations and try it for all the y's. If there is a bias, i.e. your equation is
a cos(w * x) + b sin(w * x) + c
You need to look at three x values.
If you do not know the frequency, you can try the same technique, unfortunately the solutions may not be unique, there may be more than one w that fits.
Edit As I understand your problem, you have a real y value for each x and a bunch of incorrect ones. You want to find the real values. The best way to do this is to fit curves through a small number of points and check to see if the curve fits some y value in the other sets.
If not all the x values have valid y values then the same technique applies, but you need to look at a much larger set of pairs, triples or quadruples (essentially every pair, triple, or quad of points with different y values)
If your problem is something else, and I suspect it is, please specify it.
Define sinusoid. Most people take that to mean a function of the form a cos(w * x) + b sin(w * x) + c. If you mean something different, specify it.
2 Specify exactly what success looks like. An example with say 10 points instead of 100 would be nice.
It is extremely unclear what this has to do with combinatorial optimization.

Sinusoidal equations are so general that if you take any random value of all y's these values can be fitted in sinusoidal function unless you give conditions eg. Frequency<100 or all parameters are integers,its not possible to diffrentiate noise and data theorotically so work on finding such conditions from your data source/experiment first.

By sinusoidal, do you mean a function that is increasing for n steps, then decreasing for n steps, etc.? If so, you you can model your data as a sequence of nodes connected by up-links and down-links. For each node (possible value of y), record the length and end-value of chains of only ascending or descending links (there will be multiple chain per node). Then you scan for consecutive runs of equal length and opposite direction, modulo some initial offset.

Related

Parametric Scoring Function or Algorithm

I'm trying to come up with a way to arrive at a "score" based on an integer number of "points" that is adjustable using a small number (3-5?) of parameters. Preferably it would be simple enough to reasonably enter as a function/calculation in a spreadsheet for tuning the parameters by the "designer" (not a programmer or mathematician). The first point has the most value and eventually additional points have a fixed or nearly fixed value. The transition from the initial slope of point value to final slope would be smooth. See example shapes below.
Points values are always positive integers (0 pts = 0 score)
At some point, curve is linear (or nearly), all additional points have fixed value
Preferably, parameters are understandable to a lay person, e.g.: "smoothness of the curve", "value of first point", "place where the additional value of points is fixed", etc
For parameters, an example of something ideal would be:
Value of first point: 10
Value of point #: 3 is: 5
Minimum value of additional points: 0.75
Exact shape of curve not too important as long as the corner can be more smooth or more sharp.
This is not for a game but more of a rating system with multiple components (several of which might use this kind of scale) will be combined.
This seems like a non-traditional kind of question for SO/SE. I've done mostly financial software in my career, I'm hoping there some domain wisdom for this kind of thing I can tap into.
Implementation of Prune's Solution:
Google Sheet
Parameters:
Initial value (a)
Second value (b)
Minimum value (z)
Your decay ratio is b/a. It's simple from here: iterate through your values, applying the decay at each step, until you "peg" at the minimum:
x[n] = max( z, a * (b/a)^n )
// Take the larger of the computed "decayed" value,
// and the specified minimum.
The sequence x is your values list.
You can also truncate intermediate results if you want integers up to a certain point. Just apply the floor function to each computed value, but still allow z to override that if it gets too small.
Is that good enough? I know there's a discontinuity in the derivative function, which will be noticeable if the minimum and decay aren't pleasantly aligned. You can adjust this with a relative decay, translating the exponential decay curve from y = 0 to z.
base = z
diff = a-z
ratio = (b-z) / diff
x[n] = z + diff * ratio^n
In this case, you don't need the max function, since the decay has a natural asymptote of 0.

Querying large amount of multidimensional points in R^N

I'm looking at listing/counting the number of integer points in R^N (in the sense of Euclidean space), within certain geometric shapes, such as circles and ellipses, subject to various conditions, for small N. By this I mean that N < 5, and the conditions are polynomial inequalities.
As a concrete example, take R^2. One of the queries I might like to run is "How many integer points are there in an ellipse (parameterised by x = 4 cos(theta), y = 3 sin(theta) ), such that y * x^2 - x * y = 4?"
I could implement this in Haskell like this:
ghci> let latticePoints = [(x,y) | x <- [-4..4], y <-[-3..3], 9*x^2 + 16*y^2 <= 144, y*x^2 - x*y == 4]
and then I would have:
ghci> latticePoints
[(-1,2),(2,2)]
Which indeed answers my question.
Of course, this is a very naive implementation, but it demonstrates what I'm trying to achieve. (I'm also only using Haskell here as I feel it most directly expresses the underlying mathematical ideas.)
Now, if I had something like "In R^5, how many integer points are there in a 4-sphere of radius 1,000,000, satisfying x^3 - y + z = 20?", I might try something like this:
ghci> :{
Prelude| let latticePoints2 = [(x,y,z,w,v) | x <-[-1000..1000], y <- [-1000..1000],
Prelude| z <- [-1000..1000], w <- [-1000..1000], v <-[1000..1000],
Prelude| x^2 + y^2 + z^2 + w^2 + v^2 <= 1000000, x^3 - y + z == 20]
Prelude| :}
so if I now type:
ghci> latticePoints2
Not much will happen...
I imagine the issue is because it's effectively looping through 2000^5 (32 quadrillion!) points, and it's clearly unreasonably of me to expect my computer to deal with that. I can't imagine doing a similar implementation in Python or C would help matters much either.
So if I want to tackle a large number of points in such a way, what would be my best bet in terms of general algorithms or data structures? I saw in another thread (Count number of points inside a circle fast), someone mention quadtrees as well as K-D trees, but I wouldn't know how to implement those, nor how to appropriately query one once it was implemented.
I'm aware some of these numbers are quite large, but the biggest circles, ellipses, etc I'd be dealing with are of radius 10^12 (one trillion), and I certainly wouldn't need to deal with R^N with N > 5. If the above is NOT possible, I'd be interested to know what sort of numbers WOULD be feasible?
There is no general way to solve this problem. The problem of finding integer solutions to algebraic equations (equations of this sort are called Diophantine equations) is known to be undecidable. Apparently, you can write equations of this sort such that solving the equations ends up being equivalent to deciding whether a given Turing machine will halt on a given input.
In the examples you've listed, you've always constrained the points to be on some well-behaved shape, like an ellipse or a sphere. While this particular class of problem is definitely decidable, I'm skeptical that you can efficiently solve these problems for more complex curves. I suspect that it would be possible to construct short formulas that describe curves that are mostly empty but have a huge bounding box.
If you happen to know more about the structure of the problems you're trying to solve - for example, if you're always dealing with spheres or ellipses - then you may be able to find fast algorithms for this problem. In general, though, I don't think you'll be able to do much better than brute force. I'm willing to admit that (and in fact, hopeful that) someone will prove me wrong about this, though.
The idea behind the kd-tree method is that you recursive subdivide the search box and try to rule out whole boxes at a time. Given the current box, use some method that either (a) declares that all points in the box match the predicate (b) declares that no points in the box match the predicate (c) makes no declaration (one possibility, which may be particularly convenient in Haskell: interval arithmetic). On (c), cut the box in half (say along the longest dimension) and recursively count in the halves. Obviously the method can choose (c) all the time, which devolves to brute force; the goal here is to do (a) or (b) as much as possible.
The performance of this method is very dependent on how it's instantiated. Try it -- it shouldn't be more than a couple dozen lines of code.
For nicely connected region, assuming your shape is significantly smaller than your containing search space, and given a seed point, you could do a growth/building algorithm:
Given a seed point:
Push seed point into test-queue
while test-queue has items:
Pop item from test-queue
If item tests to be within region (eg using a callback function):
Add item to inside-set
for each neighbour point (generated on the fly):
if neighbour not in outside-set and neighbour not in inside-set:
Add neighbour to test-queue
else:
Add item to outside-set
return inside-set
The trick is to find an initial seed point that is inside the function.
Make sure your set implementation gives O(1) duplicate checking. This method will eventually break down with large numbers of dimensions as the surface area exceeds the volume, but for 5 dimensions should be fine.

I have a function f(w,x,y,z) and a target value A, how can I discover values for w,x,y,z that produce A?

So I have a function that takes four numerical arguments and produces a numerical argument.
f(w,x,y,z) --> A
If I have the function f and a target result A, is there an iterative method for discovering parameters w,x,y,z that produce a given number A?
If it helps, my function f is a quintic bezier where most of the parameters are determined. I have isolated just these four that are required to fit the value A.
Q(t)=R(1−t)^5+5S(1−t)^4*t+10T(1−t)^3*t^2+10U(1−t)^2*t^3+5V(1−t)t^4+Wt^5
R,S,T,U,V,W are vectors where R and W are known, I have isolated only a single element in each of S,T,U,V that vary as parameters.
The set of solutions of the equation f(w,x,y,z)=A (where all of w, x, y, z and A are scalars) is, in general, a 3 dimensional manifold (surface) in the 4-dimensional space R^4 of (w,x,y,z). I.e., the solution is massively non-unique.
Now, if f is simple enough for you to compute its derivative, you can use the Newton's method to find a root: the gradient is the direction of the fastest change of the function, so you go there.
Specifically, let X_0=(w_0,x_0,y_0,z_0) be your initial approximation of a solution and let G=f'(X_0) be the gradient at X_0.
Then f(X_0+h)=f(X_0)+(G,h)+O(|h|^2) (where (a,b) is the dot product).
Let h=a*G, and solve A=f(X_0)+a*|G|^2 to get a=(A-f(X_0))/|G|^2 (if G=0, change X_0) and X_1=X_0+a*G. If f(X_1) is close enough to A, you are done, otherwise proceed to compute f'(X_1) &c.
If you cannot compute f', you can play with many other methods.
If you can impose 3 (or more) additional equations that you know (or suspect) must be true for your 4-variable solution that gives target value A, then you can try applying Newton's method for solving a system of k equations with k unknowns. Otherwise, without a deeper understanding of the structure of the function you are trying to make equal to A, the only general type of technique I'm aware of that's easy to implement is to simply define the error function as g(w,x,y,z) = |f(w,x,y,z) - A| and search for a minimum of g. Typically the "minimum" found will be a local minimum, so it may require many restarts of the minimization problem with different starting values for your parameters to actually find a solution that gives a local minimum you want of g = 0. This is very easy to implement and try in a few lines e.g. in MATLAB using fminsearch

"Squares" Logic Riddle Solution in Prolog

"Squares"
Input:
Board size m × n (m, n ∈ {1,...,32}), list of triples (i, j, k), where i ∈ {1,...,m}, j ∈ {1,...,n}, k ∈ {0,...,(n-2)·(m-2)}) describing fields with numbers.
Output:
List of triples (i, j, d) showing solved riddle. Triple (i, j, d) describes square with opposite vertices at coordinates (i, j) and (i+d, j+d).
Example:
Input:
7.
7. [(3,3,0), (3,5,0), (4,4,1), (5,1,0), (6,6,3)].
Output:
[(1,1,2), (1,5,2), (2,2,4), (5,1,2), (4,4,3)]
Image:
Explanation:
I have to find placement for x squares(x = fields with numbers). On the
circuit of the each square, exactly at one of his corners should be
only one digit equal to amount of digits inside the square. Sides of
squares can't cover each other, same as corners. Square lines are
"filling fields" so (0,0,1) square fills 4 fields and have 0 fields inside.
I need a little help coding solution to this riddle in Prolog. Could someone direct me in the right direction? What predicates, rules I should use.
Since Naimads life is worth saving(!), here is some more directive help.
Programmers tend to work out their projects either in bottomp-up (starting from "lower" levels of functionality, building up to a complete application) or top-down (starting with a high level "shell" of functionality that provides input/output, but with the solution processing stubbed in for later refinement).
Either way I can recommend an approach for Prolog progrmming that has been championed by the Ruby programming community: Test Driven Development.
Let's pick a couple of functional requirement, one high and one low, and discuss how this bit of philosophy might help.
The high level predicate in Prolog is one which takes all the input and output arguments and semantically defines the problem. Note that we can write down such a predicate without giving any care as to how the solution process will get implemented:
/* squaresRiddle/4 takes inputs of Height and Width of a "board" with
a list of (nonnegative) integers located at cells of the board, and
produces a corresponding list of squares having one corner at each
of those locations "boxing in" exactly the specified counts of them
in their interiors. No edges can overlap nor corners coincide. */
squaresRiddle(Height,Width,BoxedCountList, OutputSquaresList).
So far we have just framed the problem. The useful progress is given by the comment (clearly defining the inputs and outputs at a high level), and the code at this point just guarantees "success" with whatever arguments are passed. So a "test" of this code:
?= squaresRiddle(7,7,ListIn,ListOut).
won't produce anything except free variables.
Next let's give some thought to how the input list and output list ought to be represented. In the Question it is proposed to use triples of integers as entries of these lists. I would recommend instead using a named functor, because the semantics of those input triples (giving an x,y coordinate of a cell with a count) are subtly different from that of the output triples (giving an x,y coordinate of upper left corner and an "extent" (height&width) of the square). Note in particular that an output square may be described by a different corner than the one in the corresponding input item, although the one in the input item needs to be one of the four corners of the output square.
So it tends to make the code more readable if these constructs are distinguished by simple functor names, e.g. BoxedCountList = [box(3,3,0),box(3,5,0),box(4,4,1),box(5,1,0),box(6,6,3)] vs. OutputSquaresList = [sqr(1,1,2), sqr(1,5,2), sqr(2,2,4), sqr(5,1,2), sqr(4,4,3)].
Let's give a little thought to how the search for solutions will proceed. The output items are in 1-to-1 correspondence with the input items, so the lists will have equal length in the end. However choosing the kth output item depends not only on the kth input item, but on all the input items (since we count how many are in the interior of the output square) and on the preceding output items (since we disallow corners that meet and edges that overlap).
There are several ways to manage the flow of decision making consistent with this requirement, and it seems to be the crux of this exercise to pick one approach and make it work. If you are familiar with difference lists or accumulators, you have a head start on ways to do it.
Now let's switch to discussing some lower level functionality. Given an input box there are potentially a number of output sqr that could correspond to it. Given a positive "extent" for the output, the X,Y coordinates of the input can be met with any of the four corners of the output square. While more checking is required to verify the solution, this aspect seems a good place to start testing:
/* check that coordinates X,Y are indeed a corner of candidate square */
hasCorner(sqr(SX,SY,Extent),X,Y) :-
(SX is X ; SX is X + Extent),
(SY is Y ; SY is Y + Extent).
How would we make use of this test? Well we could generate all possible squares (possibly limiting ourselves to those that fit inside the height and width of our "board"), and then check whether any have the required corner. This might be fairly inefficient. A better way (as far as efficiency goes) would use the corner to generate possible squares (by extending from the corner by Extent in any of four directions) subject to the parameters of board size. Implementation of this "optimization" is left as an exercise for the reader.

Biasing random number generator to some integer n with deviation b

Given an integer range R = [a, b] (where a >=0 and b <= 100), a bias integer n in R, and some deviation b, what formula can I use to skew a random number generator towards n?
So for example if I had the numbers 1 through 10 inclusively and I don't specify a bias number, then I should in theory have equal chances of randomly drawing one of them.
But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.
And if I specify a deviation of say 2 in addition to the bias number, then the number generator should be drawing from 1 through 5 a more frequently than 6 through 10.
What algorithm can I use to achieve this?
I'm using Ruby if it makes it any easier/harder.
i think the simplest route is to sample from a normal (aka gaussian) distribution with the properties you want, and then transform the result:
generate a normal value with given mean and sd
round to nearest integer
if outside given range (normal can generate values over the entire range from -infinity to -infinity), discard and repeat
if you need to generate a normal from a uniform the simplest transform is "box-muller".
there are some details you may need to worry about. in particular, box muller is limited in range (it doesn't generate extremely unlikely values, ever). so if you give a very narrow range then you will never get the full range of values. other transforms are not as limited - i'd suggest using whatever ruby provides (look for "normal" or "gaussian").
also, be careful to round the value. 2.6 to 3.4 should all become 3, for example. if you simply discard the decimal (so 3.0 to 3.999 become 3) you will be biased.
if you're really concerned with efficiency, and don't want to discard values, you can simply invent something. one way to cheat is to mix a uniform variate with the bias value (so 9/10 times generate the uniform, 1/10 times return 3, say). in some cases, where you only care about average of the sample, that can be sufficient.
For the first part "But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.", a very easy solution:
def randBias(a,b,biasedNum=None, bias=0):
x = random.randint(a, b+bias)
if x<= b:
return x
else:
return biasedNum
For the second part, I would say it depends on the task. In a case where you need to generate a billion random numbers from the same distribution, I would calculate the probability of the numbers explicitly and use weighted random number generator (see Random weighted choice )
If you want an unimodal distribution (where the bias is just concentrated in one particular value of your range of number, for example, as you state 3), then the answer provided by andrew cooke is good---mostly because it allows you to fine tune the deviation very accurately.
If however you wish to make several biases---for instance you want a trimodal distribution, with the numbers a, (a+b)/2 and b more frequently than others, than you would do well to implement weighted random selection.
A simple algorithm for this was given in a recent question on StackOverflow; it's complexity is linear. Using such an algorithm, you would simply maintain a list, initial containing {a, a+1, a+2,..., b-1, b} (so of size b-a+1), and when you want to add a bias towards X, you would several copies of X to the list---depending on how much you want to bias. Then you pick a random item from the list.
If you want something more efficient, the most efficient method is called the "Alias method" which was implemented very clearly in Python by Denis Bzowy; once your array has been preprocessed, it runs in constant time (but that means that you can't update the biases anymore once you've done the preprocessing---or you would to reprocess the table).
The downside with both techniques is that unlike with the Gaussian distribution, biasing towards X, will not bias also somewhat towards X-1 and X+1. To simulate this effect you would have to do something such as
def addBias(x, L):
L = concatList(L, [x, x, x, x, x])
L = concatList(L, [x+2])
L = concatList(L, [x+1, x+1])
L = concatList(L, [x-1,x-1,x-1])
L = concatList(L, [x-2])

Resources