How to combine various measures into a single measure - algorithm

I have several measures:
Profit and loss (PNL).
Win to loss ratio (W2L).
Avg gain to drawdown ratio (AG2AD).
Max gain to maximum drawdown ratio (MG2MD).
Number of consecutive gains to consecutive losses ratio (NCG2NCL).
If there were only 3 measures (A, B, C), then I could represent the "total" measure as a magnitude of a 3D vector:
R = SQRT(A^2 + B^2 + C^2)
If I want to combine those 5 measures into a single value, would it make sense to represent them as the magnitude of a 5D vector? Is there a way to put more "weight" on certain measures, such as the PNL? Is there a better way to combine them?
Update:
I'm trying to write a function (in C#) that takes in 5 measures and represents them in a linear manner so I can collapse the multidimensional values into a single linear value. The point of this is that it will allow me to only use one variable (save memory) and it will provide a fast method of comparison between two sets of measures. Almost like building a hash value, but each hash can be used for comparison (i.e. >, <, ==).
The statistical significance of the values is the same as the order they're listed: PNL is the most significant while NCG2NCL is the least significant.

If I want to combine those 5 measures into a single value, would it make sense to represent them as the magnitude of a 5D vector?
Absolutely, if result suits you.
Is there a way to put more "weight" on certain measures, such as the PNL?
You can introduce constant weights
SQRT(wa*A^2 + wb*B^2 + wb*C^2)
Is there a better way to combine them?
That depends on your requirements. In particular, there's nothing wrong with using simple sum |A| + |B| + |C|, that would favour 'average' properties better. I.e., with your formula (0, 0, 9) gives much better total than (3, 3, 3), while with the simple sum they would be equivalent.
Generally speaking Oli is right: you'll have to make the decision yourself, no algorithm book can evaluate the requirements for you.

Combining measures into a single value is risky at best. However you do it you loose information. If I have 3 oranges, an apple and a couple of slices of bread I can combine them in various ways:
Sum (3 + 1 + 2 ) = 6
Weighted sum ( .5 * 3 + 2 * 1 + 1.5 * 2) = 6.5
SQRT( 3 ^ 2 + 1 ^ 2 + 2 ^ 2) = SQRT ( 15 ) ~= 3.8
SQRT( 3 ^ 2 + 2 * 1 ^ 2 + 2 ^ 2) = SQRT (16) = 4
and on and on.
Whichever result I get is less meaningful than the first. Through in a steak and a glass of water and the value becomes even less meaningful. The result is always some measure of serving of food.
You need to figure out how to convert your various values into values with equivelent scales (linear or log) and equivalent value (1 X ~= 1 Y ~= 1Z). At that point a simple sum or product may be sufficient. In your case, it appears you are trying to combine various measure of financial return. Some of the measures you are using are not highly comparable.

As others have noted, there are an infinite number of ways of combining values. You've tagged the question machine-learning and artificial-intelligence, which suggests you might want to find the optimum way of combining them? Eg. come up with a "goodness" metric, and try to model this from the others. Then there are a range of machine learning algorithms - eg. a Bayesian Model would be a good start: Fast, generally performs well if not necessarily the best.

I would suggest implementing this using principal component analysis. That will give you the weights you need for your coefficients. You can either do this via a stat package or use a packaged C# function.
-Ralph Winters

Related

Top k precision

I have a database of documents in which I perform searches. For each and every search there are n positives. Now, if I evaluate the performance of the search by precision#k and recall#k, things work out just fine for the latter:
recall#k = true positives / positives = true positives / n
The amount of true positives is in the range [0, n] so recall#k is in range [0, 1] - perfect.
Things get weird concerning precision#k, however. If I calculate
precision#k = tp / (tp + fp) = tp / k
precision#k is in the range [0, n/k] which doesn't make too much sense to me. Think of the edge case n=1 for example. One cannot increase the tp beacuse there are just no more than n positives and one cannot decrease k either because, well, it's called precision#k, isn't it?
What am I getting wrong?
An example of what I'm talking about can be found in [1] figure 8b. What you can see there is a precision-recall-curve for the top 1..200 query results. Even though there are less than 200 positives in the database the precision is quite high.
[1] https://www.computer.org/csdl/pds/api/csdl/proceedings/download-article/19skfc3ZfKo/pdf
Since precision#k is computed as #num_relevant/k, its max could be 1 (which happens if all the k top-ranked documents in your retrieved list is relevant).
Your argument is correct in the sense that if the #relevant_docs is less than k then you're being wrongly penalized by the P#k metric because in that case even with a perfect retrieval you don't score 1 on the metric.
A standard solution is thus to take both into account and compute precision values not at arbitrary values of k but rather at recall points, i.e. at those positions in your ranked list where a relevant document is retrieved. You would then eventually divide the sum by the number of relevant documents. This measure is called the mean average precision* (MAP). An example to compute MAP follows.
Let's say that you retrieved 10 documents out of which 2 are relevant at ranks 2 and 5 (and there're 3 relevant docs in total - one of which is not retrieved).
You compute precision#k at the recall points (values of k = 2 and 5).
This gives:
1/2 (at position 2, one is relevant out of 2) +
2/5 (at position 5, one 2 are relevant out of 5)
and then you divide this number by 3 (total number of known rel docs). The last step favours systems that achieve high recall whereas the cut-off point based precisions favour systems that retrieve docs towards top ranks.
Note that a system A which retrieves the relevant docs at better ranks and retrieves a higher number of rel docs would score better than a system which fails to cater to either or both the cases.
Also note that you'll score a perfect 1 on this metric if you retrieve the 3 rel docs at the top 3 ranks out of 10 that you retrieved in total (check this), which addresses your concern that motivated this question.

Guidance on Algorithmic Thinking (4 fours equation)

I recently saw a logic/math problem called 4 Fours where you need to use 4 fours and a range of operators to create equations that equal to all the integers 0 to N.
How would you go about writing an elegant algorithm to come up with say the first 100...
I started by creating base calculations like 4-4, 4+4, 4x4, 4/4, 4!, Sqrt 4 and made these values integers.
However, I realized that this was going to be a brute force method testing the combinations to see if they equal, 0 then 1, then 2, then 3 etc...
I then thought of finding all possible combinations of the above values, checking that the result was less than 100 and filling an array and then sorting it...again inefficient because it may find 1000s of numbers over 100
Any help on how to approach a problem like this would be helpful...not actual code...but how to think through this problem
Thanks!!
This is an interesting problem. There are a couple of different things going on here. One issue is how to describe the sequence of operations and operands that go into an arithmetic expression. Using parentheses to establish order of operations is quite messy, so instead I suggest thinking of an expression as a stack of operations and operands, like - 4 4 for 4-4, + 4 * 4 4 for (4*4)+4, * 4 + 4 4 for (4+4)*4, etc. It's like Reverse Polish Notation on an HP calculator. Then you don't have to worry about parentheses, having the data structure for expressions will help below when we build up larger and larger expressions.
Now we turn to the algorithm for building expressions. Dynamic programming doesn't work in this situation, in my opinion, because (for example) to construct some numbers in the range from 0 to 100 you might have to go outside of that range temporarily.
A better way to conceptualize the problem, I think, is as breadth first search (BFS) on a graph. Technically, the graph would be infinite (all positive integers, or all integers, or all rational numbers, depending on how elaborate you want to get) but at any time you'd only have a finite portion of the graph. A sparse graph data structure would be appropriate.
Each node (number) on the graph would have a weight associated with it, the minimum number of 4's needed to reach that node, and also the expression which achieves that result. Initially, you would start with just the node (4), with the number 1 associated with it (it takes one 4 to make 4) and the simple expression "4". You can also throw in (44) with weight 2, (444) with weight 3, and (4444) with weight 4.
To build up larger expressions, apply all the different operations you have to those initial node. For example, unary negation, factorial, square root; binary operations like * 4 at the bottom of your stack for multiply by 4, + 4, - 4, / 4, ^ 4 for exponentiation, and also + 44, etc. The weight of an operation is the number of 4s required for that operation; unary operations would have weight 0, + 4 would have weight 1, * 44 would have weight 2, etc. You would add the weight of the operation to the weight of the node on which it operates to get a new weight, so for example + 4 acting on node (44) with weight 2 and expression "44" would result in a new node (48) with weight 3 and expression "+ 4 44". If the result for 48 has better weight than the existing result for 48, substitute that new node for (48).
You will have to use some sense when applying functions. factorial(4444) would be a very large number; it would be wise to set a domain for your factorial function which would prevent the result from getting too big or going out of bounds. The same with functions like / 4; if you don't want to deal with fractions, say that non-multiples of 4 are outside of the domain of / 4 and don't apply the operator in that case.
The resulting algorithm is very much like Dijkstra's algorithm for calculating distance in a graph, though not exactly the same.
I think that the brute force solution here is the only way to go.
The reasoning behind this is that each number has a different way to get to it, and getting to a certain x might have nothing to do with getting to x+1.
Having said that, you might be able to make the brute force solution a bit quicker by using obvious moves where possible.
For instance, if I got to 20 using "4" three times (4*4+4), it is obvious to get to 16, 24 and 80. Holding an array of 100 bits and marking the numbers reached
Similar to subset sum problem, it can be solved using Dynamic Programming (DP) by following the recursive formulas:
D(0,0) = true
D(x,0) = false x!=0
D(x,i) = D(x-4,i-1) OR D(x+4,i-1) OR D(x*4,i-1) OR D(x/4,i-1)
By computing the above using DP technique, it is easy to find out which numbers can be produced using these 4's, and by walking back the solution, you can find out how each number was built.
The advantage of this method (when implemented with DP) is you do not recalculate multiple values more than once. I am not sure it will actually be effective for 4 4's, but I believe theoretically it could be a significant improvement for a less restricted generalization of this problem.
This answer is just an extension of Amit's.
Essentially, your operations are:
Apply a unary operator to an existing expression to get a new expression (this does not use any additional 4s)
Apply a binary operator to two existing expressions to get a new expression (the new expression has number of 4s equal to the sum of the two input expressions)
For each n from 1..4, calculate Expressions(n) - a List of (Expression, Value) pairs as follows:
(For a fixed n, only store 1 expression in the list that evaluates to any given value)
Initialise the list with the concatenation of n 4s (i.e. 4, 44, 444, 4444)
For i from 1 to n-1, and each permitted binary operator op, add an expression (and value) e1 op e2 where e1 is in Expressions(i) and e2 is in Expressions(n-i)
Repeatedly apply unary operators to the expressions/values calculated so far in steps 1-3. When to stop (applying 3 recursively) is a little vague, certainly stop if an iteration produces no new values. Potentially limit the magnitude of the values you allow, or the size of the expressions.
Example unary operators are !, Sqrt, -, etc. Example binary operators are +-*/^ etc. You can easily extend this approach to operators with more arguments if permitted.
You could do something a bit cleverer in terms of step 3 never ending for any given n. The simple way (described above) does not start calculating Expressions(i) until Expressions(j) is complete for all j < i. This requires that we know when to stop. The alternative is to build Expressions of a certain maximum length for each n, then if you need to (because you haven't found certain values), extend the maximum length in an outer loop.

How to generate a function that will algebraically encode a sequence?

Is there any way to generate a function F that, given a sequence, such as:
seq = [1 2 4 3 0 5 4 2 6]
Then F(seq) will return a function that generates that sequence? That is,
F(seq)(0) = 1
F(seq)(1) = 2
F(seq)(2) = 4
... and so on
Also, if it is, what is the function of lowest complexity that does so, and what is the complexity of the generated functions?
EDIT
It seems like I'm not clear, so I'll try to exemplify:
F(seq([1 3 5 7 9])}
# returns something like:
F(x) = 1 + 2*x
# limited to the domain x ∈ [1 2 3 4 5]
In other words, I want to compute a function that can be used to algebraically, using mathematical functions such as +, *, etc, restore a sequence of integers, even if you cleaned it from memory. I don't know if it is possible, but, as one could easily code an approximation for such function for trivial cases, I'm wondering how far it goes and if there is some actual research concerning that.
EDIT 2 Answering another question, I'm only interested in sequences of integers - if that is important.
Please let me know if it is still not clear!
Well, if you just want to know a function with "+ and *", that is to say, a polynomial, you can go and check Wikipedia for Lagrange Polynomial (https://en.wikipedia.org/wiki/Lagrange_polynomial).
It gives you the lowest degree polynomial that encodes your sequence.
Unfortenately, you probably won't be able to store less than before, as the probability of the polynom being of degree d=n-1 where n is the size of the array is very high with random integers.
Furthermore, you will have to store rational numbers instead of integers.
And finally, the access to any number of the array will be in O(d) (using Horner algorithm for polynomial evaluation), in comparison to O(1) with the array.
Nevertheless, if you know that your sequences may be very simple and very long, it might be an option.
If the sequence comes from a polynomial with a low degree, an easy way to find the unique polynomial that generates it is using Newton's series. Constructing the polynomial for a n numbers has O(n²) time complexity, and evaluating it has O(n).
In Newton's series the polynomial is expressed in terms of x, x(x-1), x(x-1)(x-2) etc instead of the more familiar x, x², x³. To get the coefficients, basically you compute the differences between subsequent items in the sequence, then the differences between the differences, until only one is left or you get a sequence of all zeros. The numbers you get along the bottom, divided by factorial of the degree of the term, give you the coefficients. For example with the first sequence you get these differences:
1 2 4 3 0 5 4 2 6
1 2 -1 -3 5 -1 -2 4
1 -3 -2 8 -6 -1 6
-4 1 10 -14 5 7
5 9 -24 19 2
4 -33 43 -17
-37 76 -60
113 -136
-249
The polynomial that generates this sequence is therefore:
f(x) = 1 + x(1 + (x-1)(1/2 + (x-2)(-4/6 + (x-3)(5/24 + (x-4)(4/120
+ (x-5)(-37/720 + (x-6)(113/5040 + (x-7)(-249/40320))))))))
It's the same polynomial you get using other techniques, like Lagrange interpolation; this is just the easiest way to generate it as you get the coefficients for a polynomial form that can be evaluated with Horner's method, unlike the Lagrange form for example.
There is no magic if you say that the sequence could be completely random. And yet, it is always possible, but won't save you memory. Any interpolation method requires the same amount of memory in the worst case. Because, if it didn't, it would be possible to compress everything to a single bit.
On the other hand, it is sometimes possible to use a brute force, some heuristics (like genetic algorithms), or numerical methods to reproduce some kind of mathematical expression having a specified type, but good luck with that :)
Just use some archiving tools instead in order to save memory usage.
I think it will be useful for you to read about this: http://en.wikipedia.org/wiki/Entropy_(information_theory)

Segmented Sieve of Atkin, possible?

I am aware of the fact that the Sieve of Eratosthenes can be implemented so that it finds primes continuosly without an upper bound (the segmented sieve).
My question is, could the Sieve of Atkin/Bernstein be implemented in the same way?
Related question: C#: How to make Sieve of Atkin incremental
However the related question has only 1 answer, which says "It's impossible for all sieves", which is obviously incorrect.
Atkin/Bernstein give a segmented version in Section 5 of their original paper. Presumably Bernstein's primegen program uses that method.
In fact, one can implement an unbounded Sieve of Atkin (SoA) not using segmentation at all as I have done here in F#. Note that this is a pure functional version that doesn't even use arrays to combine the solutions of the quadratic equations and the squaresfree filter and thus is considerably slower than a more imperative approach.
Berstein's optimizations using look up tables for optimum 32-bit ranges would make the code extremely complex and not suitable for presentation here, but it would be quite easy to adapt my F# code so that the sequences start at a set lower limit and are used only over a range in order to implement a segmented version, and/or applying the same techniques to a more imperative approach using arrays.
Note that even Berstein's implementation of the SoA isn't really faster than the Sieve of Eratosthenes with all possible optimizations as per Kim Walisch's primesieve but is only faster than an equivalently optimized version of the Sieve of Eratosthenes for the selected range of numbers as per his implementation.
EDIT_ADD: For those who do not want to wade through Berstein's pseudo-code and C code, I am adding to this answer to add a pseudo-code method to use the SoA over a range from LOW to HIGH where the delta from LOW to HIGH + 1 might be constrained to an even modulo 60 in order to use the modulo (and potential bit packing to only the entries on the 2,3,5 wheel) optimizations.
This is based on a possible implementation using the SoA quadratics of (4*x^2 + y^), (3*x^2 + y^2), and (3*x^2 -y^2) to be expressed as sequences of numbers with the x value for each sequence fixed to values between one and SQRT((HIGH - 1) / 4), SQRT((HIGH - 1) / 3), and solving the quadratic for 2*x^2 + 2*x - HIGH - 1 = 0 for x = (SQRT(1 + 2 * (HIGH + 1)) - 1) / 2, respectively, with the sequences expressed in my F# code as per the top link. Optimizations to the sequences there use that when sieving for only odd composites, for the "4x" sequences, the y values need only be odd and that the "3x" sequences need only use odd values of y when x is even and vice versa. Further optimization reduce the number of solutions to the quadratic equations (= elements in the sequences) by observing that the modulo patterns over the above sequences repeat over very small ranges of x and also repeat over ranges of y of only 30, which is used in the Berstein code but not (yet) implemented in my F# code.
I also do not include the well known optimizations that could be applied to the prime "squares free" culling to use wheel factorization and the calculations for the starting segment address as I use in my implementations of a segmented SoE.
So for purposes of calculating the sequence starting segment addresses for the "4x", "3x+", and "3x-" (or with "3x+" and "3x-" combined as I do in the F# code), and having calculated the ranges of x for each as per the above, the pseudo-code is as follows:
Calculate the range LOW - FIRST_ELEMENT, where FIRST_ELEMENT is with the lowest applicable value of y for each given value of x or y = x - 1 for the case of the "3x-" sequence.
For the job of calculating how many elements are in this range, this boils down to the question of how many of (y1)^2 + (y2)^2 + (y3)^2... there are where each y number is separated by two, to produce even or odd 'y's as required. As usual in square sequence analysis, we observe that differences between squares have a constant increasing increment as in delta(9 - 1) is 8, delta(25 - 9) is 16 for an increase of 8, delta (49 - 25) is 24 for a further increase of 8, etcetera. so that for n elements the last increment is 8 * n for this example. Expressing the sequence of elements using this, we get it is one (or whatever one chooses as the first element) plus eight times the sequence of something like (1 + 2 + 3 + ...+ n). Now standard reduction of linear sequences applies where this sum is (n + 1) * n / 2 or n^2/2 + n/2. This we can solve for how many n elements there are in the range by solving the quadratic equation n^2/2 + n/2 - range = 0 or n = (SQRT(8*range + 1) - 1) / 2.
Now, if FIRST_ELEMENT + 4 * (n + 1) * n does not equal LOW as the starting address, add one to n and use FIRST_ELEMENT + 4 * (n + 2) * (n + 1) as the starting address. If one uses further optimizations to apply wheel factorization culling to the sequence pattern, look up table arrays can be used to look up the closest value of used n that satisfies the conditions.
The modulus 12 or 60 of the starting element can be calculated directly or can be produced by use of look up tables based on the repeating nature of the modulo sequences.
Each sequence is then used to toggle the composite states up to the HIGH limit. If the additional logic is added to the sequences to jump values between only the applicable elements per sequence, no further use of modulo conditions is necessary.
The above is done for every "4x" sequence followed by the "3x+" and "3x-" sequences (or combine "3x+" and "3x-" into just one set of "3x" sequences) up to the x limits as calculated earlier or as tested per loop.
And there you have it: given an appropriate method of dividing the sieve range into segments, which is best used as fixed sizes that are related to the CPU cache sizes for best memory access efficiency, a method of segmenting the SoA just as used by Bernstein but somewhat simpler in expression as it mentions but does not combine the modulo operations and bit packing.

Implementation details of a Bayesian classifier

I've implemented a simple Bayesian classifier, but I'm running into some overflow problems when using it on non-trivial amounts of data.
One strategy I tried in order to keep the numbers small, but still exact, was to keep reducing the numerator and denominator with the greatest common divisor for every part of the equation. This, however, only works when they have a common divisor...
Note, the problem goes both ways, when I keep the denominators and numerators separate for most of the calculation I struggle with integer overflow, when I do most calculations on the fly, using double arithmetic, I'm met with the various problems/limits that really small double values have (as defined by IEEE 754).
As I'm sure some of you here have implemented this algorithm before, how did you deal with these issues? I'd prefer not to pull in arbitrary precision types as they cost too much and I'm sure there exists a solution which doesn't require them.
Thanks.
Usually the way you handle this is by taking logs and using adds, and then doing an exp if you want to get back into probability space.
p1 * p2 * p3 * ... * pn = exp(log(p1) + log(p2) + log(p3) + ... log(pn))
You avoid under flows by working in log space.
If you're classifying between two categories you can introduce the log ratio of probabilities for each category. So if:
log(Pr(cat1) / Pr(cat2)) <=> 0 # positive would favor cat1 and negative cat2
That is equal to:
log(Pr(cat1)) - log(Pr(cat2)) <=> 0
And if (as in Bayesian classifiers) the category probabilities are themselves products of probabilities given conditions:
log(Pr(cat1|cond1)) + ... <=> log(Pr(cat2|cond1)) + ...
Thus you are dealing with summation rather than multiplication and you will need a massive data set to run into the same thing.

Resources