Ways to compare two methods in general

Ways to compare two methods in general - performance

Background:
I have two methods to handle a set of problems. The key performance index is the dates to complete those set of problems. I want to assess which method is better base upon the overall dates is needed. For example, there are two methods called, A and B. There are a set of problems containing 3 problems, which are x, y, and z. Using method A to approach these three problems needs 3, 5, 7 days respectively. Whereas using method B needs 6, 7, 8 days respectively. I need to compare whether A is better or B is better. In this naive example, A is obviously better.
However, there is an edge case. It could happen that for some problems, one method would takes forever to finish it. For example, there is another method called C. To approach x, y, and z, it needs 1, 1, and 99999 days to finish. My question is how I can compare C to A and B?
There are two ways I am considering. One is reciprocal of dates. The other one is truncated. On the previous example, if I use reciprocal, then the scores of method A are (1/3 + 1/5 + 1/7)/3, method B are (1/6 + 1/7 + 1/8)/3, and method C are (1/1 + 1/1 + 1/99999)/3. Based upon reciprocal, A > C > B. If I use truncated to 30 (any value bigger than 30 become 30). Then the average needed dates for method A is 5, method B is 7, and method C is 10.7.
My question are
which criterion is better?
Is there any other ways to assess?
If truncation is better, is there a better way to set the cutoff point?
If reciprocal is better, how to interpret that number? average efficiency? and how to report to boss what that number means.
Is there any more structure, or scientific way to think of this kind of problem in general?
Any helps or hints is appreciated, thank you very much.

Related

How to find the best combination of parameters from a very large sets?

I have a processing logic which has 11 parameters(let's say from parameter A to parameter K) and different combinations of theses parameters can results in different outcomes.
Processing Logic Example:
if x > A:
x = B
else:
x = C
y = math.sin(2x*x+1.1416)-D
# other logic involving parameter E,F,G,H,I,J,K
return outcome
Here are some examples of the possible values of the parameters(others are similar, discrete):
A ∈ [0.01, 0.02, 0.03, ..., 0.2]
E ∈ [1, 2, 3, 4, ..., 200]
I would like to find the combination of these parameters that results in the best outcome.
However, the problem I am facing is that there are in total
10^19 possible combinations while each combination takes 700ms processing time per CPU core. Obviously, the time to process the whole combinations is unacceptable even I have a large computing cluster.
Could anyone give some advice on what is the correct methodology to handle this problem?
Here is some of my thoughts:
Step 1. Minimize the step interval of each parameter that reduces the total processing time to an acceptable scope, for example:
A ∈ [0.01, 0.05, 0.09, ..., 0.2]
E ∈ [1, 5, 10, 15, ..., 200]
Step 2. Starting from the best combination resulted from step 1, doing a more meticulous research around that combination to find the best combination
But I am afraid that the best combination might hide somewhere that step 1 is not able to perceive, so step 2 is in vain

This is an optimization problem. However, you have two distinct problems in what you posed:
There are no restrictions or properties on the evaluation function;
You accept only the best solution of 10^19 possibilities.
The field of optimization serves up many possibilities, most of which are one variation or another of hill-climbing search and irruptive movement (to help break out of a local maximum that is not the global solution). All of these depend on some manner of continuity or predictability in the evaluation function's dependence on its inputs.
Without that continuity, there is no shorter path to the sole optimal solution.
If you do have some predictability, then you have some reading to do on various solution methods. Start with Newton-Raphson, move on to Gradient Descent, and continue to other topics, depending on the fabric of your function.

Have you thought about purely mathematical approach i.e. trying to find local/global extrema, or based on whether function is monotonic per operation?
There are quite decent numerical methods for derivatives/integrals, even to be used in a relatively-generic manner.
So in other words limit the scope, instead of computing every single option - depends on the general character of operations, that you have in mind.

Algorithm to calculate all possible subset

This should be a quite simple problem, but I don't have proper algorithmic training and find myself stuck trying to solve this.
I need to calculate the possible combinations to reach a number by adding a limited set of smaller numbers together.
Imagine that we are playing with LEGO and I have a brick that is 12 units long and I need to list the possible substitutions I can make with shorter bricks. For this example we may say that the available bricks are 2, 4, 6 and 12 units long.
What might be a good approach to building an algorithm that can calculate the substitions? There are no bounds on how many bricks I can use at a time, so it could be 6x2 as well as 1x12, the important thing is I need to list all of the options.
So the inputs are the target length (in this case 12) and available bricks (an array of numbers (arbitrary length), in this case [2, 4, 6, 12]).
My approach was to start with the low number and add it up until I reach the target, then take the next lowest and so on. But that way I miss out on the combinations of multiple numbers and when I try to factor that in, it gets really messy.

I suggest a recursive approach: given a function f(target,permissibles) to list all representations of target as a combination of permissibles, you can do this:
def f(target,permissibles):
for x in permissibles:
collect f(target - x, permissibles)
if you do not want to differentiate between 12 = 4+4+2+2 and 12=2+4+2+4, you need to sort permissibles in the descending order and do
def f(target,permissibles):
for x in permissibles:
collect f(target - x, permissibles.remove(larger than x))

Generating a mathematical model of a pattern

Does there exist some algorithm that allows for the creation of a mathematical model given an inclusive set?
I'm not sure I'm asking that correctly... Let me try again...
Given some input set...
int Set[] = { 1, 4, 9, 16, 25, 36 };
Does there exist an algorithm that would be able to deduce the pattern evident in the set? In this case being...
Set[x] = x^2
The only way I can think of doing something like this is some GA where the fitness is how closely the generated model matches the input set.
Edit:
I should add that my problem domain implies that the set is inclusive. Meaning, I am finding the closest possible function for the set and not using the function to extrapolate beyond the set...

The problem of curve fitting might be a reasonable place to start looking. I'm not sure if this is exactly what you're looking for - it won't really identify the pattern so much as just produce a function which follows the pattern as closely as possible.
As others have mentioned, for a simple set there can easily be infinitely many such functions, so something like this may be what you want, rather than exactly what you have described in your question.
Wikipedia seems to indicate that the Gauss-Newton algorithm or the Levenberg–Marquardt algorithm might be a good place to begin your research.

A mathematical argument explaining why, in general, this is impossible:
There are only countably many computer programs that can be written at all.
There are uncountably many infinite sequences of integers.
Therefore, there are infinitely many sequences of integers for which no possible computer program can generate those sequences.
Accordingly, this is impossible in the general case. Sorry!
Hope this helps!

If you want to know if the given data fits some polynomial function, you compute successive differences until you reach a constant. The number of differences to reach the constant is the degree of the polynomial.
x | 1 2 3 4
y | 1 4 9 16
y' | 3 5 7
y" | 2 2
Since y" is 2, y' is 2x + C1, and thus y is x2 + C1x + C2. C1 is 0, since 2×1.5 = 3. C2 is 0 because 12 = 1. So, we have y = x2.
So, the algorithm is:
Take successive differences.
If it does not converge to a constant, either resort to curve fitting, or report the data is insufficient to determine a polynomial.
If it does converge to a constant, iteratively integrate polynomial expression and evaluate the trailing constant until the degree is achieved.

Initial guess for Newton Raphson

How can I determine the initial guess of the equation Ax+Bsin(x)=C in terms of A,B and C ?
I am trying to solve it using Newton Raphson. A,B and C will be given during runtime.
Is there any other method more efficient than Newton Raphson for this purpose ?

The optimal initial guess is the root itself, so finding an "optimal" guess isn't really valid.
Any guess will give you a valid solution eventually as long as f'(x0) != 0 for any step, which only occurs at the zeroes of cos(x), which are k*pi + pi/2 for any integer k.
I would try x0 = C * pi, just to see if it works.
Your biggest problem, however, would be the periodic nature of your function. Newton's method will be slow (if it even works) for your function as sin(x) will shift x0 back and forth over and over.
Precaution:
In Newton's method, do you notice how f'(xn) is in the denominator? f'(x) approaches 0 infinitely many times. If your f'(x) = 0.0001 (or anywhere close to zero, which has a chance of happening), your xn+1 gets thrown really far away from xn.
Worse yet, this can happen over and over due to f'(x) being a periodic function, which means that Newton's method might never even converge for an arbitrary x0.

The simplest "good" approximation is to just assume that sin(x) is approximately zero, and so set:
x0 = C/A

Well, if A,B and C are real and different from 0, then (B+C)/A is an upper quote to the highest root and (C-B)/A is a lower quote to the lowest root, as -1 <= sin(x) <= 1. You could start with those.

Newton method can work with any guess. the problem is simple,
if there is an equation and I guessed x0=100
and the best close solution for it is x0=2
and I know the answer is 2.34*
by using any guess in the world you will eventually get to 2.34*
the method says to choose a guess because without a valid guess it will take many solutions which aren't comfortable no one wants to repeat the method 20 times
and guessing a solution is not hard
you just find a critical point-
for example, 3 is too big and 2 is too small
so the answer is between 2 and 3
but if instead guessing 2 you guess 50
you will still get to the right solution.
like I said it will just take you much longer
I tested the method by myself
I guessed 1000 to a random equation
and I knew the best guess was 4
the answer was between 4 and 5
I chose 1000 it took me much time
but after a few hours, I got down from 1000 to 4.something
if you somehow can't find a critical point you can actually put a random number equals to x0 and then eventually you will get to the right solution
no matter what number you guessed.

Bin-packing (or knapsack?) problem

I have a collection of 43 to 50 numbers ranging from 0.133 to 0.005 (but mostly on the small side). I would like to find, if possible, all combinations that have a sum between L and R, which are very close together.*
The brute-force method takes 243 to 250 steps, which isn't feasible. What's a good method to use here?
Edit: The combinations will be used in a calculation and discarded. (If you're writing code, you can assume they're simply output; I'll modify as needed.) The number of combinations will presumably be far too large to hold in memory.
* L = 0.5877866649021190081897311406, R = 0.5918521703507438353981412820.

The basic idea is to convert it to an integer knapsack problem (which is easy).
Choose a small real number e and round numbers in your original problem to ones representable as k*e with integer k. The smaller e, the larger the integers will be (efficiency tradeoff) but the solution of the modified problem will be closer to your original one. An e=d/(4*43) where d is the width of your target interval should be small enough.
If the modified problem has an exact solution summing to the middle (rounded to e) of your target interval, then the original problem has one somewhere within the interval.

You haven't given us enough information. But it sounds like you are in trouble if you actually want to OUTPUT every possible combination. For example, consistent with what you told us are that every number is ~.027. If this is the case, then every collection of half of the elements with satisfy your criterion. But there are 43 Choose 21 such sets, which means you have to output at least 1052049481860 sets. (too many to be feasible)
Certainly the running time will be no better than the length of the required output.

Actually, there is a quicker way around this:
(python)
sums_possible = [(0, [])]
# sums_possible is an array of tuples like this: (number, numbers_that_yield_this_sum_array)
for number in numbers:
sums_possible_for_this_number = []
for sum in sums_possible:
sums_possible_for_this_number.insert((number + sum[0], sum[1] + [number]))
sums_possible = sums_possible + sums_possible_for_this_number
results = [sum[1] for sum in sums_possible if sum[0]>=L and sum[1]<=R]
Also, Aaron is right, so this may or may not be feasible for you

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio