Microsoft Solver Foundation for semi-integer - solver

Is it possible to use the MSF api to specify a variable as semi-integer (V = 0, or a <= V <= b)?
The following is an example in LP_Solve that uses the "sec" and "int" keywords to indicate the variables are semi-continuous and integer.
max: 0.5 Q1 + 0.55 Q2 ;
Q1 >= 5;
Q1 <= 10 ;
Q2 >= 5;
Q2 <= 10;
Q1 + Q2 <= 10;
sec Q1,Q2 ;
int Q1,Q2 ;
Something similar in MSF would be nice. I note that it is possible to call a Gurobi Plugin DLL within MSF however I cannot find any place in that api to be able to set the type of the variable correctly (I think Gurobi calls it the VTYPE), so I assume it is either not exposed in their .net api or not available in the version of Gurobi that MSF uses? Alternatively, is there a nice way to call LP_Solve from .NET?

You can do this with Solver Foundation but there is no equivalent for the "sec" keyword. Instead you can add a dummy 0-1 decision for each semi-integer variable. For your original example involving "V", here's how you could do it in OML:
Model[
Decisions[
Integers[0, 1],
VPositive
],
Decisions[
Reals,
V
],
Constraints[
constraint -> 10 * VPositive<= V <= 20 * VPositive
]
]
If you are using the Solver Foundation API then you would add the analagous decisions, constraints, goals using the object model. The way to specify the type of a decision is using a Domain, provided in the ctor.

Related

How to update the error on a hidden node using back-propogation, given the error on the output nodes and weights

I'm learning NN, and I want to manually code back-propogation to understand how it works, but I'm struggling with the algorithm. I'm trying to solve question 30 of this paper (so that I have an example of how it works to work from).
The short version of the question is if someone could show me how to do this to find the error for H2, I would appreciate it (the answer should be A; -0.0660).
The long version of the question is, is my thinking right (to find the error using back-propogation at H2):
The error (from question 29) for I1, I2 and I3 are 0.1479, -0.0929 and 0.1054, respectively.
The network architecture is:
The weights are:
So I thought what I had to do:
Find the total amount of weights that led to each output error (I took the absolute values to get the total sum of errors, is this right?):
E1 = 1.0 + 3.5 => 4.5
E2 = 0.5 + 1.2 => 1.7
E3 = 0.3 + 0.6 => 0.9
So then I worked out the proportion of each weight that came from my node of interest (y2):
E1 = 3.5/4.5 = 0.777
E2 = 1.2/1.7 = 0.71
E3 = 0.6/0.7 = 0.86
And then I worked out the proportion of error that came from that proportion of weight:
E1 => (0.14/100)*14 = 0.01078
E2 => (-0.09/100)*71 = -0.0639
E3 => (0.1054/100)*86 = 0.090644
If someone could show me where I'm going wrong (because as mentioned above, I know what the right answer should be), I'd appreciate it. Also, as mentioned above, I've added a link to the original question 30 on the original exam, if it helps (it's from 17 years ago, not an exam I'm doing, just trying to understand it). I know that I can just use TensorFlow/Keras to implement this automatically, but I'm trying to really get into how it all works.
In the question you mentioned, you are given the error function:
You need to calculate its value for j = 2. You have the values for all delta_k and w_ij.
You are also given the derivative of activation function, f'(Hj):
Finally, you are given the activation of hidden node 2, which is f(H2). All you need to do is to place all the values you have into the equations:
f'(H2) = 0.74 * (1 - 0.74) = 0.1924
delta_2 = 0.1924 * ((0.1479 * -3.5) + (-0.0929 * -1.2) + (0.1054 * 0.6)) = -0.06598

Algorithm to detect unusual growth/fall in my numbers

I have a dataset with visitor number who visited my site's pages during the last 30 days, it looks something like this:
Page 1: [1,2,66,2,2,7,8]
Page 2: [3,5,8,3,7,11,45]
The total amount of pages is huge. I would like to apply an algorithm to detect pages which had sudden growth, spikes or downfalls during the period. Is there a single algorithm that lets me do that?
int Q = 20; //Q should be the difference
//between two pages that should be
//considered a spike
for (int i = 0; i < pages.length; i++){
page p = pages[i];
for (int j = 0; j < p.visitors.length - 1; j++){
if(p.visitors[j] >= p.visitors[j+1] + Q){
print("Page " + i + " has spike in day " + j);
}
else if(p.visitors[j] + Q <= p.visitors[j+1] + Q){
print("Page " + i + " has spike in day " + (j+1));
}
}
}
You can check Z-score, so based on the mean and standard deviations you can estimate pikes.
For example
In page 1:
Mean: 12.571428571429
Std Dv: 23.719592062661
Z-score(Number of standard deviations from the mean a data point) for values of page 1:
[-0.4878,-0.44568,2.2525,-0.44568,-0.44568,-0.23489,-0.19273]
So you can note that the third value is 2.2525 standard deviations from the mean, which is probably a pike(sudden growth, because is positive). The others values seems expected.
Statistically speaking, a value in a data set is considered an outlier when it's distance from Q1 or Q3 is larger than 1.5 * (Q3 - Q1) where Q1 and Q3 represent the first and third quartile respectively.
You could implement this with an algorithm that calculates Q1 and Q3 based on the last n days (e.g. 30) and go from there.
Find Q1 and Q3
IQR = 1.5 * (Q3 - Q1)
Loop through array
Check page[i] <= Q1 - IQR. If true: outlier
Check page[i] >= Q3 + IQR. If true: outlier
So far, so good. However.
Finding Q1 and Q3 is a bit tricky.
You could either A)
Calculate them the easy way I.E not technically correct
Find average
Divide by 2. This is Q1
Add Q1 to average. This is Q3
Or B)
Find some other way of calculating the quartiles. Visit this for reference.

SCIP infeasibility detection with a MINLP

I'm using SCIPAMPL to solve mixed integer nonlinear programming problems (MINLPs). For the most part it's been working well, but I found an instance where the solver detects infeasibility erroneously.
set K default {};
var x integer >= 0;
var y integer >= 0;
var z;
var v1{K} binary;
param yk{K} integer default 0;
param M := 300;
param eps := 0.5;
minimize upperobjf:
16*x^2 + 9*y^2;
subject to
ll1: 4*x + y <= 50;
ul1: -4*x + y <= 0;
vf1{k in K}: z + eps <= (x + yk[k] - 20)^4 + M*(1 - v1[k]);
vf2: z >= (x + y - 20)^4;
aux1{k in K}: -(4*x + yk[k] - 50) <= M*v1[k] - eps;
# fix1: x = 4;
# fix2: y = 12;
let K := {1,2,3,4,5,6,7,8,9,10,11};
for {k in K} let yk[k] := k - 1;
solve;
display x,y,z,v1;
The solver is detecting infeasibility at the presolve phase. However, if you uncomment the two constraints that fix x and y to 4 and 12, the solver works and outputs the correct v and z values.
I'm curious about why this might be happening and whether I can formulate the problem in a different way to avoid it. One suggestion I got was that infeasibility detection is usually not very good with non-convex problems.
Edit: I should mention that this isn't just a SCIP issue. SCIP just hits the issue with this particular set K. If for instance I use bonmin, another global MINLP solver, I can solve the problem for this particular K, but if you expand K to go up to 15, then bonmin detects infeasibility when the problem remains feasible. For that K, I'm yet to find a solver that actually works. I've also tried minlp solvers based on FILTER. I'm yet to try BARON since it only takes GAMS input.
There are very good remarks about modeling issues regarding, e.g., big-M constraints in the comments to your original question. Numerical issues can indeed cause troubles, especially when nonlinear constraints are present.
Depending on how deep you would like to dive into that matter, I see 3 options for you:
You can decrease numeric precision by tuning the parameters numerics/feastol, numerics/epsilon, and numerics/lpfeastol. You can save the following lines in a file "scip.set" and save it to the working directory from where you call scipampl:
# absolute values smaller than this are considered zero
# [type: real, range: [1e-20,0.001], default: 1e-09]
numerics/epsilon = 1e-07
# absolute values of sums smaller than this are considered zero
# [type: real, range: [1e-17,0.001], default: 1e-06]
numerics/sumepsilon = 1e-05
# feasibility tolerance for constraints
# [type: real, range: [1e-17,0.001], default: 1e-06]
numerics/feastol = 1e-05
# primal feasibility tolerance of LP solver
# [type: real, range: [1e-17,0.001], default: 1e-06]
numerics/lpfeastol = 1e-05
You can now test different numerical precisions within scipampl by modifying the file scip.set
Save the solution you obtain by fixing your x and y-variables. If you pass this solution to the model without fixings, you get a message what caused the infeasibility. Usually, you will get a message that some variable bound or constraint is violated slightly outside a tolerance.
If you want to know precisely through which presolver a solution becomes infeasible, or if the former approach does not show any violation, SCIP offers the functionality to read in a debug solution; Specify the solution file "debug.sol" by uncommenting the line in src/scip/debug.h
/* #define SCIP_DEBUG_SOLUTION "debug.sol" */
and recompile SCIP and SCIPAmpl by using
make DBG=true
SCIP checks the debug-solution against every presolving reduction and outputs the presolver which causes the trouble.
I hope this is useful for you.
Looking deeper into this instance, SCIP seems to do something wrong in presolve.
In cons_nonlinear.c:7816 (function consPresolNonlinear), remove the line
if( nrounds == 0 )
so that SCIPexprgraphPropagateVarBounds is executed in any case.
That seems to fix the issue.

How Random module get tested in OCaml?

OCaml has a Random module, I am wondering how it tests itself for randomness. However, i don't have a clue what exactly they are doing. I understand it tries to test for chi-square with two more dependencies tests. Here are the code for the testing part:
chi-square test
(* Return the sum of the squares of v[i0,i1[ *)
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else if i1 = i0 + 1 then Pervasives.float v.(i0) *. Pervasives.float v.(i0)
else sumsq v i0 ((i0+i1)/2) +. sumsq v ((i0+i1)/2) i1
;;
let chisquare g n r =
if n <= 10 * r then invalid_arg "chisquare";
let f = Array.make r 0 in
for i = 1 to n do
let t = g r in
f.(t) <- f.(t) + 1
done;
let t = sumsq f 0 r
and r = Pervasives.float r
and n = Pervasives.float n in
let sr = 2.0 *. sqrt r in
(r -. sr, (r *. t /. n) -. n, r +. sr)
;;
Q1:, why they write sum of squares like that?
It seems it is just summing up all squares. Why not write like:
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else Pervasives.float v.(i0) *. Pervasives.float v.(i0) + (sumsq v (i0+1) i1)
Q2:, why they seem to use different way for chisquare?
From the chi squared test wiki, they formula is
But it seems they are using different formula, what's behind the scene?
Other two dependencies tests
(* This is to test for linear dependencies between successive random numbers.
*)
let st = ref 0;;
let init_diff r = st := int r;;
let diff r =
let x1 = !st
and x2 = int r
in
st := x2;
if x1 >= x2 then
x1 - x2
else
r + x1 - x2
;;
let st1 = ref 0
and st2 = ref 0
;;
(* This is to test for quadratic dependencies between successive random
numbers.
*)
let init_diff2 r = st1 := int r; st2 := int r;;
let diff2 r =
let x1 = !st1
and x2 = !st2
and x3 = int r
in
st1 := x2;
st2 := x3;
(x3 - x2 - x2 + x1 + 2*r) mod r
;;
Q3: I don't really know these two tests, can someone en-light me?
Q1:
It's a question of memory usage. You will notice that for large arrays, your implementation of sumsq will fail with "Stack overflow during evaluation" (on my laptop, it fails for r = 200000). This is because before adding Pervasives.float v.(i0) *. Pervasives.float v.(i0) to (sumsq v (i0+1) i1), you have to compute the latter. So it's not until you have computed the result of the last call of sumsq that you can start "going up the stack" and adding everything. Clearly, sumsq is going to be called r times in your case, so you will have to keep track of r calls.
By contrast, with their approach they only have to keep track of log(r) calls because once sumsq has been computed for half the array, you only need to the result of the corresponding call (you can forget about all the other calls that you had to do to compute that).
However, there are other ways of achieving this result and I'm not sure why they chose this one (maybe somebody will be able to tell ?). If you want to know more on the problems linked to recursion and memory, you should probably check the wikipedia article on tail-recursion. If you want to know more on the technique that they used here, you should check the wikipedia article on divide and conquer algorithms -- be careful though, because here we are talking about memory and the Wikipedia article will probably talk a lot about temporal complexity (speed).
Q2:
You should look more closely at both expressions. Here, all the E_i's are equal to n/r. If you replace this in the expression you gave, you will find the same expression that they use: (r *. t /. n) -. n. I didn't check about the values of the bounds though, but since you have a Chi-squared distribution with parameter r-minus-one-or-two degrees of freedom, and r quite large, it's not surprising to see them use this kind of confidence interval. The Wikipedia article you mentionned should help you figure out what confidence interval they use exactly fairly easily.
Good luck!
Edit: Oops, I forgot about Q3. I don't know these tests either, but I'm sure you should be able to find more about them by googling something like "linear dependency between consecutive numbers" or something. =)
Edit 2: In reply to Jackson Tale's June 29 question about the confidence interval:
They should indeed test it against the Chi-squared distribution -- or, rather, use the Chi-squared distribution to find a confidence interval. However, because of the central limit theorem, the Chi-squared distribution with k degrees of freedom converges to a normal law with mean k and variance 2k. A classical result is that the 95% confidence interval for the normal law is approximately [μ - 1.96 σ, μ + 1.96 σ], where μ is the mean and σ the standard deviation -- so that's roughly the mean ± twice the standard deviation. Here, the number of degrees of freedom is (I think) r - 1 ~ r (because r is large) so that's why I said I wasn't surprised by a confidence interval of the form [r - 2 sqrt(r), r + 2 sqrt(r)]. Nevertheless, now that I think about it I can't see why they don't use ± 2 sqrt(2 r)... But I might have missed something. And anyway, even if I was correct, since sqrt(2) > 1, they get a more stringent confidence interval, so I guess that's not really a problem. But they should document what they're doing a bit more... I mean, the tests that they're using are probably pretty standard so most likely most people reading their code will know what they're doing, but still...
Also, you should note that, as is often the case, this kind of test is not conclusive: generally, you want to show that something has some kind of effect. So you formulate two hypothesis : the null hypothesis, "there is no effect", and the alternative hypothesis, "there is an effect". Then, you show that, given your data, the probability that the null hypothesis holds is very low. So you conclude that the alternative hypothesis is (most likely) true -- i.e. that there is some kind of effect. This is conclusive. Here, what we would like to show is that the random number generator is good. So we don't want to show that the numbers it produces differ from some law, but that they conform to it. The only way to do that is to perform as many tests as possible showing that the number produced have the same property as randomly generated ones. But the only conclusion we can draw is "we were not able to find a difference between the actual data and what we would have observed, had they really been randomly generated". But this is not a lack of rigor from the OCaml developers: people always do that (eg, a lot of tests require, say, the normality. So before performing these tests, you try to find a test which would show that your variable is not normally distributed. And when you can't find any, you say "Oh well, the normality of this variable is probably sufficient for my subsequent tests to hold") -- simply because there is no other way to do it...
Anyway, I'm no statistician and the considerations above are simply my two cents, so you should be careful. For instance, I'm sure there is a better reason why they're using this particular confidence interval. I also think you should be able to figure it out if you write everything down carefully to make sure about what they're doing exactly.

Performance algorithm - Ordering - Tree (data structure) only solution?

I have a problem at hand, at first sight it looks easy and it is, however I am looking for some other solution (maybe more easy one):
Expressions:
V0
V1
V2
V3
V4
SumA = V1 + V2
SumB = SumA + V3
SumC = SumB + SumA
SumD = SumC + V0
As we can see here, the "base" variables are V0, V1, V2, V3 and V4 (the value of each one of them is returned from DB queries)
The user ask the software to return the result of V1 and SumC.
Solution that I know:
Find all necessary variables: V1, SumC, SumB, SumA, V3, V2
For performance I just want to process the math of each variable JUST ONE TIME.
This means that I need to order the expressions from "base expressions" to "top variables".
At this point I am only seeing a solution of the type "Tree (data structure)" > Get V1, V2 and V3
Then get SumA, after get SumB and only at last get SumC.
Is there any other way to solve this problem?
The final objective in this algorithm is to use with more complex variables and several "middle variables". So, performance is critical, I can't make the same math operation more than 1 time.
I am not sure I completely understand - but I think you are referring to common subexpression elimination, [or something similar to it] which is a very common compiler optimization.
One common way of doing this optimization is using a graph [which is actually a DAG] of the expressions in the program, and adding iteratively new expressions. The "sources" in your DAG are all initial variables [V0,V1,V2,V3,V4 in your example]. You can "know" which expression is redundant if you already calculated it - and avoid recalculating it.
These lecture notes seems to be a decent more detailed explanation [though I admit I did not read it all]
First of all, you need to build a tree with all expressions. Trees are the most simple data structure for this case.
Now let's assume you have these formulas:
SumA = v1 + v2
SumB = v1 + v2 + v3
SumC = ...
and the user asks for SumB (so you know how to calculate SumC but to make the user happy, you don't have to).
In Memory, this looks like so:
SumA = Add( v1, v2 )
SumB = Add( Add( v1, v2 ), v3 ) )
The next step is to define compare operators which tell whether two sub-trees are the same. Running those, you will notice that Add( v1, v2 ) appears twice, so you can optimize:
SumA = Add( v1, v2 )
SumB = Add( SumA, v3 )
This means you can achieve the result by the minimum of calculations. The next step is to add caching to your operators: When someone asks their value, they should cache it so the next getValue() call can return the last result.
That means evaluating either SumA or SumB will fill the cache for SumA. Since you never ask for the value of SumC, it's never calculated and therefore costs nothing.
Maybe you could simplify it into this and eliminate the middle step:
SumA = (V1 + V2)*2
SumC = V3 + SumA
Only way to speed this up is to use serialisation on level you can't get programatically unless you use your own hardware. Example:
Please ignore note on top right, this is stolen from my script :)
Case A:
100 * 4 cycles
Case B:
First result takes 3 cycles, each next takes only 1 (serialisation, Ford factory like). - 102 cycles
102 vs 400 - roughly 4* the speed.
Modern CPUs can do this to some extent automatically, but it's pretty hard to measure it.
I've heard that ICC (intel C compiler) does optimize it's assembly to exploit this as much as possible, maybe that's partially why they beat everything else on intel CPU's :)

Resources