FindRoot vs Solve, NSolve and Reduce - wolfram-mathematica

First some non-essential context for fun. My real question is far below. Please don't touch the dial.
I'm playing with the new probabilistic functions of Mathematica 8. Goal is to do a simple power analysis. The power of an experiment is 1 minus the probability of a type II error (i.e., anouncing 'no effect', whereas there is an effect in reality).
As an example I chose an experiment to determine whether a coin is fair. Suppose the probability to throw tails is given by b (a fair coin has b=0.5), then the power to determine that the coin is biased for an experiment with n coin flips is given by
1 - Probability[-in <= x - n/2 <= in, x \[Distributed] BinomialDistribution[n, b]]
with in the size of the deviation from the expected mean for a fair coin that I an willing to call not suspicious (in is chosen so that for a fair coin flipped n times the number of tails will be about 95% of the time within mean +/- in ; this, BTW, determines the size of the type I error, the probability to incorrectly claim the existence of an effect).
Mathematica nicely draws a plot of the calculated power:
n = 40;
in = 6;
Plot[1-Probability[-in<=x-n/2<=in,x \[Distributed] BinomialDistribution[n, b]], {b, 0, 1},
Epilog -> Line[{{0, 0.85}, {1, 0.85}}], Frame -> True,
FrameLabel -> {"P(tail)", "Power", "", ""},
BaseStyle -> {FontFamily -> "Arial", FontSize -> 16,
FontWeight -> Bold}, ImageSize -> 500]
I drew a line at a power of 85%, which is generally considered to be a reasonable amount of power. Now, all I want is the points where the power curve intersects with this line. This tells me the minimum bias the coin must have so that I have a reasonable expectation to find it in an experiment with 40 flips.
So, I tried:
In[47]:= Solve[ Probability[-in <= x - n/2 <= in,
x \[Distributed] BinomialDistribution[n, b]] == 0.15 &&
0 <= b <= 1, b]
Out[47]= {{b -> 0.75}}
This fails miserably, because for b = 0.75 the power is:
In[54]:= 1 - Probability[-in <= x - n/2 <= in, x \[Distributed] BinomialDistribution[n, 0.75]]
Out[54]= 0.896768
NSolve finds the same result. Reducedoes the following:
In[55]:= res = Reduce[Probability[-in <= x - n/2 <= in,
x \[Distributed] BinomialDistribution[n, b]] == 0.15 &&
0 <= b <= 1, b, Reals]
Out[55]= b == 0.265122 || b == 0.73635 || b == 0.801548 ||
b == 0.825269 || b == 0.844398 || b == 0.894066 || b == 0.932018 ||
b == 0.957616 || b == 0.987099
In[56]:= 1 -Probability[-in <= x - n/2 <= in,
x \[Distributed] BinomialDistribution[n, b]] /. {ToRules[res]}
Out[56]= {0.85, 0.855032, 0.981807, 0.994014, 0.99799, 0.999965, 1., 1., 1.}
So, Reduce manages to find the two solutions, but it finds quite a few others that are dead wrong.
FindRoot works best here:
In[57]:= FindRoot[{Probability[-in <= x - n/2 <= in,
x \[Distributed] BinomialDistribution[n, b]] - 0.15`}, {b, 0.2, 0, 0.5}]
FindRoot[{Probability[-in <= x - n/2 <= in,
x \[Distributed] BinomialDistribution[n, b]] - 0.15`}, {b, 0.8, 0.5, 1}]
Out[57]= {b -> 0.265122}
Out[58]= {b -> 0.734878}
OK, long introduction. My question is: why do Solve, NSolve, and Reduce fail so miserably (and silently!) here? IMHO, it can't be numerical accuracy since the power values found for the various solutions seem to be correct (they lie perfectly on the power curve) and are considerably removed from the real solution.
For the mma8-deprived Mr.Wizard: The expression for the power is a heavy one:
In[42]:= Probability[-in <= x - n/2 <= in,
x \[Distributed] BinomialDistribution[n, b]]
Out[42]= 23206929840 (1 - b)^26 b^14 + 40225345056 (1 - b)^25 b^15 +
62852101650 (1 - b)^24 b^16 + 88732378800 (1 - b)^23 b^17 +
113380261800 (1 - b)^22 b^18 + 131282408400 (1 - b)^21 b^19 +
137846528820 (1 - b)^20 b^20 + 131282408400 (1 - b)^19 b^21 +
113380261800 (1 - b)^18 b^22 + 88732378800 (1 - b)^17 b^23 +
62852101650 (1 - b)^16 b^24 + 40225345056 (1 - b)^15 b^25 +
23206929840 (1 - b)^14 b^26
and I wouldn't have expected Solve to handle this, but I had high hopes for NSolve and Reduce. Note that for n=30, in=5 Solve, NSolve, Reduce and FindRoot all find the same, correct solutions (of course, the polynomial order is lower there).

I think the problem is just the numeric instablitity of finding roots to high order polynomials:
In[1]:= n=40; in=6;
p[b_]:= Probability[-in<=x-n/2<=in,
x\[Distributed]BinomialDistribution[n,b]]
In[3]:= Solve[p[b]==0.15 && 0<=b<=1, b, MaxExtraConditions->0]
1-p[b]/.%
Out[3]= {{b->0.75}}
Out[4]= {0.896768}
In[5]:= Solve[p[b]==0.15 && 0<=b<=1, b, MaxExtraConditions->1]
1-p[b]/.%
Out[5]= {{b->0.265122},{b->0.736383},{b->0.801116},{b->0.825711},{b->0.845658},{b->0.889992},{b->0.931526},{b->0.958879},{b->0.986398}}
Out[6]= {0.85,0.855143,0.981474,0.994151,0.998143,0.999946,1.,1.,1.}
In[7]:= Solve[p[b]==3/20 && 0<=b<=1, b, MaxExtraConditions->0]//Short
1-p[b]/.%//N
Out[7]//Short= {{b->Root[-1+<<39>>+108299005920 #1^40&,2]},{b->Root[<<1>>&,3]}}
Out[8]= {0.85,0.85}
In[9]:= Solve[p[b]==0.15`100 && 0<=b<=1, b, MaxExtraConditions->0]//N
1-p[b]/.%
Out[9]= {{b->0.265122},{b->0.734878}}
Out[10]= {0.85,0.85}
(n.b. MaxExtraConditions->0 is actually the default option, so it could have been left out of the above.)
Both Solve and Reduce are simply generating Root objects
and when given inexact coefficients, they are automatically numerically evaluated.
If you look at the (shortened) output Out[7] then you'll see the Root of the full 40th order polynomial:
In[12]:= Expand#(20/3 p[b] - 1)
Out[12]= -1 + 154712865600 b^14 - 3754365538560 b^15 + 43996471155000 b^16 -
331267547520000 b^17 + 1798966820560000 b^18 -
7498851167808000 b^19 + 24933680132961600 b^20 -
67846748661120000 b^21 + 153811663157880000 b^22 -
294248399084640000 b^23 + 479379683508726000 b^24 -
669388358063093760 b^25 + 804553314979680000 b^26 -
834351666126339200 b^27 + 747086226686186400 b^28 -
577064755104364800 b^29 + 383524395817442880 b^30 -
218363285636496000 b^31 + 105832631433929400 b^32 -
43287834659596800 b^33 + 14776188957129600 b^34 -
4150451102878080 b^35 + 942502182076000 b^36 -
168946449235200 b^37 + 22970789150400 b^38 -
2165980118400 b^39 + 108299005920 b^40
In[13]:= Plot[%, {b, -1/10, 11/10}, WorkingPrecision -> 100]
From this graph you can confirm that the zeros are at (approx)
{{b -> 0.265122}, {b -> 0.734878}}.
But, to get the flat parts on the right hand side of the bump requires lots of numerical cancellations. Here's what it looks like without the explicit WorkingPrecision option:
This graph makes it clear why Reduce (or Solve with MaxConditions->1, see In[5] above) finds (from left to right) the first solution properly and the second solution almost correctly, followed by a whole load of crud.

Different numeric methods will fare differently when handling this.
(1) The ones that find all polynomial roots have the most difficult job, in that they may need to deal with deflated polynomials. FindRoot is off the hook there.
(2) The polynomial is a perturbation of one with substantial multiplicity. I would expect numeric methods to have trouble.
(3) The roots are all within 1-2 orders of magnitude in size. SO this is not so far from generally "bad" polynomials with roots around the unit circle.
(4) Most difficult is handling Solve[numeric eqn and ineq]. This must combine inequality solving methods (i.e. cylindrical decomposition) with machine arithmetic. Expect little mercy. Okay, this is univariate, so it amounts to Sturm sequences or Descartes' Rule of Signs. Still not numerically well behaved.
Here are some experiments using various method settings.
n = 40; in = 6;
p[b_] := Probability[-in <= x - n/2 <= in,
x \[Distributed] BinomialDistribution[n, b]]
r1 = NRoots[p[b] == .15, b, Method -> "JenkinsTraub"];
r2 = NRoots[p[b] == .15, b, Method -> "Aberth"];
r3 = NRoots[p[b] == .15, b, Method -> "CompanionMatrix"];
r4 = NSolve[p[b] == .15, b];
r5 = Solve[p[b] == 0.15, b];
r6 = Solve[p[b] == 0.15 && Element[b, Reals], b];
r7 = N[Solve[p[b] == 15/100 && Element[b, Reals], b]];
r8 = N[Solve[p[b] == 15/100, b]];
Sort[Cases[b /. {ToRules[r1]}, _Real]]
Sort[Cases[b /. {ToRules[r2]}, _Real]]
Sort[Cases[b /. {ToRules[r3]}, _Real]]
Sort[Cases[b /. r4, _Real]]
Sort[Cases[b /. r5, _Real]]
Sort[Cases[b /. r6, _Real]]
Sort[Cases[b /. r7, _Real]]
Sort[Cases[b /. r8, _Real]]
{-0.128504, 0.265122, 0.728, 1.1807, 1.20794, 1.22063}
{-0.128504, 0.265122, 0.736383, 0.801116, 0.825711, 0.845658, \
0.889992, 0.931526, 0.958879, 0.986398, 1.06506, 1.08208, 1.18361, \
1.19648, 1.24659, 1.25157}
{-0.128504, 0.265122, 0.733751, 0.834331, 0.834331, 0.879148, \
0.879148, 0.910323, 0.97317, 0.97317, 1.08099, 1.08099, 1.17529, \
1.17529, 1.23052, 1.23052}
{-0.128504, 0.265122, 0.736383, 0.801116, 0.825711, 0.845658, \
0.889992, 0.931526, 0.958879, 0.986398, 1.06506, 1.08208, 1.18361, \
1.19648, 1.24659, 1.25157}
{-0.128504, 0.265122, 0.736383, 0.801116, 0.825711, 0.845658, \
0.889992, 0.931526, 0.958879, 0.986398, 1.06506, 1.08208, 1.18361, \
1.19648, 1.24659, 1.25157}
{-0.128504, 0.75}
{-0.128504, 0.265122, 0.734878, 1.1285}
{-0.128504, 0.265122, 0.734878, 1.1285}
It looks like NSolve is using NRoots with Aberth's method, and Solve might just be calling NSolve.
The distinct solution sets seem to be all over the map. Actually many of the numeric ones that claim to be real (but aren't) might not be so bad. I'll compare magnitudes of one such set vs a set formed from numericizing exact root objects (a generally safe process).
mags4 = Sort[Abs[b /. r4]]
Out[77]= {0.128504, 0.129867, 0.129867, 0.13413, 0.13413, 0.141881, \
0.141881, 0.154398, 0.154398, 0.174443, 0.174443, 0.209069, 0.209069, \
0.265122, 0.543986, 0.543986, 0.575831, 0.575831, 0.685011, 0.685011, \
0.736383, 0.801116, 0.825711, 0.845658, 0.889992, 0.902725, 0.902725, \
0.931526, 0.958879, 0.986398, 1.06506, 1.08208, 1.18361, 1.19648, \
1.24659, 1.25157, 1.44617, 1.44617, 4.25448, 4.25448}
mags8 = Sort[Abs[b /. r8]]
Out[78]= {0.128504, 0.129867, 0.129867, 0.13413, 0.13413, 0.141881, \
0.141881, 0.154398, 0.154398, 0.174443, 0.174443, 0.209069, 0.209069, \
0.265122, 0.543985, 0.543985, 0.575831, 0.575831, 0.685011, 0.685011, \
0.734878, 0.854255, 0.854255, 0.902725, 0.902725, 0.94963, 0.94963, \
1.01802, 1.01802, 1.06769, 1.06769, 1.10183, 1.10183, 1.12188, \
1.12188, 1.1285, 1.44617, 1.44617, 4.25448, 4.25448}
Chop[mags4 - mags8, 10^(-6)]
Out[82]= {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0.00150522, -0.0531384, -0.0285437, -0.0570674, -0.0127339, \
-0.0469044, -0.0469044, -0.0864986, -0.0591449, -0.0812974, \
-0.00263812, -0.0197501, 0.0817724, 0.0745959, 0.124706, 0.123065, 0, \
0, 0, 0}
Daniel Lichtblau

Well, not a proper answer, but an interesting observation. Solve[ ] has the same behavior than Reduce[ ] when the magic (aka MaxExtraConditions) option is used:
n=40;
in=6;
Solve[Probability[-in<=x-n/2<=in,
x\[Distributed]BinomialDistribution[n,b]]==0.15 &&
0<=b<=1,b, MaxExtraConditions->1]
{{b -> 0.265122}, {b -> 0.736488}, {b -> 0.80151}, {b -> 0.825884},
{b -> 0.84573}, {b -> 0.890444}, {b -> 0.931972}, {b -> 0.960252},
{b -> 0.985554}}

Related

Hash collision rate calculation of 2-choice hashing

For a hash function, I can calculate its collision rate by simple/brute force math calculation:
We see that the collision probability of 32-bit hashing is quite high. In order to reduce the collision rate, I'm implementing a variant of 2-choice hashing, which calculates the hash key by two hash functions. I want to know how to calculate the collision probability of my new solution.
Thanks in advance.
I think my question falls into a case of general birthday problem P(k, n, d) that k events collide within n trials in sample space d where k equals 3 and d equals 2^32.
I did numerical solving using the equation provided by here
Table[NSolve[
n E^(-n/(
d k )) - (d^(k - 1) k! Log[1/(1 - p)] (1 - n/(d (k + 1))))^(1/
k) == 0 /. { p -> 0.5 , k -> i, d -> 2^32}, n, Reals], {i, 2, 4}]
The result seems convencible.
{{{n -> 77163.2}}, {{n -> 4.25017*10^6}}, {{n -> 3.39364*10^7}}}
I also tried the anwser provided here and got a close enough result:
Block[{
M = 2^32,
sampleCount = 8000000,
options = Sequence ## {PlotLegends -> "Expressions",
GridLines -> {{0.5}},
MeshFunctions -> {{x, y} |-> y}, Mesh -> {{0.5}},
MeshStyle -> Directive[PointSize[Medium], Red],
PlotRange -> {0, 1}}},
Plot[ 1 - Exp[-Binomial[k, 3]/M^2], {k, 1, sampleCount},
PlotLegends -> "Expressions",
Evaluate # options ]
]

Mathematica, solving non linear system of equations with lot of equations and variables

I need to find a square matrix A satisfying the equation
A.L.A = -17/18A -2(A.L.L + L.A.L + (L.L).A) + 3(A.L + L.A) -4L.L.L + 8L.L - 44/9L + 8/9*(ID)
,where L is a diagonal matrix L = {{2/3,0,0,0},{0,5/12,0,0},{0,0,11/12,0},{0,0,0,2/3}}.
I can find the answers in the case that A is of dimension 2 and 3, but there is a problem with dimension 4 and above.
Actually, the matrix A has to satisfy the equation A.A = A too, but with a suitable matrix L only the equation above equation is enough.
This is my code ;
A = Table[a[i,j],{i,1,4},{j,1,4}]
B = A.L.A
ID = IdentityMatrix[4]
M = -17/18A -2(A.L.L + L.A.L + (L.L).A) + 3(A.L + L.A) -4L.L.L + 8L.L - 44/9L + 8/9*(ID)
diff = (B - M)//ExpandAll//Flatten ( so I get 16 non linear system of equations here )
A1 = A/.Solve[diff == 0][[1]]
After running this code for quite sometime, the error come up with there is not enough memory to compute.
In this case there are 16 equations and 16 variables. Some of the entries are parameters but I just do not know which one until I get the result.
I am not sure if there is anyway to solve this problem. I need the answer to be rational(probably integers) which is possible theoretically.
Could this problem be solved by matrix equation or any other method? I see one problem for this is there are too many equations and variables.
This evaluates fairly quickly and with modest memory for a problem this size.
L = {{2/3, 0, 0, 0}, {0, 5/12, 0, 0}, {0, 0, 11/12, 0}, {0, 0, 0, 2/3}};
A = {{a, b, c, d}, {e, f, g, h}, {i, j, k, l}, {m, n, o, p}};
Reduce[{A.L.A == -17/18 A - 2 (A.L.L + L.A.L + (L.L).A) + 3 (A.L + L.A) -
4 L.L.L + 8 L.L - 44/9 L + 8/9*IdentityMatrix[4]},
{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p}, Backsubstitution->True
]
Then you just have to sort through the 143 potential solutions that it returns.
You might be able to Select from those that satisfy your A.A==A. You can also use ToRules on the result returned from Reduce to put this into a form similar to that returned from Solve, but check this carefully to make certain it is doing what you expect.
Check this very carefully to make certain I haven't made any mistakes.

Mathematica: using simplify to do common sub-expression elimination and reduction in strength

So lately I have been toying around with how Mathematica's pattern matching and term rewriting might be put to good use in compiler optimizations...trying to highly optimize short blocks of code that are the inner parts of loops. Two common ways to reduce the amount of work it takes to evaluate an expression is to identify sub-expressions that occur more than once and store the result and then use the stored result at subsequent points to save work. Another approach is to use cheaper operations where possible. For instance, my understanding is that taking square roots take more clock cycles than additions and multiplications. To be clear, I am interested in the cost in terms of floating point operations that evaluating the expression would take, not how long it takes Mathematica to evaluate it.
My first thought was that I would tackle the problem developing using Mathematica's simplify function. It is possible to specify a complexity function that compares the relative simplicity of two expressions. I was going to create one using weights for the relevant arithmetic operations and add to this the LeafCount for the expression to account for the assignment operations that are required. That addresses the reduction in strength side, but it is the elimination of common subexpressions that has me tripped up.
I was thinking of adding common subexpression elimination to the possible transformation functions that simplify uses. But for a large expression there could be many possible subexpressions that could be replaced and it won't be possible to know what they are till you see the expression. I have written a function that gives the possible substitutions, but it seems like the transformation function you specify needs to just return a single possible transformation, at least from the examples in the documentation. Any thoughts on how one might get around this limitation? Does anyone have a better idea of how simplify uses transformation functions that might hint at a direction forward?
I imagine that behind the scenes that Simplify is doing some dynamic programming trying different simplifications on different parts of the expressions and returning the one with the lowest complexity score. Would I be better off trying to do this dynamic programming on my own using common algebraic simplifications such as factor and collect?
EDIT: I added the code that generates possible sub-expressions to remove
(*traverses entire expression tree storing each node*)
AllSubExpressions[x_, accum_] := Module[{result, i, len},
len = Length[x];
result = Append[accum, x];
If[LeafCount[x] > 1,
For[i = 1, i <= len, i++,
result = ToSubExpressions2[x[[i]], result];
];
];
Return[Sort[result, LeafCount[#1] > LeafCount[#2] &]]
]
CommonSubExpressions[statements_] := Module[{common, subexpressions},
subexpressions = AllSubExpressions[statements, {}];
(*get the unique set of sub expressions*)
common = DeleteDuplicates[subexpressions];
(*remove constants from the list*)
common = Select[common, LeafCount[#] > 1 &];
(*only keep subexpressions that occur more than once*)
common = Select[common, Count[subexpressions, #] > 1 &];
(*output the list of possible subexpressions to replace with the \
number of occurrences*)
Return[common];
]
Once a common sub-expression is chosen from the list returned by CommonSubExpressions the function that does the replacement is below.
eliminateCSE[statements_, expr_] := Module[{temp},
temp = Unique["r"];
Prepend[ReplaceAll[statements, expr -> temp], temp[expr]]
]
At the risk of this question getting long, I will put a little example code up. I thought a decent expression to try to optimize would be the classical Runge-Kutta method for solving differential equations.
Input:
nextY=statements[y + 1/6 h (f[t, n] + 2 f[0.5 h + t, y + 0.5 h f[t, n]] +
2 f[0.5 h + t, y + 0.5 h f[0.5 h + t, y + 0.5 h f[t, n]]] +
f[h + t,
y + h f[0.5 h + t, y + 0.5 h f[0.5 h + t, y + 0.5 h f[t, n]]]])];
possibleTransformations=CommonSubExpressions[nextY]
transformed=eliminateCSE[nextY, First[possibleTransformations]]
Output:
{f[0.5 h + t, y + 0.5 h f[0.5 h + t, y + 0.5 h f[t, n]]],
y + 0.5 h f[0.5 h + t, y + 0.5 h f[t, n]],
0.5 h f[0.5 h + t, y + 0.5 h f[t, n]],
f[0.5 h + t, y + 0.5 h f[t, n]], y + 0.5 h f[t, n], 0.5 h f[t, n],
0.5 h + t, f[t, n], 0.5 h}
statements[r1[f[0.5 h + t, y + 0.5 h f[0.5 h + t, y + 0.5 h f[t, n]]]],
y + 1/6 h (2 r1 + f[t, n] + 2 f[0.5 h + t, y + 0.5 h f[t, n]] +
f[h + t, h r1 + y])]
Finally, the code to judge the relative cost of different expressions is below. The weights are conceptual at this point as that is still an area I am researching.
Input:
cost[e_] :=
Total[MapThread[
Count[e, #1, Infinity, Heads -> True]*#2 &, {{Plus, Times, Sqrt,
f}, {1, 2, 5, 10}}]]
cost[transformed]
Output:
100
There are also some routines here implemented here by this author: http://stoney.sb.org/wordpress/2009/06/converting-symbolic-mathematica-expressions-to-c-code/
I packaged it into a *.M file and have fixed a bug (if the expression has no repeated subexpressions the it dies), and I am trying to find the author's contact info to see if I can upload his modified code to pastebin or wherever.
EDIT: I have received permission from the author to upload it and have pasted it here: http://pastebin.com/fjYiR0B3
To identify repeating subexpressions, you could use something like this
(*helper functions to add Dictionary-like functionality*)
index[downvalue_,
dict_] := (downvalue[[1]] /. HoldPattern[dict[x_]] -> x) //
ReleaseHold;
value[downvalue_] := downvalue[[-1]];
indices[dict_] :=
Map[#[[1]] /. {HoldPattern[dict[x_]] -> x} &, DownValues[dict]] //
ReleaseHold;
values[dict_] := Map[#[[-1]] &, DownValues[dict]];
items[dict_] := Map[{index[#, dict], value[#]} &, DownValues[dict]];
indexQ[dict_, index_] :=
If[MatchQ[dict[index], HoldPattern[dict[index]]], False, True];
(*count number of times each sub-expressions occurs *)
expr = Cos[x + Cos[Cos[x] + Sin[x]]] + Cos[Cos[x] + Sin[x]];
Map[(counts[#] = If[indexQ[counts, #], counts[#] + 1, 1]; #) &, expr,
Infinity];
items[counts] // Column
I tried to mimic the dictionary compression function appears on this blog: https://writings.stephenwolfram.com/2018/11/logic-explainability-and-the-future-of-understanding/
Here is what I made:
DictionaryCompress[expr_, count_, size_, func_] := Module[
{t, s, rule, rule1, rule2},
t = Tally#Level[expr, Depth[expr]];
s = Sort[
Select[{First##, Last##, Depth[First##]} & /#
t, (#[[2]] > count && #[[3]] > size) &], #1[[2]]*#1[[3]] < #2[[
2]]*#2[[2]] &];
rule = MapIndexed[First[#1] -> func ## #2 &, s];
rule = (# //. Cases[rule, Except[#]]) & /# rule;
rule1 = Select[rule, ! FreeQ[#, Plus] &];
rule2 = Complement[rule, rule1];
rule = rule1 //. (Reverse /# rule2);
rule = rule /. MapIndexed[ Last[#1] -> func ## #2 &, rule];
{
expr //. rule,
Reverse /# rule
}
];
poly = Sum[Subscript[c, k] x^k, {k, 0, 4}];
sol = Solve[poly == 0, x];
expr = x /. sol;
Column[{Column[
MapIndexed[
Style[TraditionalForm[Subscript[x, First[#2]] == #], 20] &, #[[
1]]], Spacings -> 1],
Column[Style[#, 20] & /# #[[2]], Spacings -> 1, Frame -> All]
}] &#DictionaryCompress[expr, 1, 1,
Framed[#, Background -> LightYellow] &]

In Mathematica, how can I define an arbitrary probability distribution?

I want an arbitrary function p[x] that integrates to 1 and for all x, 0 <= p[x] <= 1. Some kind of transformation rule?
You could use ProbabilityDistribution for this together with an undefined function of x:
dist = ProbabilityDistribution[p[x], {x, -Infinity, Infinity}];
It now knows a few rules to apply:
continuous probability density: probability of a single value is zero
In[26]:= Probability[x == 0, x \[Distributed] dist]
Out[26]= 0
the probability of having a value at all
In[28]:= Probability[x > 0 || x <= 0, x \[Distributed] dist]
Out[28]= 1
The CDF at - infinity
In[29]:= CDF[dist][-\[Infinity]]
Out[29]= 0
The CDF at + infinity
In[30]:= CDF[dist][\[Infinity]]
Out[30]= 1
The PDF
In[32]:= PDF[dist][x]
Out[32]= p[x]
However, it doesn't assume the PDF of the distribution is normalized:
In[33]:= Integrate[PDF[dist][x], {x, -Infinity, Infinity}]
Out[33]= Integrate[p[x], {x, -Infinity, Infinity}]
The latter can be taught, defining an UpValue for p:
p /: Integrate[p[x], {x, -Infinity, Infinity}] = 1;
Now it can integrate the PDF:
In[4]:= Integrate[PDF[dist][x], {x, -Infinity, Infinity}]
Out[4]= 1
You know that your second requirement, i.e. 0 <= p[x] <= 1, is not generally true for probability density functions, do you?
In case you're just asking for examples of density functions (PDFs) that match your criteria, here are two (out of uncountably many):
p(x) = 1 if 0 < x < 1
0 otherwise
p(x) = x/2 if 0 < x < 2
0 otherwise
We could even generalize those slightly:
p(x) = 1/k if 0 < x < k
0 otherwise
p(x) = 2x/k^2 if 0 < x < k
0 otherwise
The latter works for k >= 2.
We can even generalize that with another parameter to get a class of such functions with arbitrary exponent
p(x) = (a+1)/k^(a+1)*x^a if 0 < x < k
0 otherwise
which works for all a > 1 and k > a+1.
For more interesting examples I think you'll need to give more criteria.
You mention a transformation rule so perhaps you'd like to take an arbitrary bounded function on R1 and translate/scale it so that it's always between 0 and 1 and integrates to 1.
That will have a straightforward answer as long as you can get the min, max, and integral of the given function.
Go ahead and edit the question to ask that if that's indeed what you're looking for.

find minimum of a function defined by integration in Mathematica

I need to find the minimum of a function f(t) = int g(t,x) dx over [0,1]. What I did in mathematica is as follows:
f[t_] = NIntegrate[g[t,x],{x,-1,1}]
FindMinimum[f[t],{t,t0}]
However mathematica halts at the first try, because NIntegrate does not work with the symbolic t. It needs a specific value to evaluate. Although Plot[f[t],{t,0,1}] works perferctly, FindMinimum stops at the initial point.
I cannot replace NIntegrate by Integrate, because the function g is a bit complicated and if you type Integrate, mathematica just keep running...
Any way to get around it? Thanks!
Try this:
In[58]:= g[t_, x_] := t^3 - t + x^2
In[59]:= f[t_?NumericQ] := NIntegrate[g[t, x], {x, -1, 1}]
In[60]:= FindMinimum[f[t], {t, 1}]
Out[60]= {-0.103134, {t -> 0.57735}}
In[61]:= Plot[f[t], {t, 0, 1}]
Two relevant changes I made to your code:
Define f with := instead of with =. This effectively gives a definition for f "later", when the user of f has supplied the values of the arguments. See SetDelayed.
Define f with t_?NumericQ instead of t_. This says, t can be anything numeric (Pi, 7, 0, etc). But not anything non-numeric (t, x, "foo", etc).
An ounce of analysis...
You can get an exact answer and completely avoid the heavy lifting of the numerical integration, as long as Mathematica can do symbolic integration of g[t,x] w.r.t x and then symbolic differentiation w.r.t. t. A less trivial example with a more complicated g[t,x] including polynomial products in x and t:
g[t_, x_] := t^2 + (7*t*x - (x^3)/13)^2;
xMax = 1; xMin = -1; f[t_?NumericQ] := NIntegrate[g[t, x], {x, xMin, xMax}];
tMin = 0; tMax = 1;Plot[f[t], {t, tMin, tMax}];
tNumericAtMin = t /. FindMinimum[f[t], {t, tMax}][[2]];
dig[t_, x_] := D[Integrate[g[t, x], x], t];
Print["Differentiated integral is ", dig[t, x]];
digAtXMax = dig[t, x] /. x -> xMax; digAtXMin = dig[t, x] /. x -> xMin;
tSymbolicAtMin = Resolve[digAtXMax - digAtXMin == 0 && tMin ≤ t ≤ tMax, {t}];
Print["Exact: ", tSymbolicAtMin[[2]]];
Print["Numeric: ", tNumericAtMin];
Print["Difference: ", tSymbolicAtMin [[2]] - tNumericAtMin // N];
with the result:
⁃Graphics⁃
Differentiated integral is 2 t x + 98 t x^3 / 3 - 14 x^5 / 65
Exact: 21/3380
Numeric: 0.00621302
Difference: -3.01143 x 10^-9
Minimum of the function can be only at zero-points of it's derivate, so why to integrate in the first place?
You can use FindRoot or Solve to find roots of g
Then you can verify that points are really local minimums by checking derivates of g (it should be positive at that point).
Then you can NIntegrate to find minimum value of f - only one numerical integration!

Resources