Hash collision rate calculation of 2-choice hashing - algorithm

For a hash function, I can calculate its collision rate by simple/brute force math calculation:
We see that the collision probability of 32-bit hashing is quite high. In order to reduce the collision rate, I'm implementing a variant of 2-choice hashing, which calculates the hash key by two hash functions. I want to know how to calculate the collision probability of my new solution.
Thanks in advance.

I think my question falls into a case of general birthday problem P(k, n, d) that k events collide within n trials in sample space d where k equals 3 and d equals 2^32.
I did numerical solving using the equation provided by here
Table[NSolve[
n E^(-n/(
d k )) - (d^(k - 1) k! Log[1/(1 - p)] (1 - n/(d (k + 1))))^(1/
k) == 0 /. { p -> 0.5 , k -> i, d -> 2^32}, n, Reals], {i, 2, 4}]
The result seems convencible.
{{{n -> 77163.2}}, {{n -> 4.25017*10^6}}, {{n -> 3.39364*10^7}}}
I also tried the anwser provided here and got a close enough result:
Block[{
M = 2^32,
sampleCount = 8000000,
options = Sequence ## {PlotLegends -> "Expressions",
GridLines -> {{0.5}},
MeshFunctions -> {{x, y} |-> y}, Mesh -> {{0.5}},
MeshStyle -> Directive[PointSize[Medium], Red],
PlotRange -> {0, 1}}},
Plot[ 1 - Exp[-Binomial[k, 3]/M^2], {k, 1, sampleCount},
PlotLegends -> "Expressions",
Evaluate # options ]
]

Related

What is wrong with this implementation of IDA* algorithm in Haskell? Bad heuristic or simply bad code?

I am trying to write a haskell program which can solve rubiks' cube. Firstly I tried this, but did not figure a way out to avoid writing a whole lot of codes, so I tried using an IDA* search for this task.
But I do not know which heuristic is appropriate here: I tried dividing the problem into subproblems, and measuring the distance from being in a reduced state, but the result is disappointing: the program cannot reduce a cube that is three moves from a standard cube in a reasonable amount of time. I tried measuring parts of edges, and then either summing them, or using the maximums... but none of these works, and the result is almost identical.
So I want to know what the problem with the code is: is the heuristic I used non-admissible? Or is my code causing some infinite loops that I did not detect? Or both? And how to fix that? The code (the relevant parts) goes as follows:
--Some type declarations
data Colors = R | B | W | Y | G | O
type R3 = (Int, Int, Int)
type Cube = R3 -> Colors
points :: [R3] --list of coordinates of facelets of a cube; there are 48 of them.
mU :: Cube -> Cube --and other 5 similar moves.
type Actions = [Cube -> Cube]
turn :: Cube -> Actions -> Cube --chains the actions and turns the cube.
edges :: [R3] --The edges of cubes
totheu1 :: Cube -> Int -- Measures how far away the cube is from having the cross of the first layer solved.
totheu1 c = sum $ map (\d -> if d then 0 else 1)
[c (-2, 3, 0) == c (0, 3, 0),
c (2, 3, 0) == c (0, 3, 0),
c (0, 3, -2) == c (0, 3, 0),
c (0, 3, 2) == c (0, 3, 0),
c (0, 2, -3) == c (0, 0, -3),
c (-3, 2, 0) == c (-3, 0, 0),
c (0, 2, 3) == c (0, 0, 3),
c (3, 2, 0) == c (3, 0, 0)]
expandnr :: (Cube -> Cube) -> Cube -> [(Cube, String)] -- Generates a list of tuples of cubes and strings,
-- the result after applying a move, and the string represents that move, while avoiding moving on the same face as the last one,
-- and avoiding repetitions caused by commuting moves, like U * D = D * U.
type StateSpace = (Int, [String], Cube) -- Int -> f value, [String] = actions applied so far, Cube = result cube.
fstst :: StateSpace -> Int
fstst s#(x, y, z) = x
stst :: StateSpace -> [String]
stst s#(x, y, z) = y
cbst :: StateSpace -> Cube
cbst s#(x, y, z) = z
stage1 :: Cube -> StateSpace
stage1 c = (\(x, y, z) -> (x, [sconcat y], z)) t
where
bound = totheu1 c
t = looping c bound
looping c bound = do let re = search (c, [""]) (\j -> j) 0 bound
let found = totheu1 $ cbst re
if found == 0 then re else looping c found
sconcat [] = ""
sconcat (x:xs) = x ++ (sconcat xs)
straction :: String -> Actions -- Converts strings to actions
search :: (Cube, [String]) -> (Cube -> Cube) -> Int -> Int -> StateSpace
search cs#(c, s) k g bound
| f > bound = (f, s, c)
| totheu1 c == 0 = (0, s, c)
| otherwise = ms
where
f = g + totheu1 c
olis = do
(succs, st) <- expandnr k c
let [newact] = straction st
let t = search (succs, s ++ [st]) newact (g + 1) bound
return t
lis = map fstst olis
mlis = minimum lis
ms = olis !! (ind)
Just ind = elemIndex mlis lis
I know that this heuristic is inconsistent, but am not sure if it is really admissible, maybe the problem is its non-admissibility?
Any ideas, hints, and suggestions are well appreciated, thanks in advance.
Your heuristic is inadmissible. An admissible heuristic must be a lower bound on the real cost of a solution.
You are trying to use as a heuristic the number of side pieces of the first layer that aren't correct, or perhaps the number of faces of the side pieces of the first layer that aren't correct, which is what you have actually written. Either way the heuristic is inadmissible.
The following cube is only 1 move away from being solved, but 4 of the pieces in the first layer are in incorrect positions and 4 of the faces have the wrong color. Either heuristic would say that this puzzle will take at least 4 moves to solve when it can be solved in only 1 move. The heuristics are inadmissible, because they are not lower bounds on the real cost of the solution.

Solve linear system of equations with symbolic expressions

Hi I am trying to solve a linear system of equations with mathematica. I have 18 equations and 18 Unknowns and the coefficient matrix has full rank. All entries are symbolic since I am trying to solve the problem analytically. Unfortunately Mathematica never stops the evaluation. I have prepared a minimal working example:
n = 18
A = Table[AA[i, j], {i, 1, n}, {j, 1, n}];
A // MatrixForm
x = Table[xx[i], {i, 1, n}]
b = Table[bb[i], {i, 1, n}]
MatrixRank[A]
sol = Timing[Solve[{A.x == b}, x, Reals]]
A.x == b //. sol[[2]][[1]] // Simplify
For n=2,3,4,.. all works perfectly well. But with n=10... nothing works anymore.
Why has mathematica such problems solving this?
Is there a way to solve this problem?
Thanks for help,
Andreas
You simply need more memory:
The symbolic solution involves n+1 determinants, here is an estimate of the memory needed.
bc[n_] := (A = Det[Array[a, {n, n}]];ByteCount[A])
ListLogPlot[
t = Table[ {n, (n + 1) bc[n] /1024^3 // N} , {n,2,10}], Joined -> True]
extrapolating to n=18 we can see you'll need only about 10^8 Gigabytes..
(thats 1000x more than the largest supercomputers for anyone not getting the point )

quantiles of sums using copula distributions too slow

Trying to create a table for quantiles of the sum of two dependent random variables using built-in copula distributions (Clayton, Frank, Gumbel) with Beta marginals. Tried NProbability and FindRoot with various methods -- not fast enough.
An example of the copula-marginal combinations I need to explore is the following:
nProbClayton[t_?NumericQ, c_?NumericQ] :=
NProbability[ x + y <= t, {x, y} \[Distributed]
CopulaDistribution[{"Clayton", c}, {BetaDistribution[8, 2],
BetaDistribution[8, 2]}]]
For a single evaluation of the numeric probability using
nProbClayton[1.9, 1/10] // Timing // Quiet
I get
{4.914, 0.939718}
on a Vista 64bit Core2 Duo T9600 2.80GHz machine (MMA 8.0.4)
To get a quantile of the sum, using
FindRoot[nProbClayton[q, 1/10] == 1/100, {q, 1, 0, 2}// Timing // Quiet
with various methods
( `Method -> Automatic`, `Method -> "Brent"`, `Method -> "Secant"` )
takes about a minute to find a single quantile: Timings are
{48.781, {q -> 0.918646}}
{50.045, {q -> 0.918646}}
{65.396, {q -> 0.918646}}
For other copula-marginal combinations timings are marginally better.
Need: any tricks/methods to improve timings.
The CDF of a Clayton-Pareto copula with parameter c can be calculated according to
cdf[c_] := Module[{c1 = CDF[BetaDistribution[8, 2]]},
(c1[#1]^(-1/c) + c1[#2]^(-1/c) - 1)^(-c) &]
Then, cdf[c][t1,t2] is the probability that x<=t1 and y<=t2. This means that you can calculate the probability that x+y<=t according to
prob[t_?NumericQ, c_?NumericQ] :=
NIntegrate[Derivative[1, 0][cdf[c]][x, t - x], {x, 0, t}]
The timings I get on my machine are
prob[1.9, .1] // Timing
(* ==> {0.087518, 0.939825} *)
Note that I get a different value for the probability from the one in the original post. However, running nProbClayton[1.9,0.1] produces a warning about slow convergence which could mean that the result in the original post is off. Also, if I change x+y<=t to x+y>t in the original definition of nProbClayton and calculate 1-nProbClayton[1.9,0.1] I get 0.939825 (without warnings) which is the same result as above.
For the quantile of the sum I get
FindRoot[prob[q, .1] == .01, {q, 1, 0, 2}] // Timing
(* ==> {1.19123, {q -> 0.912486}} *)
Again, I get a different result from the one in the original post but similar to before, changing x+y<=t to x+y>t and calculating FindRoot[nProbClayton[q, 1/10] == 1-1/100, {q, 1, 0, 2}] returns the same value for q as above.

find minimum of a function defined by integration in Mathematica

I need to find the minimum of a function f(t) = int g(t,x) dx over [0,1]. What I did in mathematica is as follows:
f[t_] = NIntegrate[g[t,x],{x,-1,1}]
FindMinimum[f[t],{t,t0}]
However mathematica halts at the first try, because NIntegrate does not work with the symbolic t. It needs a specific value to evaluate. Although Plot[f[t],{t,0,1}] works perferctly, FindMinimum stops at the initial point.
I cannot replace NIntegrate by Integrate, because the function g is a bit complicated and if you type Integrate, mathematica just keep running...
Any way to get around it? Thanks!
Try this:
In[58]:= g[t_, x_] := t^3 - t + x^2
In[59]:= f[t_?NumericQ] := NIntegrate[g[t, x], {x, -1, 1}]
In[60]:= FindMinimum[f[t], {t, 1}]
Out[60]= {-0.103134, {t -> 0.57735}}
In[61]:= Plot[f[t], {t, 0, 1}]
Two relevant changes I made to your code:
Define f with := instead of with =. This effectively gives a definition for f "later", when the user of f has supplied the values of the arguments. See SetDelayed.
Define f with t_?NumericQ instead of t_. This says, t can be anything numeric (Pi, 7, 0, etc). But not anything non-numeric (t, x, "foo", etc).
An ounce of analysis...
You can get an exact answer and completely avoid the heavy lifting of the numerical integration, as long as Mathematica can do symbolic integration of g[t,x] w.r.t x and then symbolic differentiation w.r.t. t. A less trivial example with a more complicated g[t,x] including polynomial products in x and t:
g[t_, x_] := t^2 + (7*t*x - (x^3)/13)^2;
xMax = 1; xMin = -1; f[t_?NumericQ] := NIntegrate[g[t, x], {x, xMin, xMax}];
tMin = 0; tMax = 1;Plot[f[t], {t, tMin, tMax}];
tNumericAtMin = t /. FindMinimum[f[t], {t, tMax}][[2]];
dig[t_, x_] := D[Integrate[g[t, x], x], t];
Print["Differentiated integral is ", dig[t, x]];
digAtXMax = dig[t, x] /. x -> xMax; digAtXMin = dig[t, x] /. x -> xMin;
tSymbolicAtMin = Resolve[digAtXMax - digAtXMin == 0 && tMin ≤ t ≤ tMax, {t}];
Print["Exact: ", tSymbolicAtMin[[2]]];
Print["Numeric: ", tNumericAtMin];
Print["Difference: ", tSymbolicAtMin [[2]] - tNumericAtMin // N];
with the result:
⁃Graphics⁃
Differentiated integral is 2 t x + 98 t x^3 / 3 - 14 x^5 / 65
Exact: 21/3380
Numeric: 0.00621302
Difference: -3.01143 x 10^-9
Minimum of the function can be only at zero-points of it's derivate, so why to integrate in the first place?
You can use FindRoot or Solve to find roots of g
Then you can verify that points are really local minimums by checking derivates of g (it should be positive at that point).
Then you can NIntegrate to find minimum value of f - only one numerical integration!

Solving vector equations in Mathematica

I'm trying to figure out how to use Mathematica to solve systems of equations where some of the variables and coefficients are vectors. A simple example would be something like
where I know A, V, and the magnitude of P, and I have to solve for t and the direction of P. (Basically, given two rays A and B, where I know everything about A but only the origin and magnitude of B, figure out what the direction of B must be such that it intersects A.)
Now, I know how to solve this sort of thing by hand, but that's slow and error-prone, so I was hoping I could use Mathematica to speed things along and error-check me. However, I can't see how to get Mathematica to symbolically solve equations involving vectors like this.
I've looked in the VectorAnalysis package, without finding anything there that seems relevant; meanwhile the Linear Algebra package only seems to have a solver for linear systems (which this isn't, since I don't know t or P, just |P|).
I tried doing the simpleminded thing: expanding the vectors into their components (pretend they're 3D) and solving them as if I were trying to equate two parametric functions,
Solve[
{ Function[t, {Bx + Vx*t, By + Vy*t, Bz + Vz*t}][t] ==
Function[t, {Px*t, Py*t, Pz*t}][t],
Px^2 + Py^2 + Pz^2 == Q^2 } ,
{ t, Px, Py, Pz }
]
but the "solution" that spits out is a huge mess of coefficients and congestion. It also forces me to expand out each of the dimensions I feed it.
What I want is a nice symbolic solution in terms of dot products, cross products, and norms:
But I can't see how to tell Solve that some of the coefficients are vectors instead of scalars.
Is this possible? Can Mathematica give me symbolic solutions on vectors? Or should I just stick with No.2 Pencil technology?
(Just to be clear, I'm not interested in the solution to the particular equation at top -- I'm asking if I can use Mathematica to solve computational geometry problems like that generally without my having to express everything as an explicit matrix of {Ax, Ay, Az}, etc.)
With Mathematica 7.0.1.0
Clear[A, V, P];
A = {1, 2, 3};
V = {4, 5, 6};
P = {P1, P2, P3};
Solve[A + V t == P, P]
outputs:
{{P1 -> 1 + 4 t, P2 -> 2 + 5 t, P3 -> 3 (1 + 2 t)}}
Typing out P = {P1, P2, P3} can be annoying if the array or matrix is large.
Clear[A, V, PP, P];
A = {1, 2, 3};
V = {4, 5, 6};
PP = Array[P, 3];
Solve[A + V t == PP, PP]
outputs:
{{P[1] -> 1 + 4 t, P[2] -> 2 + 5 t, P[3] -> 3 (1 + 2 t)}}
Matrix vector inner product:
Clear[A, xx, bb];
A = {{1, 5}, {6, 7}};
xx = Array[x, 2];
bb = Array[b, 2];
Solve[A.xx == bb, xx]
outputs:
{{x[1] -> 1/23 (-7 b[1] + 5 b[2]), x[2] -> 1/23 (6 b[1] - b[2])}}
Matrix multiplication:
Clear[A, BB, d];
A = {{1, 5}, {6, 7}};
BB = Array[B, {2, 2}];
d = {{6, 7}, {8, 9}};
Solve[A.BB == d]
outputs:
{{B[1, 1] -> -(2/23), B[2, 1] -> 28/23, B[1, 2] -> -(4/23), B[2, 2] -> 33/23}}
The dot product has an infix notation built in just use a period for the dot.
I do not think the cross product does however. This is how you use the Notation package to make one. "X" will become our infix form of Cross. I suggest coping the example from the Notation, Symbolize and InfixNotation tutorial. Also use the Notation Palette which helps abstract away some of the Box syntax.
Clear[X]
Needs["Notation`"]
Notation[x_ X y_\[DoubleLongLeftRightArrow]Cross[x_, y_]]
Notation[NotationTemplateTag[
RowBox[{x_, , X, , y_, }]] \[DoubleLongLeftRightArrow]
NotationTemplateTag[RowBox[{ ,
RowBox[{Cross, [,
RowBox[{x_, ,, y_}], ]}]}]]]
{a, b, c} X {x, y, z}
outputs:
{-c y + b z, c x - a z, -b x + a y}
The above looks horrible but when using the Notation Palette it looks like:
Clear[X]
Needs["Notation`"]
Notation[x_ X y_\[DoubleLongLeftRightArrow]Cross[x_, y_]]
{a, b, c} X {x, y, z}
I have run into some quirks using the notation package in the past versions of mathematica so be careful.
I don't have a general solution for you by any means (MathForum may be the better way to go), but there are some tips that I can offer you. The first is to do the expansion of your vectors into components in a more systematic way. For instance, I would solve the equation you wrote as follows.
rawSol = With[{coords = {x, y, z}},
Solve[
Flatten[
{A[#] + V[#] t == P[#] t & /# coords,
Total[P[#]^2 & /# coords] == P^2}],
Flatten[{t, P /# coords}]]];
Then you can work with the rawSol variable more easily. Next, because you are referring the vector components in a uniform way (always matching the Mathematica pattern v_[x|y|z]), you can define rules that will aid in simplifying them. I played around a bit before coming up with the following rules:
vectorRules =
{forms___ + vec_[x]^2 + vec_[y]^2 + vec_[z]^2 :> forms + vec^2,
forms___ + c_. v1_[x]*v2_[x] + c_. v1_[y]*v2_[y] + c_. v1_[z]*v2_[z] :>
forms + c v1\[CenterDot]v2};
These rules will simplify the relationships for vector norms and dot products (cross-products are left as a likely painful exercise for the reader). EDIT: rcollyer pointed out that you can make c optional in the rule for dot products, so you only need two rules for norms and dot products.
With these rules, I was immediately able to simplify the solution for t into a form very close to yours:
In[3] := t /. rawSol //. vectorRules // Simplify // InputForm
Out[3] = {(A \[CenterDot] V - Sqrt[A^2*(P^2 - V^2) +
(A \[CenterDot] V)^2])/(P^2 - V^2),
(A \[CenterDot] V + Sqrt[A^2*(P^2 - V^2) +
(A \[CenterDot] V)^2])/(P^2 - V^2)}
Like I said, it's not a complete way of solving these kinds of problems by any means, but if you're careful about casting the problem into terms that are easy to work with from a pattern-matching and rule-replacement standpoint, you can go pretty far.
I've taken a somewhat different approach to this issue. I've made some definitions that return this output:
Patterns that are known to be vector quantities may be specified using vec[_], patterns that have an OverVector[] or OverHat[] wrapper (symbols with a vector or hat over them) are assumed to be vectors by default.
The definitions are experimental and should be treated as such, but they seem to work well. I expect to add to this over time.
Here are the definitions. The need to be pasted into a Mathematica Notebook cell and converted to StandardForm to see them properly.
Unprotect[vExpand,vExpand$,Cross,Plus,Times,CenterDot];
(* vec[pat] determines if pat is a vector quantity.
vec[pat] can be used to define patterns that should be treated as vectors.
Default: Patterns are assumed to be scalar unless otherwise defined *)
vec[_]:=False;
(* Symbols with a vector hat, or vector operations on vectors are assumed to be vectors *)
vec[OverVector[_]]:=True;
vec[OverHat[_]]:=True;
vec[u_?vec+v_?vec]:=True;
vec[u_?vec-v_?vec]:=True;
vec[u_?vec\[Cross]v_?vec]:=True;
vec[u_?VectorQ]:=True;
(* Placeholder for matrix types *)
mat[a_]:=False;
(* Anything not defined as a vector or matrix is a scalar *)
scal[x_]:=!(vec[x]\[Or]mat[x]);
scal[x_?scal+y_?scal]:=True;scal[x_?scal y_?scal]:=True;
(* Scalars times vectors are vectors *)
vec[a_?scal u_?vec]:=True;
mat[a_?scal m_?mat]:=True;
vExpand$[u_?vec\[Cross](v_?vec+w_?vec)]:=vExpand$[u\[Cross]v]+vExpand$[u\[Cross]w];
vExpand$[(u_?vec+v_?vec)\[Cross]w_?vec]:=vExpand$[u\[Cross]w]+vExpand$[v\[Cross]w];
vExpand$[u_?vec\[CenterDot](v_?vec+w_?vec)]:=vExpand$[u\[CenterDot]v]+vExpand$[u\[CenterDot]w];
vExpand$[(u_?vec+v_?vec)\[CenterDot]w_?vec]:=vExpand$[u\[CenterDot]w]+vExpand$[v\[CenterDot]w];
vExpand$[s_?scal (u_?vec\[Cross]v_?vec)]:=Expand[s] vExpand$[u\[Cross]v];
vExpand$[s_?scal (u_?vec\[CenterDot]v_?vec)]:=Expand[s] vExpand$[u\[CenterDot]v];
vExpand$[Plus[x__]]:=vExpand$/#Plus[x];
vExpand$[s_?scal,Plus[x__]]:=Expand[s](vExpand$/#Plus[x]);
vExpand$[Times[x__]]:=vExpand$/#Times[x];
vExpand[e_]:=e//.e:>Expand[vExpand$[e]]
(* Some simplification rules *)
(u_?vec\[Cross]u_?vec):=\!\(\*OverscriptBox["0", "\[RightVector]"]\);
(u_?vec+\!\(\*OverscriptBox["0", "\[RightVector]"]\)):=u;
0v_?vec:=\!\(\*OverscriptBox["0", "\[RightVector]"]\);
\!\(\*OverscriptBox["0", "\[RightVector]"]\)\[CenterDot]v_?vec:=0;
v_?vec\[CenterDot]\!\(\*OverscriptBox["0", "\[RightVector]"]\):=0;
(a_?scal u_?vec)\[Cross]v_?vec :=a u\[Cross]v;u_?vec\[Cross](a_?scal v_?vec ):=a u\[Cross]v;
(a_?scal u_?vec)\[CenterDot]v_?vec :=a u\[CenterDot]v;
u_?vec\[CenterDot](a_?scal v_?vec) :=a u\[CenterDot]v;
(* Stealing behavior from Dot *)
Attributes[CenterDot]=Attributes[Dot];
Protect[vExpand,vExpand$,Cross,Plus,Times,CenterDot];

Resources