Related
I am interested in simulating the phenomenon of "regression to the mean". Say a 0-1 vector V of length N is "gifted" if the number of 1s in V is greater than N/2 + 5*sqrt(N).
I want Maple to evaluate a string of M 0-1 lists, each of length N, to determine whether they are gifted.
Then, given that list V[i] is gifted, I want to evaluate the probability that list V[i+1] is gifted.
So far my code is failing in a strange way. So far all the code is supposed to do is create the list of sums (called 'total') and the list 'g' which carries a 0 if total[i] <= N/2 + 5sqrt(N), and a 1 otherwise.
Here is the code:
RS:=proc(N) local ra,i:
ra:=rand(0..1):
[seq(ra(),i=1..N)]:
end:
Gift:=proc(N,M) local total, i, g :
total:=[seq(add(RS(N)),i=1..M)]:
g:=[seq(0,i=1..M)]:
for i from 1 to M do
if total[i] > (N/2 + 5*(N^(1/2))) then
g[i]:=1
fi:
od:
print(total, g)
end:
The trouble is, Maple responds, when I try Gift(100,20),
"Error, (in Gift) cannot determine if this expression is true or false: 5*100^(1/2) < -2"
or, when I try Gift(10000,20), "Error, (in Gift) cannot determine if this expression is true or false: 5*10000^(1/2) < -103."
Where are these negative numbers coming from? And why can't Maple tell whether 5(10000)^{1/2} < -103 or not?
The negative quantities are simply the part of the inequality that results when the portion with the radical is moved to one side and the purely rational portion is moved to the other.
Use an appropriate mechanism for the resolution of the conditional test. For example,
if is( total[i] > (N/2 + 5*N^(1/2)) ) then
...etc
or, say,
temp := evalf(N/2 + 5*N^(1/2));
for i from 1 to M do
if total[i] > temp then
...etc
From the Maple online help:
Important: The evalb command does not simplify expressions. It may return false for a relation that is true. In such a case, apply a simplification to the relation before using evalb.
...
You must convert symbolic arguments to floating-point values when using the evalb command for inequalities that use <, <=, >, or >=.
In this particular example, Maple chokes when trying to determine if the symbolic square root is less than -2, though it tried its best to simplify before quitting.
One fix is to apply evalf to inequalities. Rather than, say, evalb(x < y), you would write evalb(evalf(x < y)).
As to why Maple can't handle these inequalities, I don't know.
I'm trying to solve this question for an hour and just can't find any way to do so.
The question is as follows:
A sorted list, length N. There might be duplicates inside the list.
Given an element x, you need to find the latest index of x in the list.
If x does not exist, return a relevant message.
Note: The model is CREW (Concurrent Read Exclusive Write) - meaning concurrent read is allowed, but write is exclusive meaning concurrent write is not allowed.
1) Describe a parallel algorithm that uses N CPUs and solves the problem in a fixed amount of time (I guess they mean O(1)).
2) Explain why the algorithm described is correct.
I assume the input is a 0-indexed, sorted (increasing) array A[] of length N.
Initialise a shared result variable with the value UNSET:
RESULT := "UNSET"
Start N CPUs with the following program, parameterized by i (from 0 to N-1):
CPU(i):
if i==0 and A[0] > x {
RESULT = "NO SOLUTION"
} else if A[i] == x and (i + 1 == N or A[i+1] > x) {
RESULT = i
} else if A[i] < x and (i + 1 == N or A[i+1] > x) {
RESULT = "NO SOLUTION"
}
The program has terminated when RESULT is updated.
Note that exactly one CPU writes to RESULT (because the input is sorted), so there's never a concurrent write, but each array location except the first is read by two CPUs. Each CPU does a fixed amount of work, so the program terminates in a fixed amount of time.
Given an array a[], and do the operation a[i]-x, a[j]+x (x <= a[i]) to two elements of this array for each time. After at most K times operation like that, to ensure the value of max(abs (a[i] - a[j])) is smallest, and get this smallest value?
My solution:
Each time, choose two number from this array, and ensure their sum is constant. After K times operation,
we can get the minimal absolute value of the difference of two elements in the array.
However, I do not know whether my idea is correct? if not, how to solve it correctly?
If I correctly understand your algorithm/question there is no need to make any calculations during performing a[i]-x, a[j]+x operations. So my suggestion is:
1) make required number of a[i]-x, a[j]+x operations
2) do the following procedure (in pseudo-code):
_aSorted[] = sort(_a[])
_dif = max integer value
for (i=0; i < _a[].length - 1; i++){
if(abs(_aSorted[i]-_aSorted[i+1]) < _dif)
_dif = abs(_aSorted[i] -_aSorted[i+1]);
}
So after this procedure _dif holds the required result
Originally, I had basically written an essay with a question at the end, so I'm going to cram it down to this: which is better (being really nit-picky here)?
A)
int min = someArray[0][0];
for (int i = 0; i < someArray.length; i++)
for (int j = 0; j < someArray[i].length; j++)
min = Math.min(min, someArray[i][j]);
-or-
B)
int min = int.MAX_VALUE;
for (int i = 0; i < someArray.length; i++)
for (int j = 0; j < someArray[i].length; j++)
min = Math.min(min, someArray[i][j]);
I reckon b is faster, saving an instruction or two by initializing min to a constant value instead of using the indexer. It also feels less redundant - no comparing someArray[0][0] to itself...
As an algorithm, which is better/valid-er.
EDIT: Assume that the array is not null and not empty.
EDIT2: Fixed a couple of careless errors.
Both of these algorithms are correct (assuming, of course, the array is nonempty). I think that version A works more generally, since for some types (strings, in particular) there may not be a well-defined maximum value.
The reason that these algorithms are equivalent has to do with a cool mathematical object called a semilattice. To motivate semilattices, there are few cool properties of max that happen to hold true:
max is idempotent, so applying it to the same value twice gives back that original value: max(x, x) = x
max is commutative, so it doesn't matter what order you apply it to its arguments: max(x, y) = max(y, x)
max is associative, so when taking the maximum of three or more values it doesn't matter how you group the elements: max(max(x, y), z) = max(x, max(y, z))
These laws also hold for the minimum as well, as well as many other structures. For example, if you have a tree structure, the "least upper bound" operator also satisfies these constraints. Similarly, if you have a collection of sets and set union or intersection, you'd find that these constraints hold as well.
If you have a set of elements (for example, integers, strings, etc.) and some binary operator defined over them with the above three properties (idempotency, commutativity, and associativity), then you have found a structure called a semilattice. The binary operator is then called a meet operator (or sometimes a join operator depending on the context).
The reason that semilattices are useful is that if you have a (finite) collection of elements drawn from a semilattice and want to compute their meet, you can do so by using a loop like this:
Element e = data[0];
for (i in data[1 .. n])
e = meet(e, data[i])
The reason that this works is that because the meet operator is commutative and associative, we can apply the meet across the elements in any order that we want. Applying it one element at a time as we walk across the elements of the array in order thus produces the same value than if we had shuffled the array elements first, or iterated in reverse order, etc. In your case, the meet operator was "max" or "min," and since they satisfy the laws for meet operators described above the above code will correctly compute the max or min.
To address your initial question, we need a bit more terminology. You were curious about whether or not it was better or safer to initialize your initial guess of the minimum value to be the maximum possible integer. The reason this works is that we have the cool property that
min(int.MAX_VALUE, x) = min(x, int.MAX_VALUE) = x
In other words, if you compute the meet of int.MAX_VALUE and any other value, you get the second value back. In mathematical terms, this is because int.MAX_VALUE is the top element of the meet semilattice. More formally, a top element for a meet semilattice is an element (denoted ⊤) satisfying
meet(⊤, x) = meet(x, ⊤) = x
If you use max instead of min, then the top element would be int.MIN_VALUE, since
max(int.MIN_VALUE, x) = max(x, int.MIN_VALUE) = x
Because applying the meet operator to ⊤ and any other element produces that other element, if you have a meet semilattice with a well-defined top element, you can rewrite the above code to compute the meet of all the elements as
Element e = Element.TOP;
for (i in data[0 .. n])
e = meet(e, data[i])
This works because after the first iteration, e is set to meet(e, data[0]) = meet(Element.TOP, data[0]) = data[0] and the iteration proceeds as usual. Consequently, in your original question, it doesn't matter which of the two loops you use; as long as there is at least one element defined, they produce the same value.
That said, not all semilattices have a top element. Consider, for example, the set of all strings where the meet operator is defined as
meet(x, y) = x if x lexicographically precedes y
= y otherwise
For example, meet("a", "ab") = "a", meet("dog, "cat") = "cat", etc. In this case, there is no string s that satisfies the property meet(s, x) = meet(x, s) = x, and so the semilattice has no top element. In that case, you cannot possibly use the second version of the code, because there is no top element that you can initialize the initial value to.
However, there is a very cute technique you can use to fake this, which actually does end up getting used a bit in practice. Given a semilattice with no top element, you can create a new semilattice that does have a top element by introducing a new element ⊤ and arbitrarily defining that meet(⊤, x) = meet(x, ⊤) = x. In other words, this element is specially crafted to be a top element and has no significance otherwise.
In code, you can introduce an element like this implicitly by writing
bool found = false;
Element e;
for (i in data[0 .. n]) {
if (!found) {
found = true;
e = i;
} else {
e = meet(e, i);
}
}
This code works by having an external boolean found keep track of whether or not we have seen the first element yet. If we haven't, then we pretend that the element e is this new top element. Computing the meet of this top element and the array element produces the array element, and so we can just set the element e to be equal to that array element.
Hope this helps! Sorry if this is too theoretical... I just happen to like math. :-)
B is better; if someArray happened to be empty, you'd get a runtime error; But A and B both could have an issue, because if someArray is null (and this wasn't checked in previous lines of code), both A and B will throw exceptions.
From a practical standpoint, I like option A marginally better because if the data type being dealt with changes in the future, changing the initial value is one less thing that needs to be updated (and therefore, one less thing that can go wrong).
From an algorithmic purity standpoint, I have no idea if one is better than the other.
By the way, option A should have its initialization like so:
int min = someArray[0][0];
I will phrase the problem in the precise form that I want below:
Given:
Two floating point lists N and D of the same length k (k is multiple of 2).
It is known that for all i=0,...,k-1, there exists j != i such that D[j]*D[i] == N[i]*N[j]. (I'm using zero-based indexing)
Return:
A (length k/2) list of pairs (i,j) such that D[j]*D[i] == N[i]*N[j].
The pairs returned may not be unique (any valid list of pairs is okay)
The application for this algorithm is to find reciprocal pairs of eigenvalues of a generalized palindromic eigenvalue problem.
The equality condition is equivalent to N[i]/D[i] == D[j]/N[j], but also works when denominators are zero (which is a definite possibility). Degeneracies in the eigenvalue problem cause the pairs to be non-unique.
More generally, the algorithm is equivalent to:
Given:
A list X of length k (k is multiple of 2).
It is known that for all i=0,...,k-1, there exists j != i such that IsMatch(X[i],X[j]) returns true, where IsMatch is a boolean matching function which is guaranteed to return true for at least one j != i for all i.
Return:
A (length k/2) list of pairs (i,j) such that IsMatch(i,j) == true for all pairs in the list.
The pairs returned may not be unique (any valid list of pairs is okay)
Obviously, my first problem can be formulated in terms of the second with IsMatch(u,v) := { (u - 1/v) == 0 }. Now, due to limitations of floating point precision, there will never be exact equality, so I want the solution which minimizes the match error. In other words, assume that IsMatch(u,v) returns the value u - 1/v and I want the algorithm to return a list for which IsMatch returns the minimal set of errors. This is a combinatorial optimization problem. I was thinking I can first naively compute the match error between all possible pairs of indexes i and j, but then I would need to select the set of minimum errors, and I don't know how I would do that.
Clarification
The IsMatch function is reflexive (IsMatch(a,b) implies IsMatch(b,a)), but not transitive. It is, however, 3-transitive: IsMatch(a,b) && IsMatch(b,c) && IsMatch(c,d) implies IsMatch(a,d).
Addendum
This problem is apparently identically the minimum weight perfect matching problem in graph theory. However, in my case I know that there should be a "good" perfect matching, so the distribution of edge weights is not totally random. I feel that this information should be used somehow. The question now is if there is a good implementation to the min-weight-perfect-matching problem that uses my prior knowledge to arrive at a solution early in the search. I'm also open to pointers towards a simple implementation of any such algorithm.
I hope I got your problem.
Well, if IsMatch(i, j) and IsMatch(j, l) then IsMatch(i, l). More generally, the IsMatch relation is transitive, commutative and reflexive, ie. its an equivalence relation. The algorithm translates to which element appears the most times in the list (use IsMatch instead of =).
(If I understand the problem...)
Here is one way to match each pair of products in the two lists.
Multiply each pair N and save it to a structure with the product, and the subscripts of the elements making up the product.
Multiply each pair D and save it to a second instance of the structure with the product, and the subscripts of the elements making up the product.
Sort both structions on the product.
Make a merge-type pass through both sorted structure arrays. Each time you find a product from one array that is close enough to the other, you can record the two subscripts from each sorted list for a match.
You can also use one sorted list for an ismatch function, doing a binary search on the product.
well。。Multiply each pair D and save it to a second instance of the structure with the product, and the subscripts of the elements making up the product.
I just asked my CS friend, and he came up with the algorithm below. He doesn't have an account here (and apparently unwilling to create one), but I think his answer is worth sharing.
// We will find the best match in the minimax sense; we will minimize
// the maximum matching error among all pairs. Alpha maintains a
// lower bound on the maximum matching error. We will raise Alpha until
// we find a solution. We assume MatchError returns an L_1 error.
// This first part finds the set of all possible alphas (which are
// the pairwise errors between all elements larger than maxi-min
// error.
Alpha = 0
For all i:
min = Infinity
For all j > i:
AlphaSet.Insert(MatchError(i,j))
if MatchError(i,j) < min
min = MatchError(i,j)
If min > Alpha
Alpha = min
Remove all elements of AlphaSet smaller than Alpha
// This next part increases Alpha until we find a solution
While !AlphaSet.Empty()
Alpha = AlphaSet.RemoveSmallest()
sol = GetBoundedErrorSolution(Alpha)
If sol != nil
Return sol
// This is the definition of the helper function. It returns
// a solution with maximum matching error <= Alpha or nil if
// no such solution exists.
GetBoundedErrorSolution(Alpha) :=
MaxAssignments = 0
For all i:
ValidAssignments[i] = empty set;
For all j > i:
if MatchError <= Alpha
ValidAssignments[i].Insert(j)
ValidAssignments[j].Insert(i)
// ValidAssignments[i].Size() > 0 due to our choice of Alpha
// in the outer loop
If ValidAssignments[i].Size() > MaxAssignments
MaxAssignments = ValidAssignments[i].Size()
If MaxAssignments = 1
return ValidAssignments
Else
G = graph(ValidAssignments)
// G is an undirected graph whose vertices are all values of i
// and edges between vertices if they have match error less
// than or equal to Alpha
If G has a perfect matching
// Note that this part is NP-complete.
Return the matching
Else
Return nil
It relies on being able to compute a perfect matching of a graph, which is NP-complete, but at least it is reduced to a known problem. It is expected that the solution be NP-complete, but this is OK since in practice the size of the given lists are quite small. I'll wait around for a better answer for a few days, or for someone to expand on how to find the perfect matching in a reasonable way.
You want to find j such that D(i)*D(j) = N(i)*N(j) {I assumed * is ordinary real multiplication}
assuming all N(i) are nonzero, let
Z(i) = D(i)/N(i).
Problem: find j, such that Z(i) = 1/Z(j).
Split set into positives and negatives and process separately.
take logs for clarity. z(i) = log Z(i).
Sort indirectly. Then in the sorted view you should have something like -5 -3 -1 +1 +3 +5, for example. Read off +/- pairs and that should give you the original indices.
Am I missing something, or is the problem easy?
Okay, I ended up using this ported Fortran code, where I simply specify the dense upper triangular distance matrix using:
complex_t num = N[i]*N[j] - D[i]*D[j];
complex_t den1 = N[j]*D[i];
complex_t den2 = N[i]*D[j];
if(std::abs(den1) < std::abs(den2)){
costs[j*(j-1)/2+i] = std::abs(-num/den2);
}else if(std::abs(den1) == 0){
costs[j*(j-1)/2+i] = std::sqrt(std::numeric_limits<double>::max());
}else{
costs[j*(j-1)/2+i] = std::abs(num/den1);
}
This works great and is fast enough for my purposes.
You should be able to sort the (D[i],N[i]) pairs. You don't need to divide by zero -- you can just multiply out, as follows:
bool order(i,j) {
float ni= N[i]; float di= D[i];
if(di<0) { di*=-1; ni*=-1; }
float nj= N[j]; float dj= D[j];
if(dj<0) { dj*=-1; nj*=-1; }
return ni*dj < nj*di;
}
Then, scan the sorted list to find two separation points: (N == D) and (N == -D); you can start matching reciprocal pairs from there, using:
abs(D[i]*D[j]-N[i]*N[j])<epsilon
as a validity check. Leave the (N == 0) and (D == 0) points for last; it doesn't matter whether you consider them negative or positive, as they will all match with each other.
edit: alternately, you could just handle (N==0) and (D==0) cases separately, removing them from the list. Then, you can use (N[i]/D[i]) to sort the rest of the indices. You still might want to start at 1.0 and -1.0, to make sure you can match near-zero cases with exactly-zero cases.