Maximizing a Trigonometric Function of Many Variables in Mathematica - wolfram-mathematica

Just to give some context, my motivation for this programming question is to understand the derivation of the CSHS inequality and basically entails maximizing the following function:
Abs[c1 Cos[2(a1-b1)]+ c2 Cos[2(a1-b2)] + c3 Cos[2(a2-b1)] + c4 Cos[2(a2-b2)]]
where a1,b1,b2,and a2 are arbitrary angles and c1,c2,c3,c4 = +/- 1 ONLY. I want to be able to determine the maximum value of this function along with the combination of angles that lead to this maximum
Eventually, I also want to repeat the calculation for a1,a2,a3,b1,b2,b3 (which will have a total of nine cosine terms)
When I tried putting the following code in Mathematica, it simply spat the input back at me and did not perform any computation, can someone help me out? (note my code didn't include the c1,c2,c3,c4 parameters, I wasn't quite sure how to incorporate them)
Maximize[{Abs[Cos[2 (a1 - b1)] - Cos[2 (a1 - b2)] + Cos[2 (a2 - b1)] +
Cos[2 (a2 - b2)]], 0 <= a1 <= 2 \[Pi] , 0 <= b1 <= 2 \[Pi], 0 <= a2 <= 2 \[Pi], 0 <= b2 <= 2 \[Pi]}, {a1, b2, a2, b1}]

The answer is 4. This is because each Cos can be made to equal 1. You have 4 variables a1, a2, b1 and b2, and four cosines, so there are going to be several ways of making the combinations 2(a1-b1), 2(a1-b2), 2(a2-b1) and 2(a2-b2) equal 0 (hence choosing the corresponding c1/c2/c3/c4 to be +1), or equal to pi (hence choosing the corresponding c1/c2/c3/c4 to be -1).
For one set of angles that give the max, the obvious answer is a1=a2=b1=b2=0. For the 9 cosine case, the max will be 9, and one possible answer is a1=a2=a3=b1=b2=b3=0.
Regarding using Mathematica, I think the lesson is that it's always best to think about before the maths itself before using tools to help with the maths.

Related

Allocating numbers (x1, x2, x3, ...) to each element in a list (a1, a2, a3, ...) so that a1/x1 is similar to a2/x2 and so on

Suppose I have a list of numbers = [3, 10, 20, 1, ...]
How can I assign a number (x1, x2, x3, x4, ...) to each of the elements in the list, so that 3/x1 ~= 10/x2 ~= 20/x3 ~= 1/x4 = ... ?
Edit: there are some restrictions on the numbers (x1, x2, x3...). they have to be picked from a list of available numbers (which can be floating points as well).
The problem is that the number of elements is not the same. There are more X elements. Xs can be assigned multiple times.
The goal is to minimize the difference between 3/x1, 10/x2, 20/x3, 1/x4
It often helps to develop a mathematical model. E.g.
Let
a(i)>=0 i=1,..,m
b(j)>0 j=1,..,n with n > m
be the data.
Introduce variables (to be determined by the model)
c = common number for all expressions to be close to
x(i,j) = 1 if a(i) is assigned to b(j)
0 otherwise
Then we can write:
min sum((i,j), (x(i,j)*(a(i)/b(j) - c))^2 )
subject to
sum(j, x(i,j)) = 1 for all i (each a(i) is assigned to exactly one b(j))
x(i,j) in {0,1}
c free
This is a non-linear model. MINLP (Mixed Integer Non-linear Programming) solvers are readily available. You can also choose an objective that can be linearized:
min sum((i,j), abs(x(i,j)*(a(i)/b(j) - y(i,j))) )
subject to
y(i,j) = x(i,j)*c
sum(j, x(i,j)) = 1 for all i
x(i,j) in {0,1}
c free
This can be reformulated as a MIP (Mixed Integer Programming) model. There are many MIP solvers available.
The solution can look like:
The values inside the matrix are a(i)/b(j). Each row corresponds to an a(i), and has exactly one matching b(j).
More details are here.

Understanding Modified Baugh-Wooley multiplication algorithm

For Modified Baugh-Wooley multiplication algorithm , why is it !(A0*B5) instead of just (A0*B5) ?
Same questions for !(A1*B5), !(A2*B5), !(A3*B5), !(A4*B5), !(A5*B4), !(A5*3), !(A5*B2), !(A5*B1) and !(A5*B0)
Besides, why there are two extra '1' ?
In signed 6-bit 2s complement notation, the place values of the bits are:
-32 16 8 4 2 1
Notice that the top bit has a negative value. When addition, subtraction, and multiplication are performed mod 64, however, that minus sign makes absolutely no difference to how those operations work, because 32 = -32 mod 64.
Your multiplication is not being performed mod 64, though, so that sign must be taken into account.
One way to think of your multiplication is that the 6-bit numbers are extended to 12 bits, and multiplication is then performed mod 4096. When extending a signed number, the top bit is replicated, so -32 becomes -2048 + 1024 + 512 ... +32, which all together has the same value of -32. So extend the signed numbers and multiply. I'll do it with 3 bits, multiplying mod 64:
Given: Sign-extended:
A2 A1 A0 A2 A2 A2 A2 A1 A0
B2 B1 B0 B2 B2 B2 B2 B1 B0
Multiply:
A0B2 A0B2 A0B2 A0B2 A0B1 A0B0
A1B2 A1B2 A1B2 A1B1 A1B0
A2B2 A2B2 A2B1 A2B0
A2B2 A2B1 A2B0
A2B1 A2B0
A2B0
Since we replicated the same bits in multiple positions, you'll see the same bit products at multiple positions.
A0B2 appears 4 times with with total place value 60 or 15<<2, and so on. Let write the multipliers in:
A0B2*15 A0B1 A0B0
A1B2*7 A1B1 A1B0
A2B2*5 A2B1*7 A2B0*15
Again, because of modular arithmetic, the *15s and *7s are the same as *-1, and the *5 is the same as *1:
-A0B2 A0B1 A0B0
-A1B2 A1B1 A1B0
A2B2 -A2B1 -A2B0
That pattern is starting to look familiar. Now, of course -1 is not a bit value, but ~A0B2 = 1-A0B2, so we can translate -A0B2 into ~A0B2 and then subtract the extra 1 we added. If we do this for all the subtracted products:
~A0B2 A0B1 A0B0
~A1B2 A1B1 A1B0
A2B2 ~A2B1 ~A2B0
-2 -2
If we add up the place values of those -2s and expand them into the equivalent bits, we discover the source of the additional 1s in your diagram:
~A0B2 A0B1 A0B0
~A1B2 A1B1 A1B0
A2B2 ~A2B1 ~A2B0
1 1
why two extra '1'?
See some previous explanation in Matt Timmermans's answer
Note : '-2' in two complement is 110, and this contributes to the carries, thus two extra '1'
why flipping the values of some of the partial product bits.
It is due to signed bit in the MSB (A5 and B5).
Besides, please see below the Countermeasure for modified baugh-wooley algorithm in the case of A_WIDTH != B_WIDTH with the help of others.
I have written a hardware verilog code for this algorithm
Hopefully, this post helps some readers.
The short answer is that's because how 2's-complement representation works: the top bit is effectively a sign bit so 1 there means -. In other words you have to subtract
A5*(B4 B3 B2 B1 B0) << 5
and
B5*(A4 A3 A2 A1 A0) << 5
from the sum (note that A5*B5 is added again because both have the same - sign). And those two 1 is the result of substituting those two subtractions with additions of -X.
If you need more details, then you probably just need to re-read how 2's-complement work and then the whole math behind the Baugh-Wooley multiplication algorithm. It is not that complicated.

Findig a solution for a linear equation system which has more variable then equtions

Let's divide the problem to 2 parts, the second one is optional.
Part 1
I have 3 linear equtions with N variables where N usually bigger then 3.
x1*a+x2*b+x3*c+x4*d[....]xN*p = B1
y1*a+y2*b+y3*c+y4*d[....]yN*p = B2
z1*a+z2*b+z3*c+z4*d[....]zN*p = B3
Looking for (a,b,c,d,[...],p), others are constant.
The standard Gaussian way won't work because the matrix will be wider then tall. Of course i can use it to eliminate 2 variables. Do you know an algorithm to find out a solution? (I only need one.) More 0s in the solution coefficients are better but not required.
Part 2
The coefficients in the solution must be non-negative.
Requirements:
The algorithm must be fast enough to run real time. (1800 per sec on an avrage pc). So trial and error method is a no go.
The algorithm will be implemented in C# but feel free to use pseudo language if you want to write code.
Set extra variables to zero. Now we have the matrix equation
A.x = b, where
x1 x2 x3
A = y1 y2 y3
z1 z2 z3
b = (B1, B2, B3), as a column vector
Now invert A. The solution is;
X = A-1.x
End matrix formula's in excel with Ctrl Shift Enter

Is there an algorithm to multiply square matrices in-place?

The naive algorithm for multiplying 4x4 matrices looks like this:
void matrix_mul(double out[4][4], double lhs[4][4], double rhs[4][4]) {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out[i][j] = 0.0;
for (int k = 0; k < 4; ++k) {
out[i][j] += lhs[i][k] * rhs[k][j];
}
}
}
}
Obviously, this algorithm gives bogus results if out == lhs or out == rhs (here == means reference equality). Is there a version that allows one or both of those cases that doesn't simply copy the matrix? I'm happy to have different functions for each case if necessary.
I found this paper but it discusses the Strassen-Winograd algorithm which is overkill for my small matrices. The answers to this question seem to indicate that if out == lhs && out == rhs (i.e., we're attempting to square the matrix), then it can't be done in place, but even there there's no convincing evidence or proof.
I'm not thrilled with this answer (I'm posting it mainly to silence the "it obviously can't be done" crowd), but I'm skeptical that it's possible to do much better with a true in-place algorithm (O(1) extra words of storage for multiplying two n x n matrices). Let's call the two matrices to be multplied A and B. Assume that A and B are not aliased.
If A were upper-triangular, then the multiplication problem would look like this.
[a11 a12 a13 a14] [b11 b12 b13 b14]
[ 0 a22 a23 a24] [b21 b22 b23 b24]
[ 0 0 a33 a34] [b31 b32 b33 b34]
[ 0 0 0 a44] [b41 b42 b43 b44]
We can compute the product into B as follows. Multiply the first row of B by a11. Add a12 times the second row of B to the first. Add a13 times the third row of B to the first. Add a14 times the fourth row of B to the first.
Now, we've overwritten the first row of B with the correct product. Fortunately, we don't need it any more. Multiply the second row of B by a22. Add a23 times the third row of B to the second. (You get the idea.)
Likewise, if A were unit lower-triangular, then the multiplication problem would look like this.
[ 1 0 0 0 ] [b11 b12 b13 b14]
[a21 1 0 0 ] [b21 b22 b23 b24]
[a31 a32 1 0 ] [b31 b32 b33 b34]
[a41 a42 a43 1 ] [b41 b42 b43 b44]
Add a43 times to third row of B to the fourth. Add a42 times the second row of B to the fourth. Add a41 times the first row of B to the fourth. Add a32 times the second row of B to the third. (You get the idea.)
The complete algorithm is to LU-decompose A in place, multiply U B into B, multiply L B into B, and then LU-undecompose A in place (I'm not sure if anyone ever does this, but it seems easy enough to reverse the steps). There are about a million reasons not to implement this in practice, two being that A may not be LU-decomposable and that A won't be reconstructed exactly in general with floating-point arithmetic.
This answer is more sensible than my other one, though it uses one whole column of additional storage and has the same amount of data movement as the naive copying algorithm. To multiply A with B, storing the product in B (again assuming that A and B are stored separately):
For each column of B,
Copy it into the auxiliary storage column
Compute the product of A and the auxiliary storage column into that column of B
I switched the pseudocode to do the copy first because for large matrices, caching effects may result in it being more efficient to multiply A by the contiguous auxiliary column as opposed to the non-contiguous entries in B.
This answer is about 4x4 matrices. Assuming, as you propose, that out may reference either lhs or rhs, and that A and B have cells of uniform bit-length, in order to technically be able to perform the multiplication in place, elements of A and B, as signed integers, generally cannot be greater or smaller than ± floor (sqrt (2 ^ (cellbitlength - 1) / 4)).
In this case, we can hack the elements of A into B (or vice versa) in the form of a bit shift or a combination of bit flags and modular arithmetic, and compute the product into the former matrix. If A and B were tightly packed, save for special cases or limits, we could not admit out to reference either lhs or rhs.
Using the naive method now would not be unlike David's second algorithm description, just with the extra column stored in A or B itself. Alternatively, we could implement the Strassen-Winograd algorithm according to the schedule below, again with no storage outside of lhs and rhs. (The formulation of p0,...,p6 and C is taken from page 166 of Jonathan Golan's The Linear Algebra a Beginning Graduate Student Ought to Know.)
p0 = (a11 + a12)(b11 + b12), p1 = (a11 + a22)b11, p2 = a11(b12 - b22),
p3 = (a21 - a11)(b11 + b12), p4 = (a11 + a12)b22, p5 = a22(b21 - b11),
p6 = (a12 - a22)(b21 + b22)
┌ ┐
c = │ p0 + p5 - p4 + p6, p2 + p4 │
│ p1 + p5 , p0 - p1 + p2 + p3 │
└ ┘
Schedule:
Each p below is a 2x2 quadrant; "x" means unassigned; "nc", no change. To compute each p, we use an unassigned 2x2 quadrant to superimpose the (one or) two results of 2x2 block matrix addition or subtraction, using the same bit-shift or modular method above; we then add their product (the seven multiplications resulting in single-elements) directly into the the target block in any order (note that for the 2x2-sized p2 and p4, we use the southwest quadrant of rhs, which is no longer needed at that point). For example, to write the first 2x2-sized p6, we superimpose the block matrix subtraction, rhs(a12) - rhs(a22), and block matrix addition, rhs(b21) + rhs(b22), onto the lhs21 submatrix; then add each of the seven single-element p's for that block multiplication, (a12 - a22) X (b21 + b22), directly to the lhs11 submatrix.
LHS RHS (contains A and B)
(1)
p6 x
x p3
(2)
+p0 x
p0 +p0
(3)
+p5 x
p5 nc
(4)
nc p1
+p1 -p1
(5)
-p4 p4 p4 (B21)
nc nc
(6)
nc +p2 p2 (B21)
nc +p2

How can I make this vector enumeration code faster?

I have three large sets of vectors: A, B1 and B2. These sets are stored in files on disk. For each vector a from A I need to check whether it may be presented as a = b1 + b2, where b1 is from B1 and b2 is from B2. Vectors have 20 components, and all components are non-negative numbers.
How I'm solving this problem now (pseudocode):
foreach a in A
foreach b1 in B1
for i = 1 to 20
bt[i] = a[i] - b1[i]
if bt[i] < 0 then try next b1
next i
foreach b2 in B2
for i = 1 to 20
if bt[i] != b2[i] then try next b2
next i
num_of_expansions++
next b2
next b1
next a
My questions:
1. Any ideas on how to make if faster?
2. How to make it in parallel?
3. Questions 1, 2 for the case when I have B1, B2, ..., Bk, k > 2?
You can sort B1 and B2 by norm. If a = b1 + b2, then ||a|| = ||b1 + b2|| <= ||b1|| + ||b2||, so for any a and b1, you can efficiently eliminate all elements of B2 that have norm < ||a|| - ||b1||. There may also be some way to use the distribution of norms in B1 and B2 to decide whether to switch the roles of the two sets in this. (I don't see off-hand how to do it, but it seems to me that something like this should hold if the distributions of norms in B1 and B2 are significantly different.)
As for making it parallel, it seems that each loop can be turned into a parallel computation, since all computations of one inner iteration are independent of all other iterations.
EDIT
Continuing the analysis: since b2 = a - b1, we also have ||b2|| <= ||a|| + ||b1||. So for any given a and b1, you can restrict the search in B2 to those elements with norms in the range ||a|| ± ||b1||. This suggests that for B1 you should select the set with the smallest average norm.

Resources