Performance problems: Solving an inequality with several assumptions in Mathematica - wolfram-mathematica

I need to prove an inequality (or find a counter example) given several assumptions (also inequalities). Unfortunately the inequality to prove is a quite long and complicated expression. There are about 15 variables and FullSimplify's output fills several A4 pages. For examples with less variables, FindInstance helps to find a counterexample or gives a result of {} if the inequality is true. I also tried to use Reduce in that way:
Reduce[
Implies[
assumtion1 && assumtion2,
inequality
],
Reals
]
For simple examples this outputs "True", if the inequality holds. But in my case, after several hours of running time Mathematica needed 5-6 GB of RAM (and swap) so I had to abort the process.
Is there anything that I could do with Mathematica to improve the performance?

You will find a very nice paper on Mma CAD algorithms here
The cylindrical algebraic decomposition (CAD), which Mma uses, scales with a double exponential behavior on the number of variables.
Newer methods are double exponential in the number of quantifier alternations.
I think you'll have no luck using only the Mma internal engine, but you may roll your own based in the symmetries of your problem (if any)

Related

Ordering equations in the face of sometimes-unordered terms

Superposition calculus is used for reasoning with equations; it reduces the size of the search space by applying an order to equations, based on an ordering of terms.
A suitable ordering of terms, such as Knuth-Bendix, must sometimes answer 'unordered'. For example, f(x) vs f(y) where x and y are variables; a suitable order must be stable under substitution of terms for variables, so no matter which answer you might give for f(x) vs f(y) (less, same, greater), some substitution of terms for the two variables, would turn out to be inconsistent with the initial answer. In this domain, comparison needs a fourth possible answer, 'unordered'.
Superposition calculus orders equations relative to each other, based on the constituent terms and the polarity. There are ways of constructing this based on the multiset extension of term order, but perhaps the simplest correct algorithm is:
Compare the larger terms; if they are unequal, that's the answer.
Compare the polarities; if they are unequal, negative is greater than positive.
Compare the smaller terms.
It is tempting to implement this by first sorting each equation, larger term first, then implementing the above algorithm directly. The problem is that each equation may not have a larger term; it is quite possible that the component terms of one or both equations are unordered relative to each other, so a correct algorithm for comparing equations, must take that into account.
This could be derived from first principles by going through all the possibilities, but it also looks like there would be many opportunities to make a subtle error that would take a while to track down.
Is there a known/canonical algorithm already worked out, for comparing equations in this context?

How to using CNFs to describe addition

Many papers are using SAT, but few mentioned how to convert an addition to CNF.
Since CNF only allows AND OR NOT operation, it is difficult to describe addition operation. For example,
x1 + x2 + x3 + ... +x1599 < 30, xi is binary.
Map these equations into a Boolean circuit.
Apply Tseitin's transformation to the circuit and convert it into DIMACS format.
But is there any way to read the results? I do think it is possible to read the results if all the variables are defined by ourself, so figuring out how to convert a linear constraint to SAT problem is necessary.
If there are 3 or 4 variables, i.e. x1+x2+x3 <3, we can use truth table to solve this conversion. Also, a direct way is that chose 29 (any number smaller than 30) variables from 1600 variables to be 1, the others to be 0. But there are too many possibilities which makes this problem hard to solve.
I have used STP, but it can only give 1 answer. As the increasing number of variables and clauses, it costs a long time for STP to run.
So I tried to use SAT to solve the cnf given by STP, it can give out answers in a minutes. But the results cannot be read.
In the end, I found some paper,
1. Encoding Linear Constraints with Implication Chains to CNF,
2. SAT-Based Techniques for Integer Linear Constraints. This may be helpful.
What you're describing is known as a cardinality constraint. There are many ways of encoding these in CNF. As a starting point, some of these encodings are explained in
http://www.carstensinz.de/papers/CP-2005.pdf and
https://arxiv.org/pdf/1012.3853.pdf.
Many are implemented in the PySAT python toolkit
https://pysathq.github.io/docs/html/api/card.html

Definition of matrix-vector division operator of Julia

I stumbled upon something, which I consider very strange.
As an example consider the code
A = reshape(1:6, 3,2)
A/[1 1]
which gives
3×1 Array{Float64,2}:
2.5
3.5
4.5
As I understand, in general such division gives the weighted average of columns, where each weight is inversely proportional to the corresponding element of the vector.
So my question is, why is it defined such way?
What is the mathematical justification of this definition?
It's the minimum error solution to |A - v*[1 1]|₂ – which, being overconstrained, has no exact solution in general (i.e. value v such that the norm is precisely zero). The behavior of / and \ is heavily overloaded, solving both under and overconstrained systems by a variety of techniques and heuristics. Whether this kind of overloading is a good idea or not is debatable, but it's what people have come to expect from these operations in Matlab and Octave, and it's often quite convenient to have so much functionality available in a single operator.
Let A be an NxN matrix and b be a Nx1 column vector. Then \ solves Ax=b, and / solves xA=b.
As Stefan mentions, this is extended to underdetermined cases as the least squares solution. This is done via the QR or SVD decompositions. See the details on these algorithms to see why this is the case. Hint: the linear form of the OLS estimator can actually be written as the solution to matrix decompositions, so it's the same thing.
Now you might ask, how does it actually solve it? That's a complicated question. Essentially, it uses a matrix factorization. But which matrix factorization is used is dependent on the matrix type. The reason for this is because Gaussian elimination is O(n^3), and so treating the problem generally is usually not good. But whenever you can specialize, you can get speedups. So essentially \ (and /, which transposes and calls \) check for a bunch of special types and pick a factorization or other algorithm (LU, QR, SVD, Cholesky, etc.) based on the matrix type. The flow chart from MATLAB explains this very well. There's a lot of details here, and it gets even more details when the matrix is sparse. Also IterativeSolvers.jl should be mentioned because it's another set of algorithms for solving Ax=b.
Most applied math problems reduce down to linear algebra, with solving Ax=b being one of the most important and difficult problems, which is why there is tons of research on the subject. In fact, you can probably say that the vast majority of the field of numerical linear algebra is devoted to finding fast methods for solving Ax=b on specific matrix types. \ essentially puts all of the direct (non-iterative) methods into one convenient operator.

Create a function for given input and ouput

Imagine, there are two same-sized sets of numbers.
Is it possible, and how, to create a function an algorithm or a subroutine which exactly maps input items to output items? Like:
Input = 1, 2, 3, 4
Output = 2, 3, 4, 5
and the function would be:
f(x): return x + 1
And by "function" I mean something slightly more comlex than [1]:
f(x):
if x == 1: return 2
if x == 2: return 3
if x == 3: return 4
if x == 4: return 5
This would be be useful for creating special hash functions or function approximations.
Update:
What I try to ask is to find out is whether there is a way to compress that trivial mapping example from above [1].
Finding the shortest program that outputs some string (sequence, function etc.) is equivalent to finding its Kolmogorov complexity, which is undecidable.
If "impossible" is not a satisfying answer, you have to restrict your problem. In all appropriately restricted cases (polynomials, rational functions, linear recurrences) finding an optimal algorithm will be easy as long as you understand what you're doing. Examples:
polynomial - Lagrange interpolation
rational function - Pade approximation
boolean formula - Karnaugh map
approximate solution - regression, linear case: linear regression
general packing of data - data compression; some techniques, like run-length encoding, are lossless, some not.
In case of polynomial sequences, it often helps to consider the sequence bn=an+1-an; this reduces quadratic relation to linear one, and a linear one to a constant sequence etc. But there's no silver bullet. You might build some heuristics (e.g. Mathematica has FindSequenceFunction - check that page to get an impression of how complex this can get) using genetic algorithms, random guesses, checking many built-in sequences and their compositions and so on. No matter what, any such program - in theory - is infinitely distant from perfection due to undecidability of Kolmogorov complexity. In practice, you might get satisfactory results, but this requires a lot of man-years.
See also another SO question. You might also implement some wrapper to OEIS in your application.
Fields:
Mostly, the limits of what can be done are described in
complexity theory - describing what problems can be solved "fast", like finding shortest path in graph, and what cannot, like playing generalized version of checkers (they're EXPTIME-complete).
information theory - describing how much "information" is carried by a random variable. For example, take coin tossing. Normally, it takes 1 bit to encode the result, and n bits to encode n results (using a long 0-1 sequence). Suppose now that you have a biased coin that gives tails 90% of time. Then, it is possible to find another way of describing n results that on average gives much shorter sequence. The number of bits per tossing needed for optimal coding (less than 1 in that case!) is called entropy; the plot in that article shows how much information is carried (1 bit for 1/2-1/2, less than 1 for biased coin, 0 bits if the coin lands always on the same side).
algorithmic information theory - that attempts to join complexity theory and information theory. Kolmogorov complexity belongs here. You may consider a string "random" if it has large Kolmogorov complexity: aaaaaaaaaaaa is not a random string, f8a34olx probably is. So, a random string is incompressible (Volchan's What is a random sequence is a very readable introduction.). Chaitin's algorithmic information theory book is available for download. Quote: "[...] we construct an equation involving only whole numbers and addition, multiplication and exponentiation, with the property that if one varies a parameter and asks whether the number of solutions is finite or infinite, the answer to this question is indistinguishable from the result of independent tosses of a fair coin." (in other words no algorithm can guess that result with probability > 1/2). I haven't read that book however, so can't rate it.
Strongly related to information theory is coding theory, that describes error-correcting codes. Example result: it is possible to encode 4 bits to 7 bits such that it will be possible to detect and correct any single error, or detect two errors (Hamming(7,4)).
The "positive" side are:
symbolic algorithms for Lagrange interpolation and Pade approximation are a part of computer algebra/symbolic computation; von zur Gathen, Gerhard "Modern Computer Algebra" is a good reference.
data compresssion - here you'd better ask someone else for references :)
Ok, I don't understand your question, but I'm going to give it a shot.
If you only have 2 sets of numbers and you want to find f where y = f(x), then you can try curve-fitting to give you an approximate "map".
In this case, it's linear so curve-fitting would work. You could try different models to see which works best and choose based on minimizing an error metric.
Is this what you had in mind?
Here's another link to curve-fitting and an image from that article:
It seems to me that you want a hashtable. These are based in hash functions and there are known hash functions that work better than others depending on the expected input and desired output.
If what you want a algorithmic way of mapping arbitrary input to arbitrary output, this is not feasible in the general case, as it totally depends on the input and output set.
For example, in the trivial sample you have there, the function is immediately obvious, f(x): x+1. In others it may be very hard or even impossible to generate an exact function describing the mapping, you would have to approximate or just use directly a map.
In some cases (such as your example), linear regression or similar statistical models could find the relation between your input and output sets.
Doing this in the general case is arbitrarially difficult. For example, consider a block cipher used in ECB mode: It maps an input integer to an output integer, but - by design - deriving any general mapping from specific examples is infeasible. In fact, for a good cipher, even with the complete set of mappings between input and output blocks, you still couldn't determine how to calculate that mapping on a general basis.
Obviously, a cipher is an extreme example, but it serves to illustrate that there's no (known) general procedure for doing what you ask.
Discerning an underlying map from input and output data is exactly what Neural Nets are about! You have unknowingly stumbled across a great branch of research in computer science.

Efficient evaluation of hypergeometric functions

Does anyone have experience with algorithms for evaluating hypergeometric functions? I would be interested in general references, but I'll describe my particular problem in case someone has dealt with it.
My specific problem is evaluating a function of the form 3F2(a, b, 1; c, d; 1) where a, b, c, and d are all positive reals and c+d > a+b+1. There are many special cases that have a closed-form formula, but as far as I know there are no such formulas in general. The power series centered at zero converges at 1, but very slowly; the ratio of consecutive coefficients goes to 1 in the limit. Maybe something like Aitken acceleration would help?
I tested Aitken acceleration and it does not seem to help for this problem (nor does Richardson extrapolation). This probably means Pade approximation doesn't work either. I might have done something wrong though, so by all means try it for yourself.
I can think of two approaches.
One is to evaluate the series at some point such as z = 0.5 where convergence is rapid to get an initial value and then step forward to z = 1 by plugging the hypergeometric differential equation into an ODE solver. I don't know how well this works in practice; it might not, due to z = 1 being a singularity (if I recall correctly).
The second is to use the definition of 3F2 in terms of the Meijer G-function. The contour integral defining the Meijer G-function can be evaluated numerically by applying Gaussian or doubly-exponential quadrature to segments of the contour. This is not terribly efficient, but it should work, and it should scale to relatively high precision.
Is it correct that you want to sum a series where you know the ratio of successive terms and it is a rational function?
I think Gosper's algorithm and the rest of the tools for proving hypergeometric identities (and finding them) do exactly this, right? (See Wilf and Zielberger's A=B book online.)

Resources