Why some symbolic expressions don´t get simplified? - expression

Hi I was working on a model for oscilation problems with Lagrangian mechanichs for my Classical Mechanics I course.
My problem is the following:
When I try to Simplify some expressions like the one in the image below, sympy just shows the division and doesn´t reduce the expression.
I was wondering whether this is some kind of limitation of SymPy (probably that´s not the case) or is just me missing something.
enter image description here

If SymPy doesn't know enough about the variables (like whether they are positive or zero) then it doesn't make a simplification. For sqrt you will get better results if you indicate that the variables are positive. Alternatively, you can use posify on the expression before attempting the simplification.
>>> from sympy import symbols
>>> x,y = symbols('x y', positive=True)
>>> sqrt(x/y)/sqrt(y/x)
x/y
This would not be true if x were positive and y were negative (in which case the answer would be -x/y)

Related

Control print order of matrix terms in Sympy

I have a matrix addition with several terms that I want to display in a Jupyter Notebook. I need the order of terms to match the standard notation - in my case, of linear regression. But, the terms do not, by default, appear in the correct order for my purpose, and I would like to ask how to control the order of display of matrices in a matrix addition (MatAdd) term in Sympy. For example, here we see that Sympy selects a particular order for the terms, that appears to be based on the values in the Matrix.
from sympy import MatAdd, Matrix
A = Matrix([1])
B = Matrix([0])
print(MatAdd(A, B, evaluate=False))
This gives
Matrix([[0]]) + Matrix([[1]])
Notice the matrix terms do not follow the order of defintion or the variable names.
Is there anything I can do to control the print output order of Matrix terms in a MatAdd expression?
You can use init_printing to chose from a few options. In particular, the order keyword should control how things are shown on the screen vs how things are stored in SymPy objects.
Now comes the differences: by setting init_printing(order="none") printers behave differently. I believe this is some bug.
For example, I usually use Latex rendering when using Jupyter Notebook:
from sympy import MatAdd, Matrix, init_printing
init_printing(order="none")
A = Matrix([1])
B = Matrix([0])
add = MatAdd(A, B, evaluate=False)
print(add)
# out: Matrix([[0]]) + Matrix([[1]])
display(add)
# out: [1] + [0]
Here you can see that the latex printer is displaying the elements as they are stored (check add.args), whereas the string printer is not following that convention...

Why is my Doc2Vec model in gensim not reproducible?

I have noticed that my gensim Doc2Vec (DBOW) model is sensitive to document tags. My understanding was that these tags are cosmetic and so they should not influence the learned embeddings. Am I misunderstanding something? Here is a minimal example:
from gensim.test.utils import common_texts
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
import numpy as np
import os
os.environ['PYTHONHASHSEED'] = '0'
reps = []
for a in [0,500]:
documents = [TaggedDocument(doc, [i + a])
for i, doc in enumerate(common_texts)]
model = Doc2Vec(documents, vector_size=100, window=2, min_count=0,
workers=1, epochs=10, dm=0, seed=0)
reps.append(np.array([model.docvecs[k] for k in range(len(common_texts))])
reps[0].sum() == reps[1].sum()
This last line returns False. I am working with gensim 3.8.3 and Python 3.5.2. More generally, is there any role that the values of the tags play (assuming they are unique)? I ask because I have found that using different tags for documents in a classification task leads to widely varying performance.
Thanks in advance.
First & foremost, your test isn't even comparing vectors corresponding to the same texts!
In run #1, the vector for the 1st text in in model.docvecs[0]. In run #2, the vector for the 1st text is in model.docvecs[1].
And, in run #2, the vector at model.docvecs[0] is just a randomly-initialized, but never-trained, vector - because none of the training texts had a document tag of (int) 0. (If using pure ints as the doc-tags, Doc2Vec uses them as literal indexes - potentially leaving any unused slots less than your highest tag allocated-and-initialized, but never-trained.)
Since common_texts only has 11 entries, by the time you reach run #12, all the vectors in your reps array of the first 11 vectors are garbage uncorrelated with any of your texts/
However, even after correcting that:
As explained in the Gensim FAQ answer #11, determinism in this algorithm shouldn't generally be expected, given many sources of potential randomness, and the fuzzy/approximate nature of the whole approach. If you're relying on it, or testing for it, you're probably making some unwarranted assumptions.
In general, tests of these algorithms should be evaluating "roughly equivalent usefulness in comparative uses" rather than "identical (or even similar) specific vectors". For example, a test whether apple and orange are roughly at the same positions in each others' nearest-neighbor rankings makes more sense than checking their (somewhat arbitrary) exact vector positions or even cosine-similarity.
Additionally:
tiny toy datasets like common_texts won't show the algorithm's usual behavior/benefits
PYTHONHASHSEED is only consulted by the Python interpreter at startup; setting it from Python can't have any effect. But also, the kind of indeterminism it introduces only comes up with separate interpreter launches: a tight loop within a single interpreter run like this wouldn't be affected by that in any case.
Have you checked the magnitude of the differences?
Just running:
delta = reps[0].sum() - reps[1].sum()
for the aggregate differences results with -1.2598932e-05 when I run it.
Comparison dimension-wise:
eps = 10**-4
over = (np.abs(diff) <= eps).all()
Returns True on a vast majority of the runs which means that you are getting quite reproducible results given the complexity of the calculations.
I would blame numerical stability of the calculations or uncontrolled randomness. Even though you do try to control the random seed, there is a different random seed in NumPy and different in random standard library so you are not controlling for all of the sources of randomness. This can also have an influence on the results but I did not check the actual implementation in gensim and it's dependencies.
Change
import os
os.environ['PYTHONHASHSEED'] = '0'
to
import os
import sys
hashseed = os.getenv('PYTHONHASHSEED')
if not hashseed:
os.environ['PYTHONHASHSEED'] = '0'
os.execv(sys.executable, [sys.executable] + sys.argv)

Decrease precision Sympy Equality Class

I am performing some symbolic calculations using Sympy, and the calculations are just too computationally expensive. I was hoping to minimize the number of bytes used per calculation, and thus increase processing speed. I am solving two polynomial equations for two unknowns, but whenever i create the Equalities using the Sympy equality class it introduces precision that did not exist in the variables supplied. It adds extra numbers to the ends to create the 15 point precision standard of sympy. I was hoping there might be a way to keep this class from doing this, or just limit the overall precision of sympy for this problem, as this amount of precision is not necessary for my calculations. I have read through all the documentation i can find on the class, and on precision handling in sympy with no luck.
My code looks like this.
c0=np.float16((math.cos(A)2)/(a2)+(math.sin(A)2)/(b2))
c1=np.float16((math.cos(A)2)/(b2)+(math.sin(A)2)/(a2))
c2=np.float16((math.sin(2*A))/(a2)-(math.sin(2*A))/(b2))
c3=np.float16((k*math.sin(2*A))/(b2)-(2*h*(math.cos(A))**2)/(a2)-(k*(math.sin(2*A)))/(a2)-(2*h*(math.sin(A))**2)/(b2))
c4=np.float16((h*math.sin(2*A))/(b2)-(2*k*(math.cos(A))**2)/(b2)-(h*(math.sin(2*A)))/(a2)-(2*k*(math.sin(A))**2)/(a2))
c5=np.float16((h2*(math.cos(A))**2)/(a2)+(kh(math.sin(2*A)))/(a2)+(k2*(math.sin(A))2)/(a2)+(h2*(math.sin(A))**2)/(b2)+(k2*(math.cos(A))**2)/(b2)-(kh(math.sin(2*A)))/(b**2)-1)
x=sym.Symbol('x', real=True)
y=sym.Symbol('y', real=True)
e=sym.Eq(c0*x2+c1*y2+c2*x*y+c3*x+c4*y+c5)
Each value of c5 originally calculates to double precision float as normal with python, and since i don't require that precision i just recast it as float16. So the values look like
c0=1.547
c1=15.43
c2=1.55
c3=5.687
c4=7.345
c5=6.433
However when cast into the equality e. The equation becomes
e=1.5470203040506025*x2 + 15.43000345000245*y2....etc
with the standard sympy 15 point precision on every coefficient, even though those numbers are not representative of the data.
I'm hoping that lowering this precision i might decrease my run time. I have a lot of these polynomials to solve for. I've already tried using sympy's float class, and eval function, and many other things. Any help would be appreciated.
Give the number of significant figures to Float as the second argument:
.>> from sympy import Float, Eq
>>> c0,c1,c2,c3,c4,c5 = [Float(i,4) for i in (c0,c1,c2,c3,c4,c5)]
>>> Eq(c0*x**2+c1*y**2+c2*x*y+c3*x+c4*y+c5,0)
Eq(1.547*x**2 + 1.55*x*y + 5.687*x + 15.43*y**2 + 7.345*y + 6.433, 0)

Mathematica not round small value to zero

I have a simple question, for the specific project I am working on I would like mathematica to not evaluate extremely small decimals (of the order of ~10^-90) to zero. I would like a scientific notation return. When I evaluate similar expressions into WolframAlpha I receive a non-zero result.
For an example of a specific evaluation which returns non-zero in wolfram, and zero in mathematica:
Mathematica:
In[219]:= Integrate[dNitrogen, {v, 11000, Infinity}]
Out[219]= 0.
Compared to WolframAlpha:
I've tried searching around myself but oddly enough have only found solutions to the opposite of my problem -Those wanting when mathematica evaluates the small number to print as zero, this seems to involve some use of the Chop function.
Thanks for help/suggestions.
You should use NIntegrate instead of Integrate. By default it will give you the precision you're wanting, and it's also configurable through the PrecisionGoal parameter (and other parameters, see the NIntegrate docs for details).

Numeric instability

I'm doing some Linear programming exercises for the course of Algorithms, and in doing this I'm solving manually many operations with fractions. In doing this I realized that a human being don't suffer from numeric instability: we just keep values in fractional representation, and we finally evaluate (possibly by using a calculator) the value of expressions.
Is there any technique that does this automatically?
Im thinking of something which achieves some kind of symbolic computation, simplifies the numbers internally and finally yields the value only during the evaluation of an expression.
Boost contains a rational number library here which might be of help.
In Python you can have a look at fractions:
import fractions
a = fractions.Fraction(2,3)
a*2
# Fraction(4, 3)
a**2
# Fraction(4, 9)
'Value: %.2f' % a
# 'Value: 0.67'

Resources