Managing Setup code with TimeIt - performance

As part of a pet project of mine, I need to test the performance of various different implementations of my code in Python. I anticipate this to be something I do alot of, and I want to try to make the code I write to serve this aim as easy to update and modify as possible.
It's still in its infancy at the moment, but I've taken to using strings to manage common setup or testing code, eg:
naiveSetup = 'from PerformanceTests.Vectors import NaiveVector\n' \
+ 'left = NaiveVector([1,0,0])\n' \
+ 'right = NaiveVector([0,1,0])'
This allows me to only write the code once, at the expense of making it harder to read and clunky to update.
Is there a better way?

Use triple quotes """
setup_code = """
from PerformanceTests.Vectors import NaiveVector
left = NaiveVector([1,0,0])
right = NaiveVector([0,1,0])
"""
Another interesting method is provided in the docs of timeit:
def test():
"Stupid test function"
L = []
for i in range(100):
L.append(i)
if __name__=='__main__':
from timeit import Timer
t = Timer("test()", "from __main__ import test")
print t.timeit()
Though this isn't suitable for all needs.

Timing code is fine, but it will still leave you guessing what's going on.
To find out what's actually going on, manually pause it a few random times in the debugger, and examine the call stack.
For example, in the code that is 30x slower in one implementation than in another, each sample of the stack has a 96.7% chance of falling in the extra time that it is spending, so you can see why.
No guesswork required.

Related

Python script for Pymol with user input

I have prepared a script for Pymol that works well in evaluating RMSD values for a list of residues of proteins of interest (targetted residues are generated by script embedded commands). However, I wished to implement the script to allow the user to select the analysed residues. I have tried to use the input function, but it fails to work. Since the code is a bit long, I simplified progressively the scripts. In that way, I ended up with the following two scripts:
"test.py":
import Bio.PDB
import numpy as np
from pymol import cmd
import os,glob
from LFGA_functions import user_entered
List_tot = user_entered()
print (List_tot)
which calls the simple "user_entered" function from inside the "My_function.py" script:
def user_entered():
List_res = []
List_tot = []
a = input("Please indicate the number of residues to analyze/per monomer:")
numberRes =int(a)
for i in range(numberRes):
Res = input("Please provide each residue NUMBER and hit ENTER:")
Res1 = int(Res)
Res2 = Res1+2000
Res3 = Res1+3000
Res4 = Res1+4000
Res5 = Res1+5000
Res6 = Res1+6000
List_res = (str(Res1),str(Res2),str(Res3),str(Res4),str(Res5),str(Res6))
List_tot.append(List_res)
return List_tot
The script "test.py" works well when executed by Python (3.7.5, which is installed with Pymol 2.3.4,under Windows 7 Prof) from a windows command line. An example of result:
[first input to indicate that 2 cases will be treated, followed by the identification number for each case][1]
However, when the script is run from Pymol GUI, I get the following error message: input():lost sys.stdin
Does anybody know what is the problem. Obviously, I am a very primitive Python user...
I profit also to ask something related to this problem... in the original script that works well in Pymol, before trying to implement the "input" option, I store transiently some data (residue name, type and chain) in the form of a "set". I need a set, and not a list, because it removes automatically data repeatitions. However, when I try to run it from PYthon (thinking of overcoming the abovementioned problem of input), it stops with a message: name 'close_to_A' not defined. However, it is as shown in the portion of code shown here below, and indeed works in Pymol.
...
# Step 3:
# listing of residues sufficiently close to chainA
# that will be considered in calculation of RMSD during trajectory
cmd.load("%s/%s" %(path3,"Average.pdb"), "Average", quiet=0)
cmd.select("around_A", "Average and chain B+C near_to 5 of chain A")
cmd.select("in_A", "Average and chain A near_to 5 of chain B+C")
close_to_A = set()
cmd.iterate("(around_A)","close_to_A.add((chain,resi,resn))")
cmd.iterate("(in_A)","close_to_A.add((chain,resi,resn))")
cmd.delete("around_A")
cmd.delete("in_A")
...
How shall I define a set in Python 3 ? Why does it work when run from Pymol ?
I would really appreciate if you could help me to solve these two problems. Thank you in advance,
Best regards,
Luis

Cross validation of dataset separated on files

The dataset that I have is separated on different files grouped on samples that know each other, i.e., they were created on similar conditions on a similar time.
The balance of the train-test dataset is important so the samples have to be on train or test, but cannot be separated. So KFold it is not simple to use on my scikit-learn code.
Right now, I am using something similar to LOO making something like:
train ~> cat ./dataset/!(1.txt)
test ~> cat ./dataset/1.txt
Which is not confortable and not very useful if I want to make folds on test of several files and make a "real" CV.
How would be possible to make a good CV to check real overfitting?
Looking to this answer, I've realized that pandas can concatenate dataframes. I checked that the process is 15-20% slower than cat command-line but makes able to do folds as I was expecting.
Anyway, I am quite sure that there should be any other better way than this one:
import glob
import numpy as np
import pandas as pd
from sklearn.cross_validation import KFold
allFiles = glob.glob("./dataset/*.txt")
kf = KFold(len(allFiles), n_folds=3, shuffle=True)
for train_files, cv_files in kf:
dataTrain = pd.concat((pd.read_csv(allFiles[idTrain], header=None) for idTrain in train_files))
dataTest = pd.concat((pd.read_csv(allFiles[idTest], header=None) for idTest in cv_files))

How to implement custom Gibbs sampling scheme in pymc

I have a hidden Markov stochastic volatility model (represented as a linear state space model). I am using a hand-written Gibbs sampling scheme to estimate parameters for the model. The actual sampler requires some fairly sophisticated update rules that I believe I need to write by hand. You can see an example of a Julia version of these update rules here.
My question is the following: how can I specify the model in a custom way and then hand the job of running the sampler and collecting the samples to pymc? In other words, I am happy to provide code to do all the heavy lifting (how to update each block of parameters on each scan -- utilizing full conditionals within each block), but I want to let pymc handle the "accounting" for me.
I realize that I will probably need to provide more information so that others can answer this question. The problem is I am not sure exactly what information will be useful. So, if you feel you can help me out with this, but need more information -- please let me know in a comment and I will update the question.
Here is an example of a custom sampler in PyMC2:
class BDSTMetropolis(mc.Metropolis):
def __init__(self, stochastic):
mc.Metropolis.__init__(self, stochastic, scale=1., proposal_sd='custom',
proposal_distribution='custom', verbose=None, tally=False)
def propose(self):
T = self.stochastic.value
T.u_new, T.v_new = T.edges()[0]
while T.has_edge(T.u_new, T.v_new):
T.u_new, T.v_new = random.choice(T.base_graph.edges())
T.path = nx.shortest_path(T, T.u_new, T.v_new)
i = random.randrange(len(T.path)-1)
T.u_old, T.v_old = T.path[i], T.path[i+1]
T.remove_edge(T.u_old, T.v_old)
T.add_edge(T.u_new, T.v_new)
self.stochastic.value = T
def reject(self):
T = self.stochastic.value
T.add_edge(T.u_old, T.v_old)
T.remove_edge(T.u_new, T.v_new)
self.stochastic.value = T
It pretty different than your model, but it should demonstrate all the parts. Does that give you enough to go on?

What is the corret syntax for using max function

Still using bloody OpenOffice Writer to customize my sale_order.rml report.
In my sale order I have 6 order lines with 6 different lead time to delivery. I need to show the maximum out of the six values.
After many attempt I have abandoned using the reduce function as it works erratically or not at all most of the time. I have never seen anything like this.
So I thought I'd give a try using max encapsulating a loop such as:
[[ max(repeatIn(so.order_line.delay,'d')) ]]
My maximum lead time being 20, I would expect to see 20 (yes well that would be too easy, wouldn't it!).
It returns
{'d': 20.0}
At least it contains the value I am after.
But; if I try and manipulate this result, it disappears altogether.
I have tried:
int(re.findall(r'[0-9]+', max(repeatIn(so.order_line.delay,'d')))[0])
which works great from the python window, but returns absolutely nothing in OpenERP.
I import the re from my sale_order.py file, which I have recompiled into sale_order.pyo:
import time
import re
from datetime import datetime, timedelta
from report import report_sxw
class order(report_sxw.rml_parse):
def __init__(self, cr, uid, name, context=None):
super(order, self).__init__(cr, uid, name, context=context)
self.localcontext.update({
'time': time,
'datetime': datetime,
'timedelta': timedelta,
're': re,
})
I have of course restarted the server many times. My test install sits on windows.
So can anyone tell me what I am doing wrong, because I can make it work from Python but not from OpenOffice Writer!
Thanks for your help!
EDIT 1:
The format
{'d': 20.0}
is, according to python, a dictionary. Still in Python, to extract the integer from a dictionary it is possible to do it like so:
>>> dict={'d': 20.0}
>>> print(dict['d'])
20.0
But how can I transpose this to OpenERP writer???
I have manage to get the result I wanted by importing functools and declaring the reduce function within the parameters of the sale_order.py file.
I then simply used a combination of reduce and max function and it works exactly as expected.
The correct syntax is as follow:
repeatIn(objects,'o')
reduce(lambda x, y: max(x, y.delay), o.order_line, 0)
Nothing else is required.
Enjoy!

Need to get information from Qt4ruby Form's textedit(textbox) and pass back to string for console

I think this problem is best described in code. I'm sure the solution is close, I just haven't been able to find it. I've been looking over the Qt4 api as well as doing tutorials. Here is my code so far:
require 'Qt4'
class PictureCommentForm < Qt::Widget
def initialize(parent = nil)
super()
#setFixedSize(300, 100)
#comment_text = nil
picture = Qt::Label.new()
image = Qt::Image.new('image.jpeg')
picture.pixmap = image
comment = Qt::LineEdit.new()
layout = Qt::VBoxLayout.new()
layout.addWidget(picture)
layout.addWidget(comment)
setLayout(layout)
connect(comment, SIGNAL('returnPressed()'), self, setCommentText(comment.text) )
end
def setCommentText(text)
#comment_text = text
$qApp.quit()
end
end
app = Qt::Application.new(ARGV)
comment_form = PictureCommentForm.new()
comment_form.show()
app.exec
comment_text = comment_form.comment_text
puts "Comment was:\n #{comment_text}"
EDIT: Thanks for that answer integer. All I want done is a dialog box showing a picture and comment so I can get that data. I do plan on making a full GUI version with qt4, but that's for later.
I don't know Ruby, so bear with me, but I use Qt extensively in Python.
First point is that Qt really, really doesn't want to be used the way you're trying to use it. If you're making some sort of script, then Qt wants you to give it to Qt so it can run your code when it feels like:
We recommend that you connect clean-up
code to the aboutToQuit() signal,
instead of putting it in your
application's main() function because
on some platforms the
QCoreApplication::exec() call may not
return.
Working with Qt you pretty much have to do event-driven programming and give it control of your program flow / main loop.
If you really just want some "utility" that shows some GUI input box and prints whatever the user inputs to console, consider putting the puts directly in whatever function you connected to the text box. Then you can use that program's output in other console scripts.

Resources