Optimizing Groovy Performance - performance

I'm working on groovy code perfomance optimization. I've used jvisualvm to connect to running applicaton and gather CPU samples. Samples say that org.codehaus.groovy.reflection.CachedMethod.inkove takes the most CPU time. I don't see any other application methods in samples.
What is the right way to dig into CachedMethod.invoke and understand what code lines really give perfomance penalties?
Thanks.
UPD:
I do use Indy, it didn't help me.
I didn't try to introduce #CompileStatic since I want to find my bottlenecks before rewriting groovy to java.
My problem a bit similar to this thread: Call site caching faster than invokedynamic?
I have a code that dynamically composes groovy script. Script template looks this way:
def evaluateExpression(Map context){
def user = context.user
%s
}
where %s replaced with
user.attr1 == '1' || user.attr2 == '2' || user.attr3 = '3'
There is a set (20 in total) of replacements have taken from Databases.
The code gets replacements from DB, creates GroovyScript and evaluates it.
I suppose the bottleneck is in the script execution. What is the right way to fix it?

So, I've tried various things
groovy-indy, doesn't work
groovy-indy with some code "optimization", doesn't work. BTW, I'started to play around with try/catch and it as a result I made my "hotspot" run 4 times faster. I'm not good at JVM internals, but internet says - try/catch prevents optimizations. I assumed it as a ground truth. Need to g deeper to understand who it really works.
I gave up, turned off invokedynamic and rewrote my "hottest" code with #CompileStatic. It took about 3-4 hours and I my code runs 100 time faster now.
Here are initial metrics with "invokedynamic support"
count = 83043
mean rate = 395.52 calls/second
1-minute rate = 555.30 calls/second
5-minute rate = 217.78 calls/second
15-minute rate = 82.92 calls/second
min = 0.29 milliseconds
max = 12.98 milliseconds
mean = 1.59 milliseconds
stddev = 1.08 milliseconds
median = 1.39 milliseconds
75% <= 2.46 milliseconds
95% <= 3.14 milliseconds
98% <= 3.44 milliseconds
99% <= 3.76 milliseconds
99.9% <= 12.19 milliseconds
Here are #CompileStatic metrics with ind turned off. BTW, there is no reason to use #CompileStatic if "indy" is turned on.
count = 139724
mean rate = 8950.43 calls/second
1-minute rate = 2011.54 calls/second
5-minute rate = 426.96 calls/second
15-minute rate = 143.76 calls/second
min = 0.02 milliseconds
max = 24.18 milliseconds
mean = 0.08 milliseconds
stddev = 0.72 milliseconds
median = 0.06 milliseconds
75% <= 0.08 milliseconds
95% <= 0.11 milliseconds
98% <= 0.15 milliseconds
99% <= 0.20 milliseconds
99.9% <= 1.27 milliseconds

Related

Is it possible to vectorize annotation for matplotlib?

As a part of a large QC benchmark I am creating a large number (approx 100K) of scatter plots in a single PDF using PdfPages backend. (See further down for the code)
The issue I am having is that the plotting takes too much time, see output from a custom profiling/debugging effort:
Checkpoint1: Predictions done in 1.110076904296875 millis
Checkpoint2: df created and correlations calculated in 3.108978271484375 millis
Checkpoint3: plotting and accumulating done in 231.31990432739258 millis
Cycle completed in 0.23553895950317383 secs
----------------------
Checkpoint1: Predictions done in 3.718852996826172 millis
Checkpoint2: df created and correlations calculated in 2.353191375732422 millis
Checkpoint3: plotting and accumulating done in 155.93385696411133 millis
Cycle completed in 0.16200590133666992 secs
----------------------
Checkpoint1: Predictions done in 2.920866012573242 millis
Checkpoint2: df created and correlations calculated in 1.995086669921875 millis
Checkpoint3: plotting and accumulating done in 161.8819236755371 millis
Cycle completed in 0.16679787635803223 secs
The figure for plotting gets an 2-3x increase if I annotate the points, which is necessary for the use case. As you can see below I have tried both itertuples() and apply(), switching to apply did not give a significant change in the times as far as I can see.
def annotate(row, ax):
ax.annotate(row.name, (row.exp, row.model),
xytext=(10, 20), textcoords='offset points',
arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
family='sans-serif', fontsize=8, color='darkslategrey')
def plot2File(df, file, seq, z, p, s):
""" Plot predictions vs experimental """
plttitle = f"Correlations for {seq}+{z} \n pearson={p} \n spearman={s}"
ax = df.plot(x='exp', y='model', kind='scatter', title=plttitle, s=40)
df.apply(annotate, ax=ax, axis=1)
# for row in df.itertuples():
# ax.annotate(row.Index, (row.exp, row.model),
# xytext=(10, 20), textcoords='offset points',
# arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
# family='sans-serif', fontsize=8, color='darkslategrey')
plt.savefig(file, bbox_inches='tight', format='pdf')
plt.close()
Given the nice explanation by Jeff on a question regarding iterrows() I was wondering if it would be possible to vectorize the annotation process? Or should I ditch using a data frame altogether?

Whenever i ran the Jmeter test for less than 10 Thread Groups then all the time "Throughput" shows numbers in "Minutes"

When I execute test in JMeter for less than 10 Thread Groups, in Summary Report column Throughput showing result in Minutes.
Can anyone please help me
As per RateRenderer class source
String unit = "sec";
if (rate < 1.0) {
rate *= 60.0;
unit = "min";
}
if (rate < 1.0) {
rate *= 60.0;
unit = "hour";
}
setText(formatter.format(rate) + "/" + unit);
So:
If throughput is more than 1 - time unit is "seconds"
If your throughput is less than 1 - it's being multiplied by 60 and time unit is set to "minutes"
If after throughput converting to "minutes" it is still less than 1 - it is being multiplied by 60 and time unit is set to "hours"
If you need to get the throughput in hits per second from minutes - just divide the value by 60.
Other options are:
Patch the RateRenderer class and comment out the two above "if" clauses
Use an external 3rd-party tool like BM.Sense for JMeter results analysis

Jmeter : Summary report : Throughput

is the total throughput shown in last row in Summary Report correct ? I m using Jmeter 2.11
I find it difficult to match the displayed figure by manipulation.
I followed the formula (x/sec) : Number of request / Total response time required (in sec)
Or 1/Avg total response time (sec).
for example : 50 request taking avg response time as 2000 ms each then throughput = 50/(50*2) = 0.5/sec
But Jmeter shows different value than 0.5/sec or 30/min
Can someone help me here?
I was also having similar assumption. But this is the formula for calculating throughput.
endTime = lastSampleStartTime + lastSampleLoadTime
startTime = firstSampleStartTime
converstion = unit time conversion value
Throughput = Numrequests / ((endTime - startTime)*conversion)
(I got this few months back from the below answer)
Calculating throughput from Jmeter jtl log file

cplex prints a lot to terminal although corresponding parameters are set

I am using CPLEX in Cpp.
After googling I found out what parameters need to be set to avoid cplex from printing to terminal and I use them like this:
IloCplex cplex(model);
std::ofstream logfile("cplex.log");
cplex.setOut(logfile);
cplex.setWarning(logfile);
cplex.setError(logfile);
cplex.setParam(IloCplex::MIPInterval, 1000);//Controls the frequency of node logging when MIPDISPLAY is set higher than 1.
cplex.setParam(IloCplex::MIPDisplay, 0);//MIP node log display information-No display until optimal solution has been found
cplex.setParam(IloCplex::SimDisplay, 0);//No iteration messages until solution
cplex.setParam(IloCplex::BarDisplay, 0);//No progress information
cplex.setParam(IloCplex::NetDisplay, 0);//Network logging display indicator
if ( !cplex.solve() ) {
....
}
but yet cplex prints such things:
Warning: Bound infeasibility column 'x11'.
Presolve time = 0.00 sec. (0.00 ticks)
Root node processing (before b&c):
Real time = 0.00 sec. (0.01 ticks)
Parallel b&c, 4 threads:
Real time = 0.00 sec. (0.00 ticks)
Sync time (average) = 0.00 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 0.00 sec. (0.01 ticks)
Is there any way to avoid printing them?
Use setOut method from IloAlgorithm class (IloCplex inherits from IloAlgorithm). You can set a null output stream as a parameter and prevent logging the message on the screen.
This is what works in C++ according to cplex parameters doc:
cplex.setOut(env.getNullStream());
cplex.setWarning(env.getNullStream());
cplex.setError(env.getNullStream());

Rolling list over unequal times in XTS

I have stock data at the tick level and would like to create a rolling list of all ticks for the previous 10 seconds. The code below works, but takes a very long time for large amounts of data. I'd like to vectorize this process or otherwise make it faster, but I'm not coming up with anything. Any suggestions or nudges in the right direction would be appreciated.
library(quantmod)
set.seed(150)
# Create five minutes of xts example data at .1 second intervals
mins <- 5
ticks <- mins * 60 * 10 + 1
times <- xts(runif(seq_len(ticks),1,100), order.by=seq(as.POSIXct("1973-03-17 09:00:00"),
as.POSIXct("1973-03-17 09:05:00"), length = ticks))
# Randomly remove some ticks to create unequal intervals
times <- times[runif(seq_along(times))>.3]
# Number of seconds to look back
lookback <- 10
dist.list <- list(rep(NA, nrow(times)))
system.time(
for (i in 1:length(times)) {
dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
}
)
> user system elapsed
6.12 0.00 5.85
You should check out the window function, it will make your subselection of dates a lot easier. The following code uses lapply to do the work of the for loop.
# Your code
system.time(
for (i in 1:length(times)) {
dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
}
)
# user system elapsed
# 10.09 0.00 10.11
# My code
system.time(dist.list<-lapply(index(times),
function(x) window(times,start=x-lookback-1,end=x))
)
# user system elapsed
# 3.02 0.00 3.03
So, about a third faster.
But, if you really want to speed things up, and you are willing to forgo millisecond accuracy (which I think your original method implicitly does), you could just run the loop on unique date-hour-second combinations, because they will all return the same time window. This should speed things up roughly twenty or thirty times:
dat.time=unique(as.POSIXct(as.character(index(times)))) # Cheesy method to drop the ms.
system.time(dist.list.2<-lapply(dat.time,function(x) window(times,start=x-lookback-1,end=x)))
# user system elapsed
# 0.37 0.00 0.39

Resources