Is there a chart/listener in JMeter that would show data of different runs in same graph? - jmeter

I have measured separately our application server CPU with 5 users and 15 users. Both runs took 5 minutes. I would like to show both runs and CPU in one graph. Currently the only way to do it is to export CSV to Excel and create custom graphs. This is silly because the graphs are quite good in JMeter itself.
JMeter plugin extensions contains a graph called Composite graph, but the use-case seems to be to be able to add many data points to one and the same graph from within the same run. It doesn't seem to be able to merge graphs from two separate thread runs because the time in the two are different (the two runs have been executed in succession and therefore the graphs are shown after one-another, not at the same time).
Any ideas? Thanks!

It looks like it can be done using the information here jmeter_wiki but that it will still require some effort.

Related

OpenMDAO - information on cycles

In OpenMDAO, is there any way to get the analytics about the execution of the nonlinear solvers within a coupled model (containing multiple cycles and subcycles), such as the number of iterations within each of the cycles, and execution time?
Though there is no specific functionality to get this exact data, you should be able to get the information you need from case record data which includes iteration counts and time-stamps. So you'd have to do a bit of analysis on the first/last case of a specific run of a solver to compute the run times. Iteration counts should be very strait forward.
This question seems closely related to another one, recently posted which did identify a bug in OpenMDAO. (Issue #2453). Until that bug is fixed, you'll need to use the case names to separate out which cases belong to which cycles, since you can only currently add the recorders to the components/groups and not to the nested solvers themselves. But the naming of the cases should still allow you to pull the data you need out.

Designing an algorithm for detecting anamoly and statistical significance for ordinal data using python

Firstly, I would like to apologise for the detailed problem statement. Being a novice, I couldn't express it in any lesser words.
Environment Setup Details:
To give some background, I work in a cloud company where we have multiple servers geographically located in all continents. So, we have hierarchy like this:
Several partitions
Each partition has 7 pop's
Each pop has multiple nodes all set up with redundancy.
Turn servers connecting traffic to each node depending on the client location
Actual clients-ios, android, mac, windows,etc.
Now, every time the user uses our product/service, he leaves a rating out of 5, 5 being outstanding. This data is stored in our databases and we mine it and analyse it to pin-point the exact issue on any particular day.
For example, if the users from Asia are giving more bad ratings on Tuesday this week than a usual Tuesday, what factors can cause this - is it something to do with clients app version, or server release , physical factors, loss, increased round trip delay etc.
What we have done:
Till now we have been using visualization tools to track each of these metrics separately per day to see the trends and detect the issues manually.
But, due to growing micr-services, it is becoming difficult day by day. Now, we want to automate it using python/pandas.
What I want to do:
If the ratings drop on a particular day/hour, I run the script and it should do all the manual work by taking all the permutations and combinations of all factors and list out the exact combinations which could have lead to the drop.
The second step would be to check whether the drop was significant due to varying number of ratings.
What I know:
I understand that I can do this using pandas by creating a dataframe for each predictor variable and trying to do it per variable.
And then I can apply tests like whitney test etc for ordinal data.
What I need help with:
But I just wanted to know if there is a better way to do it? It is perfectly fine if there is a learning curve involved. I can learn and do it. I just wanted some help in choosing the right approach for this.

What data structure and algorithms to use to optimize concurrent jobs?

I have a series of file-watchers that trigger jobs. The file-watchers look, every fixed interval of time, in their list and, if they find a file, they trigger a job. If not, they wait, coming back after that mentioned interval.
Some jobs are dependent on others, so running them in a proper order and with proper parallelism would be a good optimization. But I do not want to think about this myself.
What data structure and algorithms should I use to ask a computer to tell me what job to assign to what file-watcher (and in what order to put them)?
As input, I have the dependencies between the jobs, the arrival time of files for each job and a number of watchers. (For starter, I will pretend each jobs takes same amount of time). How do I spread the jobs between the watchers, to avoid unnecessary waiting gaps and to obtain faster run time?
(I am looking forward tackling this optimization in an algorithmic way, but would like to start with some expert advice)
EDIT : so far I understood the fact the I need a DAG (Directed acyclic graph) to represent the dependencies and that I need to play with Topological sorting in order to optimize. But this responds with a one execution line, one thread. What if I have more, say 7?

K-Means on time series data with Apache Spark

I have a data pipeline system where all events are stored in Apache Kafka. There is an event processing layer, which consumes and transforms that data (time series) and then stores the resulting data set into Apache Cassandra.
Now I want to use Apache Spark in order train some machine learning models for anomaly detection. The idea is to run the k-means algorithm on the past data for example for every single hour in a day.
For example, I can select all events from 4pm-5pm and build a model for that interval. If I apply this approach, I will get exactly 24 models (centroids for every single hour).
If the algorithm performs well, I can reduce the size of my interval to be for example 5 minutes.
Is it a good approach to do anomaly detection on time series data?
I have to say that strategy is good to find the Outliers but you need to take care of few steps. First, using all events of every 5 minutes to create a new Centroid for event. I think tahat could be not a good idea.
Because using too many centroids you can make really hard to find the Outliers, and that is what you don't want.
So let's see a good strategy:
Find a good number of K for your K-means.
That is reall important for that, if you have too many or too few you can take a bad representation of the reality. So select a good K
Take a good Training set
So, you don't need to use all the data to create a model every time and every day. You should take a example of what is your normal. You don't need to take what is not your normal because this is what you want to find. So use this to create your model and then find the Clusters.
Test it!
You need to test if it is working fine or not. Do you have any example of what you see that is strange? And you have a set that you now that is not strange. Take this an check if it is working or not. To help with it you can use Cross Validation
So, your Idea is good? Yes! It works, but make sure to not do over working in the cluster. And of course you can take your data sets of every day to train even more your model. But make this process of find the centroids once a day. And let the Euclidian distance method find what is or not in your groups.
I hope that I helped you!

d3js force large number of nodes

pl. help me with this noob questions. I want to show a network with large number (70000) of nodes, and 2.1 million links in force layout. Looking for a good and scalable way to do this.
How do we actually show such large nodes practically, can we do some kind of approximation and show semantically same network (e.g: http://www.visualcomplexity.com/vc/project.cfm?id=76 )
How do we actually reduce such data in back end [ say using KDE ? We cannot afford to use science.js in front end as the volume is large ]
Initial view can be the network with pre-determined locations of the nodes or clusters. How do we predertmine the locations in back end, before sending the data to d3js. Do we have to use topojson ?
Any such examples are available using d3js (and a backend - say java, python etc) ?
Sorry about the question, but do you really need to show all that information in one shot?
If you really need it, have first a look with Gephi and see what it looks like, then pass to the next step.
If you see that you can focus on specific nodes or patterns at the beginning and then explore the result of the chart, probably this is the best solution from a performance point of view.
In case the discovery approach works but you are still having troubles with many items on the screen, just control the force layout with a time based threshold. It's not perfect but it will work for hundred nodes.
Next step
If you decide to go anyway on this path, I would recommend the followings:
Aggregate: that's probably the most useful thing you can do here: let the user interact with the data and dig in it to see more in detail. That is the best solution if you have to serve many clients.
Do not run the force directed layout on the front end with the entire network as is: it will eat all the browser resources for at least tens of minutes in any case.
Compute the layout on the back end - e.g. using JUNG or Gephi core itself in Java or NetworkX in Python - and then just display the result.
Cache the result of the point above as well: they are many even for the server if you have many clients, so cache it.
When the user drag the network, hide the links: it should speed up the computation ( sigmajs uses this trick)

Resources