Exporting time series response data for VS2013 load tests - visual-studio-2013

I am trying to figure out how to export and then analyze the results of a load test, but after the test is over it seems I cannot find the data for each individual request by url. This data shows during the load test itself, but after it is over it seems as if that data is no longer accessible and all I can find are totals. The data that I want is under the "Page response time" graph on the graphs window during the test. I know this is not the response time for every single request and is probably averaged, but that would suffice for the calculations I want to make.
I have looked in the database on my local machine (LoadTest2010, where all of the summary data is stored) and I cannot find the data I'm looking for. I am load testing a single page application, fyi.
My goal is to plot (probably in excel) each request url against the user load and analyze the slope of the response time averages to determine which requests scale the worst (and best). During the load test I can see this data and get a visual idea but when it ends I cannot seem to find it to export.
A) Can this data be exported from within visual studio? Is there a setting required to make VS persist this data to the database? I have, from under Run Settings, the "Results" section "Timing Details Storage" set to "All individual details" and the Storage Type set to "Database".
B) Is this data in any of the tables in the LoadTest2010 database where all of the summary data is stored? It might be easier to query manually if its not spread out overly, but all I was able to find was summary data.

I was able to find the data in the database that I wanted. The tables I needed were the WebLoadTestRequestMap (which has the request URI's in it) and the LoadTestPageDetail (which has the individual response times themselves). They can be joined on webloadtestrequestmap.requestId and loadtestpagedetail.pageId (unintuitively).
I do have the "Results" section "Timing Details Storage" set to "All individual details" and the Storage Type set to "Database", it did not seem like every load tests results were available, maybe because of this setting.
More data on the layout of the load test database here: http://blogs.msdn.com/b/slumley/archive/2010/02/12/description-of-tables-and-columns-in-vs-2010-load-test-database.aspx

Related

Automating Validation of PDF Prepared Reports

Our team uses Spotfire to host online analyses and also prepare monthly reports. One pain point that we have is around validation. The reports are all prepared reports, and the process for creating them each month is as simple as 1) refresh the data (through Infolink connected to Oracle) and 2) Press button to export each report. The format of the final product is a PDF.
The issue is that there are a lot of small things that can go wrong with the reports (filter accidentally applied, wrong month selected, data didn't refresh, new department not grouped correctly, etc.) meaning that someone on our team has to manually validate each of the reports. We create almost 20 reports each month and some of them are as many as 100 pages.
We've done a great job automating the creation of the reports, but now we have this weird imbalance where it takes like 25 minutes to create all the reports but 4+ hours to validate each one.
Does anyone know of a good way to automate, or even cut down, the time we have to spend each month validating the reports? I did a brief google and all I could find was in the realm of validating reports to meet government regulation standards
It depends on 2 factors:
Do your reports have the same template (format) each time you extract them? You said that you pull them out automatically so I guess the answer is Yes.
What exactly are you trying to check/validate? You need to have a clear list on what are you validating. You mentioned month, grouping, data values (for the refresh)). But the clearer the picture you have for validation, the more likely the process can be fully automated.
There are so called RPA (robot process automation) tools that can automate complex workflows.
A "data extract" task, which is part of a workflow, can detect and collect data from documents (PDF for example).
A robot that runs on the validating machine can:
batch read all your PDF reports from specified locations on your computer (or on another computer);
based on predefined templates it can read through the documents for specific fields that you specify (through defined anchors on the templates) and collect the exact data from there;
compare the extracted data with the baseline that you set (compare the month to be correct, compare a data field to confirm proper refresh of the data, another data field to confirm grouping, etc.);
It takes a bit of time to dissect the PDF for each report template and correctly set the anchors but then it runs seamless each time.
One such tool I used is called Atomatik. It has a studio environment where you design the robot (or robots) and run the process.

Transform data before adding to cache system or after when reading it

I have this situation, which I don't know which could fit better.
I have this solution where I search for soccer players, I only have their names and teams, but when a user comes to my website and clicks on the player I will use detailed information from the player that I get from various external providers (usually based by country).
I know which external provider to use when a call is done, and I pay to the external providers each time I grab data, so to mitigate this, I will try to get the less times possible, so I grab once a user clicks on the player info, and next time if it's in my database cache I will show my cache info. After 10 days I will grab again for the specific player form the external provider as I want the info to be somehow updated.
I will need to transform different providers data that come, usually, as JSON in my own structure so I can handle it the right way, I have my own object structure, so the fields coming from the external providers fall/map/transform in my code always with the same naming and structure..
So, my problem is to decide when should I map/transform data coming form the providers.
I grab data from the provider, I transform it to my JSON structure and record/keep it in the database cache system this once with my main structure, and in my solution code all I need is, everytime a user clicks on the soccer player details I get from the database cache this JSON field and convert it directly to object I know how to use.
I grab data from the provider, keep it as is in my database cache system, and in my solution code everytime someone clicks to get the soccer player detail info I get the JSON record from my database cache, I transform it for my naming and structure, and convert it to object
Notes:
- this is a cache database, records won't be kept forever, if in a call I see the record have more then 10 days I will get new data from the apropriate external provider
Deciding the layer to cache data is an art form all its own. The higher the layer you cache data, the more performant it will be (less reprocessing needed), but the lower the re-use potential will be (different parts of the application may use the same cache, and find value if it hasn’t been transformed too much).
Yours is another case of this. If you store it as the provider provides it, and you need to change the way you transform it, you won’t have to pay to re-retrieve it. If on the other hand, you store it as you need it now, you may have to discard it all if you decide to change the transformation method.
Like all architectural design decisions, it's all about trade-offs. You have to decide what is more important to you and your application.

How does ssrs cache reports that contain subreports

I'm trying to figure out how SSRS handles caching of subreports. The report I have has a lot going on: dynamic formatting, dynamic links, dynamic graphs, etc. Because of all this, it takes quite a while to load (about 10 seconds), and every time you click on something in the report, it has to reload everything (another 10 seconds). I looked into caching options, but the problem is I need the data shown in the graphs to be the live data (the rest of the report doesn't display any live data).
I came up with an idea to put the graphs in a subreport, so that I could cache the main report, and only the subreport would have to be reprocessed on every load. My thinking was that this would significantly cut down on the processing time for the whole report, and I could schedule the cache to be preloaded every night.
I know that Reporting Services isn't the ideal way to deliver this type of thing, and creating it as a website would yield much better performance, but my company wants to try and make it work with SSRS first.
Does anyone know if this would work? Does SSRS cache subreports separate from their parent reports, or would caching the parent report also cache the subreport?
Sub reports are processed at the same time as the main report you are calling, and you would need to setup caching on the main report to cache the contents of the sub report.
For the components that do not require live data, you can cache shared datasets to improve report processing time. Seperate the data sources for the items that you need live data feeds from the items you can cache.
https://msdn.microsoft.com/en-us/library/ee636149.aspx
To open the Caching properties page for a shared dataset
Open Report Manager, and locate the report for which you want to configure shared dataset properties:
Point to the shared dataset, and click the drop-down arrow.
In the drop-down list, click Manage. The General properties page for the report opens.
Click the Caching tab.
http://i.stack.imgur.com/vFHQP.png

Queries over large data tables

We have a large dataset of unstructured data (Azure Blob) and have started noticing that refreshing our model gets quite slow after a few thousand records are being loaded.
Our current query structure is:
#"Load Data"
Loads data from the Azure Blob, ~1000 files
Parses the files into a table with 3 columns (of list/record types which can be further expanded), ~700k rows
#"Sessions"
Reference #"Load Data"
Expand all 'Session' related columns
#"Users"
Reference #"Load Data"
Expand all 'User' related columns
#"Events"
Reference #"Load Data"
Expand all 'Event' related columns
#"Events By Name"
Reference #"Events"
Groups by 'event.name'- generates a column of tables to each event type's events and properties (these vary between events)
#"Event Name1" (2, 3, etc. one table per event type)
Reference #"Events by Name"
Expands that event name's Table, and generates a table with event.id and each of the properties for that event type
While running this and watching the resource monitor, the memory usage goes through the roof, and eventually tons of Hard Faults leading to Disk usage. From looking at the query execution popup, it seems a bunch of queries kick-off and run in parallel.
If I load the data from a local folder, they seem to all be fetching data, going through the files and loading the referenced common queries in parallel. I believe this is what's causing the memory usage to go haywire, the disk to kick in, and the queries to take hours to run.
I assumed referenced queries would run once first, and then have their resulting tables referenced by individual queries using it, but that doesn't seem to be the case. I've also tried using Table.Buffer as the last step of #"Load Data" and #"Events", in an attempt to make those queries be computed once and then shared across dependents, but that only seemed to make it worse. Are there ways to:
Make a query only run once, and have it's result passed forward to any queries referencing it
Prevent queries from running in parallel, and run sequentially instead
Am I just looking at this the wrong way? A lot of 'performance' articles I found only mention structuring your queries to allow Query Folding. However this is not a possibility for our current case, as the Azure Blob storage really just stores 'blob' files which have to be loaded and parsed locally.
It's being a real struggle to get these queries running on our current 700k test events, and we expect it to go up to millions in the real environment. Is our only option to treat the blobs and push the data into an SQL database and link our model to that instead?
You process your data first and store it into a table on your DB, and then use this table as a data source to your model. Refresh data in the source table by running a job which runs on a scheduled interval and update the table.

Multidimensional data types

So I was thinking... Imagine you have to write a program that would represent a schedule of a whole college.
That schedule has several dimensions (e.g.):
time
location
indivitual(s) attending it
lecturer(s)
subject
You would have to be able to display the schedule from several standpoints:
everything held in one location in certain timeframe
everything attended by individual in certain timeframe
everything lecturered by a certain lecturer in certain timeframe
etc.
How would you save such data, and yet keep the ability to view it from different angles?
Only way I could think of was to save it in every form you might need it:
E.g. you have folder "students" and in it each student has a file and it contains when and why and where he has to be. However, you also have a folder "locations" and each location has a file which contains who and why and when has to be there. The more angles you have, the more size-per-info ratio increases.
But that seems highly inefficinet, spacewise.
Is there any other way?
My knowledge of Javascript is 0, but I wonder if such things would be possible with it, even in this space inefficient form.
If not that, I wonder if it would work in any other standard (C++, C#, Java, etc.) language, primarily in Java...
EDIT: Could this be done by using MySQL database?
Basically, you are trying to first store data and then present it under different views.
SQL databases were made exactly for that: from one side you build a schema and instantiate it in a database to store your data (the language is called Data Definition Language, DDL), then you make requests on it with the query language (SQL), what you call "views". There are even "views" objects in SQL databases to build these views Inside the database (rather than having to the code of the request in the user code).
MySQL can do that for sure, note that it is possible to compile some SQL engine for Javascript (SQLite for example) and use local web store to store the data.
There is another aspect to your question: optimization of the queries. While SQL can do most of the request job for your views. It is sometimes preferred to create actual copies of the requests results in so called "datamarts" (this is called de-normalizing a request), so that the hard work of selecting or computing aggregate/groups functions and so on is done once per period of time (imagine that a specific view changes only on Monday), then requesters just have to read these results. It is important in this case to separate at least semantically what is primary data from what is secondary data (and for performance/user rights reasons, physical separation is often a good idea).
Note that as you cited MySQL, I wrote about SQL but mostly any database technology could do that what you searched to do (hierarchical, object oriented, XML...) as long as the particular implementation that you use is flexible enough for your data and requests.
So in short:
I would use a SQL database to store the data
make appropriate views / requests
if I need huge request performance, make appropriate de-normalized data available
the language is not important there, any will do

Resources