Visual Studio Load Test - Using Data Source with Multiple Agents - visual-studio

I'm using Visual Studio 2015 Load Test and running a Web Performance test that has a data source connected. The data source contains user login information for 250 users.
Running this in sequential order on a single agent works fine. However, I'm attempting to add in 10 test agents to share out the load. By design the Load Test copies the data source to each agent and it runs the test. What ends up happening is that all 10 agents start the test using the row 1 user from the data source. I'm hoping there's away to set up the Load Test to run sequentially across all agents (ex: Agent 1 uses row 1, Agent 2 uses row 2, Agent 3 uses row 3, etc...)
I suspect there's not an option to set this up, but wondered if anyone ran into this and had workarounds to offer. I did find this info via http://vsptqrg.codeplex.com
Multiple machines running as a rig
Sequential – This works that same as if you are on one machine. Each agent receives a full copy of the data and each starts with row 1 in the data source. Then each agent will run through each row in the data source and continue looping until the load test completes.
Random – This also works the same as if you run the test on one machine. Each agent will receive a full copy of the data source and randomly select rows.
Unique – This one works a little differently. Each row in the data source will be used once. So if you have 3 agents, the data will be spread across the 3 agents and no row will be used more than once. As with one machine, once every row is used, the web test will stop executing.

You Can Split the Data set/CSV and distribute to each Agent, i.e in your case "25 data set"/agent and execute the test.
Each Agent can use their own Data set/CSV.
CSV Split: http://monchito.com/blog/autosplit-csv

The nearest you can get to what you want is to use the unique setting. However each data source row will only be used once, then the test will stop. With a data source containing 250 line only 250 test executions will take place. I do not know the exact distribution of data source rows to agents when unique is specified.
If more than one execution per data source row is wanted then another approach is to have one data source column per agent. Use the agent_id to select the column. Use the sequential data source access. A variation is to just have one set of data in the data source but append the agent_id to some of the values in the data sources. This answer has some variations on these ideas and some code.
Another possibility is to use the MoveDataTableCursor method to set a specific row for each test execution. This could be called in a PreWebTest method of a WebTestPlugin. The code would use the context parameters $AgentId and $WebTestIteration. The call would be based the following:
MoveDataTableCursor(..., ..., $AgentId * NumberOfAgents + $WebTestIteration);
Notes:
The values of $AgentId and $WebTestIteration from the context are strings, they would need to be converted to numbers to do the multiply and add.
Would also need to check whether the two values are zero-based or one-based.
The documentation for MoveDataTableCursor is not very informative

Related

spring batch - how to avoid re-loading(writing) data that was loaded in the previous run

I have a basic spring batch app which is trying to load the data from a csv file to mysql. the program does load the file into db during the first run. However when I accidently re-run the job/app again, it had thrown the primary key violation (for the right reasons).
What is the best way to avoid reloading the data that is present on the target system? when the batch job is scheduled, if for any good reason, the source file has not changed since the previous run, I want to see 0 record processed message rather than a primary key violation error. hope it makes sense.
more information:
Thanks. I have probably not understood the answer. Let me explain my requirement in a better way. I have a file contains the data from external data source (say new hire data) with a fixed name of hire.csv. The file should be updated with the delta changes for every run. As there is a possibility of manual error of not removing all loaded rows, some new hires from previous run would also be present on current run. Is there a mechanism available within itemreader or itemprocessor to skip those records that are already present on the target db? I can do "insert into tb where not in (select from tb)" but this run for every row which I dont want to use. Hope it is clear now. thanks again.
However when I accidently re-run the job/app again, it had thrown the primary key violation (for the right reasons). What is the best way to avoid reloading the data that is present on the target system?
The file you are ingesting should be a (identifying) job parameter. This way, when the first run succeeds, the job instance is complete and cannot be run again. This is by design in Spring Batch for this very use case: preventing accidental job execution twice by error.
Edit: Add further options based on comments
If deleting the file is an option, then you can use a job listener or a final step to delete the file after ingesting it. With this option, you need to add a second identifying paramter (since the file name is always hire.csv) to make sure you have a different job instance for each run. This option does not require having a different file name for each run.
If the file can be renamed to hire-${timestamp}.csv and will be unique, then deleting the file after ingesting it and using a single job parameter with the filename is enough
Side note: I have seen people using a business key to identify records in the input file and using an item processor to query the database and filter items that have been already ingested. This works for small datasets but performs poorly with large datasets due to the additional query for each item.

How to run a test with distribution of load

I am new to JMeter and need your help with a problem.
I have 4 test scenarios and I need to run it with 30 users load with distribution as 30,10,30,30 percent. Out of 4 scenarios, 1 scenario create a customer ID and that ID is being used in the rest of the scenarios.TO test this, I have created a test data of customer ID's with my 1 scenarios and saved in a CSV file. Now my question is when I will run my test how would I handle the customer iD's generated at the run time and how to manage it with my test data which I have already created. Please help me.
With regards to reusing the data, generated in the runtime - you can extract the required data, i.e. customer ID using suitable JMeter Post-Processor and store it into a JMeter Variable. Once done the variable can be re-used in other scenarios. The process is known as correlation and there is a lot of information on implementation with examples over the web.
With regards to the distribution there are different approaches as well:
Throughput Controller
Switch Controller
Weighted Switch Controller
With regards to "manage test data you created" - you can read the values from a CSV file using CSV Data Set Config or __CSVRead() function

Jmeter for concurrent users

I have being using Jmeter-plugin Ultimate thread group for concurrent request.
But now I'm finding it difficult to use because the scenario is :
Each request has a trackingnumber(The trackingnumber are already generated in the system when a form is submitted, so I have to use the generated tracking number from DB) which are generated passed as a POST in http request, these trackingnumber are unique and have configured csv config for passing the trackingnumber. So once when trackingnumber is used, it cant be used again (as it would give me a error message) . So can someone please suggest me how to stress test this scenario where I have to hit a particular URL (with unique trackingnumber from csv file) for approximately 60/30 mins (with varing no of threads) till I get the crash point of the system.
1st way:-
You can pass the tracking numbers via csv file steps as,
allocate all the tracking numbers to specific uses (this can be possible with database query).
copy-paste those tracking numbers in csv file.
pass those tracking numbers as an parameter via csv data set config.
2nd way:-
fill the form & generated tracking number can be fetch via regular expression.
set allocation logic to specific user each time (disable other users).
log-in with this user & pass the fetched tracking number.
Hope will be helpful to you.

How to order ETL tasks in Sql Server Data Tools (Integration Services)?

I'm a newbie in ETL processing. I am trying to populate a data mart through ETL and have hit a bump. I have 4 ETL tasks(Each task filling a particular table in the Mart) and the problem is that I need to perform them in a particular order so as to avoid constraint violations like Foreign Key constraints. How can I achieve this? Any help is really appreciated.
This is a snap of my current ETL:
Create a separate Data Flow Task for each table you're populating in the Control Flow, and then simply connect them together in the order you need them to run in. You should be able to just copy/paste the components from your current Data Flow to the new ones you create.
The connections between Tasks in the Control Flow are called Precendence Constraints, and if you double-click on one you'll see that they give you a number of options on how to control the flow of your ETL package. For now though, you'll probably be fine leaving it on the defaults - this will mean that each Data Flow Task will wait for the previous one to finish successfully. If one fails, the next one won't start and the package will fail.
If you want some tables to load in parallel, but then have some later tables wait for all of those to be finished, I would suggest adding a Sequence Container and putting the ones that need to load in parallel into it. Then connect from the Sequence Container to your next Data Flow Task(s) - or even from one Sequence Container to another. For instance, you might want one Sequence Container holding all of your Dimension loading processes, followed by another Sequence Container holding all of your Fact loading processes.
A common pattern goes a step further than using separate Data Flow Tasks. If you create a separate package for every table you're populating, you can then create a parent package, and use the Execute Package Task to call each of the child packages in the correct order. This is fantastic for reusability, and makes it easy for you to manually populate a single table when needed. It's also really nice when you're testing, as you don't need to keep disabling some Tasks or re-running the entire load when you want to test a single table. I'd suggest adopting this pattern early on so you don't have a lot of re-work to do later.

PDI: Returning the result of a SELECT-statement to the datastream

Using PDI (Kettle) I am filling the entry-stage of my database by utilizing a CSV Inputand Table Output step. This works great, however, I also want to make sure that the data that was just inserted fulfills certain criteria, e.g. fields not being NULL, etc.
Normally this would be a job for database constraints, however, we want to keep the data in the database even if its faulty (for debugging purposes. It is a pain trying to debug a .csv file...). As it is just a staging table anyway it doesn't cause any troubles for integrity, etc.
So to do just that, I wrote some SELECT Count(*) as test123 ... statements that instantly show if something is wrong or not and are easy to handle (if the value of test123 is 0 all is good, else the job needs to be aborted).
I am executing these statements using a Execute SQL Statements step within a PDI transformation. I expected the result to be automatically passed to my datastream, so I also used a Copy rows to result step to pass it up to the executing job.
This is the point where the problem is most likely located.
I think that the result of the SELECT statement was not automatically passed to my datastream, because when I do a Simple evaluation in the main job using the variable ${test123} (which I thought would be implicitly created by executing SELECT Count(*) as test123 ...) I never get the expected result.
I couldn't really find any clues to this problem in the PDI documentation so I hope that someone here has some experience with PDI and might be able to help. If something is still unclear, just hint at it and I will edit the post with more information.
best regards
Edit:
This is a simple model of my main job:
Start --> Load data (Transformation) --> Check data (Transformation) --> Simple Evaluation --> ...
You are mixing up a few concepts, if I read your post correctly.
You don't need a Execute SQL script, this is a job for the Table input step.
Just type your query in the Table input and you can preview your data and see it coming from the step into the data stream by using the preview on a subsequent step. The Execute SQL script is not an input step, which means it will not add external data to your data stream.
The output fields are not Variables. A Variable is set using the Set Variables step, which takes a single input row and maps a specific field to a variable, which can be persisted at parent job or root job levels. Fields are just that: fields. They are passed from one step to the next through hops and eventually to the parent job if you have a Copy rows to result step, but they are NOT variables.

Resources