Grinder - how to distribute invocation of urls from file - performance

We have a huge file of different urls (~500K - ~1M urls).
We want to use Grinder 3 for distributing these urls to the Workers in a way that every worker will invoke a single and different url.
In the JY script we could:
Read the file one time per Agent
Allocate line-number-ranges per Agent
Every Worker would gets a line/url according to its run-id from its Agent line-number-range.
This still means loading a huge file into memory and writing some code to a problem that might be common to many.
Any ideas to a simpler/ready-made solution?

I used Grinder in a similar fashion a while back, and wrote a utility for multi-threaded, one-time ingestion of URLs from a large file.
See https://bitbucket.org/travis_bear/file_util -- in particular, the sequential reader.
I'd recommend using the split command-line utility (or similar) to give separate chunks of the master file to each agent prior to executing your Grinder run.

I would have taken a different approach if you like since its a huge file ,
How many threads are you planning to spawn . I believe you already know that you can get Grinder.ThreadNo to get the currently executing thread.
You can actually divide the file using a pre-processor with equal number of records into number of thread and name them 0 , 1 ,2 etc which matches with thread name .
Why I am suggesting this is that processing the file looks like a pre task whats important are its contents. File processing should not interfere when threads are executing.
So now each thread will have its own file and no collisions .
for eg 20 threads 20 files however your number of threads should be chosen carefully and may be peak + 50 % .

Related

Run multiple batch jobs in parallel Mule 4

I have a batch job that reads hundreds of images from an SFTP location and then encodes them into base64 and uploads them via API using HTTP connector.
I would like to make the process run quicker and hence trying to split the payload into 2 via scatter-gather and then sending then sending payload1 to one batch job in a subflow and payload2 to another batch job in another subflow.
Is this the right approach?
Or is it possible to split the load in just one batch process, ie for one half of the payload to be processed by batch step 1 and second half will be processed by batch step 2 at the same time?
Thank you
No, it is not a good approach. Batch jobs are always executed asynchronously (ie using different threads) so there is no benefit on using scatter-gather and it has the cons of increasing resource usage.
Splitting the payload in different batch steps doesn't make sense either. You should not try to scale by adding steps.
Batch jobs should be used naturally to work in parallel by iterating on an input. It may be able to handle the splitting itself or you can manually split the input payload before. Then let it handle the concurrency automatically. There are some configurations you can use to tune it, like block sizing.

Jmeter thread is rising very slowly

I have a jmeter script where in I initiate the test with concurrent 300 users(threads), then with every 10sec, 100 more threads are added. My script was running absolutely fine. But suddenly I am noticing that inspite of adding 100 threads, its adding 3 or 4 threads.
I have increased swap memory, still no go.
I am using 4GB machine, in which i have made "Xms=1g Xms=3GB"
And this is how slowly threads are getting added:
Here is how "absolutely fine" execution of your scenario should look like:
Given you have that many threads in the Finished state there is something preventing them from doing their normal job so instead of finishing execution of all Samplers of the current iteration and starting the new one the threads are being terminated somewhere in between and JMeter restarts them.
Your question doesn't contain sufficient information in order to guess the reason, inspect your test plan and make sure you have enough test data, in case of CSV Data Set Config ensure that Stop Thread on EOF is set to False, etc.
If your test plan looks okayish check jmeter.log file for any suspicious entries

Parallel Processing with Starting New Task - front end screen timeout

I am running an ABAP program to work with a huge amount of data. The SAP documentation gives the information that I should use
Remote Function Modules with the addition STARTING NEW TASK to process the data.
So my program first selects all the data, breaks the data into packages and calls a function module with a package of data for further processing.
So that's my pseudo code:
Select KEYFIELD from MYSAP_TABLE into table KEY_TABLE package size 500.
append KEY_TABLE to ALL_KEYS_TABLE.
Endselect.
Loop at ALL_KEYS_TABLE assigning <fs_table> .
call function 'Z_MASS_PROCESSING'
starting new TASK 'TEST' destination in group default
exporting
IT_DATA = <fs_table> .
Endloop .
But I am surprised to see that I am using Dialog Processes instead of Background Process for the call of my function module.
So now I encountered the problem that one of my Dialog Processes were killed after 60 Minutes because of Timeout.
For me, it seems that STARTING NEW TASK is not the right solution for parallel processing of mass data.
What will be the alternative?
As already mentioned, thats not an easy topic that is handled with a few lines of codes. The general steps you have to conduct in a thoughtful way to gain the desired benefit is:
1) Get free work processes available for parallel processing
2) Slice your data in packages to be processed
3) Call an RFC enabled function module asynchronously for each package with the available work processes. Handle waiting for free work processes, if packages > available processes
4) Receive your results asynchronously
5) Wait till everything is processed and merge the data together again and assure that every package was handled properly
Although it is bad practice to just post links, the code is very long and would make this answer very messy, therfore take a look at the following links:
Example1-aRFC
Example2-aRFC
Example3-aRFC
Other RFC variants (e.g. qRFC, tRFC etc.) can be found here with short description but sadly cannot give you further insight on them.
EDIT:
Regarding process type of aRFC:
In parallel processing, a job step is started as usual in a background
processing work process. (...)While the job itself runs in a
background process, the parallel processing tasks that it starts run
in dialog work processes. Such dialog work processes may be located on
any SAP server.
The server is specified with the GROUP (default: parallel_generators) see transaction RZ12 and can have its own ressources just for parallel processing. If your process times out, you have to slice your packages differently in size.
I think, best way for parallel processing in SAP is Bank Parallel Processing framework as Jagger mentioned. Unfortunently its rarerly mentioned in any resource and its not documented well.
Actually, best documentation I found was in this book
https://www.sap-press.com/abap-performance-tuning_2092/
Yes, it's tricky. It costed me about 5 or 6 days to force it going. But results were good.
All stuff is situated in package BANK_PP_JOBCTRL and you can use its name for googling.
Main idea there is to divide all your work into steps (simplified):
Preparation
Parallel processing
2.1. Processing preparation
2.2. Processing
(Actually there are more steps there)
First step is not paralleized. Here you should prepare all you data for parallel processing and devide it into 'piece' which will be processed in parallel.
Content of pieces, in turn, can be ID or preloaded data as well.
After that, you can run step 2 in parallel processing.
Great benefit of all this is that error in one piece of parallel work won't lead to crash of all your processing.
I recomend you check demo in function group BANK_API_PP_DEMO
To implement parallel processing, you need to do a bit more than just add that clause. The information is contained in this help topic. A lot of design effort needs to be devoted to ensure that the communication and result merging overhead of the parallel processing does not negate the performance advantage gained by the parallel processing in the first place and that referential integrity of the data is maintained even when some of the parallel tasks fail. Do not under-estimate the complexity of this task.
You could make use of the bgRFC technique. This is a new method of background processing made by SAP.
BgRFC has, in addition to the already existing IN BACKGROUND TASK, the possibility to configure and monitor all calls which run through this method.
You can read more documentation between the different possibilities here. This is all (of course) depending on your SAP version.

JMeter - Wrong number of users in results from remote load testing

I was using non-GUI mode to perform a remote load testing with Jmeter from master server (Linux) to 5 slave servers (Linux). 5x "n" users have been run, "n" users on each server.
The results have been written to master server.
There are samples from all servers in results file but they relate to the number of active users from particular servers ("n") and not from all servers (5x "n").
There are no information in the result file about the real number of active users on all the servers.
As a result, a maximum number of active users is "n" on generated graphs which does not reflect the real load (5x "n" users).
Has anyone got a similar problem?
Is there anything I can do to correct the results already gathered?
Should I change any JMeter parameter to get the correct results in the next run?
Short Answer:
This is normal and no, there's nothing in JMeter you can do to fix it.
Long Answer:
Each Load Generator creates a number of threads n: the threads will be numbered 1-n. When the Controller collects all of the information, it sees 5 results for Thread 1, 5 results for Thread 2, ... Thread n. The Controller has no way of knowing that each of them are 5 separate concurrent threads and not just the same thread 5 sequential times.
Fixing it:
It depends on what you mean by a maximum number of active users is "n" on generated graphs. If this is something inside JMeter, then no, you can't fix it.
If it's a report-generator that you have created yourself, then yes, you can fix it by passing in the number of load generators.

Simulate millions of users with JMeter

I have a system that should be able to handle millions of users requests concurrently. In order to check how the system handles the load, I setup a cluster of JMeter servers (slaves), and one controller (client).
I have a database of all users (~10M), and I need each request sent to be from a different user.
I am wondering how I can implement such a thing in JMeter. Basically, I thought about dividing a range of users (let's say 100,000) per each slave, and then within a given slave, each request should read a new user from the local 100,000 list, and delete it. Thus, I will eventually send a request from every user.
The thing is while this idea sounds logical theoretically, I do not exactly know how to implement it using the JMeter terms. Also, I am not sure how to read from database in the test, although I could theoretically read it in advance into a text file, and have each slave contain the text file with its 100,000 users portion.
I can setup a very large cluster of machines, so scale will not be the issue here. Just how to set it all up.
The best way to provide Jmeter with a list of parameters is to use a CSV file:
http://jmeter.apache.org/usermanual/component_reference.html#CSV_Data_Set_Config
You can configure the CSV dataset config to do make every thread use a different line in the CSV. Each engine will need to have it’s own unique CSV file, because the sharing mode does nto work between engines in distributed testing (you can try to automate this part, this can be interesting to do :) ).
This is how your script should look like:
1. Thread Group
1.1 HTTP sampler (login)
1.1.1 CSV dataset config
1.2 second http sampler
etc...
The login sampler will use the parameters loaded by from the CSV file, so for every ’login’ it will use a different line.
Distributed testing is pretty simple:
http://jmeter.apache.org/usermanual/remote-test.html
Keep in mind that running 100K concurrent users on a single Jmeter load engine will be hard (Jmeter consumes resources on the server, so you will need lots of CPU and memory). So you should also monitor the engines.
Also 1M users will cause a lot of data that the engines will send back to the console, so you might need to start a bunch of distributed tests in parallel, and at the end aggregate the results.
Cheers,
This is can be implemented by doing the following steps:
Taking the user credential dump and saving it to a csv file
split the csv file. copy 1 file to each Jmeter slave, in the same location on all the
machines e.g. "C:\Loadtest\"
from the controller, give the path of your csv file in "CSV Data Set Config".
Run the Test.
By doing the above steps, Jmeter controller will start execution of the test by pointing all the Jmeter Slave nodes to use the CSV file in the same location "C:\Loadtest\".
But the trick here is that all the machine will be using different set of users.
Hope this will help.

Resources