How to performance test workflow execution? - performance

I have 2 APIs
Create a workflow (http POST request)
Check workflow status (http GET
request)
I want to performance test on how much time does workflow takes to complete.
Tried two ways:
Option 1 Created a java test that triggers workflow create API and then poll status API to check if status turns to CREATED. I check the time taken in this process which gives me performance results.
Option 2 Was using Gatling to do the same
val createWorkflow = http("create").post("").body(ElFileBody("src/main/resources/weather.json")).asJson.check(status.is(200))
.check(jsonPath("$.id").saveAs("id"))
val statusWorkflow = http("status").get("/${id}")
.check(jsonPath("$.status").saveAs("status")).asJson.check(status.is(200))
val scn = scenario("CREATING")
.exec(createWorkflow)
.repeat(20){exec(statusWorkflow)}
Gatling one didn't really work (or I am doing it in some wrong way). Is there a way in Gatling I can merge multiple requests and do something similar to Option 1
Is there some other tool that can help me out to performance test such scenarios?

I think something like below should work when using Gatling's tryMax
.tryMax(100) {
pause(1)
.exec(http("status").get("/${id}")
.check(jsonPath("$.status").saveAs("status")).asJson.check(status.is(200))
)
}
Note: I didn't try this out locally. More information about tryMax:
https://medium.com/#vcomposieux/load-testing-gatling-tips-tricks-47e829e5d449 (Polling: waiting for an asynchronous task)
https://gatling.io/docs/current/advanced_tutorial/#step-05-check-and-failure-management

Related

Form Recognizer Heavy Workload

My use case is the following :
Once every day I upload 1000 single page pdf to Azure Storage and process them with Form Recognizer via python azure-form-recognizer latest client.
So far I’m using the Async version of the client and I send the 1000 coroutines concurrently.
tasks = {asyncio.create_task(analyse_async(doc)): doc for doc in documents}
pending = set(tasks)
# Handle retry
while pending:
# backoff in case of 429
time.sleep(1)
# concurrent call return_when all completed
finished, pending = await asyncio.wait(
pending, return_when=asyncio.ALL_COMPLETED
)
# check if task has exception and register for new run.
for task in finished:
arg = tasks[task]
if task.exception():
new_task = asyncio.create_task(analyze_async(doc))
tasks[new_task] = doc
pending.add(new_task)
Now I’m not really comfortable with this setup. The main reason being the unpredictable successive states of the service in the same iteration. Can be up then throw 429 then up again. So not enough deterministic for me. I was wondering if another approach was possible. Do you think I should rather increase progressively the transactions. Start with 15 (default TPS) then 50 … 100 until the queue is empty ? Or another option ?
Thx
We need to enable the CORS and make some changes to that CORS to make it available to access the heavy workload.
Follow the procedure to implement the heavy workload in form recognizer.
Make it for page blobs here for higher and best performance.
Redundancy is also required. Make it ZRS for better implementation.
Create a storage account to upload the files.
Go to CORS and add the URL required.
Set the Allowed origins to https://formrecognizer.appliedai.azure.com
Go to containers and upload the documents.
Upload the documents. Use the container and blob information to give as the input for the recognizer. If the case is from Form Recognizer studio, the size of the total documents is considered and also the number of characters limit is there. So suggested to use the python code using the container created as the input folder.

Get status of a task Elasticsearch for a long running update query

Assuming I have a long running update query where I am updating ~200k to 500k, perhaps even more.Why I need to update so many documents is beyond the scope of the question.
Since the client times out (I use the official ES python client), I would like to have a way to check what the status of the bulk update request is, without having to use enormous timeout values.
For a short request, the response of the request can be used, is there a way I can get the response of the request as well or if I can specify a name or id to a request so as to reference it later.
For a request which is running : I can use the tasks API to get the information.
But for other statuses - completed / failed, how do I get it.
If I try to access a task which is already completed, I get resource not found .
P.S. I am using update_by_query for the update
With the task id you can look up the task directly:
GET /_tasks/taskId:1
The advantage of this API is that it integrates with
wait_for_completion=false to transparently return the status of
completed tasks. If the task is completed and
wait_for_completion=false was set on it them it’ll come back with a
results or an error field. The cost of this feature is the document
that wait_for_completion=false creates at .tasks/task/${taskId}. It is
up to you to delete that document.
From here https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html#docs-update-by-query-task-api
My use case went like this, I needed to do an update_by_query and I used painless as the script language. At first I did a reindex (when testing). Then I tried using the update_by_query functionality (they resemble each other a lot). I did a request to the task api (the operation hasn't finished of course) and I saw the task being executed. When it finished I did a query and the data of the fields that I was manipulating had disappeared. The script worked since I used the same script for the reindex api and everything went as it should have. I didn't investigate further because of lack of time, but... yeah, test thoroughly...
I feel GET /_tasks/taskId:1 confusing to understand. It should be
GET http://localhost:9200/_tasks/taskId
A taskId looks something like this NCvmGYS-RsW2X8JxEYumgA:1204320.
Here is my trivial explanation related to this topic.
To check a task, you need to know its taskId.
A task id is a string that consists of node_id, a colon, and a task_sequence_number. An example is taskId = NCvmGYS-RsW2X8JxEYumgA:1204320 where node_id = NCvmGYS-RsW2X8JxEYumgA and task_sequence_number = 1204320. Some people including myself thought taskId = 1204320, but that's not the way how the elasticsearch codebase developers understand it at this moment.
A taskId can be found in two ways.
wait_for_deletion = false. When sending a request to ES, with this parameter, the response will be {"task" : "NCvmGYS-RsW2X8JxEYumgA:1204320"}. Then, you can check a status of that task like this GET http://localhost:9200/_tasks/NCvmGYS-RsW2X8JxEYumgA:1204320
GET http://localhost:9200/_tasks?detailed=false&actions=*/delete/byquery. This example will return you the status of all tasks with action = delete_by_query. If you know there is only one task running on ES, you can find your taskId from the response of all running tasks.
After you know the taskId, you can get the status of a task with this.
GET /_tasks/taskId
Notice you can only check the status of a task when the task is running, or a task is generated with wait_for_deletion == false.
More trivial explanation, wait_for_deletion by default is true. Based on my understanding, tasks with wait_for_deletion = true are "in-memory" only. You can still check the status of a task while it's running. But it's completely gone after it is completed/canceled. Meaning checking the status will return you a 'resouce_not_found_exception'. Tasks with wait_for_deletion = false will be stored in an ES system index .task. You can still check it's status after it finishes. However, you might want to delete this task document from .task index after you are done with it to save some space. The deletion request looks like this
http://localhost:9200/.tasks/task/NCvmGYS-RsW2X8JxEYumgA:1204320
You will receive resouce_not_found_exception if a taskId is not present. (for example, you deleted some task twice, or you are deleting an in-memory task, whose wait_for_deletetion == true).
About this confusing taskId thing, I made a pull request https://github.com/elastic/elasticsearch/pull/31122 to help clarify the Elasticsearch document. Unfortunately, they rejected it. Ugh.

Jmeter Test Plan summary report PASS/FAIL

I'm stuck on finding solution on one problem with Jmeter. I need to put some logic into my Test Plan that can give simple report PASS/FAIL calculated on test cases execution results and put in generated JTL report afterwards. For instance
All tests passed - Test Plan result=PASS
One or more tests failed - Test Plan result=FAIL
The majority of suitable options assume using third-party tools, to wit:
you can run JMeter test in Jenkins and use Performance plugin, it allows to conditionally fail the build if the amount of failed requests exceeds specified threshold
you can run JMeter test using Taurus tool as a wrapper, it has flexible and powerful Pass/Fail Criteria Subsystem allowing to set different criteria definitions to mark the test as passed or failed. If build is failed Taurus process returns non-zero exit code.
If above approaches are not suitable for any reason please elaborate your question and explain how and where you would like to see this "FAIL" or "PASS" result.
Add one BeanShell Listener and one BeanShell Sampler at the end of your Thread Group and put this in Listener:
if(sampleEvent.getResult() instanceof org.apache.jmeter.protocol.http.sampler.HTTPSampleResult)
if (!sampleEvent.getResult().isResponseCodeOK())
vars.put("res", -1);
And in BS Sampler put:
if you wanna store result as property:
props.put("testPlanResult", vars.get("res") != -1 ? "PASS" : "FAIL");
if you wanna store result in a file:
f = new FileOutputStream("/path/to/file.txt", false);
p = new PrintStream(f);
p.println("Result: " + (vars.get("res") != -1 ? "PASS" : "FAIL"));
p.close();
f.close();
From here you can do what ever you need with created property or file containing result...
Hope this helps you!
EDIT:
You will need to add this import if writing result to file:
import org.apache.jmeter.services.FileServer;

Jmeter - how to get SLA metric

Does exists any way to calculate count of requests under SLA in jmeter from UI? For example, count of requests that response time < 400 ms?
I had a similar problem a while ago and wrote a little tool - see https://github.com/sgoeschl/jmeter-sla-report
Simplest solution is to use Simple Data Writer to save Label, Elapsed Time and / or Latency to a CSV file, which will generate raw output like this:
elapsed,label
423,sampler1
452,sampler2
958,sampler1
152,sampler1
And from here you can take it to any other tool (awk, Excel, etc.) to filter results you want.
Another option is to use BeanShell Listener to generate such report on the fly. Something like this:
long responseTime = sampleResult.getEndTime() - sampleResult.getStartTime();
if(responseTime < 400) {
FileOutputStream f = new FileOutputStream("myreport.csv", true);
PrintStream p = new PrintStream(f);
this.interpreter.setOut(p);
print(sampleResult.getSampleLabel() + "," + responseTime);
f.close();
}
This method, though, may not be performant enough if you are planning to run a stress test with many (more than 200-300) users and many operations that "fit" the filter.
JMeter provides OOTB a Web Report that provides tons of informations regarding your load test using standard metrics like APDEX, Percentiles ...
See this:
http://jmeter.apache.org/usermanual/generating-dashboard.html
If you still want this, do the following:
Add as a child of your request add a Duration Assertion:
All response below it will be marked as failing.
And in the report, you'll have the count of successful requests meeting this SLA criterion.

How to test our code to send 400 queries per user per minute in Gatling

I want to limit server calls to 400. For that I need to check if I pass 400+ queries will it give me error.
And how write code for 1 user repeated 400 times over a minute.
val UIScenario = scenario("UI Simulation")
.repeat(400)
{
exec(loginScns).exec(search)
}
setUp(
delphiUIScenario.inject(rampUsers(1) over(1 second))
).protocols(httpProtocol)
Please help to sort out this
Thanks !!!!!!!!!!!!
For such as scenarios you can use Apache Benchmark tool. Here is homepage and documentation of it.
http://httpd.apache.org/docs/2.2/en/programs/ab.html

Resources