Disapointed by Vertx versus raw servlet

Disapointed by Vertx versus raw servlet - performance

Vertx is way slower than raw asyncservlet with exactly the same code to execute
I have implemented a very basic GET http method that does nothing than Thread.sleep(30) before returning http response text "ok". I did this with jetty-servlet and vertx-verticles
Vertx vertx = Vertx.vertx();
DeploymentOptions d = new DeploymentOptions();
d.setInstances(400);
d.setWorker(true);
d.setWorkerPoolSize(400);
vertx.deployVerticle("com.vertx.Ping",d);
public class Ping extends AbstractVerticle {
private HttpServer server;
public void start(Future<Void> startFuture) {
server = vertx.createHttpServer().requestHandler(req -> {
try {
Thread.currentThread().sleep(30);
} catch (InterruptedException e) {
e.printStackTrace();
}
req.response().end("ok");
});
server.listen(9999);
}
}
ab -k -c 1 -n 1 => same result for servlet and vertx : ~31ms
Servlet: ab -k -c 1000 -n 100000
Percentage of the requests served within a certain time (ms)
50% 219
66% 224
75% 227
80% 229
90% 235
95% 240
98% 245
99% 247
100% 264 (longest request)
Vertx: ab -k -c 1000 -n 100000 => after more than 2 minutes i stopped the test
Let's try something easier :
Vertx: ab -k -c 1000 -n 10000 (10 times less)
Percentage of the requests served within a certain time (ms)
50% 3930
66% 3943
75% 3964
80% 3977
90% 3997
95% 4009
98% 4019
99% 4028
100% 4038 (longest request)
Vertx is so damn slow, what do i do wrong folks ? Thank you

You are not using Vert.x as it should be. Remember that Vert.x is based on asynchronous operations.
First off, you don't need 400 worker verticle instances. Just 1 regular (non-worker) verticle instance is more than enough. You are doing this because you call Thread.sleep, which causes a thread to block, but there is a much simpler way.
Next: don't use blocking operations! You are running code on an event-loop, so you must not block. Your code should look like:
vertx
.createHttpServer()
.requestHandler(req -> {
vertx.setTimer(30, tid -> {
req.response().end("ok");
}
})
.listen(9999);
You should see a drastic improvement in your measures, and all of that with just 1 thread compared to the 400 you had.
Note: when you need to call a blocking operation, you should have a look at executeBlocking in the Vertx class, which offloads some code to a worker, and then dispatches the result as a new event.

Related

load test using k6 with fixed request rate

trying to run k6 performance test cases by using the following scenario's
how to hit x amount of API per min for
example: Produce 500 messages per minute → Check how API behaves
It should be same, next time we run the test case.

This is easily achievable with the constant-arrival-rate executor:
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
scenarios: {
500_mps: {
executor: 'constant-arrival-rate',
duration: '10m',
rate: 500,
timeUnit: '1m',
preAllocatedVUs: 10,
maxVUs: 100,
},
},
};
export default function() {
http.get('your api');
}
The above code will try to run the default function 500 times per minute for 10 minutes. It will use at most 100 VUs – if your API is too slow, k6 will not start more VUs and you will not reach the target load.

Concurrent block in hacklang

Since hack is a single threaded language, what is the benefit of using a concurrent block?
concurrent {
await func_a;
await func_b;
}
My understanding is that one job is waiting until the other job is over.

Concurrent doesn't mean multithreading
The concurrent block will wait for all async operations (awaitables) in that block similarly to javascript Promise.all (also single threaded).
Without concurrent:
await func_a; // 1 sec
await func_b; // 2 sec
await func_c; // 3 sec
// will get here after at least 6 seconds (sum of requests time)
With concurrent:
concurrent {
await func_a; // 1 sec
await func_b; // 2 sec
await func_c; // 3 sec
}
// will get here after at least 3 seconds (longest request time)
It fits if you want to make multiple IO requests in parallel.
It doesn't fit if you want to run multiple CPU jobs.

Backpressure with Reactors Parallel Flux + Timeouts

I'm currently working on using paralellism in a Flux. Right now I'm having problems with the backpressure. In our case we have a fast producing service we want to consume, but we are much slower.
With a normal flux, this works so far, but we want to have parallelism. What I see when I'm using the approach with
.parallel(2)
.runOn(Schedulers.parallel())
that there is a big request on the beginning, which takes quite a long time to process. Here occurs also a different problem, that if we take too long to process, we somehow seem to generate a cancel event in the producer service (we consume it via webflux rest-call), but no cancel event is seen in the consumer.
But back to problem 1, how is it possible to bring this thing back to sync. I know of the prefetch parameter on the .parallel() method, but it does not work as I expect.
A minimum example would be something like this
fun main() {
val atomicInteger = AtomicInteger(0)
val receivedCount = AtomicInteger(0)
val processedCount = AtomicInteger(0)
Flux.generate<Int> {
it.next(atomicInteger.getAndIncrement())
println("Emitted ${atomicInteger.get()}")
}.doOnEach { it.get()?.let { receivedCount.addAndGet(1) } }
.parallel(2, 1)
.runOn(Schedulers.parallel())
.flatMap {
Thread.sleep(200)
log("Work on $it")
processedCount.addAndGet(1)
Mono.just(it * 2)
}.subscribe {
log("Received ${receivedCount.get()} and processed ${processedCount.get()}")
}
Thread.sleep(25000)
}
where I can observe logs like this
...
Emitted 509
Emitted 510
Emitted 511
Emitted 512
Emitted 513
2022-02-02T14:12:58.164465Z - Thread[parallel-1,5,main] Work on 0
2022-02-02T14:12:58.168469Z - Thread[parallel-2,5,main] Work on 1
2022-02-02T14:12:58.241966Z - Thread[parallel-1,5,main] Received 513 and processed 2
2022-02-02T14:12:58.241980Z - Thread[parallel-2,5,main] Received 513 and processed 2
2022-02-02T14:12:58.442218Z - Thread[parallel-2,5,main] Work on 3
2022-02-02T14:12:58.442215Z - Thread[parallel-1,5,main] Work on 2
2022-02-02T14:12:58.442315Z - Thread[parallel-2,5,main] Received 513 and processed 3
2022-02-02T14:12:58.442338Z - Thread[parallel-1,5,main] Received 513 and processed 4
So how could I adjust that thing that I can use the parallelism but stay in backpressure/sync with my producer? The only way I got it to work is with a semaphore acquired before the parallelFlux and released after work, but this is not really a nice solution.

Ok for this szenario it seemed crucial that prefetch of parallel and runOn had to bet seen very low, here to 1.
With defaults from 256, we requested too much from our producer, so that there was already a cancel event because of the long time between the first block of requests for getting the prefetch and the next one when the Flux decided to fill the buffer again.

Why and how is the quota "critial read requests" exceeded when using batchCreateContacts

I'm programming a contacts export from our database to Google Contacts using the Google People API. I'm programming the requests over URL via Google Apps Script.
The code below - using https://people.googleapis.com/v1/people:batchCreateContacts - works for 13 to about 15 single requests, but then Google returns this error message:
Quota exceeded for quota metric 'Critical read requests (Contact and Profile Reads)' and limit 'Critical read requests (Contact and Profile Reads) per minute per user' of service 'people.googleapis.com' for consumer 'project_number:***'.
For speed I send the request with batches of 10 parallel requests.
I have the following two questions regarding this problem:
Why, for creating contacts, I would hit a quotum regarding read requests?
Given the picture link below, why would sending 2 batches of 10 simultaneous requests (more precise: 13 to 15 single requests) hit that quotum limit anyway?
quotum limit of 90 read requests per user per minute as displayed on console.cloud.google.com
Thank you for any clarification!
Further reading: https://developers.google.com/people/api/rest/v1/people/batchCreateContacts
let payloads = [];
let lengthPayloads;
let limitPayload = 200;
/*Break up contacts in payload limits*/
contacts.forEach(function (contact, index) /*contacts is an array of objects for the API*/
{
if(!(index%limitPayload))
{
lengthPayloads = payloads.push(
{
'readMask': "userDefined",
'sources': ["READ_SOURCE_TYPE_CONTACT"],
'contacts': []
}
);
}
payloads[lengthPayloads-1]['contacts'].push(contact);
}
);
Logger.log("which makes "+payloads.length+" payloads");
let parallelRequests = [];
let lengthParallelRequests;
let limitParallelRequest = 10;
/*Break up payloads in parallel request limits*/
payloads.forEach(function (payload, index)
{
if(!(index%limitParallelRequest))
lengthParallelRequests = parallelRequests.push([]);
parallelRequests[lengthParallelRequests-1].push(
{
'url': "https://people.googleapis.com/v1/people:batchCreateContacts",
'method': "post",
'contentType': "application/json",
'payload': JSON.stringify(payload),
'headers': { 'Authorization': "Bearer " + token }, /*token is a token of a single user*/
'muteHttpExceptions': true
}
);
}
);
Logger.log("which makes "+parallelRequests.length+" parallelrequests");
let responses;
parallelRequests.forEach(function (parallelRequest)
{
responses = UrlFetchApp.fetchAll(parallelRequest); /* error occurs here*/
responses = responses.map(function (response) { return JSON.parse(response.getContentText()); });
responses.forEach(function (response)
{
if(response.error)
{
Logger.log(JSON.stringify(response));
throw response;
}
else Logger.log("ok");
}
);
Output of logs:
which makes 22 payloads
which makes 3 parallelrequests
ok (15 times)
(the error message)

I had raised the same issue in Google's issue tracker.
Seems that the single BatchCreateContacts or BatchUpdateContacts call consumes six (6) "Critical Read Request" quota per request. Still did not get an answer why for creating/updating contacts, we are hitting the limit of critical read requests.

Quota exceeded for quota metric 'Critical read requests (Contact and Profile Reads)' and limit 'Critical read requests (Contact and Profile Reads) per minute per user' of service 'people.googleapis.com' for consumer 'project_number:***'.
There are two types of quotas: project based quotas and user based quotas. Project based quotas are limits placed upon your project itself. User based quotes are more like flood protection they limit the number of requests a single user can make over a period of time.
When you send a batch request with 10 requests in it it counts as ten requests not as a single batch request. If you are trying to run this parallel then you are defiantly going to be overflowing the request per minute per user quota.
Slow down this is not a race.
Why, for creating contacts, I would hit a quota regarding read requests?
I would chock it up to a bad error message.
Given the picture link below, why would sending 13 to 15 requests hit that quota limit anyway? ((there are 3 read requests before this code)) quota limit of 90 read requests per user per minute as displayed on console.cloud.google.com
Well you are sending 13 * 10 = 130 per minute that would exceed the request per minute. There is also no way of knowing how fast your system is running it could be going faster as it will depend upon what else the server is doing at the time it gets your requests what minute they are actually being recorded in.
My advice is to just respect the quota limits and not try to understand why there are to many variables on Googles servers to be able to tack down what exactly a minute is. You could send 100 requests in 10 seconds and then try to send another 100 in 55 seconds and you will get the error you could also get the error after 65 seconds depend upon when they hit the server and when the server finished processing your initial 100 requests.
Again slow down.

Jmeter 3.0: JSR223+Groovy vs. BeanShell

Could you please help me to understand the difference between response times when i am using JSR223+Groovy (with caching) and BeanShell and what a reason of it:
The gray line is JSR223, red - BeanShell
Inside them a have a code which sending Protobuf message via HTTP protocol:
byte[] data = protobuf.toByteArray();
String SESSION = vars.get("SESSION");
String url = "http://" + vars.get("SERVER_NAME")+":8080/messages/list";
HttpClient client = new DefaultHttpClient();
////System.out.println(url);
HttpPost post = new HttpPost(url);
HttpEntity entity = new ByteArrayEntity(data);
post.setEntity(entity);
post.setHeader(HttpHeaders.CONTENT_TYPE, "application/x-protobuf");
post.setHeader(HttpHeaders.ACCEPT,"*/*");
post.setHeader(HttpHeaders.ACCEPT_ENCODING,"identity");
post.setHeader(HttpHeaders.CONNECTION,"Keep-Alive");
post.setHeader(HttpHeaders.USER_AGENT,"UnityPlayer/5.0.3p3 (http://unity3d.com) ");
post.setHeader("Cookie", "SESSION="+SESSION);
HttpResponse response=null;
try {
response = client.execute(post);
} catch (ClientProtocolException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
ByteArrayOutputStream baos = new ByteArrayOutputStream();
response.getEntity().writeTo(baos);
byte[] b = baos.toByteArray();

I observed the same issue with Jmeter v3.0 running on Sun Java 1.8. In my view, Groovy + JSR223 is memory intensive and loads tons of temporary objects and meta class objects. This increases the garbage collection overhead substantially. I was using G1GC as the GC algorithm.
Test #1 : Groovy + JSR223 - 8 requests / sec was the throughput achieved with 60 to 80% CPU and 3GB of heap space consumed
Test #2 : JSR223 + java (beanshell implementation) - GC overhead reduced considerably and the CPU % was somewhere around 40 to 60%. However, the throughput didn't improve. I observed plenty of thread locks for the bsh.ForName method
Test #3 : plain beanshell sampler with java code in it - This was the best! The throughput immediately reached an incredible 15000 per sec with CPU% at 100%. However, the overhead was due to the load. I reduced it down by 10% of the original load and it was able to achieved more than 50 req/sec easily with just 20% CPU.
Based on the experiement conducted above, I would suggest using plain beanshell sampler with java code in it instead of using JSR223.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio