Load testing microservice - bad results - performance

I am struggling with the results of a load test of my api service.
The service is deployed on GCP. There are 2 instances running behind a nginx.
I am using spring boot 2.0 with default tomcat configuration.
The service is running inside a docker container with -Xmx768M
A request to the tested endpoint fetches a single row from a database and returns it.
My results:
ab -c 1 -n 1 /testapi
Requests per second: 15
Time per request: 63ms
ab -c 10 -n 100 /testapi
Requests per second: 102
Time per request: 97ms
ab -c 50 -n 100 /testapi
Requests per second: 95
Time per request: 522ms
ab -c 100 -n 100 /testapi
Requests per second: 93
Time per request: 1065ms
ab -c 200 -n 200 /testapi
Timeout
Worrying aspects:
Heap usage is rising after each test
At 99.99% heap usage I am getting slow responses until the container crashes with java.lang.OutOfMemoryError: Java heap space
I am able to ddos 2 instances of an API from a single laptop.
What am I doing wrong here? Does the tomcat configuration needs further tweaking? Are the JVM parameters wrong?
Thanks
EDIT tested code:
#ApiOperation(value="Get Device", response = Device.class, authorizations = {#Authorization(value = "Authorization")})
#RequestMapping(value="/device/commnr/{commnr}", method= RequestMethod.GET)
public DeferredResult<ResponseEntity> getDevice(#PathVariable String commnr) {
final DeferredResult<ResponseEntity> deferred = new DeferredResult<>();
HttpServletRequest request = getCurrentHttpRequest();
timer.schedule(new TimerTask() {
#Override
public void run() {
if(deferred.isSetOrExpired()) {
throw new RuntimeException();
} else {
User user = authenticationService.loginUser(request);
Device device = deviceService.findByCommNr(commnr);
if(!AuthTools.isUserDevice(device, user)) {
deferred.setResult(new ResponseEntity<>("Requested Device does not belong to user.", HttpStatus.FORBIDDEN));
}
if(device == null) {
deferred.setResult(new ResponseEntity<>(HttpStatus.NOT_FOUND));
}
deferred.setResult(new ResponseEntity<>(device, HttpStatus.OK));
}
}
}, 10);
return deferred;
}

Related

SSE connection keeps failing every 5 minutes

I'm exposing a simple SSE endpoint using the SseEmitter Spring API, persisting all the emitters in a ConcurrentHashMap. The timeout for each emitter is set to 24 hours. Every 10 seconds I'm sending a message to all the clients. Clients are subscribed with native EventSource implementation, listening for events of particular name.
Unfortunately, I've noticed that every 5 minutes the connection is lost and reestablished again - even though timeout of emitter was explicitly set to 24 hours. Client's part does log it as an error, however on server side there's nothing. The issue occurs on both Tomcat and Jetty. I'd like to keep the session open without any interruptions, so resetting the connection every 5 minutes is unacceptable. Any ideas why this could be happening?
#RestController
#RequestMapping("api/v1/sse")
class SseController {
private val emitters = ConcurrentHashMap<String, SseEmitter>()
#GetMapping
fun initConnection(#RequestParam token: String): SseEmitter {
logger.info { "Init connection from $token" }
val emitter = SseEmitter(24 * 60 * 60 * 1000)
emitter.onCompletion {
logger.info { "Completion" }
emitters.remove(token)
}
emitter.onTimeout { logger.info { "Timeout " } }
emitter.onError { logger.error(it) { "Error" } }
emitters[token] = emitter
return emitter
}
#Scheduled(fixedRate = 10000)
fun send() {
emitters.forEach { (k, v) ->
logger.info { "Sending message to $k" }
v.send(
SseEmitter.event()
.id(UUID.randomUUID().toString())
.name("randomEvent")
.data("some data")
)
}
}
}
const eventSource = new EventSource(url);
eventSource.addEventListener('randomEvent', (e) =>
console.log(e.data)
);
eventSource.onerror = (e) => console.log(e);
Alright, seems it was an issue with Stackblitz's service worker. I've just implemented the same client-side solution in Chrome's plain console and the disconnecting is no longer happening.

Load/stress test in a SPA with Hasura Cloud Graphql as a backend and subscriptions

I'm trying to do a performance test on a
SPA with a Frontend in React, deployed with Netlify
As a backend we're using Hasura Cloud Graphql (std version) https://hasura.io/, where everything from the client goes directly through Hasura to the DB.
DB is in Postgress housed in Heroku (Std 0 tier).
We're hoping to be able to have around 800 users simultaneous.
The problem is that i'm loss about how to do it or if i'm doing it correctly, seeing how most of our stuff are "subscriptions/mutations" that I had to transform into queries. I tried doing those test with k6 and Jmeter but i'm not sure if i'm doing them properly.
k6 test
At first, i did a quick search and collected around 10 subscriptions that are commonly used. Then i tried to create a performance test with k6 https://k6.io/docs/using-k6/http-requests/ but i wasn't able to create a working subscription test so i just transform each subscription into a query and perform a http.post with this setup:
export const options = {
stages: [
{ duration: '30s', target: 75 },
{ duration: '120s', target: 75 },
{ duration: '60s', target: 50 },
{ duration: '30s', target: 30 },
{ duration: '10s', target: 0 }
]
};
export default function () {
var res = http.post(prod,
JSON.stringify({
query: listaQueries.GetDesafiosCursosByKey(
keys.desafioCursoKey
)}), params);
sleep(1)
}
I did this for every query and ran each test individually. Unfortunately, the numbers i got were bad, and somehow our test environment was getting better times than production. (The only difference afaik is that we're using Hasura Cloud for production).
I tried to implement websocket, but i couldn't getthem work and configure them to do a stress/load test.
K6 result
Jmeter test
After that, i tried something similar with Jmeter, but again i couldn't figure how to set up a subscription test (after i while, i read in a blog that jmeter doesn't support it
https://qainsights.com/deep-dive-into-graphql-in-jmeter/ ) so i simply transformed all subscriptions into a query and tried to do the same, but the numbers I was getting were different and much higher than k6.
Jmeter query Config 1
Jmeter query config 2
Jmeter thread config
Questions
I'm not sure if i'm doing it correctly, if transforming every subscription into a query and perform a http request is a correct approach for it. (At least I know that those queries return the data correctly).
Should i just increase the number of VUS/threads until i get a constant timeout to simulate a stress test? There were some test that are causing a graphql error on the website Graphql error, and others were having a
""WARN[0059] Request Failed error="Post \"https://xxxxxxx-xxxxx.herokuapp.com/v1/graphql\": EOF""
in the k6 console.
Or should i just give up with k6/jmeter and try to search for another tool to perfom those test?
Thanks you in advance, and sorry for my English and explanation, but i'm a complete newbie at this.
I'm not sure if i'm doing it correctly, if transforming every
subscription into a query and perform a http request is a correct
approach for it. (At least I know that those queries return the data
correctly).
Ideally you would be using WebSocket as that is what actual clients will most likely be using.
For code samples, check out the answer here.
Here's a more complete example utilizing a main.js entry script with modularized Subscription code in subscriptions\bikes.brands.js. It also uses the Httpx library to set a global request header:
// main.js
import { Httpx } from 'https://jslib.k6.io/httpx/0.0.5/index.js';
import { getBikeBrandsByIdSub } from './subscriptions/bikes-brands.js';
const session = new Httpx({
baseURL: `http://54.227.75.222:8080`
});
const wsUri = 'wss://54.227.75.222:8080/v1/graphql';
const pauseMin = 2;
const pauseMax = 6;
export const options = {};
export default function () {
session.addHeader('Content-Type', 'application/json');
getBikeBrandsByIdSub(1);
}
// subscriptions/bikes-brands.js
import ws from 'k6/ws';
/* using string concatenation */
export function getBikeBrandsByIdSub(id) {
const query = `
subscription getBikeBrandsByIdSub {
bikes_brands(where: {id: {_eq: ${id}}}) {
id
brand
notes
updated_at
created_at
}
}
`;
const subscribePayload = {
id: "1",
payload: {
extensions: {},
operationName: "query",
query: query,
variables: {},
},
type: "start",
}
const initPayload = {
payload: {
headers: {
"content-type": "application/json",
},
lazy: true,
},
type: "connection_init",
};
console.debug(JSON.stringify(subscribePayload));
// start a WS connection
const res = ws.connect(wsUri, initPayload, function(socket) {
socket.on('open', function() {
console.debug('WS connection established!');
// send the connection_init:
socket.send(JSON.stringify(initPayload));
// send the chat subscription:
socket.send(JSON.stringify(subscribePayload));
});
socket.on('message', function(message) {
let messageObj;
try {
messageObj = JSON.parse(message);
}
catch (err) {
console.warn('Unable to parse WS message as JSON: ' + message);
}
if (messageObj.type === 'data') {
console.log(`${messageObj.type} message received by VU ${__VU}: ${Object.keys(messageObj.payload.data)[0]}`);
}
console.log(`WS message received by VU ${__VU}:\n` + message);
});
});
}
Should i just increase the number of VUS/threads until i get a
constant timeout to simulate a stress test?
Timeouts and errors that only happen under load are signals that you may be hitting a bottleneck somewhere. Do you only see the EOFs under load? These are basically the server sending back incomplete responses/closing connections early which shouldn't happen under normal circumstances.
My expectation is that your test should be replicating the real user activity as close as possible. I doubt that real users will be sending requests to GraphQL directly and well-behaved load test must replicate the real life application usage as close as possible.
So I believe you should move to HTTP protocol level and mimic the network footprint of the real browser instead of trying to come up with individual GraphQL queries.
With regards to JMeter and k6 differences it might be the case that k6 produces higher throughput given the same hardware and running requests at maximum speed as it evidenced by kind of benchmark in the Open Source Load Testing Tools 2021 article, however given you're trying to simulate real users using real browsers accessing your applications and the real users don't hammer the application non-stop, they need some time to "think" between operations you should be getting the same number of requests for both load testing tools, if JMeter doesn't give you the load you want to conduct make sure to follow JMeter Best Practices and/or consider running it in distributed mode .

Messages are not moved to DLQ

I'm using ElasticMQ (via docker image v1.3.3) and can't get the DLQ to work.
This is my elasticmq.conf:
include classpath("application.conf")
node-address {
protocol = http
host = localhost
port = 9324
context-path = ""
}
rest-sqs {
enabled = true
bind-port = 9324
bind-hostname = "0.0.0.0"
// Possible values: relaxed, strict
sqs-limits = strict
}
rest-stats {
enabled = true
bind-port = 9325
bind-hostname = "0.0.0.0"
}
queues {
main {
defaultVisibilityTimeout = 10 seconds
delay = 2 seconds
receiveMessageWait = 0 seconds
deadLettersQueue {
name = "retry"
maxReceiveCount = 1
}
}
retry {
defaultVisibilityTimeout = 10 seconds
delay = 2 seconds
receiveMessageWait = 0 seconds
}
deadletter {
defaultVisibilityTimeout = 10 seconds
delay = 2 seconds
receiveMessageWait = 0 seconds
}
}
I'm sending a message like so (using the AWS CLI):
aws --endpoint-url http://localhost:9324 sqs send-message --queue-url http://localhost:9324/queue/main --message-body "Hello, queue"
And receiving it like so:
aws --endpoint-url http://localhost:9324 sqs receive-message --queue-url http://localhost:9324/queue/main --wait-time-seconds 10
I'm not deleting the message from the queue but the message is being deleted and not being moved to the DLQ (i.e., the retry queue). I'm also trying to receive the message with Java code and getting the same result.
Why is that?

Geocoding requests to HERE API randomly fails

I am trying to geocode addresses with HERE API. I am not free plan. I try following code (Spring Boot in Kotlin):
override fun geocode(address: Address): Coordinate? {
val uriString = UriComponentsBuilder
.fromHttpUrl(endpoint)
.queryParam("app_id", appId)
.queryParam("app_code", appCode)
.queryParam("searchtext", addressToSearchText(address))
.toUriString()
logger.info("Geocode requested with url {}", uriString)
val response = restTemplate.getForEntity(uriString, String::class.java)
return response.body?.let {
Klaxon().parse<GeocodeResponse>(it)
}?.let {
it.Response.View.firstOrNull()?.Result?.firstOrNull()
}?.let {
Coordinate(
latitude = it.Location.DisplayPosition.Latitude,
longitude = it.Location.DisplayPosition.Longitude
)
}.also {
if (it == null) {
logger.warn("Geocode failed: {}", response.body)
}
}
}
It turned out that when I call this method many times in a row, some requests returns empty responses, like this:
{
"Response":{
"MetaInfo":{
"Timestamp":"2019-04-18T11:33:17.756+0000"
},
"View":[
]
}
}
I could not figure out any rule why some requests fail. It seems to be just random.
However, when I try to call same URLs with curl of in my browser, everything works just fine.
I guess there is some limit for amount requests per seconds, but I could not find anything in HERE documentation.
Does anyone have an idea about the limit? Or may it be something else?
Actually, there was a problem with my code. Requests were failing for addresses having "special" symbols like ü and ö. The problem was with building request URL
val uriString = UriComponentsBuilder
.fromHttpUrl(endpoint)
.queryParam("app_id", appId)
.queryParam("app_code", appCode)
.queryParam("searchtext", addressQueryParam(address))
.build(false) // <= this was missed
.toUriString()

Observable vs Future Performance

I am working with Vert.x 2.x (http://vertx.io) which makes extensive use of asynchronous callbacks. These quickly become unwieldy with typical nesting/callback hell issues.
I have considered both Scala Futures/Promises (which I think would be the defacto approach) and also Reactive Extensions (RxScala).
From my testing I have found some interesting performance results.
My testing is pretty basic, I'm just issuing a bunch of HTTP requests (via weighttp) to a Vert.x verticle that makes an asynchronous call across the Vert.x eventbus, and processes a response that is then returned in an HTTP 200 response.
What I have found is the following (performance here is measured in terms of HTTP requests per second):
Async Callback performance = 68,305 rps
Rx performance = 64,656 rps
Future/Promises performance = 61,376 rps
The test conditions were:
Mac Pro OS X Yosemite 10.10.2
Oracle JVM 1.8U25
weighttp version 0.3
Vert.x 2.1.5
Scala 2.10.4
RxScala 0.23.0
4 x Web Service Verticle Instances
4 x Backend Service Verticle Instances
The test command was
weighttp -n 1000000 -c 128 -7 8 -k "localhost:8888"
The figures above are the average of five test runs less best and worst result. Note the results are very consistent around the presented average (no more than a few hundred rps deviation).
Is there any known reason why the above might be happening - i.e. Rx > Futures in pure requests per second?
Reactive Extensions in my opinion are superior as they can do so much more but given the standard approach to async callbacks typically seems to go down the Futures/Promises track I'm surprised at the performance hit.
EDIT: Here is the Web Service Verticle
class WebVerticle extends Verticle {
override def start() {
val port = container.env().getOrElse("HTTP_PORT", "8888").toInt
val approach = container.env().getOrElse("APPROACH", "ASYNC")
container.logger.info("Listening on port: " + port)
container.logger.info("Using approach: " + approach)
vertx.createHttpServer.requestHandler { req: HttpServerRequest =>
approach match {
case "ASYNC" => sendAsync(req, "hello")
case "FUTURES" => sendWithFuture("hello").onSuccess { case body => req.response.end(body) }
case "RX" => sendWithObservable("hello").doOnNext(req.response.end(_)).subscribe()
}
}.listen(port)
}
// Async callback
def sendAsync(req: HttpServerRequest, body: String): Unit = {
vertx.eventBus.send("service.verticle", body, { msg: Message[String] =>
req.response.end(msg.body())
})
}
// Rx
def sendWithObservable(body: String) : Observable[String] = {
val subject = ReplaySubject[String]()
vertx.eventBus.send("service.verticle", body, { msg: Message[String] =>
subject.onNext(msg.body())
subject.onCompleted()
})
subject
}
// Futures
def sendWithFuture(body: String) : Future[String] = {
val promise = Promise[String]()
vertx.eventBus.send("service.verticle", body, { msg: Message[String] =>
promise.success(msg.body())
})
promise.future
}
}
EDIT: Here is the Backend Verticle
class ServiceVerticle extends Verticle {
override def start(): Unit = {
vertx.eventBus.registerHandler("service.verticle", { msg: Message[String] =>
msg.reply("Hello Scala")
})
}
}

Resources