socket.io server fails in scale test with artillery - socket.io

I am executing the scale on socket.io using artillery using the steps mentioned in the blog - https://www.artillery.io/blog/load-testing-socketio-with-artillery with following modification (adding more users)
config:
target: "http://localhost:3000"
phases:
- duration: 60
arrivalRate: 5
- duration: 60
arrivalRate: 5
rampTo: 50
I am executing the test against the code available at https://github.com/socketio/socket.io/tree/main/examples/chat
Here is the findings
engine.socketio.emit: .......................................................... 1877
engine.socketio.emit_rate: ..................................................... 2/sec
errors.ECONNREFUSED: ........................................................... 111
errors.Error: xhr poll error: .................................................. 591
errors.Error: xhr post error: .................................................. 388
vusers.completed: .............................................................. 860
vusers.created: ................................................................ 1950
vusers.created_by_name.A chatty user: .......................................... 188
vusers.created_by_name.A mostly quiet user: .................................... 282
vusers.created_by_name.A user that just lurks: ................................. 1480
vusers.failed: ................................................................. 1090
vusers.session_length:
min: ......................................................................... 60007.8
max: ......................................................................... 174740.5
median: ...................................................................... 64236
p95: ......................................................................... 161191.7
p99: ......................................................................... 171159.6
Any idea, why it is failing for so many users?

Related

Loki: Ingester high memory usage

Please advise on the use of memory with the Loki - Ingester component.
I have the following setup: loki distributed v2.6.1 installed through the official helm chart in K8s.
The number of promtail clients is ~1000 hosts. Each of them generates a large load. About 5 million chunks (see screenshot below)
The number of loki_log_messages_total is 235 million per day.
My problem is that the ingester is using about 140 GB of RAM per day, and the memory consumption keeps increasing. I want to understand if this is normal behavior or can I somehow reduce memory usage through config? Tried to adjust various parameters myself, in particular chunk_idle_period and max_chunk_age. But no matter what values I set, memory consumption is still at 100+ GB.
I also tried to reduce the number of labels on the client side, at the moment the labels are as follows:
Here is my config:
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
retention_enabled: true
shared_store: s3
working_directory: /var/loki/compactor
distributor:
ring:
kvstore:
store: memberlist
frontend:
compress_responses: true
log_queries_longer_than: 5s
tail_proxy_url: http://loki-distributed-querier:3100
frontend_worker:
frontend_address: loki-distributed-query-frontend:9095
grpc_client_config:
max_recv_msg_size: 167772160
max_send_msg_size: 167772160
ingester:
autoforget_unhealthy: true
chunk_block_size: 262144
chunk_encoding: snappy
chunk_idle_period: 5m
chunk_retain_period: 30s
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
max_chunk_age: 15m
max_transfer_retries: 0
wal:
enabled: false
ingester_client:
grpc_client_config:
max_recv_msg_size: 167772160
max_send_msg_size: 167772160
limits_config:
cardinality_limit: 500000
enforce_metric_name: false
ingestion_burst_size_mb: 300
ingestion_rate_mb: 150
max_cache_freshness_per_query: 10m
max_entries_limit_per_query: 1000000
max_global_streams_per_user: 5000000
max_label_name_length: 1024
max_label_names_per_series: 300
max_label_value_length: 8096
max_query_series: 250000
per_stream_rate_limit: 150M
per_stream_rate_limit_burst: 300M
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 72h
split_queries_by_interval: 30m
memberlist:
join_members:
- loki-distributed-memberlist
querier:
engine:
timeout: 5m
query_timeout: 5m
query_range:
align_queries_with_step: true
cache_results: true
max_retries: 5
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_items: 1024
ttl: 24h
query_scheduler:
grpc_client_config:
max_recv_msg_size: 167772160
max_send_msg_size: 167772160
runtime_config:
file: /var/loki-distributed-runtime/runtime.yaml
schema_config:
configs:
- from: "2022-09-07"
index:
period: 24h
prefix: loki_index_
object_store: aws
schema: v12
store: boltdb-shipper
server:
grpc_server_max_recv_msg_size: 167772160
grpc_server_max_send_msg_size: 167772160
http_listen_port: 3100
http_server_idle_timeout: 300s
http_server_read_timeout: 300s
http_server_write_timeout: 300s
storage_config:
aws:
s3: https:/....
s3forcepathstyle: true
boltdb_shipper:
active_index_directory: /var/loki/boltdb_shipper/index
cache_location: /var/loki/boltdb_shipper/cache
shared_store: s3
index_cache_validity: 5m
table_manager:
retention_deletes_enabled: false
retention_period: 0s
In documentation I have not found any examples or information for heavy loads, so I decided to ask the community. I will be very grateful for help.

Unknown protocol error when testing a website in jmeter

When I tried to test a website. The test result was like this:
Thread Name: Test new22 1-1
Sample Start: 1970-01-01 05:30:00 IST
Load time: 0
Connect Time: 0
Latency: 0
Size in bytes: 841
Sent bytes:0
Headers size in bytes: 0
Body size in bytes: 841
Sample Count: 1
Error Count: 1
Data type ("text"|"bin"|""): text
Response code: Non HTTP response code: java.net.MalformedURLException
Response message: Non HTTP response message: unknown protocol: stagingblueridge.com
HTTPSampleResult fields:
ContentType:
DataEncoding: null
Kindly anybody help me to resolve the above issue...

Empty karma html report

Here are the facts:
I have a nodejs app
I've written some Jasmine tests
I'm using karma to generate the code coverage report
Here is the karma.conf.js:
module.exports = function (config) {
config.set({
frameworks: ['jasmine', 'browserify'],
files: [
'utils/dates.js',
'utils/maths.js',
'spec/server_utils_spec.js'
],
preprocessors: {
'utils/*.js': ['coverage', 'browserify'],
'spec/server_utils_spec.js': ['coverage', 'browserify']
},
reporters: ['progress', 'coverage'],
coverageReporter: {
type: 'html',
dir: 'coverage/'
},
colors: true,
logLevel: config.LOG_INFO,
autoWatch: true,
singleRun: false
})
}
These are my packages (I know they are global):
→ npm -g list | grep -i "karma"
├─┬ karma#2.0.4
├─┬ karma-browserify#5.3.0
├─┬ karma-coverage#1.1.2
├─┬ karma-html-reporter#0.2.7
├── karma-jasmine#1.1.2
├─┬ karma-mocha#1.3.0
and despite the terminal log
/usr/bin/karma start server/karma.conf.js
23 07 2018 15:39:36.033:INFO [framework.browserify]: registering rebuild (autoWatch=true)
23 07 2018 15:39:36.817:INFO [framework.browserify]: 3073 bytes written (0.03 seconds)
23 07 2018 15:39:36.817:INFO [framework.browserify]: bundle built
23 07 2018 15:39:36.818:WARN [karma]: No captured browser, open http://localhost:9876/
23 07 2018 15:39:36.822:INFO [karma]: Karma v2.0.4 server started at http://0.0.0.0:9876/
23 07 2018 15:39:39.135:INFO [Chrome 67.0.3396 (Linux 0.0.0)]: Connected on socket H9J96y-fFn-7PfkHAAAA with id manual-9692
Chrome 67.0.3396 (Linux 0.0.0): Executed 0 of 2 SUCCESS (0 secs / 0 secs)
Chrome 67.0.3396 (Linux 0.0.0): Executed 1 of 2 SUCCESS (0 secs / 0.001 secs)
Chrome 67.0.3396 (Linux 0.0.0): Executed 2 of 2 SUCCESS (0 secs / 0.002 secs)
Chrome 67.0.3396 (Linux 0.0.0): Executed 2 of 2 SUCCESS (0.033 secs / 0.002 secs)
TOTAL: 2 SUCCESS
I get this result when I open the generated html report:
I've tried the solutions to all the other related questions and I've come up with nothing.
My questions:
why the empty report?
how can I manage to have the report
Thank you in advance!

JMeter StackOverflow

I am using JMeter 2.11. The following parameters are defined in the jmeter.bat file :
set HEAP=-Xms512m -Xmx12144m
set PERM=-XX:PermSize=64m -XX:MaxPermSize=64m
I run my scenario in batch mode with 50 users. It appears some threads are blocked during 20 min or 1 hour and run again after. For example, we have the following with the unit group 6:
<httpSample t="13" lt="13" ts="1410856270124" s="true" lb="/hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715" rc="200" rm="OK" tn="Groupe d'unités 1-6" dt="text" by="412">
<java.net.URL>http://172.16.1.23/hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715</java.net.URL>
</httpSample>
**executed at 16/09/2014 10:31:10**
<httpSample t="0" lt="0" ts="1410856270138" s="true" lb="/hopex/statesessionprovider.aspx" rc="200" rm="OK" tn="Groupe d'unités 1-6" dt="text" by="238">
<java.net.URL>http://172.16.1.23/hopex/statesessionprovider.aspx</java.net.URL>
</httpSample>
**executed at 16/09/2014 10:31:10**
<sample t="0" lt="0" ts="1410856274818" s="true" lb="Timer between steps" rc="200" rm="OK" tn="Groupe d'unités 1-6" dt="text" by="1478"/>
**executed at 16/09/2014 10:31:15**
<httpSample t="3" lt="3" ts="1410860493293" s="false" lb="/Hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715" rc="500" rm="Internal Server Error" tn="Groupe d'unités 1-6" dt="text" by="298">
<java.net.URL>http://172.16.1.23/Hopex/service.aspx?data=generationType-standard|generator-E98AEA3A4F717715</java.net.URL>
</httpSample>
**executed at 16/09/2014 11:41:33**
It appears the timers are executed at 10H31 and the next request is sent at 11H41, that is to say 1H10 after the timers. Our server application log shows that the last request has never been handled because of IIS
web application session timeout. So it means JMeter made a pause of more than one hour before sending the request. It should be noted that if we remove the JMeter while statement from our scenario, it works.
I retrieve this information from JMeter logs. It seems the problem comes from JMeter with stack overflow.
2014/09/16 10:30:49 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:30:49 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:30:49 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:30:51 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:31:00 INFO - jmeter.reporters.Summariser: summary + 196 in 30s = 6.5/s Avg: 154 Min: 0 Max: 11347 Err: 0 (0.00%) Active: 50 Started: 50 Finished: 0
2014/09/16 10:31:00 INFO - jmeter.reporters.Summariser: summary = 5974 in 1103s = 5.4/s Avg: 406 Min: 0 Max: 47864 Err: 0 (0.00%)
2014/09/16 10:31:01 WARN - jmeter.control.GenericController: StackOverflowError detected
2014/09/16 10:31:32 INFO - jmeter.reporters.Summariser: summary + 154 in 32s = 4.9/s Avg: 94 Min: 0 Max: 10982 Err: 0 (0.00%) Active: 50 Started: 50 Finished: 0
2014/09/16 10:31:32 INFO - jmeter.reporters.Summariser: summary = 6128 in 1135s = 5.4/s Avg: 399 Min: 0 Max: 47864 Err: 0 (0.00%)
2014/09/16 10:31:37 WARN - jmeter.control.GenericController: StackOverflowError detected
I tried to change JMeter.bat parameters using -XSS but we had side effects. I also ran the test with those parameters:
set HEAP=-Xms512m -Xmx12144m
set NEW=-XX:NewSize=128m -XX:MaxNewSize=128m
set SURVIVOR=-XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=50%
set TENURING=-XX:MaxTenuringThreshold=2
set RMIGC=-Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000
set PERM=-XX:PermSize=64m -XX:MaxPermSize=64m
Nothing change, the same problem applies.
Does anyone has an idea of how to remove those JMeter errors ?
This is quite blocking for us since it means JMeter could not handle correctly 50 users simultaneously...
Regards
Sylvie
Issue was due to a know bug in version 2.11 :
Listeners don't show iteration counts when a If Controller has a condition which is always false from the first iteration (see Bug 52496 ). A workaround is to add a sampler at the same level as (or superior to) the If Controller. For example a Test Action sampler with 0 wait time (which doesn't generate a sample), or a Debug Sampler with all fields set to False (to reduce the sample size).
Also opened as:
https://issues.apache.org/bugzilla/show_bug.cgi?id=56160
This bug is now fixed so will be available as 2.12
2014/09/16 10:31:37 WARN - jmeter.control.GenericController: StackOverflowError detected
This error can appear when there is a logical error in your test plan. Please check carefully with adding beanshell listener for printing extra logs within Logical controller such as looping/iterating controller in your test plan.
I believe that in the "set HEAP=-Xms512m -Xmx12144m " both values need to be the same.
I think that if you would try with "set HEAP=-Xms2048m -Xmx2048m " the error will be gone.
Just increase jmeter stack size by using -Xss option

Definition of Connect, Processing, Waiting in apache bench

When I run apache bench I get results like:
Command: abs.exe -v 3 -n 10 -c 1 https://mysite
Connection Times (ms)
min mean[+/-sd] median max
Connect: 203 213 8.1 219 219
Processing: 78 177 88.1 172 359
Waiting: 78 169 84.6 156 344
Total: 281 389 86.7 391 564
I can't seem to find the definition of Connect, Processing and Waiting. What do those numbers mean?
By looking at the source code we find these timing points:
apr_time_t start, /* Start of connection */
connect, /* Connected, start writing */
endwrite, /* Request written */
beginread, /* First byte of input */
done; /* Connection closed */
And when request is done some timings are stored as:
s->starttime = c->start;
s->ctime = ap_max(0, c->connect - c->start);
s->time = ap_max(0, c->done - c->start);
s->waittime = ap_max(0, c->beginread - c->endwrite);
And the 'Processing time' is later calculated as
s->time - s->ctime;
So if we translate this to a timeline:
t1: Start of connection
t2: Connected, start writing
t3: Request written
t4: First byte of input
t5: Connection closed
Then the definitions would be:
Connect: t1-t2 Most typically the network latency
Processing: t2-t5 Time to receive full response after connection was opened
Waiting: t3-t4 Time-to-first-byte after the request was sent
Total time: t1-t5
From http://chestofbooks.com/computers/webservers/apache/Stas-Bekman/Practical-mod_perl/9-1-1-ApacheBench.html:
Connect and Waiting times
The amount of time it took to establish the connection and get the first bits of a response
Processing time
The server response time—i.e., the time it took for the server to process the request and send a reply
Total time
The sum of the Connect and Processing times
I equate this to:
Connect time: the amount of time it took for the socket to open
Processing time: first byte + transfer
Waiting: time till first byte
Total: Sum of Connect + Processing
Connect: Time it takes to connect to remote host
Processing: Total time minus time it takes to connect to remote host
Waiting: Response first byte receive minus last byte sent
Total: From before connect till after the connection is closed

Resources