Slow network time to pass large data from resolver to client - performance

I am using Apollo and GraphQL, and I have a performance issue passing he data from the resolver to the client. The response time grows in proportion to the size of the data sent. I'm trying to send about 1k nested records, and it takes somewhere in the area of 5-10 seconds depending on the result set. When sending the same data as a stringified JSON, the result takes a fraction of a second to reach the client.
I tried to increase node memory, but there was no improvement.
Relevant versions / specs:
"apollo-cache-inmemory": "^1.5.1",
"apollo-client": "^2.5.1",
"apollo-link": "^1.2.11",
"apollo-link-error": "^1.1.10",
"apollo-link-http": "^1.5.14",
"apollo-server": "^2.4.8",
"graphql-tag": "^2.10.1",
"graphql-type-json": "^0.2.4",
"graphql-yoga": "^1.17.4",
How can I reduce the response time from resolver to client?

Related

REDIS Consuming 20GB of RAM fo 150k keys

I'm using out of the box Laravel's REDIS implementation. I'm caching collections of queries that are needed for the site only, so I get lots of keys that have heavy serialized objects stored in a key as a String Type.
With this, REDIS consumes between 22GB - 25GB(Max memory). This leads sometimes to key evictions which we want to avoid at all costs
Should this be addressed from code POV by optimization (only storing query resultset) or is there something that we're doing wrong on REDIS?
used_memory:25182306344
used_memory_human:23.45G
used_memory_rss:24106418176
used_memory_rss_human:22.45G
used_memory_peak:25238402912
used_memory_peak_human:23.51G
used_memory_peak_perc:99.78%
used_memory_overhead:14926818
used_memory_startup:508096
used_memory_dataset:25167379526
used_memory_dataset_perc:99.94%
total_system_memory:32899166208
total_system_memory_human:30.64G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:26843545600
maxmemory_human:25.00G
maxmemory_policy:allkeys-lru
mem_fragmentation_ratio:0.96
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0

AWS Neptune Performance

I'm working on transferring data from our database which is a rdf store DB to AWS Neptune, and I'm facing some performance issues.
I have a db.r4.large Neptune instance & ec2 instance on the same vpc as Neptune.
Basically, I'm trying to ingest data to Neptune using the following http request: <myinstance>:8182/sparql.
Actually, I send the http request from my ec2 instance, and it seems that Neptune processing time is slow. In addition, it seems that Neptune's processing is not parallel.
Below are my tests & results:
I sent the following request to Neptune:
time curl -X POST -d #/tmp/my_file_32m.txt http://myneptune-poc.c0zm6uyrnnwp.us-east-1.neptune.amazonaws.com:8182/sparql
/tmp/my_file_32m.txt contains sparql insert commands and the time for this request is 34.037s while Neptune claims that it took 21.846 s:
{
"type" : "Commit",
"totalElapsedMillis" : 21846
}
real 0m34.037s
user 0m0.044s
sys 0m0.062s
A tcpdump can clearly proves that the response from Neptune was received in a delay of 34 seconds.
When I sent a data of 100m it took more than 1 min.
When I sent the same file of 32m in parallel, time was multiple in 2 :
time xargs -I % -P 8 curl -vX POST -d #/tmp/my_file_32m.txt "http://myneptune-poc.c0zm6uyrnnwp.us-east-1.neptune.amazonaws.com:8182/sparql" < <(printf '%s\n' {1..2})<
{
"type" : "Commit",
"totalElapsedMillis" : 29797
}
{
"type" : "Commit",
"totalElapsedMillis" : 30362
}
real 0m57.752s
user 0m0.137s
sys 0m0.101s
I took a tcpdump and clearly see from the wireshark that the request was sent in parallel, but there is a delay of ~1 min till Neptune returned 200 OK for both requests.
Actually, it seems that Neptune's processing is not concurrent.
request was sent in time 12 and 200 ok for both requests was sent in time 69 which is exactly 57 seconds of delay.
I tried to increase my Neptune instance size to db.r4.xlarge and also to db.r4.2xlarge, db, but I got the same performance.
I tried to send a compressed data in a gzip format in order to improve times, but it seems that Neptune doesn't support it (checking in wireshark the request was sent correctly).
I would like to hear your opinion about my tests and the results:
why performance is slow for a single http request?
why Neptune's processing is not parallel?
You are comparing the output of time (client side round trip time) with server reported totalEllapsedMillis. The former includes your network transmission time where as the latter is just the time that the db took to compute the query from the time it accepted the request. Do you have any metrics on the time it took to transmit your 100MB file?
Neptune does process queries in parallel (in fact the amount of parallelism scales with your instance type). If your queries are really small compared to the time it spends on the wire, then it may appear like the results completed one after the other. I would like to see more granular details of your experiments to see if there is an issue with your setup.
For starters, what is the network lag between your client and the DB endpoint? (ie how long does it take for you to make a request to the /status API for example)

Response time different in Postman/Jmeter and web API

I have an MVC Web aPI and I have trouble in comparing the response time of this API. I added some code to calculate the response time:
In the AuthorizationFilterAttribute OnAuthorization, I have the below code:
actionContext.Request.Headers.Add("RequestStartTime", DateTime.Now.ToString());
I have an ActionFilterAttribute, and an OnActionExecuted in which I have the below code:
string strRequestStartTime = actionExecutedContext.Request.Headers.GetValues("RequestStartTime").First();
DateTime dtstartTime = DateTime.Parse(strRequestStartTime);
TimeSpan tsTimeTaken = DateTime.Now.Subtract(dtstartTime);
actionExecutedContext.Response.Headers.Add("RequestProcessingTime", tsTimeTaken.TotalMilliseconds + "ms");
The response has the header "RequestProcessingTime" in milli seconds. The issue is whenever I try the same request using Postman/JMeter, I see that the response time is lesser than what I see in my Response. Why is this happening?
I think this is due to the fact the header does not consider time for request to reach the server and response to travel back, my expectation is that it shows only the time, required to process the request on the server side. So JMeter reports time as delta from the time when request has been sent and the time when the last byte has been received, which is more correct in terms of real user experience.
See definitions of "Elapsed Time", "Connect Time" and "Latency" in the JMeter Glossary. You may also be interested in How to Analyze the Results of a Load Test article which demonstrates the impact of network capacity on the overall performance

es_rejected_execution_exception rejected execution

I'm getting the following error when doing indexing.
es_rejected_execution_exception rejected execution of org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1#16248886
on EsThreadPoolExecutor[bulk, queue capacity = 50,
org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#739e3764[Running,
pool size = 16, active threads = 16, queued tasks = 51, completed
tasks = 407667]
My current setup:
Two nodes. One is the master (data: true, master: true) while the other one is data only (data: true, master: false). They are both EC2 I2.4XL (16 Cores, 122GB RAM, 320GB instance storage). 2 shards, 1 replication.
Those two nodes are being fed by our aggregation server which has 20 separate workers. Each worker makes bulk indexing request to our ES cluster with 50 items to index. Each item is between 1000-4000 characters.
Current server setup: 4x client facing servers -> aggregation server -> ElasticSearch.
Now the issue is this error only started occurring when we introduced the second node. Before when we had one machine, we got consistent indexing throughput of 20k request per second. Now with two machine, once it hits the 10k mark (~20% CPU usage)
we start getting some of the errors outlined above.
But here is the interesting thing which I have noticed. We have a mock item generator which generates a random document to be indexed. Generally these documents are of the same size, but have random parameters. We use this to do the stress test and check the stability. This mock item generator sends requests to aggregation server which in turn passes them to Elasticsearch. The interesting thing is, we are able to index around 40-45k (# ~80% CPU usage) items per second without getting this error. So it seems really interesting as to why we get this error. Has anyone seen this error or know what could be causing it?

AJAX query weird delay between DNS lookup and initial connection on Chrome but not FF, what is it?

I have an AJAX query on my client that passes two parameters to a server:
var url = window.location.origin + "/instanceStats"
$.getJSON(url, { 'unit' : unit, "stat" : stat }, function(data) {
instanceData[key] = data;
var count = showInstanceStats(targetElement, unit, stat, limiter);
});
The server itself is a very simple Python Flask application. On that particular URL, it grabs the "unit" and "stat" parameters from the query to determine the name of a CSV file and line within that file, grabs the line, and sends the data back to the client formatted as JSON (roughly 1KB).
Here is the funny thing: When I measure the time it takes for the data to come back, I observe that some queries are fast (between 20 and 40 ms), and some queries are slow (between 320 and 350 ms). Varying the "stat" parameter (i.e. selecting a different line in the CSV) doesn't seem to have any impact. The fast and slow queries usually switch back and forth (i.e. all even queries are fast, all odd ones are slow). The Python server itself reports roughly the same time for each query.
AJAX itself doesn't seem to have any impact either, as I can take the url that is constructed in the JS and paste it into the browser myself and get the same behavior. Here are some measurements from two subsequent queries:
Fast: http://i.imgur.com/VQ7qopd.png
Slow: http://i.imgur.com/YuG0ROM.png
This seems to be Chrome-specific, as I've tried it on Firefox and the same experiment yields roughly the same query time everytime (between 30 and 50 ms). This is unfortunate, as I want to deploy on both Chrome and Firefox.
What's causing this behavior, and how can I fix it?
I've run into this also. It only seems to happen when using localhost. If you use 127.0.0.1 (or even the computer name), it will not have the extra delay.
I'm having it too, and it's exactly the same: my Node.js application serves Ajax requests and no matter which /url I request it's either 30ms or 300ms and it switches back and forth: odd requests are long, even requests are short.
The thing I see in Chrome Web Inspector (aka Chrome DevTools) is that there is a long gap between "DNS lookup" and "Initial Connection".
They say it's OCSP related here:
http://www.webpagetest.org/forums/showthread.php?tid=12357
OCSP is some kind of certificate validation protocol:
https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol
Moving from localhost to 127.0.0.1 seems to fix it: response times are 30ms now.

Resources