AWS Neptune Performance

AWS Neptune Performance - performance

I'm working on transferring data from our database which is a rdf store DB to AWS Neptune, and I'm facing some performance issues.
I have a db.r4.large Neptune instance & ec2 instance on the same vpc as Neptune.
Basically, I'm trying to ingest data to Neptune using the following http request: <myinstance>:8182/sparql.
Actually, I send the http request from my ec2 instance, and it seems that Neptune processing time is slow. In addition, it seems that Neptune's processing is not parallel.
Below are my tests & results:
I sent the following request to Neptune:
time curl -X POST -d #/tmp/my_file_32m.txt http://myneptune-poc.c0zm6uyrnnwp.us-east-1.neptune.amazonaws.com:8182/sparql
/tmp/my_file_32m.txt contains sparql insert commands and the time for this request is 34.037s while Neptune claims that it took 21.846 s:
{
"type" : "Commit",
"totalElapsedMillis" : 21846
}
real 0m34.037s
user 0m0.044s
sys 0m0.062s
A tcpdump can clearly proves that the response from Neptune was received in a delay of 34 seconds.
When I sent a data of 100m it took more than 1 min.
When I sent the same file of 32m in parallel, time was multiple in 2 :
time xargs -I % -P 8 curl -vX POST -d #/tmp/my_file_32m.txt "http://myneptune-poc.c0zm6uyrnnwp.us-east-1.neptune.amazonaws.com:8182/sparql" < <(printf '%s\n' {1..2})<
{
"type" : "Commit",
"totalElapsedMillis" : 29797
}
{
"type" : "Commit",
"totalElapsedMillis" : 30362
}
real 0m57.752s
user 0m0.137s
sys 0m0.101s
I took a tcpdump and clearly see from the wireshark that the request was sent in parallel, but there is a delay of ~1 min till Neptune returned 200 OK for both requests.
Actually, it seems that Neptune's processing is not concurrent.
request was sent in time 12 and 200 ok for both requests was sent in time 69 which is exactly 57 seconds of delay.
I tried to increase my Neptune instance size to db.r4.xlarge and also to db.r4.2xlarge, db, but I got the same performance.
I tried to send a compressed data in a gzip format in order to improve times, but it seems that Neptune doesn't support it (checking in wireshark the request was sent correctly).
I would like to hear your opinion about my tests and the results:
why performance is slow for a single http request?
why Neptune's processing is not parallel?

You are comparing the output of time (client side round trip time) with server reported totalEllapsedMillis. The former includes your network transmission time where as the latter is just the time that the db took to compute the query from the time it accepted the request. Do you have any metrics on the time it took to transmit your 100MB file?
Neptune does process queries in parallel (in fact the amount of parallelism scales with your instance type). If your queries are really small compared to the time it spends on the wire, then it may appear like the results completed one after the other. I would like to see more granular details of your experiments to see if there is an issue with your setup.
For starters, what is the network lag between your client and the DB endpoint? (ie how long does it take for you to make a request to the /status API for example)

Related

Latency shown in JMETER is much less when I use faster internet connection & run my Test. Is it due to faster internet or some other factors too?

I created a test in JMETER
Add > Sampler > HTTP Request = Get
Server Name = dainikbhaskar.com
No. of threads(users) = 1
Ramp-up period (seconds) = 1,
Loop Count = 1
(My internet connection is a broadband one with the speed 50 MBPS)
I ran the test, ran successful, latency comes as 127 & sometimes less than 100 in subsequent executions.
I switched off my Wi-Fi, connected my laptop with mobile hotspot & executed the same test.
Now the latency is 607, 932, 373, 542, 915
I believe it's happening due to INTERNET CONNECTION SPEED as rest of the inputs are same.
Please confirm whether my perception is correct ? :)

It is correct.
You can also get network latency from https://www.speedtest.net/ or https://fast.com/

Latency is the time from sending the request until first byte of response arrives, so called "Time to first byte"
In JMeter's world:
Latency is Connect Time + Time to send the request + time to get the first byte of response
Elapsed time is Latency + time to get the last byte of the response.
More information:
JMeter Glossary
Understanding Your Reports: Part 1 - What are KPIs?
If you get 5x times higher latency for other connection it means that the majority of time is spend for the packets to travel back and forth. You can see the more precise picture by looking at Network tab of your browser developer tools or using a special solution like Lighthouse

HOWTO split response time into dns name lookup, wait, transfer time in JMeter

I would like to know if I can get a breakdown of response times in JMeter load tests. E.g. when I use curl I can get the breakdown of each response time by specifying curl format like so,
\n
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_appconnect: %{time_appconnect}\n
time_pretransfer: %{time_pretransfer}\n
time_redirect: %{time_redirect}\n
time_starttransfer: %{time_starttransfer}\n
----------\n
time_total: %{time_total}\n
\n
and then making the actual curl call like so,
curl -w "#curl-format.txt" "http://some.api/call"
As you can see this gives me the breakdown in terms of time spent doing a DNS Name resolution, connecting with the server, transferring response form server to the client etc.
Is it possible to get something similar in JMeter?

So I have at least found a way to get what I want, partially.
In Jmeter I can collect the Connect time, which is a combination of DNS lookup, handshake & connection.
If someone has a better answer, would be happy to know it.

AJAX query weird delay between DNS lookup and initial connection on Chrome but not FF, what is it?

I have an AJAX query on my client that passes two parameters to a server:
var url = window.location.origin + "/instanceStats"
$.getJSON(url, { 'unit' : unit, "stat" : stat }, function(data) {
instanceData[key] = data;
var count = showInstanceStats(targetElement, unit, stat, limiter);
});
The server itself is a very simple Python Flask application. On that particular URL, it grabs the "unit" and "stat" parameters from the query to determine the name of a CSV file and line within that file, grabs the line, and sends the data back to the client formatted as JSON (roughly 1KB).
Here is the funny thing: When I measure the time it takes for the data to come back, I observe that some queries are fast (between 20 and 40 ms), and some queries are slow (between 320 and 350 ms). Varying the "stat" parameter (i.e. selecting a different line in the CSV) doesn't seem to have any impact. The fast and slow queries usually switch back and forth (i.e. all even queries are fast, all odd ones are slow). The Python server itself reports roughly the same time for each query.
AJAX itself doesn't seem to have any impact either, as I can take the url that is constructed in the JS and paste it into the browser myself and get the same behavior. Here are some measurements from two subsequent queries:
Fast: http://i.imgur.com/VQ7qopd.png
Slow: http://i.imgur.com/YuG0ROM.png
This seems to be Chrome-specific, as I've tried it on Firefox and the same experiment yields roughly the same query time everytime (between 30 and 50 ms). This is unfortunate, as I want to deploy on both Chrome and Firefox.
What's causing this behavior, and how can I fix it?

I've run into this also. It only seems to happen when using localhost. If you use 127.0.0.1 (or even the computer name), it will not have the extra delay.

I'm having it too, and it's exactly the same: my Node.js application serves Ajax requests and no matter which /url I request it's either 30ms or 300ms and it switches back and forth: odd requests are long, even requests are short.
The thing I see in Chrome Web Inspector (aka Chrome DevTools) is that there is a long gap between "DNS lookup" and "Initial Connection".
They say it's OCSP related here:
http://www.webpagetest.org/forums/showthread.php?tid=12357
OCSP is some kind of certificate validation protocol:
https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol
Moving from localhost to 127.0.0.1 seems to fix it: response times are 30ms now.

Strategies in reducing network delay from 500 milliseconds to 60-100 milliseconds

I am building an autocomplete functionality and realized the amount of time taken between the client and server is too high (in the range of 450-700ms)
My first stop was to check if this is result of server delay.
But as you can see these Nginx logs are almost always 0.001 milliseconds (request time is the last column). It’s hardly a cause of concern.
So it became very evident that I am losing time between the server and the client. My benchmarks are Google Instant's response times. Which almost often is in the range of 30-40 milliseconds. Magnitudes lower.
Although it’s easy to say that Google's has massive infrastructural capabilities to deliver at this speed, I wanted to push myself to learn if this is possible for someone who is not that level. If not 60 milliseconds, I want to shave off 100-150 milliseconds.
Here are some of the strategies I’ve managed to learn.
Enable httpd slowstart and initcwnd
Ensure SPDY if you are on https
Ensure results are http compressed
Etc.
What are the other things I can do here?
e.g
Does have a persistent connection help?
Should I reduce the response size dramatically?
Edit:
Here are the ping and traceroute numbers. The site is served via cloudflare from a Fremont Linode machine.
mymachine-Mac:c name$ ping site.com
PING site.com (160.158.244.92): 56 data bytes
64 bytes from 160.158.244.92: icmp_seq=0 ttl=58 time=95.557 ms
64 bytes from 160.158.244.92: icmp_seq=1 ttl=58 time=103.569 ms
64 bytes from 160.158.244.92: icmp_seq=2 ttl=58 time=95.679 ms
^C
--- site.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 95.557/98.268/103.569/3.748 ms
mymachine-Mac:c name$ traceroute site.com
traceroute: Warning: site.com has multiple addresses; using 160.158.244.92
traceroute to site.com (160.158.244.92), 64 hops max, 52 byte packets
1 192.168.1.1 (192.168.1.1) 2.393 ms 1.159 ms 1.042 ms
2 172.16.70.1 (172.16.70.1) 22.796 ms 64.531 ms 26.093 ms
3 abts-kk-static-ilp-241.11.181.122.airtel.in (122.181.11.241) 28.483 ms 21.450 ms 25.255 ms
4 aes-static-005.99.22.125.airtel.in (125.22.99.5) 30.558 ms 30.448 ms 40.344 ms
5 182.79.245.62 (182.79.245.62) 75.568 ms 101.446 ms 68.659 ms
6 13335.sgw.equinix.com (202.79.197.132) 84.201 ms 65.092 ms 56.111 ms
7 160.158.244.92 (160.158.244.92) 66.352 ms 69.912 ms 81.458 ms
mymachine-Mac:c name$ site.com (160.158.244.92): 56 data bytes

I may well be wrong, but personally I smell a rat. Your times aren't justified by your setup; I believe that your requests ought to run much faster.
If at all possible, generate a short query using curl and intercept it with tcpdump on both the client and the server.
It could be a bandwidth/concurrency problem on the hosting. Check out its diagnostic panel, or try estimating the traffic.
You can try and save a response query into a static file, then requesting that file (taking care as not to trigger the local browser cache...), to see whether the problem might be in processing the data (either server or client side).
Does this slowness affect every request, or only the autocomplete ones? If the latter, and no matter what nginx says, it might be some inefficiency/delay in recovering or formatting the autocompletion data for output.
Also, you can try and serve a static response bypassing nginx altogether, in case this is an issue with nginx (and for that matter: have you checked out nginx' error log?).

One approach I didn't see you mention is to use SSL sessions: you can add the following into your nginx conf to make sure that an SSL handshake (very expensive process) does not happen with every connection request:
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
See "HTTPS server optimizations" here:
http://nginx.org/en/docs/http/configuring_https_servers.html

I would recommend using New Relic if you aren't already. It is possible that the server-side code you have could be the issue. If you think that might be the issue, there are quite a few free code profiling tools.

You may want to consider an option to preload autocomplete options in the background while the page is rendered and then save a trie or whatever structure you use on the client in the local storage. When the user starts typing in the autocomplete field you would not need to send any requests to the server but instead query local storage.
Web SQL Database and IndexedDB introduce databases to the clientside.
Instead of the common pattern of posting data to the server via
XMLHttpRequest or form submission, you can leverage these clientside
databases. Decreasing HTTP requests is a primary target of all
performance engineers, so using these as a datastore can save many
trips via XHR or form posts back to the server. localStorage and
sessionStorage could be used in some cases, like capturing form
submission progress, and have seen to be noticeably faster than the
client-side database APIs.
For example, if you have a data grid component or an inbox with
hundreds of messages, storing the data locally in a database will save
you HTTP roundtrips when the user wishes to search, filter, or sort. A
list of friends or a text input autocomplete could be filtered on each
keystroke, making for a much more responsive user experience.
http://www.html5rocks.com/en/tutorials/speed/quick/#toc-databases

Neo4j 2.0.0 - Poor performance for dev/test in a virtual machine

I have Neo4j server running inside a virtual machine using Ubuntu 13.10 and I am accessing via REST using Cypher queries. The virtual machine has 4 GB of memory allocated to it.
I've changed the open file count to 40000, set the initial JVM heap to 1G and my neo4j.properties file is as follows:
neostore.nodestore.db.mapped_memory=250M
neostore.relationshipstore.db.mapped_memory=100M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=100M
keep_logical_logs=3 days
node_auto_indexing=true
node_keys_indexable=id
I've also updated sysctl based on the Neo4j Linux tuning guide:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
Since I am testing queries, the basic routine is to run my suite of tests and then delete all of the nodes and run them all again. At the start of each test run, the database has 0 nodes in it. My suite of tests of about 100 queries is taking 22 seconds to run. Basic parameterized creates such as:
CREATE (x:user { email: {param0},
name: {param1},
displayname: {param2},
id: {param3},
href: {param4},
object: {param5} })
CREATE x-[:LOGIN]->(:login { password: {param6},
salt: {param7} } )
are currently taking over 170ms to execute (and that's the average, first query time is 700ms). During a test run, the CPU in the VM never exceeds 50% and memory usage is at a steady 1.4Gb.
Why would creating a single node in an empty database take 170ms? At this point unit testing is becoming almost impossible since it is so slow. This is my first time trying to tune Neo4j so I'm not really sure how to figure out where the problem is or what changes should be made.
Additional Details
I'm using Go 1.2 to make REST calls to the cypher endpoint (http://localhost:7474/db/data/cypher) of a locally installed Neo4j instance. I'm setting the request headers for content-type to "application/json", accept to "application/json" and "X-Stream" to true. I always return either an array of maps or nothing depending on the query.
It seems like the creates are the problem and are taking forever. For example:
2014/01/15 11:35:51 NewUser took 123.314938ms
2014/01/15 11:35:51 NewUser took 156.101784ms
2014/01/15 11:35:52 NewUser took 167.439442ms
2014/01/15 11:35:52 ValidatePassword took 4.287416ms
NewUser creates two new nodes and one relationship and is taking 167ms, while ValidatePassword is a read-only operation and it completes in 4ms. Also note that the three calls to NewUser are identical parameterized queries. While the creates are the big problem, I'm also a little concerned that Neo4j is taking 4ms to just find a labeled node when there are only 100 nodes in the database.
I do not restart the server in between test runs or delete the database. I issue a single delete all nodes query MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r at the end of the test run. Running the same test suite multiple times back to back does not improve the query times.

Are your 100 queries all the same only with different parameters, or actually 100 different queries?
What you see is actually setup work. The parser has to load the parsing rules initially that takes a few ms. Also new queries that have not been seen are compiled, planned and put in the query cache.
So the first query always takes a bit longer. But as you parametrize all subsequent ones should be fast.
Can you confirm that?
I think you see the transactional overhead of flushing the transaction to disk.
Did you try to batch more requests into one? I.e. with the transactional endpoint? Or the /db/data/batch (but I'd rather use the new tx-endpoint /db/data/transaction).
Did you create an index for your lookup property for your validate query?
Can you do me a favor and test your create query without a label? I found some perf issues when testing that myself earlier this week.
Just ran a test with curl
for i in `seq 1 10`; do time curl -i -H content-type:application/json -H accept:application/json -H X-Stream:true -d #perf_test.json http://localhost:7474/db/data/cypher; done
I'm getting between 16 and 30ms per request externally including starting curl
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8; stream=true
Access-Control-Allow-Origin: *
Transfer-Encoding: chunked
Server: Jetty(9.0.5.v20130815)
{"columns":[],"data":[]}
real 0m0.016s
user 0m0.005s
sys 0m0.005s
Perhaps it is rather the VM (disk or network) or the cross-vm communication?
Did another test with ab and 1000 requests for both endpoints, got a mean of about 5 ms both times.
https://gist.github.com/jexp/8452037

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio