WCF, ThreadPool.QueueUserWorkItem and Windows Azure - performance

I have a weird problem that I hope you can help me out with.
On our development server we are running Windows 2008R2 with IIS 7.5 on a virtual x64 instance with 8GB RAM.
Here I call a WCF method that uses ThreadPool.QueueUserWorkItem to process a large amount of hierarchical data. This works fine, and work rather fast (a 270 MB XML is read an processed producing 190.035 records within 379 seconds). The client is done calling the method about 250 seconds in.
Now the same "workflow" on Windows Azure is a whole other case. Although similar (Large instance in Round Robin configuration), Windows Azure stop within seconds the client disconnect. This means that only 160.055 records is written and far slower - 917 seconds. The problem here is, that I miss around 30.000 records which should now be queued on the two Azure instances, but it seems like - on client disconnect - to abandon the remaining work.
The client uses HttpWebRequest for communication and both solutions run .NET 4.0.
What is that I am missing out on here?
Thanks in advance for any help regarding this issue.

My bad guys - I simply could not in my wildest imagination foresee that Windows Azure would be so slow .. so by increasing the HttpWebRequest from 2 minutes to 30 minutes I was able to achieve the same data volume as in our development environment.
So - I will not delete the question - but let this stand as a reference for you soon to come Azure guys.
I am positive that Azure (and other cloud providers) is the future, but from Denmark to "North Europe" the latency is high - and SQL Azure has yet to prove it can perform when talking OLTP and normalized databases.
DEVELOPMENT (VIRTUAL ENVIRONMENT)
190.335 records from a 299 MB file took 379 seconds on a single instance
WINDOWS AZURE (NORTH EUROPE)
190.335 records from a 299 MB file took 1.400 seconds on two LARGE instances
The good news is, that WCF and ThreadPool work flawlessly and no special considerations (except a high timeout) is necessary.
Just to clarify, the 299MB file is split up in multiple REST calls to the server, in a format similiar to this one:
<?xml version="1.0" encoding="UTF-8"?>
<HttpPost absolutePath="A/B/C/D/E/OO">
<Parameters xmlns="http://somenamespace">
<A>Package</A>
<B>100</B>
<C>Generic</C>
<D>ReceiverParty</D>
<E>
<F xmlns="http://somenamespace">
<G xmlns="http://somenamespace/Product">Long Text</G>
<H xmlns="http://somenamespace/Product">1</H>
<I xmlns="http://somenamespace/Product">PK</I>
<J xmlns="http://somenamespace/Product">5995</J>
<K xmlns="http://somenamespace/Product">
<L xmlns="http://somenamespace/P/Q">Discount</L>
<M xmlns="http://somenamespace/P/Q">1000</M>
<N xmlns="http://somenamespace/P/Q">6995</N>
</K>
</F>
</E>
<OO>
<O>
<A>Product</A>
<B>100</B>
<C>Generic</C>
<D>ReceiverParty</D>
<E>
<F xmlns="http://somenamespace">
<G xmlns="http://somenamespace/Product">Long Text</G>
<H xmlns="http://somenamespace/Product">1</H>
<I xmlns="http://somenamespace/Product">PK</I>
<J xmlns="http://somenamespace/Product">5995</J>
<K xmlns="http://somenamespace/Product">
<L xmlns="http://somenamespace/P/Q">Discount</L>
<M xmlns="http://somenamespace/P/Q">1000</M>
<N xmlns="http://somenamespace/P/Q">6995</N>
</K>
</F>
</E>
</O>
</OO>
</Parameters>
</HttpPost>

Related

MongoDB-Java performance with rebuilt Sync driver vs Async

I have been testing MongoDB 2.6.7 for the last couple of months using YCSB 0.1.4. I have captured good data comparing SSD to HDD and am producing engineering reports.
After my testing was completed, I wanted to explore the allanbank async driver. When I got it up and running (I am not a developer, so it was a challenge for me), I first wanted to try the rebuilt sync driver. I found performance improvements of 30-100%, depending on the workload, and was very happy with it.
Next, I tried the async driver. I was not able to see much difference between it and my results with the native driver.
The command I'm running is:
./bin/ycsb run mongodb -s -P workloads/workloadb -p mongodb.url=mongodb://192.168.0.13:27017/ycsb -p mongodb.writeConcern=strict -threads 96
Over the course of my testing (mostly with the native driver), I have experimented with more and less threads than 96; turned on "noatime"; tried both xfs and ext4; disabled hyperthreading; disabled half my 12 cores; put the journal on a different drive; changed sync from 60 seconds to 1 second; and checked the network bandwidth between the client and server to ensure its not oversubscribed (10GbE).
Any feedback or suggestions welcome.
The Async move exceeded my expectations. My experience is with the Python Sync (pymongo) and Async driver (motor) and the Async driver achieved greater than 10x the throughput. further, motor is still using pymongo under the hoods but adds the async ability. that could easily be the case with your allanbank driver.
Often the dramatic changes come from threading policies and OS configurations.
Async needn't and shouldn't use any more threads than cores on the VM or machine. For example, if you're server code is spawning a new thread per incoming conn -- then all bets are off. start by looking at the way the driver is being utilized. A 4 core machine uses <= 4 incoming threads.
On the OS level, you may have to fine-tune parameters like net.core.somaxconn, net.core.netdev_max_backlog, sys.fs.file_max, /etc/security/limits.conf nofile and the best place to start is looking at nginx related performance guides including this one. nginx is the server that spearheaded or at least caught the attention of many linux sysadmin enthusiasts. Contrary to popular lore one should reduce your keepalive timeout opposed to lengthen it. The default keep-alive timeout is some absurd (4 hours) number of seconds. you might want to cut the cord in 1 minute. basically, think a short sweet relationship with your clients connections.
Bear in mind that Mongo is not Async so you can use a Mongo driver pool. nevertheless, don't let the driver get stalled on slow queries. cut it off in 5 to 10 seconds using the following equivalents in Java. I'm just cutting and pasting here with no recommendations.
# Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. If max_time_ms is None no limit is applied.
# Raises TypeError if max_time_ms is not an integer or None. Raises InvalidOperation if this Cursor has already been used.
CONN_MAX_TIME_MS = None
# socketTimeoutMS: (integer) How long (in milliseconds) a send or receive on a socket can take before timing out. Defaults to None (no timeout).
CLIENT_SOCKET_TIMEOUT_MS=None
# connectTimeoutMS: (integer) How long (in milliseconds) a connection can take to be opened before timing out. Defaults to 20000.
CLIENT_CONNECT_TIMEOUT_MS=20000
# waitQueueTimeoutMS: (integer) How long (in milliseconds) a thread will wait for a socket from the pool if the pool has no free sockets. Defaults to None (no timeout).
CLIENT_WAIT_QUEUE_TIMEOUT_MS=None
# waitQueueMultiple: (integer) Multiplied by max_pool_size to give the number of threads allowed to wait for a socket at one time. Defaults to None (no waiters).
CLIENT_WAIT_QUEUE_MULTIPLY=None
Hopefully you will have the same success. I was ready to bail on Python prior to async

Strategies in reducing network delay from 500 milliseconds to 60-100 milliseconds

I am building an autocomplete functionality and realized the amount of time taken between the client and server is too high (in the range of 450-700ms)
My first stop was to check if this is result of server delay.
But as you can see these Nginx logs are almost always 0.001 milliseconds (request time is the last column). It’s hardly a cause of concern.
So it became very evident that I am losing time between the server and the client. My benchmarks are Google Instant's response times. Which almost often is in the range of 30-40 milliseconds. Magnitudes lower.
Although it’s easy to say that Google's has massive infrastructural capabilities to deliver at this speed, I wanted to push myself to learn if this is possible for someone who is not that level. If not 60 milliseconds, I want to shave off 100-150 milliseconds.
Here are some of the strategies I’ve managed to learn.
Enable httpd slowstart and initcwnd
Ensure SPDY if you are on https
Ensure results are http compressed
Etc.
What are the other things I can do here?
e.g
Does have a persistent connection help?
Should I reduce the response size dramatically?
Edit:
Here are the ping and traceroute numbers. The site is served via cloudflare from a Fremont Linode machine.
mymachine-Mac:c name$ ping site.com
PING site.com (160.158.244.92): 56 data bytes
64 bytes from 160.158.244.92: icmp_seq=0 ttl=58 time=95.557 ms
64 bytes from 160.158.244.92: icmp_seq=1 ttl=58 time=103.569 ms
64 bytes from 160.158.244.92: icmp_seq=2 ttl=58 time=95.679 ms
^C
--- site.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 95.557/98.268/103.569/3.748 ms
mymachine-Mac:c name$ traceroute site.com
traceroute: Warning: site.com has multiple addresses; using 160.158.244.92
traceroute to site.com (160.158.244.92), 64 hops max, 52 byte packets
1 192.168.1.1 (192.168.1.1) 2.393 ms 1.159 ms 1.042 ms
2 172.16.70.1 (172.16.70.1) 22.796 ms 64.531 ms 26.093 ms
3 abts-kk-static-ilp-241.11.181.122.airtel.in (122.181.11.241) 28.483 ms 21.450 ms 25.255 ms
4 aes-static-005.99.22.125.airtel.in (125.22.99.5) 30.558 ms 30.448 ms 40.344 ms
5 182.79.245.62 (182.79.245.62) 75.568 ms 101.446 ms 68.659 ms
6 13335.sgw.equinix.com (202.79.197.132) 84.201 ms 65.092 ms 56.111 ms
7 160.158.244.92 (160.158.244.92) 66.352 ms 69.912 ms 81.458 ms
mymachine-Mac:c name$ site.com (160.158.244.92): 56 data bytes
I may well be wrong, but personally I smell a rat. Your times aren't justified by your setup; I believe that your requests ought to run much faster.
If at all possible, generate a short query using curl and intercept it with tcpdump on both the client and the server.
It could be a bandwidth/concurrency problem on the hosting. Check out its diagnostic panel, or try estimating the traffic.
You can try and save a response query into a static file, then requesting that file (taking care as not to trigger the local browser cache...), to see whether the problem might be in processing the data (either server or client side).
Does this slowness affect every request, or only the autocomplete ones? If the latter, and no matter what nginx says, it might be some inefficiency/delay in recovering or formatting the autocompletion data for output.
Also, you can try and serve a static response bypassing nginx altogether, in case this is an issue with nginx (and for that matter: have you checked out nginx' error log?).
One approach I didn't see you mention is to use SSL sessions: you can add the following into your nginx conf to make sure that an SSL handshake (very expensive process) does not happen with every connection request:
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
See "HTTPS server optimizations" here:
http://nginx.org/en/docs/http/configuring_https_servers.html
I would recommend using New Relic if you aren't already. It is possible that the server-side code you have could be the issue. If you think that might be the issue, there are quite a few free code profiling tools.
You may want to consider an option to preload autocomplete options in the background while the page is rendered and then save a trie or whatever structure you use on the client in the local storage. When the user starts typing in the autocomplete field you would not need to send any requests to the server but instead query local storage.
Web SQL Database and IndexedDB introduce databases to the clientside.
Instead of the common pattern of posting data to the server via
XMLHttpRequest or form submission, you can leverage these clientside
databases. Decreasing HTTP requests is a primary target of all
performance engineers, so using these as a datastore can save many
trips via XHR or form posts back to the server. localStorage and
sessionStorage could be used in some cases, like capturing form
submission progress, and have seen to be noticeably faster than the
client-side database APIs.
For example, if you have a data grid component or an inbox with
hundreds of messages, storing the data locally in a database will save
you HTTP roundtrips when the user wishes to search, filter, or sort. A
list of friends or a text input autocomplete could be filtered on each
keystroke, making for a much more responsive user experience.
http://www.html5rocks.com/en/tutorials/speed/quick/#toc-databases

FastRWeb performance on Ubuntu with built-in web server

I have installed FastRWeb 1.1-0 on an installation of R 2.15.2 (Trick or Treat) running on an Ubuntu 10.04 box. I hope to use the resulting system to run a web service.
I've configured the system by setting http.port to 8181 in rserve.conf and unsetting the socket destination. I've assigned .http.request to FastRWeb::.http.request. I exchange JSON blobs between the client and the server using HTTP POST (the second blob can exceed 150KB in size, and will not fit in an HTTP GET query string.)
Everything works end to end -- I have a little client-side R script which generates JSON RPC calls across the channel. I see the run function invoked, and see it returned.
I've run into a significant performance problem, however: the return path takes in excess of 12 seconds from the time run() returns (including the call to done()) and the time that the R client gets the return value. RCurl doesn't seem to be the culprit; it appears that something is taking twelve seconds to do a return.
Does anybody have any suggestions of where to look? I can easily shift over to using Apache 2.0 and CGI, but, honestly, I'd rather keep everything R centric.
Answering my own question.
I wrapped .http.request with an Rprof()/Rprof(NULL) pair and looked at the time spent in each routine. It turns out that the system spends ~11 seconds inside URLDecode in the standard implementation of .run. This looks like a scaling problem in URLDecode in the core.

Error "Connection reset" in JMeter (SOAP XML web-service)

I have the next test plan in JMeter:
on the screenshot you can see the settings for the 1st ThreadGroup, wich has 50% of common amout of request in test plan (in each Thread Group are 10 different subrequests placed).
So, +1 request per second is added in average using these settings.
Then I ran this test and saw this picture (Error % column):
I save errors in file and all these errors have the same text:
<sample t="30129" lt="0" ts="1356710138314" s="false" lb="WebService(SOAP) Request 1" rc="000" rm="**Connection reset**" tn="jp#gc - Stepping Thread Group1 3-247" dt="text" by="0"/>
Server's cpu screenshot:
and for database:
After the errors have appeared my comp started work slowly and slowly (although the errors stopped to appear further)...
And in the same time the server's cpu progressively dropped to 0.
Could you tell me, please,
What is the reason of this error?
Have I reached the server timeout? (Because Max is more than 30s in the table).
UPD. I have rerun test with next settings: 1000 users per 02:46:40 (+1 Thread Group per 10 second and 10 requests inside each new Thread in the Loop).
I.e. I have reduced the time of test and total Thread Groups by 2 times, but save intensivity of Thead's adding.
The results are the same (including cpu usage on the server).
I've received the error «Connection reset» after 990 thread started. There are screenshots:
Any idea?
First, WebService(SOAP) Request is not the best way to test Webservices in JMeter, it will be deprecated in upcoming 2.9 version.
HTTP Sampler is the one to choose as it performs much better.
Second, Connection Reset means your server has cut connection. It could be coming from the CPU which seems high but it's not sure.
If what you call "my comp" is the computer hosting JMeter started working slowly then your JMeter instance is overwhelmed by the number of threads (2003 or more?) you've configured. It can come from a lot of factors, read this:
http://www.dzone.com/links/see_how_to_make_jmeter_run_thousands_of_threads_w.html

Webserver Location - How important is it for SEO?

I am based in the UK and have two webservers, one German based (1&1) and the other is UK based (Easyspace).
I recently signed up to the UK easyspace server because it was about the same price I paid for my 1&1 server but also I wanted to see if my sites hosted on a UK server gave better results in terms of UK based traffic.
Its seems my traffic is roughly the same for both servers... however 1&1 server performance and customer service is much better than Easyspace so I was thinking about cancelling it and getting another 1&1 server.
I understand about latency issues where USA/Asia would be much slower for UK traffic but I am just wondering what your thoughts are traffic, SEO etc and if you think I should stick with a UK server or if it doesn't matter?
Looking forward to your replies.
I have never heard of common search engines ranking sites by their response time as it is highly variable due to the nature of the internet.
If a search engine would penalize you for the subnet you are on then you likely have bigger problems.
I get better results on google.com.au for my sites than on other flavours of google, even though the sites are not hosting in Australia. So I would suggest that the actual physical location of the servers won't matter so much and if you are wanting to be higher up on google.co.uk you might want a co.uk domain?
Google associates a region with your site mostly through its suffix (TLD/SLD, eg. .co.uk), but if you create a Google Webmaster Tools account you can tell it otherwise in the odd case it makes a mistake.
As far as the traffic is concerned the site will be loaded fast for UK visitors. I suggest using this server if most of your visitors are from UK. Server location does not have to do anything with SEO.
Stick with your UK server if you think its better.
My main concern is losing UK based customers if the server is located outside the UK but it appears from the comments that this is probably not the case.
However, my UK server is based in Scotland, my other server is based in Germany and is actually closer to London than Scotland?
Just to compare the speed between Scotland server and Germany server:
=== Germany Based ===
Pinging firststopdigital.com [87.106.101.189]:
Ping #1: Got reply from 87.106.101.189 in 126ms [TTL=46]
Ping #2: Got reply from 87.106.101.189 in 126ms [TTL=46]
Ping #3: Got reply from 87.106.101.189 in 126ms [TTL=46]
Ping #4: Got reply from 87.106.101.189 in 126ms [TTL=46]
Variation: 0.4ms (+/- 0%)
Shortest Time: 126ms
Average: 126ms
Longest Time: 126ms
=== UK Based ===
Pinging pb-net.co.uk [62.233.81.163]:
Ping #1: Got reply from 62.233.81.163 in 120ms [TTL=55]
Ping #2: Got reply from 62.233.81.163 in 119ms [TTL=55]
Ping #3: Got reply from 62.233.81.163 in 119ms [TTL=55]
Ping #4: Got reply from 62.233.81.163 in 119ms [TTL=55]
Variation: 0.3ms (+/- 0%)
Shortest Time: 119ms
Average: 119ms
Longest Time: 120ms
The difference is around 6ms which is not much at all.
Incidentally I just performed a ping to a USA based domain I own:
Pinging pbnetltd.com [74.86.61.36]:
Ping #1: Got reply from 74.86.61.36 in 6.4ms [TTL=121]
Ping #2: Got reply from 74.86.61.36 in 6.3ms [TTL=121]
Ping #3: Got reply from 74.86.61.36 in 6.2ms [TTL=121]
Ping #4: Got reply from 74.86.61.36 in 6.3ms [TTL=121]
Variation: 0.2ms (+/- 3%)
Shortest Time: 6.2ms
Average: 6.3ms
Longest Time: 6.4ms
The USA timings are much quicker considering the extra distance across the Atlantic to NY and back (9am UK time so USA are asleep - will try again tonight).

Resources