OpenFire, HTTP-BIND and performance - performance

I'm looking into getting an openfire server started and setting up a strophe.js client to connect to it. My concern is that using http-bind might be costly in terms of performance versus making a straight on XMPP connection.
Can anyone tell me whether my concern is relevant or not? And if so, to what extend?
The alternative would be to use a flash proxy for all communication with OpenFire.
Thank you

BOSH is more verbose than normal XMPP, especially when idle. An idle BOSH connection might be about 2 HTTP requests per minute, while a normal connection can sit idle for hours or even days without sending a single packet (in theory, in practice you'll have pings and keepalives to combat NATs and broken firewalls).
But, the only real way to know is to benchmark. Depending on your use case, and what your clients are (will be) doing, the difference might be negligible, or not.

Basics:
Socket - zero overhead.
HTTP - requests even on IDLE session.
I doubt that you will have 1M users at once, but if you are aiming for it, then conection-less protocol like http will be much better, as I'm not sure that any OS can support that kind of connected socket volume.
Also, you can tie your OpenFires together, form a farm, and you'll have nice scalability there.

we used Openfire and BOSH with about 400 concurrent users in the same MUC Channel.
What we noticed is that Openfire leaks memory. We had about 1.5-2 GB of memory used and got constant out of memory exceptions.
Also the BOSH Implementation of Openfire is pretty bad. We switched then to punjab which was better but couldn't solve the openfire issue.
We're now using ejabberd with their built-in http-bind implementation and it scales pretty well. Load on the server having the ejabberd running is nearly 0.
At the moment we face the problem that our 5 webservers which we use to handle the chat load are sometimes overloaded at about 200 connected Users.
I'm trying to use websockets now but it seems that it doesn't work yet.
Maybe redirecting the http-bind not via Apache rewrite rule but directly on a loadbalancer/proxy would solve the issue but I couldn't find a way on how to do this atm.
Hope this helps.

I ended up using node.js and http://code.google.com/p/node-xmpp-bosh as I faced some difficulties to connect directly to Openfire via BOSH.
I have a production site running with node.js configured to proxy all BOSH requests and it works like a charm (around 50 concurrent users). The only downside so far: in the Openfire admin console you will not see the actual IP address of the connected clients, only the local server address will show up as Openfire get's the connection from the node.js server.

Related

expensive aws load balancer, perhaps wrong setup

Some time ago, I needed HTTPS support for my express webserver. I found a tutorial that teached me a cool trick to achieve this. They basically explained me that an AWS load balancer can redirect HTTPS to HTTP.
So, I first created a load balancer.
And then redirected HTTPS to HTTP. The traditional HTTP, I just redirected 80 to 80. And I have a websocket (socket io) thing going on port 1337 (which I plan to change to port 1338 in the near future).
Just for clarity. I didn't really need a load balancer, since I actually only have 1 AWS instance. But using this setup, I did not have to go through the trouble of messing around with HTTPS certificate files, neither did I have to upgrade my webserver. It saved me a lot of trouble at first.
Then this morning, I received the bill, and discovered that this load balancing trick has a price tag of roughly 22usd/mo. (an expensive port forwarding trick)
I probably have to get rid of this load balancer. But I am wondering, perhaps I did something wrong in the configuration.
It's strange that charges are so high for a web app that is still in development. So, I am wondering if perhaps there is something wrong with my setup. And that leads me to the following question.
I noticed that I am actually using an old ELB setup: "Classic load balancer". And it actually states that this setup does not support websockets, which is a bit strange.
My web app hosts some static webpages (angular), but once it is downloaded, all traffic uses socket.io websockets. Even though the AWS documentation says that websockets are not supported, it seems to work fine. Unless ...
Now, socket io is a pretty smart thing. When it can't use modern websockets (e.g. because the webbrowser does not support it), it falls back to a kind of HTTP polling. I guess that means that from a load-balancer point of view, it creates 100s of visits per minute. And right now, I am wondering if that has an influence on the charges.
My really long question comes down to a simple one. Do you think upgrading my load balancer would decrease the number of counted "loadbalancer hours" ?
EDIT
Here are some ELB metrics. They are too complicated for me to draw conclusions. But perhaps some of you experts can. :)

how to terminate inactive websocket connection in passenger

Past few days we are struggling with inactive websocket connections. The problem may lay on network level. I would like to ask if there is any switch/configuration option to set timeout for websocket connection for Pushion Passenger in standalone mode.
You should probably solve this at the application level, because solving it in other layers will be more ugly (less knowledge about websocket).
With Passenger Standalone you could try to set max_requests. This should cause application processes to be restarted semi-regularly, and when shutting down a process Passenger should normally abort websocket connections.
If you want more control over the restart period you could also use for example a cron job that executes rolling restarts every so often, which shouldn't be noticeable to users either.
Websockets in Ruby and Passenger (and maybe node.js as well) aren't "native" to the server. Instead, your application "hijacks" the socket and controls all the details (timeouts, parsing etc').
This means that a solution must be implemented in the application layer (or whatever framework you're using), since Passenger doesn't keep any information about the socket any more.
I know this isn't the answer you wanted, but it's the fact of the matter.
Some approaches use native websockets where the server controls websocket connections (timeouts, parsing etc', i.e. the Ruby MRI iodine server), but mostly websockets are "hijacked" from the server and the application takes full control and ownership of the connections.

High Performance Options for Remote services access

I have a service, foo, running on machine A. I need to access that service from machine B. One way is to launch a web server on A and do it via HTTP; code running under web server on A accesses foo and returns the results. Another is to write socket server on A; socket server access service foo and returns the result.
HTTP connection initiation and handshake is expensive; sockets can be written, but I want to avoid that. What other options are available for high performance remote calls?
HTTP is just the protocol over the socket. If you are using TCP/IP networks, you are going to be using a socket. The HTTP connection initiation and handshake are not the expensive bits, it's TCP initiation that's really the expensive bit.
If you use HTTP 1.1, you can use persistent connections (Keep-Alive), which drastically reduces this cost, closer to that of keeping a persistent socket open.
It all depends on whether you want/need the higher-level protocol. Using HTTP means you will be able to do things like consume this service from a lot more clients, while writing much less documentation (if you write your own protocol, you will have to document it). HTTP servers also supports things like authentication, cookies, logging, out of the box. If you don't need these sorts of capabilities, then HTTP might be a waste. But I've seen few projects that don't need at least some of these things.
Adding to the answer of #Rob, as the question is not precisely pointing to an application or performance boundaries, it would be good to look into the options available in a broader context, which is Inter process communication.
The wikipedia page cleanly lists down the options available and would be a good place to start with.
What technology are you going to use? Let me answer for Java world.
If your request rate is below 100/sec, you should not care about optimizations and use most versatile solution - HTTP.
Well-written asynchronous server like Netty-HTTP can easily handle 1000 requests per second on medium-level hardware.
If you need more or have constrained resources, you can go to binary format. Most popular one out there is Google Protobuf(multilanguage) + Netty (Java).
What you should know about HTTP performance:
Http can use Keep-Alive which removes reconnection cost for every request
Http adds traffic overhead for every request and response - around 50-100 bytes.
Http client and server consumes additional CPU for parsing HTTP headers - that is noticeable after abovementioned 100 req/sec
Be careful when selecting technology. Even in 21 century it is hard to find well-written HTTP server and client.

Simulating latency when developing on a local webserver

The Performance Golden Rule from Yahoo's performance best practices is:
80-90% of the end-user response time
is spent downloading all the
components in the page: images,
stylesheets, scripts, Flash, etc.
This means that when I'm developing on my local webserver it's hard to get an accurate idea of what the end user will experience.
How can I simulate latency so that I can understand how my application will perform when I've deployed it on the web?
I develop primarily on Windows, but I would be interested in solutions for other platforms as well.
A laser modem pointed at the mirrors on the moon should give latency that's out of this world.
Fiddler2 can do this very easily. Plus, it does so much more that is useful when doing development.
YSlow might help you out. YSlow analyzes web pages based on Yahoo!'s rules.
Firefox Throttle. This can throttle speed (Windows only).
These are plugins for Firefox.
You can just set up a proxy outside that will tunnel traffic from your web server to it and then back to local browser. It would be quite realistic (of course it depends where you put the proxy).
Otherwise you can find many ways to implement it in software..
Run the web server on a nearby Linux box and configure NetEm to add latency to packets leaving the appropriate interface.
If your web server cannot run under Linux, configure the Linux box as a router between your test client machine and your web server, then use NetEm anyway
While there are many ways to simulate latency, including some very good hardware solutions, one of the easiest for me is to run a TCP proxy in a remote location. The proxy listens and then directs the traffic back to my final destination. On a remote server, I run a unix program called balance. I then point this back to my local server.
If you need to simulate for a just a single server request, a simple way is to simply make the server sleep() for a second before returning.

FTP over Satellite/High Latency connections

I use FTP on a daily basis to work on multiple websites, but when I try to work from home, my darned satellite internet has a latency of about 1000ms. (Its craptastic service, I know, but there are no alternatives where I live.) Thus, I was wondering if there is a way that I can connect to my web server and transfer files that can accomodate this latency.
FTP "works", but it communicates very very slowly, and its a nightmare with multiple files. It takes the connection about 10-15 seconds to start the transfer, and another 5 seconds after the transfer is done. The transfer itself goes very fast as expected, but the handshake process does not, as the server/client seem to need to do a lot of communication to negotiate the transfer. Worse, it seems to need to do this handshake thing for every individual file, which certainly doesn't help.
Is there any way I can modify my FTP to make it work better over a high latency connection? If not, are there any other protocols or transfer services I might be able to use that could handle such an issue? Its the main fault I find with my ISP, and there's not a lot I've been able to find that I can do about it...
Thanks
Sounds like a good case for using UDP rather than TCP-based protocols - e.g. uftp
A quote from the linked site: "especially useful for data distribution over a satellite link (with two way communication), where the inherent delay makes any TCP based communication terribly inefficient".
A few options:
Sneaker-net. Use a USB key.
SCP. I'm almost positive it'll only authenticate/handshake once.
Tunnelling over SSH. The poor man's VPN. You'll be able to tunnel FTP or anything you like over the SSH connection. It'll be as fast as you're going to get and is very secure to boot.

Resources