This question has been asked previously but not recently and not with a clear answer.
Using Socket.io, is there a maximum number of concurrent connections that one can maintain before you need to add another server?
Does anyone know of any active production environments that are using websockets (particularly socket.io) on a massive scale? I'd really like to know what sort of setup is best for maximum connections?
Because Websockets are built on top of TCP, my understanding is that unless ports are shared between connections you are going to be bound by the 64K port limit. But I've also seen reports of 512K connections using Gretty. So I don't know.
This article may help you along the way: http://drewww.github.io/socket.io-benchmarking/
I wondered the same question, so I ended up writing a small test (using XHR-polling) to see when the connections started to fail (or fall behind). I found (in my case) that the sockets started acting up at around 1400-1800 concurrent connections.
This is a short gist I made, similar to the test I used: https://gist.github.com/jmyrland/5535279
I tried to use socket.io on AWS, I can at most keep around 600 connections stable.
And I found out it is because socket.io used long polling first and upgraded to websocket later.
after I set the config to use websocket only, I can keep around 9000 connections.
Set this config at client side:
const socket = require('socket.io-client')
const conn = socket(host, { upgrade: false, transports: ['websocket'] })
GO THROUGH THE COMMENT OF THIS ANSWER BEFORE PROCEEDING FURTHER
The question ask about socket.io sockets, the answer is for native
sockets. These changes are dangerous as they apply to everything on
the system, not just socket.io sockets. Besides, today networks is
never the bottleneck for socket.io. Do not make these changes to your
system without understanding the implications first.
For +300k concurrent connection:
Set these variables in /etc/sysctl.conf:
fs.file-max = 10000000
fs.nr_open = 10000000
Also, change these variables in /etc/security/limits.conf:
* soft nofile 10000000
* hard nofile 10000000
root soft nofile 10000000
root hard nofile 10000000
And finally, increase TCP buffers in /etc/sysctl.conf, too:
net.ipv4.tcp_mem = 786432 1697152 1945728
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216
for more information please refer to this.
This guy appears to have succeeded in having over 1 million concurrent connections on a single Node.js server.
http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/
It's not clear to me exactly how many ports he was using though.
After making configurations, you can check by writing this command on terminal
sysctl -a | grep file
I would like to provide yet another answer in 2023.
We only use websocket in socket.io-client. We have done 2 type of performance tests,
my test team uses JMeter to test up to 5000 concurrent connections. Due to the nature of our product, 5000 connections is enough for us, so we didn't go higher.
I use https://a.testable.io/ to do another performance test. The reason I uses testable (this is NOT a sales pitch for them lol) I can choose ws clients from different locations, e.g. I chose 3 different locations from NA and one location from Asia. I believe this would be more closer to real life scenario than I just run a test script from my local machine(which I do have too). Doing this kind of test caused money, to quote from their technical support words after I did my test, "I see you ran a 20,000 user test successfully today too, great! Less than $20 for a test of this size is by far the best pricing out there :)."
BTW, you can also refer to https://ably.com/topic/scaling-socketio, which the latest published article about socket.io performance I can find.
So in summary I would argue that if you only use websocket, 5000 to 10,000 concurrent connection should not be to hard to achieve.
Related
Currently I have a REP/REQ model up and running in my code.
However, I do not need for either to send replies. So replies are just wasting time.. I don't know if that matters in the real world or not.
Basically it looks like this.
Client PCs - Connect - REQ
these guys all connect to the Server and update the Server with Info they have on a regular basis. They don't care if the Server didn't receive a particular message, nor do they need any info back from the Server.
there are many of these guys but not excessive.. Let's say between 10 and 100.. all hitting the same server.. well probably not, probably it will be in groups.. a group of them hit one server, another group another.. clients would send messages several times a second. But not much more than several. I have not really done any timing, I don't know how really to time on my computer at less than 1-2 ms resolution so I really don't know what to expect or what is feasible in terms of performance and how many REQ clients can be served by 1 server REP.
Server PC - Bind - REP
this guy sits there running in a loop on his own separate thread waiting for REQs to come in. He sends replies to the REQs because he has to, not because he really wants to or needs to.
Alternate Models
from some googling it seems that PUSH PULL was recommended if you just want to sent messages and don't care about replies.
However, I couldn't figure out how to fit that into my architecture because the binds and connects seem to be reversed from what I need to have.. I would like my Bind to be on the Server because the Client "Connect" guys are not always available to be reached..
Solutions
1) good alternate model
A good alternate model that works and is relatively simple would be great. I'm not sure there really is one but apart from REP/REQ and PUB/SUB I don't really know too much about other models.
2) I'm worrying about nothing?
if message replies to REQ by REP are always going to be really fast and the reception of those replies by by REQ from REP also are really fast, then I guess I'm worrying about nothing. That would be good to know, so feel free to let me know if this is the case.
The Connection question
I don't really understand what connecting sockets does.
On a client REQ should I make a connect at the start of each loop before sending that one single message? Or should I connect before the loop to my socket that I also created before the loop?
I also don't understand what this means in terms of reliability or if I have to make special checks about connected status and reconnect, or if that is done automatically.
To sum up
I have a "global" context.. created at the start, disposed of at the end
This daddy context has 1 or 2 sockets (connected to the same address, including port) - I'm still debugging this dual socket on the same address thing so I'm not sure if that is ok or it just doesn't work that way - clarification would be nice
These context(s) are lazy initialized and outside the loop scope, so we are not recreating sockets on a regular basis
connect calls for the sockets occur currently outside of the loop scope, but I'm not sure if it is not better to have them inside the loop scope.
I think I'm getting mixed up here.. I think the dual sockets are on my PUB/SUB model .. 1 PUB with 2 SUB sockets on each client, but anyhow please let me know if that would be a problem as well.
If you do not need Request-Reply, do not use it.
Request-Reply is generally slow because you need a round trip to the server for every message. This means you get twice the network latency, which is the time a network package needs to travel over the network. That does not matter if network traffic is low but will become a bottleneck when the traffic is high, for example multiple messages per second.
As you already mentioned Push-Pull is a valid alternative for one-way traffic. With Push-Pull you create a Pull socket on the server and bind it to an endpoint (this is similar to the Reply socket). You create a Push socket on the clients and connect it to the server endpoint (this is similar to the Request socket).
If you send multiple messages from the client to the same server, you should connect only once. Setting up a network connection is a costly operation because it requires multiple network round trips, at least for TCP.
The Mongo C Sharp Driver (at least the 1.9.2 version) has a setting for MaxConnectionLifeTime. From looking at the code, it looks like connections are removed from the pool when their age exceeds that lifetime. The default is set to 30 minutes.
Why?
Do connections somehow degrade in performance the more times they are used?
We have received anecdotal reports that in some scenarios connections die after a certain amount of time. This is presumably because some firewall/router along the way is periodically dropping connections that have reached a certain age.
By having the driver periodically close connections and open new ones we can avoid being affected by this.
Most users are not affected by this and could use any value they want for this setting.
I have a a Nagios configuration which is performing a number of tests on a few hundred nodes; one of these is a variant of check_http. It's not configured to --enable-embedded-perl (ePN) but we'll be changing that soon. Even with ePN enabled I'm concerned about the model where each execution of this Perl HTTP+SSL check will be handling only a single target.
I'd like to write a simple select() (or poll() / epoll()) driven daemon which creates connections to multiple targets concurrently, reads the results and spits out results in a form that's useable to Nagios as if it were results from a passive check.
Is there a guide to how one could accomplish this? What's the interface or API for providing batched check updates to Nagios?
One hack I'm considering would be to have my daemon update a Redis store (with a key for each target, and a short expiration time) and replace check_http with a very small, lightweight GET of the local Redis instance on the key (the GET would either get the actual results for Nagios or a "(nil)" response which will be treated as if the HTTP connection had timed out.
However, I'm also a bit skeptical of my idea since I'd think someone has already something like this by now.
(BTW: I'm ready to be convinced to switch to something like Icinga or Zabbix or Zenoss or OpenNMS ... pretty much anything that will scale better).
As to whether or not to let Nagios handle the scheduling and checks, I'll leave that to you as it varies depending on your version of Nagios (newer versions can run these checks concurrently), and why you want a separate daemon for it. egarding versioning of Nagios, version 3 IIRC uses concurrent checks, and scales thusly to larger node counts than you report.
However, I can answer the Redis route concept as I've done it with Postfix queue stats and TTFB tracking for web sites.
Setting up the check using Python with the curl and multiprocessing modules is fairly straightforward as is dumping it into Redis. An expiration of I'd say no more than the interval would be a solid idea to keep the DB from growing. I'd recommend tis value be no more (or possibly just less than) the check interval to avoid grabbing stale check results. If the currently running check hasn't completed and the Redis-to-Nagios check runs, pulling in the previous check, you can miss failed checks.
For the Redis-To-Nagios check a simple redis-cli+bash scripting or Python check to pull the data for a given host, returning OK or otherwise depending on your data is fairly simple and would run quickly enough.
I'd recommend running the Redis instance on the Nagios check server to ensure minimum latency and avoid a network issue causing false alerts on your checks. I would also recommend a Nagios check on your Redis instance and the checking daemon. Make the check_http replacement check dependent on the Redis and http_check daemons running. THus you have a dependency chain as follows:
Redis -> http_checkd -> http_check_replacement
This will prevent false alerts on http_check_replacement by identifying the problem. For example, if your redis_checkd dies you get alerted to that, not 200+ "failed http_check_replacement" ones.
Also, since your data in Redis is by definition transient, I would disable the disk persistence. No need to write to disk when the data is constantly rotating.
On a side note, I would recommend, if using libcurl, you pull statistics from libcurl about how long it takes to get the connection open and how long the server to to respond (Time To First Byte - TTFB) and take advantage of Nagios's ability to store check statistics. You may well reach a time when having that data is really handy for troubleshooting and performance analysis.
I have a CLI Tool I've written in C which does this and uploads it into a local Redis instance. It is fast - barely more than the time to get the URL. I'm expecting it be open sourced this week, I can add Nagios style output to it fairly easily. In fact, I think I'll do that in the next week or two.
What's the best way to ping a list of 20 websites every 5 minutes (for example) in order to know if the site responds with HTTP 202 or not?
The no brainer idea is to save the 20 URLS in a database and just run the database and ping each one. However, what happen when one doesn't answers? What happens to the ones after that?
Also, is there better but no-brainer solution for this? I'm afraid the list can grow to 20000 websites and then there's not enough time to ping them all in the 5 minutes I need to be pinging.
Basically, I'm describing how PingDom, UptimeRobot, and the likes work.
I'm building this system using node.js and Ruby on Rails.
I'm also inclined to use MongoDB to save the history of all the pings and monitoring results.
Suggestions?
Thanks a bunch!
Github
I really like node.js and I would like to tackle this problem and hopefully soon share some code on github to achieve this. Keep in mind that I only have a veryy basic setup right now hosted at https://github.com/alfredwesterveld/freakinping
What's the best way to ping a list of
20 websites every 5 minutes (for
example) in order to know if the site
responds with HTTP 202 or not?
PING(ICMP)
First I would like to know if you want to really do a ping(ICMP) or if you just want to know if the website returns with code 200(OK) and measure the time it takes. I believe from the context that you don't really want to do a ping, but just an http request and measure the time. I ask this because(I believe) pinging from node.js/ruby/python can't be done from normal user because we need raw sockets(root user) to do the pinging(ICMP) from programming language. I for example found this ping script in python(I also believe I saw a simple ruby script somewhere although I am not a really big ruby programmer) but requires root access. I don't believe there is even yet a ping module out there for node.js.
Message Queue
Also, is there better but no-brainer
solution for this? I'm afraid the list
can grow to 20000 websites and then
there's not enough time to ping them
all in the 5 minutes I need to be
pinging.
Basically, I'm describing how PingDom,
UptimeRobot, and the likes work.
What you need to achieve this kind of scale is to use a message queue like for example redis, beanstalkd or gearmand. At the scale of PingDom one worker process is not going to cut it, but in your case it(I assume) one worker will do. I think(assume) redis will be the fastest message queue because of the C(node.js) extension but then again I should benchmark it against beanstalkd, which is another popular message queue(but does not yet have a C extension).
I'm afraid the list can grow to 20000
websites
If you get at that scale you might have to have host multiple boxes(a lot of worker threads/processes) to handle the load but you aren't at that scale yet and node.js is insane fast. It might even be able to handle that load with even one single box, although I don't know for sure(you need to do/run some benchmarks).
Datastore/Redis
I think this could be achieved pretty easily in node.js(I really like node.js). The way I would do this is use redis as my datastore because it is INSANE FAST!
PING: 20000 ops 46189.38 ops/sec 1/4/1.082
SET: 20000 ops 41237.11 ops/sec 0/6/1.210
GET: 20000 ops 39682.54 ops/sec 1/7/1.257
INCR: 20000 ops 40080.16 ops/sec 0/8/1.242
LPUSH: 20000 ops 41152.26 ops/sec 0/3/1.212
LRANGE (10 elements): 20000 ops 36563.07 ops/sec 1/8/1.363
LRANGE (100 elements): 20000 ops 21834.06 ops/sec 0/9/2.287
using node_redis(with hredis(node.js) c library). I would Add the URLs to redis using sadd.
Run tasks every 5 minutes
This could be achieved without barely any effort. I would use the setInterval(callback, delay, [arg], [...]) to repeatedly test response time of servers. Get all URLs on callback from redis using smembers. I would put all the URLs(messages) on the message queue using rpush.
Checking Response (Time)
However, what happen when one doesn't
answers? What happens to the ones
after that?
I might not completely understand this sentence but here it goes. If one fails it just fails. You could try to check response(time) again in 5 seconds or something to see if it is online. A precise algorithm for this should be devised. The ones after that should not have anything to do with previous URLs unless the are to the same server. Also something you clearly think about I guess because then you should not ping all those URLs to the same server at the same time but queue them up or something.
Processing URL
From the worker process(for now just one would be suffice) fetch message(URL) from redis using brpop command. check response time for URL(message) and fetch next URL(message) from the list. I would probably do a couple of request simultaneous to speed up the process.
There is no "basic way", since you must handle a lot of use cases:
http redirects,
https pages,
request timeouts,
the cpu load of the server you use for pinging,
the type of report you need (availability? Uptime? Responsiveness? Downtime?)
how to aggregate qos measurements by time
lifetime of the data you collect (pinging dozens of targets every five minutes quickly produces a lot of data)
realtime alerts
etc.
Pingdom and the like are not "basic" tools, and if you want something similar you may want to pay for it or rely on an existing open-source alternative. I know it for sure because I built a remote monitoring application myself. It's called Uptime, it's written in Node.js and MongoDB, and it's hosted on GitHub (https://github.com/fzaninotto/uptime). It took several weeks of hard work to develop it, so believe me: it is NOT a no-brainer.
use monitoring tools like zabbix, nagios, blah blah which can metric various parameters of your servers in mass numbers.
if u would like to implement it in js, u can do a time interval-ed http request, then to determine http return status code, and use xpath or regex to validate certain element is correct
for ruby, a daemon process and use a thread pool (multithreading idea) and URI open to view the http code and the content, use xpath to validate if the content is behave correctly.
If you're curious, I've created an app called Pinger that does this. It's built on Ruby on Rails and Resque:
https://github.com/austinthecoder/pinger
There are some free quality services what provide us a very stable website up time check and notification. You can check this instruction and review http://fastjoomlahost.com/how-to-monitor-website-up-time
You can also do this in Node.js using the node-ping-monitor package.
I'm re-building an IM gateway and hope to take advantage of the new performance features in AsyncSockets for .net35.
My existing implementation simply creates packets and forwards IM requests from users to the various IM networks as required, handling request/ response streams for each connected users session(socket).
i presently have to coupe with IasyncResult and as you know it's not very pretty or scalable.
My confusion is this basically:
1) in using the new Begin/End and SocketAsyncEventArgs in 3.5 do we still need to create one SocketAsyncEventArgs per socket?
2) do we gain anything by pre-initializing say, 20000 client connections since we know the expected max_connections per server is 20000
3) do we still need to use a LOH (large object heap) allocated byte[] to handle receive data as shown in SocketServers example on MSDN, we are not building a server per say, but are still handling a lot of independent receives for each connected socket.
4) maybe there is a better pattern altogether for what i'm trying to acheive?
Thanks in advance.
Charles.
1) IAsyncResult/Begin/End is a completely different system from The "xAsync" methods that use SocketAsyncEventArgs. You're better off using SocketAsyncEventArgs and dropping Begin/End entirely.
2) Not really. Initialize a smaller number (50? 100?) and use an intermediate class (ie/ a "resource pool") to manage them. As more requests come in, grow the pool by another 50 or 100 for example. The tough part is efficiently "scaling down" the number of pooled items as resource requirements drop. A large # of sockets/buffers/etc will consume a large amount of memory, so it's better to only allocate it in batches as the server requires it.
3) Don't need to use it, but it's still a good idea. The buffer will still be "pinned" during each call.