I am currently looking into akka for a project of mine. I want to build a webserver that mainly communicates on websockets (its for collecting multiple (say thousands) streams of data of about 1-5gb an hour).
This is going to be a project that needs horizontal scaling and what I want to do is basically accept a http request but start a remote actor (probably on a different machine) to upgrade the connection to a websocket connection. Is this possible with Akka(-http)? My current knowledge says it shouldnt be possible but I am not certain.
Also... when I write my actor system in scala, can I start actors written with akka in C#.net and the other way around?
Related
I am a backend developer and I would like to know what are the common technologies for building real-time servers. I know I could use a service like Firebase, but I really want to create it. I have some experience using Websockets on Java, but I would like to know more ways to achieve a real-time server. When I say real-time, I mean something like Facebook. I also would like to know how to scale real-time servers.
Thank you all!
I've asked the same in multiple forums. Common answer to this question is strangely enough still:
WebSocket
Socket.io
Server-Sent Events (SSE)
But those are mainly ways of transporting or streaming events to the clients. Something needs to be built on top of it. And there are multiple other things to consider, such as:
Considerations for real-time API's
What events to send to the client
How to send each client only the events they need
How to handle authorization for events
Where to keep state on the event subscriptions (for stateless services)
How to recover from missed events due to lost connections and service crashes
Producing events for search-, or pagination queries
How to scale
Publish/Subscribe solutions
There are multiple pub/sub solutions out there, such as:
Pusher
PubNub
SocketCluster
etc.
But because of the limitation of a topic based pub/sub architecture, some of the above questions are still left unanswered and has to be dealt with by yourself. Examples are lost connections, where Pusher has no fallback, neither does SocketCluster, and PubNub has a limited queue.
Resgate - Realtime API Gateway
An alternative to the traditional topic based pub/sub pattern is using a resource-aware realtime API Gateway, such as Resgate.
Instead of the client subscribing to topics, the gateway keeps track on which resources (objects or arrays) that the client has fetched, keeping the client data up to date until it unsubscribes.
As a developer of Resgate, I can really recommend checking it out as it solves all above question, is language agnostic, simple and light-weight, and blazingly fast.
Read more at NATS blog.
Scaling
Let's say you want to scale both in the number of concurrent clients and the number of events that is produced. You will eventually need to ensure each client only gets the data they are interested in through either traditional topic based publish/subscribe, or through resource subscriptions. All above solutions handles that.
I also assume all the above mentioned solutions scales concurrent clients by allowing you to add more nodes/servers that handles the persistent WebSocket connections.
With Resgate, first level of scaling is done by simply running multiple instances (it is a simple executable), and adding a load balancer that distributes the connection evenly between them:
Handling 100M concurrent clients
Let's say a single Resgate instance handles 10000 persistent WebSocket connections, and you can add 10000 Resgates (distributed to multiple data centers) to a single NATS Server. This would allow a total of 100M connections. Of course, depending on your data, you might have other scaling issues as well, such as network traffic ;) .
A second layer of scaling (and adding redundancy) would be to replicate the whole setup to different data centers, and have the services synchronize their data between the data centers using other tools like Kafka, CockroachDB, etc.
Scaling data retrieval
With the traditional publish/subscribe solution that only deals with events, you will also have to handle scaling for the HTTP (REST) requests.
With Resgate, this is not required, as resource data is also fetched over the WebSocket connection. This allows Resgate not only to ensure that resource data and events are synchronized (another issue with separate pub/sub solutions), but also that the data can be cached. If multiple clients requests the same data, Resgate will only need to fetch it from the service once, effectively improving scalability.
Butterfly Server .NET is a real-time server written in C# allowing you to create real-time apps. You can see the source at https://github.com/firesharkstudios/butterfly-server-dotnet.
In the web2py examples there is a websocket example which uses tornado here:
gluon/contrib/websocket_messaging.py and this requires another server to be started namely tornado. My questions is, do I need another server? Should I only have one server to handle both the websocket stuff and the normal http requests?
Also, it seems tornado is the server of choice for the 2nd server, could that be something different?
I'm a bit of a newbie to websockets (and webapp development) so any comments/links that would help me better understand this would be appreciated.
Python WSGI based frameworks such as web2py are typically served via threaded web servers. A typical HTTP request occupies one of the server threads only very briefly in order to receive the incoming request and deliver the response, then freeing the thread to serve another incoming request.
Websockets (and long polling), on the other hand, require a long-lived connection between the client (i.e., browser) and the web server. A websocket connection will therefore occupy a thread indefinitely, so you can only have as many connections as you have threads, thus limiting the application to a relatively small number of concurrent users.
In order to enable many simultaneous websocket connections, it is therefore best to serve websockets via a server that features non-blocking network I/O, such as Tornado. For more details, see http://www.tornadoweb.org/en/stable/guide/async.html.
Another option is to use Gevent with monkey patching, which can be used in the context of a WSGI application as described here. Keep in mind, though, that any libraries you use that involve network I/O (such as database drivers) must be compatible with this approach (either via monkey patching or code explicitly designed for coroutines).
If realtime/server-push functionality is a major aspect of your application, and especially if you are new to web development, you might instead consider a framework built for this specific use case, such as Meteor.
I'm evaluating the substitution of some http pooling features of my production application with the new JEE7 supported Websocket feature. I'm planning to use Wildfly 8 as my next production server environment and I've migrated some of my websockets compatible modules with good results on development time; but I have the doubt about how it will work on production and what performance will have the websockets implementation on a a high load enviroment.
I´ve been searching documentation about the most used JEE servers but the main manufacturers haven´t yet a production JEE7 enviroment and when they have a JEE7 version, they haven´t enought documentation about how the implementation works or some values of maximum concurrency users. In addition, some not official comments says websocket connections are associated "with a server socket" but this seems to be not very efficient.
We might assume the Websocket is used only for receive data from the client point of view and we assume each user will receive, for example, an average of 10 messages per minute with a little json serialized object (a typical model data class). My requirement is more like a stock market than a chat application, for example.
What´s the real performance can I expect for a Websockets use on production enviroment in Wildfly 8 Server, for example? Also I´m interested about some comparision with another JEE7 implementations you are acquainted with.
Web sockets are TCP/IP based, no matter the implementation they will always use a socket (and hence an open file).
Performance wise it really depends on what you are doing, basically how many clients, how many requests/sec per client, and obviously on how big your hardware is. Undertow is based on non-blocking IO and is generally pretty fast, so it should be enough for what you need.
If you are testing lots of clients just be aware that you will hit OS based limits (definitely open files, and maybe available ports depending on your test).
I'm looking to prototype a web app that will use sockets to push a gentle stream of messages to mobile web app clients. I want to pick an architecture that will work for a large number of clients if/when it moves to production (so i dont have to change later)
I'd like to start with rails because its familiar and has a strong structure from the go meaning easier to prototype. I think Faye will provide what i need in terms of a pub-sub layer but am I going to create a bottleneck by using ruby and the high number of socket connections, or will Faye isolate/protect Ruby server from that load, if you follow?
At the outset the load will not be significant so it won't matter, i just don't want to be hobbled later on when there are a lot of socket connections and i wish i used node.js ! Server side JS would be fairly new to me but I guess there are benefits in that the JS app can include the client side also
Advice appreciated.
You can take a look at https://github.com/faye/faye-redis-node.
This plugin provides a Redis-based backend for the Faye messaging server. It allows a single Faye service to be distributed across many front-end web servers by storing state and routing messages through a Redis database server
I recently created a turn-based game server that can accept 10s of thousands of simultaneous client connections (long story short - epoll on Linux). Communication is based on a simple, custom, line-based protocol. This server allows clients to connect, seek for other players in game matches, play said games (send moves, chat messages, etc.), and be notified when the game has ended.
What I'm looking to do now is test the server by simulating client connections. I'm hoping to support 10s of thousands of simultaneous connections, so this testing is very important to me. What do you guys use for your own testing?
Some things I'm researching now are: pexpect (python expect lib for the functional testing) and tsung for load testing.
I'd like to be able to just test from my laptop since I do not have a cluster of client machines to connect from. Perhaps I'd need to use ip aliasing or some-such in order to generate 100s of thousands of outbound sockets (limit is 65K per interface AFAIK).
Anyway, it seems to me like I need something fairly custom but I thought I'd ask before I went down that path.
Thanks!
I've used JMeter with custom sampler and assertion components before to do automated regression/load testing for a banking application with a custom protocol (Java RMI based API).
It's not exactly lightweight though, and you'll end up doing a lot of extra coding in the JMeter components to support your custom protocol. I'm guessing you'd have to code your own Java socket based client in this case.
But it gives you a lot of flexibility in defining the logic for testing the components, so you can do whatever you want inside there. It scales nicely as well, and allows you to throw a lot of concurrent connections at the system under test.
I decided it was best to "roll my own" to start with.
We are using HP LoadRunner it's the state of the art load testing product. (But also an expensive one). It can simulate thousands of requests to the server and provides metrics on response time etc..