How to prevent automated request to the server? - performance

How to prevent automated request? This is not a duplicate question. The existing answers are not compromised.
What if an attacker using a program which make 1 Million requests per minute? Or the program may use various proxy or vpn and can request millions of request to the server. The server will slow down due to the heavy load. How can this be prevented?
Can IP Tables handle millions of request per second?

You cannot prevent someone from attacking you. You simply have to deal with it.
You either spin up enough server capacity to handle the brute force attack, or if you don't want your servers to have to handle the load then you have to deal with the attack closer to the edge of the network instead.
Have a look at the following:
https://www.cloudflare.com/learning/ddos/ddos-mitigation/
https://www.cloudflare.com/learning/ddos/glossary/ddos-blackhole-routing/
https://en.wikipedia.org/wiki/DDoS_mitigation
https://en.wikipedia.org/wiki/Denial-of-service_attack

Related

Is this a correct scenario to use WebSocket?

I have a browser plugin which will be installed on 40,000 dekstops.
This plugin will connect to a backend configuration file available via https, e.g. http://somesite/config_file.js.
The plugin is configured to poll this backend resource once/day.
But there is only one backend server. So if 40,000 endpoints start polling together the server might crash.
I could think of randomize the polling frenquency from the desktop plugins. But randomization still does not gurantee that there will not be a overload at the server.
Is using websocket in this scenario solves the scalability issue?
Polling once a day is very little.
I don't see any upside for Websockets unless you switch to Push and have more notifications.
However, staggering the polling does make a lot of sense, since syncing requests for the same time is like writing a DoS attack against your own server.
Staggering doesn't necessarily have to be random and IMHO, it probably shouldn't.
You could start with a fixed time and add a second per client ID, allowing for ~86K connections in 24 hours which should be easy for any server to handle.
As a side note, 40K concurrent connections might not as hard to achieve as you imagine.
EDIT (relating to the comments)
Websockets vs. Server Sent Events:
IMHO, when pushing data (vs. polling), I would prefer Websockets over Server Sent Events (SSE).
Websockets have a few advantages, such as client side communication which allows clients to ping the server and confirm that the connection is still alive.
The Specific Use-Case:
From the description in the question and the comments it seems that you're using browser clients with a custom plugin and that the updates you wish to install daily might require the browser to be active.
This raises different questions that effect the implementation (are the client browsers open all day? do you have any control over the client browsers and their environment? can you guarantee installation while the browser is closed?).
...
IMHO, you might consider having the client plugins test for an update each morning as they load for the first time during that day (first access).
People arrive at work in different times and they open their browsers for the first time at different schedules. So the 40K requests you're expecting will be naturally scattered across that timeline (probably a 20-30 minute timespan).
This approach makes sure that the browsers and computers are actually open (making the update possible) and that the update requests are staggered over a period of time (about 33.3 requests per second, if my assumption is correct).
If you're serving a pre-written static configuration file (perhaps updated by the server daily), avoiding dynamic content and minimizing any database calls, than 33 req/sec should be very easy to manage.

How is it possible to balance load through one proxy?

After reading about proxies, reverse proxies and load balancing I am left with a question: how is it possible to balance load (via a proxy) if all traffic still has to go through one point - the proxy?
What I understood is the concept that a proxy can distribute requests to different servers. For a client it seems like all the responses come from the proxy. But if all the responses still have to go through the proxy in the end, how does this help so much? The proxy needs to have the capacity off all the servers behind it combined! I am probably missing something..
One of the discussions I am referring to is: Difference between proxy server and reverse proxy server
Well, the load-balancing proxy only performs very simple tasks like rolling a virtual dice to pick one of the servers behind it. These tasks should take a negligible time to complete so that the throughput of the proxy was as high as possible.
On the other hand the servers that handle actual users' requests perform many complex tasks (connect to and query the database, parse data, prepare response) which take longer time, therefore their load is higher and the throughput is significantly lower.
Of course, the load-balancing isn't as simple as that, you can't just pick random numbers as you need to deal with back-end servers' downtimes for example, the point is that the tasks on a load balancer should take much much shorter time than tasks on the servers behind. :-)

Notification System - Socket.io or Ajax?

I'm using Laravel5 and, I want to create a notification system for my (web) project. What I want to do is, notifying the user for new notifications such as;
another user starts following him,
another user writes on his wall,
another user sends him a message, etc,
(by possibly highlighting an icon on the header with a drop-down menu. The ones such as StackOverflow).
I found out the new tutorials on Laracast: Real-time Laravel with Socket.io, where a kind of similar thing is achieved by using Node, Redis and Socket.io.
If I choose using socket.io and I have 5000 users online, I assume I will have to make 5000 connections, 5000 broadcastings plus the notifications, so it will make a lot of number of requests. And I need to start for every user on login, on the master blade, is that true?
Is it a bad way of doing it? I also think same thing can be achieved with Ajax requests. Should I tend to avoid using too many continuous ajax requests?
I want to ask if Socket.io is a good way of logic for creating such system, or is it a better approach to use Ajax requests in 5 seconds instead? Or is there any alternative better way of doing it? Pusher can be an alternative, however, I think free is a better alternative in my case.
A few thoughts:
Websockets and Socket.io are two different things.
Socket.io might use Websockets and it might fall back to AJAX (among different options).
Websockets are more web friendly and resource effective, but they require work as far as coding and setup is concerned.
Also using SSL with Websockets for production is quite important for many reasons, and some browsers require that the SSL certificate be valid... So there could be a price to pay.
Websockets sometimes fail to connect even when supported by the browser (that's one reason using SSL is recommended)... So writing an AJAX fallback for legacy or connectivity issues, means that the coding of Websockets usually doesn't replace the AJAX code.
5000 users at 5 seconds is 1000 new connections and requests per second. Some apps can't handle 1000 requests per second. This shouldn't always be the case, but it is a common enough issue.
The more users you have, the close your AJAX acts like a DoS attack.
On the other hand, Websockets are persistent, no new connections - which is a big resources issue - especially considering TCP/IP's slow start feature (yes, it's a feature, not a bug).
Existing clients shouldn't experience a DoS even when new clients are refused (server design might effect this issue).
A Heroku dyno should be able to handle 5000 Websocket connections and still have room for more, while still answering regular HTTP requests.
On the other hand, I think Heroku imposes an active requests per second and/or backlog limit per dyno (~50 requests each). Meaning that if more than a certain amount of requests are waiting for a first response or for your application to accept the connection, new requests will be refused automatically.... So you have to make sure you have no more than 100 new requests at a time. For 1000 requests per second, you need your concurrency to allows for 100 simultaneous requests at 10ms per request as a minimal performance state... This might be easy on your local machine, but when network latency kicks in it's quite hard to achieve.
This means that it's quite likely that a Websocket application running on one Heroku Dyno would require a number of Dynos when using AJAX.
These are just thoughts of things you might consider when choosing your approach, no matter what gem or framework you use to achieve your approach.
Outsourcing parts of your application, such as push notifications, would require other considerations such as scalability management (what resources are you saving on?) vs. price etc'

How is Ruby Mechanize fast after first get request?

I recently programmed a scraper with Ruby's Mechanize gem for the first time. It had to hit the server (some 'xyz.com/a/number') where the number will be generated by the script. Like 'xyz.com/a/2' and 'xyz.com/a/3'.
It turned out that the first request took a lot of time -- around 1.5s on a 512kbps connection. But the next request was done in 0.3ms.
How could it be done so fast? Did it have some caching mechanism?
There are lots of possible sources for a speed change between requests. A few that immediately spring to mind:
DNS lookup cached on your client. The first call must convert "xyz.com" to "123.45.67.89", involving a DNS lookup which may be slow.
HTTP keep-alive. There is an initial conversation between client and server to start an HTTP data transfer. On a high-latency connection you will notice this. If server and client both respect HTTP keep-alive, then a connection can be established once to cover multiple requests.
Server-side caching. The server you are scraping uses caching to speed up multiple similar requests. It might be caching data to do with your current session for example, or even just not fully compiled the script yet until your first request.
Server-side VM resource allocation. If the server is sharing space on a virtualised system, and does not serve high traffic, then it may become more responsive after the first request ensures everything is in RAM and has CPU allocated.
This is by no means exhaustive. The above examples are just to illustrate that this behaviour - initial slow response, followed by faster ones - is very common for web services, and has multiple causes.

Is using SSL session cache a bad idea for a websocket server?

When a client reconnects to an SSL server, SSL session caches remove the need to recompute the same cryptographic agreement previously used by that client (while also reducing the communication round trips needed from 2 to 1).
However, the websocket protocol states that clients should never disconnect from a websocket server without a good reason (e.g. because an error occurred or the user closed the browser/application tab/window) (?); so when websockets are established on top of an SSL layer, the server can simply assume any websocket connection is alive unless notified otherwise, during which time the underlying SSL session of any connection can also safely be assumed to remain valid?
Furthermore, a websocket server needs to be able to handle many concurrent long lived connections, and because SSL session caches need to be stored for every single connection (?), implementing these caches would in this case probably be detrimental to performance because of the large memory overhead, right?
Sorry; this might be more than one simple question, but I wanted to verify if my understanding of these issues is adequate.
Well,
I think it might depend on the software architecture. Let's say you have a site with 100 pages and the user often navigates between them. Many of them have a websocket for some special purpose, but it is only kept alive during the display of that page. Then caching would make sense as you're closing/opening WebSockets often.
On the other hand, you might have a site whit only one page and where you open a websocket. Content is managed by WebSocket and Ajax requests, but the WebSocket is kept alive during the whole session. In this case, caching SSL for the WebSocket doesn't make much sense.
So, at the end, I would say it depends on the implementation. If you already have a site, you should analyze how it behaves and tune your cache needs. On the other hand, if you're starting to design a new site, knowing pros and cons of different scenarios might help you to build a better and more efficient design.
Regards

Resources