How can I efficiently poll a lot of servers?

How can I efficiently poll a lot of servers? - ruby

I am looking for a good way to poll a lot of servers for their status through TCP. I am currently using synchronous code and the Minecraft Query Protocol, but whenever a server is offline the rest of the queue gets hold up.
Another problem I am experiencing with my current code is that some servers tend to block my server I use for polling in their firewall, and thus their servers appear offline on my serverlist.
I am using a Ruby rake task with an infinite loop in which every Minecraft server in my MongoDB database gets checked and updated every +- 10 minutes (I try to set this interval by letting the loop sleep (600/ s.count.to_i).ceil seconds.
Is there any way I can do this task efficiently (and prevent servers from blacklisting my IP in their firewall), preferably with Async code in Ruby?

You need to use non-blocking sockets to check - multithreading. The best thing to do is spawn several threads at once to check several servers at once - that way your main thread won't get held up.
This question contains a lot of information about multithreading in Ruby - you should be able to spawn multiple concurrent threads at once, or at least use non-blocking sockets.
Another point given by #Lie Ryan, you can use IO.Select to poll a array of servers, all at once. It will return an array of "online" servers when it's done - this could be more elegant than spawning multiple threads.

Related

Laravel cron/queue/workers setup on multiple servers

I've got multiple servers sharing a database - on each of them a cron job fires ever 5 min checking if a text message log entry doesn't exist, creates a text message log entry and sends out a text message. I thought that there would never be a situation where text messages are sent multiple times, as one server should be first.
Well - I was wrong and that scenario did happen:
A - check if log exists - it doesn't
B - check if log exists - it doesn't
A - create log
B - create log
A - send message
B - send message
I've changed this behaviour to introduce queue, which should mitigate the issue. While the crons will still fire, multiple jobs will be queued, and workers should pick up given jobs at different times, thus preventing of sending of message twice. Though it might as well end up being:
A - pick up job 1
B - pick up job 2
A - check if log exists - it doesn't
B - check if log exists - it doesn't
Etc or A and B might as well pickup the same job at exactly the same time.
The solution would be, I guess, to run one worker server. But then I've the situation that jobs from multiple servers are queued many times, and I can't check if they're already enqueued as we end up with first scenario.
I'm at loss on how to proceed here - while multiple server, one worker server setup will work, I don't want to end up with instances of the same job (coming from different servers) multiple times in the queue.
Maybe the solution to go for is to have one cron/queue/worker server, but I don't have experience with Laravel/multiserver environment to set it up.
The other problematic thing for me is - how to test this? I can't, I guess, test it locally unless there's a way I can spin VM instances that are synchronized with each other.

The easy answer:
The code that checks the database for the existing database entry could use a database transaction with a level high enough to make sure that everyone else that is trying to do the same thing at the same time will be blocked and wait for the job to finish/commit.
A really naive solution (assuming mysql) would be LOCK TABLES entries WRITE; followed by the logic, then UNLOCK TABLES when you're done.
This also means that no one can access the table while your job is doing the check. I hope the check is really quick, because you'll block all access to the table for a small time period every five minutes.
WRITE lock:
The session that holds the lock can read and write the table.
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
Lock requests for the table by other sessions block while the WRITE lock is held.
Source: https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html
That was a really boring answer, so I'll move on to the answer you're probably more interested in...
The server architecture answer:
Your wish to only have one job per time interval in your queue means that you should only have one machine dispatching the jobs. This is easiest done with one dedicated machine that only dispatches jobs from scheduled commands. (Laravel 5.5 introduced the ability to dispatch jobs directly from the scheduler; see Scheduling Queued Jobs)
You can then have an several worker machines processing the queue, and only one of them will pick up the job and execute it. Two worker machines will never execute the same job at the same time if everything works as usual*.
I would split up the web machines from the worker machines so that they can scale independently. I prefer having my web machines dedicated to web traffic, they are not processing jobs to make sure that any large amount of queued jobs will not affect my http response times.
So, I recommend the following machine types in your setup;
The scheduler - one single machine that runs the schedule and dispatches jobs.
Worker machines that handles your queue.
Web machines that handles visitors' traffic.
All machines will have identical source code for your Laravel application. They will also also have an identical configuration. The only think that is unique per machine type is ...
The scheduler has php artisan schedule:run in the crontab.
The workers have supervisor (or something similar) that runs php artisan queue:work.
The web servers have nginx + php-fpm and handles incoming web requests.
This setup will make sure that you will only get one job per 5 minute since there is only one machine that is pushing it. This setup will also make sure that the cpu load generated by the workers aren't affecting the web requests.
One issue with my answer is obvious; that single scheduler machine is a single point of failure. If it dies you will no longer have any of these scheduled jobs dispatched to the queue. That touches areas like server monitoring and health checks, which is out-of-scope of your question and are also highly dependant on your hosting provider.
Regarding that little asterisk; I can make up weird scenarios where a job is executed on several machines. This involves jobs that sleeps for longer than the timeout, while at the same time you've got an environment without support for terminating the job. This will cause the first worker to keep executing the job (since it cannot terminate it), and a second worker will consider the job as timed-out and retry it.

Since Laravel 5.6+ you can ensure your scheduled tasks only run on a single instance using the onOneServer function e.g.
$schedule->command('loggingTask')
->everyFiveMinutes()
->onOneServer();
This requires an APC or Redis cache to be set up because it seems to use a mutual exclusion lock, probably RedisLock if Redis is set up.
Using a queue you shouldn't really have such a problem because popping a task off a queue should be an atomic operation.
Source

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.

RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

TCP Socket communication between processes on Heroku worker dyno

I'd like to know how to communicate between processes on a Heroku worker dyno.
We want a Resque worker to read off a queue and send the data to another process running on the same dyno. The "other process" is an off-the-shelf piece of software that usually uses TCP sockets (port xyz) to listen for commands. It is set up to run as a background process before the Resque worker starts.
However, when we try to connect locally to that TCP socket, we get nowhere.
Our Rake task for setting up the queue does this:
task "resque:setup" do
# First launch our listener process in the background
`./some_process_that_listens_on_port_12345 &`
# Now get our queue worker ready, set up Redis backing store
port = 12345
ENV['QUEUE'] = '*'
ENV['PORT'] = port.to_s
Resque.redis = ENV['REDISTOGO_URL']
# Start working from the queue
WorkerClass.enqueue
end
And that works -- our listener process runs, and Resque tries to process queued tasks. However, the Resque jobs fail because they can't connect to localhost:12345 (specifically, Errno::ECONNREFUSED).
Possibly, Heroku is blocking TCP socket communication on the same dyno. Is there a way around this?
I tried to take the "code" out of the situation and just executed on the command line (after the server process claims that it is properly bound to 12345):
nc localhost 12345 -w 1 </dev/null
But this does not connect either.
We are currently investigating changing the client/server code to use UNIXSocket on both sides as opposed to TCPSocket, but as it's an off-the-shelf piece of software, we'd rather avoid our own fork if possible.

Use message queue Heroku add-ons ...,
like IronMQ for exsample

Have you tried Fifo?
http://www.gnu.org/software/libc/manual/html_node/FIFO-Special-Files.html#FIFO-Special-Files

Reading your question, you've answered your own question, you cannot connect to localhost 12345.
This way of setting up your processes is a strange one as your running two processes within one Heroku dyno which removes a lot of the benefits of Heroku, i.e independant process scaling, isolation and clean depenedency declaration and isolation.
I would strongly recommend running this as two seperate processes that interact via a third party backing service.

Heroku only lets you listen in a given port ($PORT) per dyno, I think.
I see two solutions here:
Use Redis as a communication middleware, so the worker would write on Redis again and the listener process, instead of listening in a port would be querying redis for new jobs.
Get another heroku dyno (or better, a complete different application) and launch there the listening process (on $PORT) and communicate both applications

#makdad, is the "3rd party software" written in Ruby? If so, I would run it with a monkey patch which fakes out TCPSocket or whatever class it is using to access the TCP socket. Put the monkey patch in a file of its own, which will only be required by the Ruby process which is running the 3rd party software. The monkey patch could even read data directly from the queue, and make TCPSocket behave as if that data had been received.
Yes, it's not very elegant, and I'm sure there may be a better way to do it, but when are you trying to get a job done (not spend days doing research), sometimes you just have to bite the bullet and do something which is ugly, but works. Whatever solution you choose, make sure to document it for those who work on the project later.

Is it a bad idea to create worker threads in a server process?

My server process is basically an API that responds to REST requests.
Some of these requests are for starting long running tasks.
Is it a bad idea to do something like this?
get "/crawl_the_web" do
Thread.new do
Crawler.new # this will take many many days to complete
end
end
get "/status" do
"going well" # this can be run while there are active Crawler threads
end
The server won't be handling more than 1000 requests a day.

Not the best idea....
Use a background job runner to run jobs.
POST /crawl_the_web should simply add a job to the job queue. The background job runner will periodically check for new jobs on the queue and execute them in order.
You can use, for example, delayed_job for this, setting up a single separate process to poll for and run the jobs. If you are on Heroku, you can use the delayed_job feature to run the jobs in a separate background worker/dyno.

If you do this, how are you planning to stop/restart your sinatra app? When you finally deploy your app, your application is probably going to be served by unicorn, passenger/mod_rails, etc. Unicorn will manage the lifecycle of its child processes and it would have no knowledge of these long-running threads that you might have launched and that's a problem.
As someone suggested above, use delayed_job, resque or any other queue-based system to run background jobs. You get persistence of the jobs, you get horizontal scalability (just launch more workers on more nodes), etc.

Starting threads during request processing is a bad idea.
Besides that you cannot control your worker threads (start/stop them in a controlled way), you'll quickly get into troubles if you start a thread inside request processing. Think about what happens - the request ends and the process gets prepared to serve the next request, while your worker thread still runs and accesses process-global resources like the database connection, open files, same class variables and global variables and so on. Sooner or later, your worker thread (or any library used from it) will affect the main thread somehow and break other requests and it will be almost impossible to debug.
You're really better off using separate worker processes. delayed_job for example is a really small dependency and easy to use.

Looking for pattern/approach/suggestions for handling long-running operation tied to web app

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!

Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?

As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)

This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.

Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.

Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.

Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio