I'm creating a Rails Engine that has some background workers using resque. The problem is that if I have two applications running locally, using this engine, any worker I have running will execute the tasks for both apps.
I want to isolate these applications. Any idea where to start?
Thank you.
You should be able to reference some unique identifier from wherever you are executing the Resque.enqueue in the engine:
Resque.enqueue "#{Rails.application.class.parent_name}_engine_jobs", EngineJob
Then you can reference these application-specific queue names from the worker(s).
I am trying to use a REST service to trigger Spark jobs using Dataproc API client. However, each job inside the dataproc clusters take 10-15 s to initialize the Spark Driver and submit the application. I am wondering if there is an effective way to eliminate the initialization time for Spark Java jobs triggered from a JAR file in gs bucket? Some solutions I am thinking of are:
Pooling a single instance of JavaSparkContext that can be used for every Spark job
Start a single job and run Spark-based processing in a single job
Is there a more effective way? How would I implement the above ways in Google Dataproc?
Instead of writing this logic yourself, you may want to investigate the Spark Job Server: https://github.com/spark-jobserver/spark-jobserver as this should allow you to reuse spark contexts.
You can write a driver program for Dataproc which accepts RPCs from your REST server and re-use the SparkContext yourself and then submit this driver via the Jobs API, but I personally would look at the job server first.
We are running our production system on Elastic Beanstalk. We want to be able to take advantage of EBS' worker tiers with autoscaling. Unfortunately, due to how Laravel queue processing works, Laravel expects all queues to be consumed by starting a php command line process on your servers. EBS worker tiers don't function that way. AWS installs a listener daemon of its own, that pulls of jobs and feeds them to your worker over local HTTP calls. Sounds great. Unfortunately, I can't figure out how one would call a queued job from a route and controller in Laravel instead of using the built-in artisan queue listener task. Any clues as to how to achieve this would be greatly appreciated.
You can use the Artisan::call method to call commands from code.
$exitCode = Artisan::call('queue:work');
You can see more info in the docs
In Controller action method:
I recently came across Apache Mesos and successfully deployed my Storm topology over Mesos.
I want to try running Storm topology/Hadoop jobs over Apache Marathon (had issues running Storm directly on Apache Mesos using mesos-storm framework).
I couldn't find any tutorial/article that could list steps how to launch a Hadoop/Spark tasks from Apache Marathon.
It would be great if anyone could provide any help or information on this topic (possibly a Json job definition for Marathon for launching storm/hadoop job).
Thanks a lot
Thanks for your reply, I went ahead and deployed a Storm-Docker cluster on Apache Mesos with Marathon. For service discovery I used HAProxy. This setup allows services (nimbus or zookeeper etc) to talk to each other with the help of ports, so for example adding multiple instances for a service is not a problem since the cluster will find them using the ports and loadbalance the requests between all the instances of a service. Following is the GitHub project which has the Marathon recipes and Docker images: https://github.com/obaidsalikeen/storm-marathon
Marathon is intended for long-running services, so you could use it to start your JobTracker or Spark scheduler, but you're better off launching the actual batch jobs like Hadoop/Spark tasks on a batch framework like Chronos (https://github.com/airbnb/chronos). Marathon will restart tasks when the complete/fail, whereas Chronos (a distributed cron with dependencies) lets you set up scheduled jobs and complex workflows.
While a little outdated, the following tutorial gives a good example.
I'm deploying my Grails (2.3.6) app with the Grails Standalone App Runner plugin, like so:
grails -Dgrails.env=prod build-standalone myapp.jar --tomcat
Then, my CI build places myapp.jar onto my app server, say, myapp01.
I now want to cluster app sessions when myapp is running on multiple nodes. So if myapp gets deployed to myapp01, myapp02 and myapp03, and one of those instances starts a new session with a user, I want all 3 to be aware of the same session. This is obviously so I can put all the nodes behind a load balanced URL (http://myapp.example.com, etc.) and it doesn't matter what node you get routed to: all nodes share the same sessions.
I googled "grails session clustering" and see a bunch of articles that seem to require terracotta, but I also heard that Grails has built-in session clustering facilities. But any searches I do come back empty-handed.
So I ask: How can I achieve this kind of session clustering with an embedded Tomcat?
Besides the seesion-cookie plugin that #injecteer proposed, there are several other plugins allowing to keep sessions in a shared storage (DB, mongodb, redis, memcached) that can be accessed by any of your tomcat instances. Take a look at these:
I never heard of something like this out-of-box. I would give 2 options a try:
Use a session-cookie plugin, with which you decouple your clients from storing the sessions in tomcat
Use or implement persistent sessions, which are stored in some sort of DB and are not bound to any tomcat instance.
You could achieve this by using the tomcat build-in functionality. Tomcat instance node could replicate session from others, then all the session get shared between nodes.
You can do this in at least three ways:
Session Replication by using Muilcast between instance nodes.
Session Replication just between primary and secondary node backup.
Session Replication between Static Memberships, this one is useful when the multicast cannot be enabled or supported such as in AWS EC2 Env.
My company has thousands of server instances running application code - some instances run databases, others are serving web apps, still others run APIs or Hadoop jobs. All servers run Linux.
In this cloud, developers typically want to do one of two things to an instance:
Upgrade the version of the application running on that instance. Typically this involves a) tagging the code in the relevant subversion repository, b) building an RPM from that tag, and c) installing that RPM on the relevant application server. Note that this operation would touch four instances: the SVN server, the build host (where the build occurs), the YUM host (where the RPM is stored), and the instance running the application.
Today, a rollout of a new application version might be to 500 instances.
Run an arbitrary script on the instance. The script can be written in any language provided the interpreter exists on that instance. E.g. The UI developer wants to run his "check_memory.php" script which does x, y, z on the 10 UI instances and then restarts the webserver if some conditions are met.
What tools should I look at to help build this system? I've seen Celery and Resque and delayed_job, but they seem like they're built for moving through a lot of tasks. This system is under much less load - maybe on a big day a thousand hundred upgrade jobs might run, and a couple hundred executions of arbitrary scripts. Also, they don't support tasks written in any language.
How should the central "job processor" communicate with the instances? SSH, message queues (which one), something else?
Thank you for your help.
NOTE: this cloud is proprietary, so EC2 tools are not an option.
I can think of two approaches:
Set up password-less SSH on the servers, have a file that contains the list of all machines in the cluster, and run your scripts directly using SSH. For example: ssh user#foo.com "ls -la". This is the same approach used by Hadoop's cluster startup and shutdown scripts. If you want to assign tasks dynamically, you can pick nodes at random.
Use something like Torque or Sun Grid Engine to manage your cluster.
The package installation can be wrapped inside a script, so you just need to solve the second problem, and use that solution to solve the first one :)