Which AWS product to use to run batch jobs?

Which AWS product to use to run batch jobs? - amazon-ec2

I have a program written in C++11. On the current input it takes too long to run. Luckily, the data can be safely split into chunks for parallel processing, which makes it a good candidate for, say, a Map/Reduce service.
AWS EMR could be a possible solution. However, since my code uses many modern libraries, it's quite a pain to compile it on the instances that are assigned for Apache Hadoop clusters. For example, I want to use soci (not available at all), boost 1.58+ (1.53 is there), etc etc. I also need a modern C++ compiler.
Obviously, all libraries and compilers can be manually upgraded (and the process scripted), but this sounds like a lot of manual work. And what about slave nodes - will they get all the libraries? Somehow I'm not sure. And the whole process of initializing the environment can now take very long time - thus killing a lot of performance advantage that distributing the jobs was supposed to bring in to begin with.
On the other hand, I don't really need all the advanced functionality that Apache Hadoop provides. And I don't want to set up a personal permanent cluster with my own installation of Hadoop or similar, because I will need to run the tasks only periodically and most of the time the servers will be idle, wasting money.
So, what would be the best product (or overall strategy) that could do the following:
Grab the given binaries + set of input files
Run the binaries on a predefined number of instances, using a recent Linux, ideally Ubuntu 15.10
Put the resulting files in a predefined location (S3 bucket?)
Shut everything down
I am sure I could write a number of scripts using the aws tool to achieve that manually, but I really don't want to reinvent the wheel. Any thoughts?
Thanks in advance!

Honestly that would be pretty easy to script, and you'll need to probably use scripting to grab the latest code on the servers when they start up anyway. I would suggest looking into defining an AutoScaling group with scheduled scaling policies. Alternatively you could have a Lambda function scheduled to run and issue the API command to create your instances.
You could either have a startup script on the server AMI, or simply pass a user-data script when you create the instances, that pulls down the binaries and input files and runs the command. The final step of the script could be to copy results to S3 and shutdown the server.

The (relatively new) AWS Batch is made for this purpose specifically.

Related

What are recommended methods to install WAS(Websphere 9) on more than one server?

I am aware of the process to install WAS 8.5.5.x and 9.0.x versions using IM response file(s) but would like to know best practices and recommendations to perform WAS installation and upgrade on more than one server, to avoid manual errors and reduce time.
I am open to use to Ansible, Puppet or any other orchestration tools as well, but would like to know possible options if we are not allowed to use these tools.
Ultimate goal is to automate most of the setup/upgrade steps, if not all of them since when dealing with bunch of servers.
Thanks

Assuming you are referring to WebSphere Application Server traditional, take a look at the approaches described here, https://www.ibm.com/support/knowledgecenter/SSEQTP_9.0.0/com.ibm.websphere.installation.base.doc/ae/tins_enterprise_install.html, especially if you are working with larger scale deployments.
Consider creating master images and distributing them in a swinging profile-type setup. They make it easier and faster to install and apply updates since you only need to create images once and distribute many times. You have consistency across systems too.
You can then automate with your preferred automation technology.

We use ansible, simple and effectively.
True, you must of course develop a playbook that will be able to do all this.

Using puppet (or any thing else) instead of a bash script for an SSH based deployment

I have a custom build and deployment script which work over SSH and deploy to servers (on running MacOS). The bash script does a lot of simple things like copying files, backing up the old ones and applying the correct SQL scripts for a forward moving database. But there are some advanced things like starting a remote SQL upgrade procedure which can be disconnected from and once the deployment script is started again it only goes forward if the SQL script has been applied completely (in short there is some flow control happening and bash is not really ideal for such stuff)
The script is already huge and is a mess since bash is not meant for such kind of detailed logic. Can you recommend some tools, libraries which would make things easier.

For what you tell us, I think you need a deployment tool, rather than a configuration management tool.
To simplify, I'll distinguish the two like this:
A deployment tool is a 'push' tool: When you press the button, the required actions are run to make the deployment. It's a one-step process (it can have multiple actions, but it's launched once).
A configuration management tool is usually a 'pull' tool, where your servers periodically check if their configuration is exactly as the CM server tells them to be - and apply changes, if needed. You configure your servers once, and after that the system assures that all is as it should be. It is also a great tool to easily clone systems.
For deployment tools, I personally know Fabric, a great Python tool. But there is also Capistrano in the Ruby world. I don't know of any others.
For CM tools, Puppet and Chef seem to be the preferred choice of people nowadays. Cfengine is an older tool, which had some problems (I don't know if that has changed).

Here are my recommendations:
Puppet
Chef
cfengine
These are all free (as in beer) and allow you to do what you're wanting. They will require you to adapt your current bash script into modules to fit their design/framework. It's a bit of work, but in the long run it tends to be better since the frameworks take care of error checking, converging configurations and a lot of other things you'd have to manually insert into your own code were you doing this yourself.
I've also used Opsware previously for this sort of thing, but that costs a fair bit of cash and, for what you're trying to do, does not offer significantly more benefit.

In some cases moving from a bash-script to an complete solution is not as straightforward as many cloudservices claim.
With 'dont try new things when your on a deadline' in mind:
it could also be a good timing to refactor your bashscripts.
I have done automated, repeatable deployments in the past using PaaS or just using GIT/SVN hooks using deployogi (which is bash) : https://github.com/coderofsalvation/deployogi
I understand your situation, but Im not sure whether its fair to say that the bash-language implies 'a mess' and 'complex'.
Every language allows to hide complexity no?
I guess code (in whatever language) gets overly complex when time does not allow us to refactor :)
PaaS is great. But always needed? I think not.

Quartz.NET vs Windows Scheduled Tasks. How different are they?

I'm looking for some comparison between Quartz.NET and Windows Scheduled Tasks?
How different are they? What are the pros and cons of each one? How do I choose which one to use?
TIA,

With Quartz.NET I could contrast some of the earlier points:
Code to write - You can express your intent in .NET language, write unit tests and debug the logic
Integration with event log, you have Common.Logging that allows to write even to db..
Robust and reliable too
Even richer API
It's mostly a question about what you need. Windows Scheduled tasks might give you all you need. But if you need clustering (distributed workers), fine-grained control over triggering or misfire handling rules, you might like to check what Quartz.NET has to offer on these areas.
Take the simplest that fills your requirements, but abstract enough to allow change.

My gut reaction would be to try and get the integral WinScheduler to work with your needs first before installing yet another scheduler - reasoning:
no installation required - installed and enabled by default
no code to write - jobs expressed as metadata
integration with event log etc.
robust and reliable - good enough for MSFT, Google etc.
reasonably rich API - create jobs, check status etc.
integrated with remote management tools
security integration - run jobs in different credentials
monitoring tooling
Then reach for Quartz if it doesn't meet your needs. Quartz certainly has many of these features too, but resist adding yet another service to own and manage if you can.

One important distinction, for me, that is not included in the other answers is what gets executed by the scheduler.
Windows Task Scheduler can only run executable programs and scripts. The code written for use within Quartz can directly interact with your project's .NET components.
With Task Scheduler, you'll have to write a shell executable or script. Inside of that shell, you can interact with your project's components. While writing this shell code is not a difficult process, you do have to consider deploying the extra files.
If you anticipate adding more scheduled tasks over the lifetime of the project, you may end up needing to create additional executable shells or script files, which requires updates to the deployment process. With Quartz, you don't need these files, which reduces the total effort needed to create and deploy additional tasks.

Unfortunately, Quartz.NET job assemblies can't be updated without restarting the process/host/service. That's a pretty big one for some folks (including myself).
It's entirely possible to build a framework for jobs running under Task Scheduler. MEF-based assemblies can be called by a single console app, with everything managed via a configuration UI. Here's a popular managed wrapper:
https://github.com/dahall/taskscheduler
https://www.nuget.org/packages/TaskScheduler
I did enjoy my brief time of working with Quart.NET, but the restart requirement was too big a problem to overcome. Marko has done a great job with it over the years, and he's always been helpful and responsive. Perhaps someday the project will get multiple AppDomain support, which would address this. (That said, it promises to be a lot of work. Kudos to he and his contributors if they decide to take it on.)
To paraphrase Marko, if you need:
Clustering (distributed workers)
Fine-grained control over triggering or misfire handling rules
...then Quartz.NET will be your requirement.

Condor, Sun Grid Engine, or something else?

I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else).
We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We'd mainly be running Matlab, Java or Python simulations for either monte-carlo or parameter explorations.
With my limited knowledge of Condor, it sounds like using a the vm universe might be a convenient way of taking care of snapshots without having to modify existing code.
Is SGE or something else better than condor for this kind of work?

SGE doesn't really support windows. It comes with all kinds of caveats and missing bits on Windows.
I've been running Condor pools for many years now and it is a superb HTPC setup for both cycle-stealing and dedicated, always-on hardware, on Linux and Windows machines. The recent addition of their Rooster daemon lets you put machines to sleep between job cycles and wake them up when new work appears in the pool. They also have an active and very helpful support community. Checkpointing is the only Condor feature not available on Windows. Everything else is there. With the addition of the VM Universe, checkpointing is getting less and less useful. Really: to use checkpointing successfully you need to be able to relink your entire code stack. So if you're running Matlab jobs, even on Linux, checkpointing isn't going to be possible.
If you have specific questions about getting Condor running on Windows I'd be happy to answer them, share my experiences with it. I run Condor across 4 pools around the globe with a total of about 1500 dedicated machines in all the pools and some 1000 or so additional desktop machines that are available as users care to donate them.

I'd start with Condor. It has good support for Windows, and newer versions have built-in support for sending wake-on-lan in a very configurable way when jobs can run on certain machines. It can also shut the machines down based on user-defined policies.

After Oracle's takeover of SGE (Sun Grid Engine), there is the Open Grid Scheduler project that still offers open-source Grid Engine.
http://gridscheduler.sourceforge.net/

For dedicated hardware I'd go with Grid Engine.
For scavenging clock cycles on machines which may be in use I'd go with Condor.
For hardware which you have dedicated access to for fixed periods, such as overnight and at weekends, I'd probably still go with Condor but might be able to persuade myself to use Grid Engine.

I've had to choose between condor and SGE for a customer project recently. I was favoring SGE (because I was more familiar with that environment), but Condor won finally because:
the customer infrastructure is Windows oriented, and the SGE solution requires a Unix or Linux machine for the Central Manager, + installing MS Services for Unix on the computation hosts
support and installation process of Condor on Windows was much simpler.
However, you cannot use the most interesting features of Condor on Windows : checkpointing is not available, nor the Condor specific IOs. I'm not using the VM universe, so I cannot comment on that aspect.

I've only tried Condor, and it was a pain to attempt to set up. If you need all the clock cycles you can fully utiilize, go with Condor.
I'm about to try SGE, and I'll tell you how it goes. However at my company, people have had experience setting up SGE, so I'll probably say SGE is easier.

SGE doesn't exist... it's OGE, and it's very expensive. Go with Condor.

How can I call an executable to run on a separate machine within a program on my own machine (win xp)?

My objective is to write a program which will call another executable on a separate computer(all with win xp) with parameters determined at run-time, then repeat for several more computers, and then collect the results. In short, I'm working on a grid-computing project. The algorithm itself being used is already coded in FORTRAN, but we are looking for an efficient way to run it on many computers at once.
I suppose one way to accomplish this would be to upload a script to each computer and then run said script on each computer, all automatically and dependent on my own parameters. But how can I write a program which will write to, upload, and run a script on a separate computer?
I had considered GridGain, but the algorithm is already coded and in a different language, so that is ruled out.
My current guess at accomplishing this task is using Expect (wiki/Expect), but I have no knowledge of the tool.
Any advice appreciated.

You can use PsExec for this:
http://technet.microsoft.com/en-us/sysinternals/bb897553.aspx
You could also look at the open source alternative RemCom:
http://rce.sourceforge.net/
It's actually pretty simple to write your own as well but RCE will show you how to do it if you want. Although, using PsExec may just suffice your needs.

Have a look into PVM, it was made for the type of situation you're describing, but you may have to annotate your existing codebase and/or implement a wrapper application.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio