How to run hadoop application automatically? - hadoop

I know that a MapReduce program can be ran using the command line "hadoop jar *.jar" for a time.
But now the program is required to be ran a time for every hour in background.
Are there any methods to make the MR program be hourly submitted to hadoop automatically?
Adding sleeptime(600sec) in program perhaps is working, but it make the app run all the time.

Your saviour is oozie. Happy Learning

Related

Code updates during submission of condor jobs

When using condor for distributing jobs across a dedicated computer cluster, one first submits the jobs to the cluster and then waits for them to actually start running. Depending on multiple factors, they might stay in an idle state for quite some time, even hours.
Let us say I just compiled the code that is going to be run in the jobs. I can submit the jobs via a condor submission file. I then realize I would like to change the original code, either because there is some bug in it, or else because I want to try different parameters. In the case the code finishes compiling while the jobs are still in an idle state, which version is going to be run in the cluster? In other words, does condor somehow stores a snapshot of the code when the jobs are submitted, or it just picks it when the jobs start running?
Despite thinking the first option sounds way more reasonable, I have evidence from my own work that the second is the one that actually happens.
When condor_submit is run, the executable is copied to the spool directory under the scheduler. This is called spooling. If you want to be able to change the executable after submission, probably the best thing to do is to make your executable a shell script that calls the real executable, and put the executable into the transfer_input_files list.

what's the difference between spark-shell and submitted sbt programs

Spark-shell can be used to interact with the distributed storage of data, then what is the essential difference between coding in spark-shell and uploading packaged sbt independent applications to the cluster operation?(I found a difference is sbt submit the job can be seen in the cluster management interface, and the shell can not) After all, sbt is very troublesome, and the shell is very convenient.
Thanks a lot!
Spark-shell gives you a bare console-like interface in which you can run your codes like individual commands. This can be very useful if you're still experimenting with the packages or debugging your code.
I found a difference is sbt submit the job can be seen in the cluster management interface, and the shell can not
Actually, spark shell also comes up in the job UI as "Spark-Shell" itself and you can monitor the jobs you are running through that.
Building spark applications using SBT gives you some organization in your development process, iterative compilation which is helpful in day-to-day development, and a lot of manual work can be avoided by this. If you have a constant set of things that you always run, you can simply run the same package again instead of going through the trouble of running the entire thing like commands. SBT does take some time getting used to if you are new to java style of development, but it can help maintain applications on the long run.

Scheduling Cucumber test features to run repeatedly

I have a Cucumber test (feature files) in the RubyMine IDE and lately I have a need to execute one of the feature repeatedly on a scheduled time.
I haven't found a way to do so. Any idea or thoughts on scheduling that feature file?
You can create a cron job which will execute a rake.
The software utility Cron is a time-based job scheduler in Unix-like
computer operating systems. People who set up and maintain software
environments use cron to schedule jobs (commands or shell scripts) to
run periodically at fixed times, dates, or intervals.
These links might help
How to create a cron job using Bash
how to create a cron job to run a ruby script?
http://rake.rubyforge.org/
I solved the problem by simply installing Jenkins on my machine from its official site,https://jenkins-ci.org/. I configured master and slave nodes on my own machine because I just needed to run one feature file(it has the script I want to run on daily basis).
Although, it is recommended to configure slave on a different machine if we have multiple jobs to run and our jobs are resource intensive.
There is a very good illustration on installing, configuring and running jobs in this link http://yakiloo.com/setup-jenkins-and-windows/

how to add GUI to a hadoop program?

I use hadoop to write a Mapreduce program which is able to deploy to ec2 and local cluster, I am fine to use the command line to run the program, but is there any way to add interface to the hadoop program, so that users just need to click and run the program instead of using command line? Thanks!
I'm not sure exactly what you want, but I think your asking if there is an UI for submitting map reduce jobs hadoop? if so, you should try Hue: http://cloudera.github.com/hue/

how can I run a java program automaticlly

I have a java package,
I want to my program be runned every night at 0 o'clock automaticlly,
how can I do this work?
Generally you have 2 solutions:
Create application that runs your code every night, i.e. implement scheduling yourself. Obviously you can (and should) use tools that help you to do scheduling.
Use OS-specific tools. For example cron for unix and windows task scheduler for windows.
You can either schedule in your own OS. On *nix, there is cron. I'm not sure what is used in windows.
Or you can make your own java program schedule: on running it, it sets a times to execute your task in a specific time.
You could use a Thread.sleep() counting the time from now until midnight, but that's a poor-man's solution. Quartz is your man, as it works schedulling your tasks.
If you choose the schedulling path, you can't forget to run your application in the OS startup

Resources