In Apache Nifi, is there any command to start particular process group from Command Prompt?
I was able to do this using curl commands
Related
I run Apache Storm in a cluster and I was looking for ways to stop and/or restart Nimbus, Supervisor and UI. Would writting a servise help? What should I write in this service file and where should I place it? Thank you in advance
Yes, writing a service is the recommended way to run Storm. The commands you want to run are storm nimbus to start Nimbus (minimum 1 per cluster), storm supervisor to run the supervisor (1 per worker machine), storm ui (1 per cluster) and storm logviewer (1 per worker machine). There are other commands you can also run, but you can find these by simply running storm, it will print a list.
Regarding how to write the service, take a look at the upstart cookbook http://upstart.ubuntu.com/cookbook/.
There's an example script here you can probably use to get started https://unix.stackexchange.com/a/84289
you can make them as service and start them up as the node starts and same can be used to stop them.
/etc/rc.d/SERVICE start or stop or restart
We can easily stop them using the command "ps -aux | grep nimbus" or supervisor etc. Then we have to find the process id and kill it with the “kill” command.
I have to run multiple spark job one by one in a sequence, So I am writing a shell script. One way I can do is to check success file in output folder for job status, but i wanna know that is there any other way to check the status of spark-submit job using unix script, where I am running my jobs.
You can use command
yarn application -status <APPLICATIOM ID>
where <APPLICATIOM ID> is your application ID and check for line like:
State : RUNNING
This will give you the status of your application
To check the list of application, run via yarn you can use command
yarn application --list
You can add also -appTypes to limit the listing based on the application type
I need to get the id of a specific hadoop job.
In my case, I lunch a sqoop commande remotely and I went to verify the job status with this commande :
hadoop job -status job_id | grep -w 'state'
I can get this information from the GUI but i went to do something
can any one help me !!!
You can use the Yarn REST apis, via your browser or curl from the command line. It will list all the currently running and previously running jobs, including sqoop and the mapreduce jobs that sqoop generates and executes. Use the UI first, if you have it up and running just point your browser to http:<host>:8088/cluster (not sure if the port is the same on all hadoop distributions. I believe 8088 is the default on apache). Alternatively you can use yarn commands directly, e.g, yarn application -list.
I have just installed Cloudera VM setup for hadoop. But when I open the command prompt and want to start all daemons for hadoop using command 'start-all.sh' , I get an error stating "bash : start-all.sh: command not found".
I have tried 'start-dfs.sh' too yet still gives the same error. When I use 'jps' command, I can see that none of the daemons have been started.
You can find start-all.sh and start-dfs.sh scripts in bin or sbin folders. You can use the following command to find that. Go to hadoop installation folder and run this command.
find . -name 'start-all.sh' # Finds files having name similar to start-all.sh
Then you can specify the path to start all the daemons using bash /path/to/start-all.sh
If you're using the QuickStart VM then the right way to start the cluster (as #cricket_007 hinted) is by restarting it in the Cloudera Manager UI. The start-all.sh scripts will not work since those only apply to the Hadoop servers (Name Node, Data Node, Resource Manager, Node Manager ...) but not all the services in the ecosystem (like Hive, Impala, Spark, Oozie, Hue ...).
You can refer to the YouTube video and the official documentation Starting, Stopping, Refreshing, and Restarting a Cluster
I have installed Cloudera Cluster on AWS EC2 instances.
Easily I can start or stop it using cloudera manager.
But now I want to make a shell script that can start or stop it.
What is the command line to start and stop the cluster and all its services?