Stopping Flume Agent - hadoop

I have a requirement where I want to run Flume agent with spooling directory as source. After all the files from the spool directory is copied to HDFS(sink) I want the agent to stop as I know all the files are pushed to channel.
Also I want to run this steps for different spooling directories each time and stop the agent when all files from the directory are marked as .COMPLETED.
Is there any way to stop the flume agent?

For now I could suggest that on running flume agent you open the flume agent terminal. Then on this terminal execute ctrl+c and agent is gone.

2 ways to stop the Flume agent:
Go to the terminal where Flume agent is running and press ctrl+C to forcefully kill the agent
Run jps from any terminal and look for 'Application' process. Note down its process id and then run kill -9 to terminate the process

Open another duplicate session window , then use below command .
ps –ef | grep flume
take out the process_id, and use below command to kill
kill -9 process_id
This worked for me.

Related

Kill hive queries without exiting from hive shell

Is there any way we can kill hive query without exiting from hive shell ?. For Example, I wrongly ran the select statement from some table which has million rows of data, i just wanted to stop it, but not exiting from the shell. If I pressed CTRL+Z, its coming out of shell.
You have two options:
press Ctrl+C and wait till command terminates, it will not exit from hive CLI, press Ctrl+C second time and the session will terminate immediately exiting to the shell
from another shell run
yarn application -kill <Application ID> or
mapred job -kill <JOB_ID>
First, look for Job ID by:
hadoop job -list
And then kill it by ID:
hadoop job -kill <JOB_ID>
Go with the second option
yarn application -kill <Application ID>. Get the application ID by getting onto another session.
This is the only way I think you would be able to kill the current query. I do use via beeline on hortonwork framework.

How to keep logstash running even when I logout from the remote server

I am connecting to a remote host via ssh login and running logstash by the following command
$./logstash -f first-pipeline.conf
However, after I logout from the server, the logstash stops running. How to enable it to keep running even after I logout. Thanks.
Another approach is to use the screen command which can be very useful for this.
First you open your SSH session, then type screen at the prompt. That opens a new session in which you can run your logstash command.
When it runs, you simply press Ctrl+a d in order to detach your self from that screen and you can safely logout.
Whenever you log back into your SSH session, you enter screen -r and you will get back into your previous session where logstash was started.
You can create as many "screens" as you wish to start many different processes at different times.
Also see this comparison between using nohup and screen
Just run it as an agent
$ logstash agent -f ~/logstash/pipeline.conf

How to Kill Hadoop fs -copyToLocal task

I ran the following command on my local filesystem:
hadoop fs -copyToLocal <HDFS Path>
But, in the middle of the task (after hitting the command in terminal and before the command completes it's task), I want to cancel the copy. How can I do this ?
Also, is -copyToLocal executed as a MR job internally ? Can someone point me to a reference.
Thanks.
It uses the FileSystem API to stream & copy the file to local. There is no MR.
You could find the process on the machine & kill the process. It is usually a JVM process which gets invoked.
if you are using Nohup and/or & to perform the process you will get the job status by searching CopyToLocal in ps -eaf action, and if you are using normal command execution, you can use ctrl+z or ctrl+c. these will kill the process.
In Both case the dump and temp location which all create remains there, so once killing the process you have to clear the dump/temp dump to perform the same process again.
It will not create any MR Job,

How to kill a NOHUP process launched via SSH

I started a rar archiving of a huge folder, forgetting to split in multiple rar archives.
How can I stop the process?
Log in again, use ps -a to find the relevant process IDs, then kill it with kill.

Kafka in supervisor mode

I'm trying to run kafka in supervision mode so that it can start automatically in case of a shutdown. But all the examples of running kafka use shell scripts and the supervisord is not able to note which PID to monitor. Can anyone suggesthow to accomplish auto restart of kafka?
If you are on a Unix or Linux machine, then this is when /etc/inittab comes in handy. Or you might want to use daemontools. I don't know about Windows though.
We are running Kafka under Supervisord (http://supervisord.org/), it works like a charm. Run command looks like this (as specified in supervisord.conf file:
command=/usr/local/bin/pidproxy /var/run/kafka.pid /usr/lib/kafka/bin/kafka-server.sh -f -p /var/run/kafka.pid
Flag -f tells Kafka to start in foreground. If flag -p is set, Kafka process PID is written into specified file.
The command pidproxy is a part of Supervisord distribution. Upon receiving KILL signal, it reads PID from specified file, and forwards the signal to the corresponding process.

Resources