Hive disable history logs and query logs - hadoop

We are using hive on our production machines but it generates a lot of job logs in /tmp/<user.name>/ directory. We would like to disable this logging as we don't need it but can't find any option to disable it. Some of the answers we checked required us to modify hive-log4j.properties file. But the only file available in /usr/lib/hive/conf is hive-site.xml
While starting hive it gives the following information:
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.7.0.jar!/hive-log4j.properties
Hive history file=/tmp/adqops/hive_job_log_79c7f1c2-b4e5-4b7b-b2d3-72b032697bb5_1000036406.txt
So it seems that hive-log4j.properties file is included in a jar and we can't modify it.
Hive Version: hive-hwi-0.10.0-cdh4.7.0.jar
Any help/solution is greatly appreciated.
Thanks.

Since Hive expects a custom properties file name, I guess you cannot use the usual trick of setting -Dlog4j.configuration=my_custom_log4j.properties on the command-line.
So I fear you would have to edit hive-common-xxx.jar with some ZIP utility to
extract the default props file into /etc/hive/conf/ or any other
directory that will be at the head of CLASSPATH
delete the file from the JAR
edit the extracted file
Ex:
$ unzip -l /blah/blah/blah/hive-common-*.jar | grep 'log4j\.prop'
3505 12-02-2015 10:31 hive-log4j.properties
$ unzip /blah/blah/blah/hive-common-*.jar hive-log4j.properties -d /etc/hive/conf/
Archive: /blah/blah/blah/hive-common-1.1.0-cdh5.5.1.jar
inflating: /etc/hive/conf/hive-log4j.properties
$ zip -d /blah/blah/blah/hive-common-*.jar hive-log4j.properties
deleting: hive-log4j.properties
$ vi /etc/hive/conf/hive-log4j.properties
NB: proceed at your own risk... 0:-)

Effectively set the logging level to FATAL
hive --hiveconf hive.root.logger=DRFA --hiveconf hive.log.level=FATAL -e "<query>"
OR redirect logs into another directory and just purge the dir
hive --hiveconf hive.root.logger=DRFA --hiveconf hive.log.dir=./logs --hiveconf hive.log.level=DEBUG -e "<query>"
It will create a log file in logs folder. Make sure that the logs folder exist in current directory.

Related

Suppressing warnings for hadoop fs -get -p command

I am copying huge number of files using hadoop fs -get -p command. I want to retain (timestamps, ownerships) Many of the files are not able to retain the permissions
as the userid are not available in the local machine. So for these files I am getting "get: chown: changing ownership /a/b/c.txt Operation not permitted)
Is it possible to suppress the error, because it might be possible that I might get other issues as well. If I do 2>/dev/null, this will suppress all the issues
So I don't want to use this option. Is there any way I can suppress ONLY issues related to Privileges.?
Any hint can be really helpful?
Not very elegant, but functionnal, use grep -v your_undesired_pattern
hadoop fs -get -p command 2>&1 | grep -v "changing ownership"
From the Hadoop side, no. The error is printed using System.err.println and is coming from the OS as the command execs chown.

File not found exception while starting Flume agent

I have installed Flume for the first time. I am using hadoop-1.2.1 and flume 1.6.0
I tried setting up a flume agent by following this guide.
I executed this command : $ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
It says log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: ./logs/flume.log (No such file or directory)
Isn't the flume.log file generated automatically? If not, how can I rectify this error ?
Try this:
mkdir ./logs
sudo chown `whoami` ./logs
bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
The first line creates the logs directory in the current directory if it does not already exist. The second one sets the owner of that directory to the current user (you) so that flume-ng running as your user can write to it.
Finally, please note that this is not the recommended way to run Flume, just a quick hack to try it.
You are getting this error probably because you are running command directly on console, you've to first go to the bin in flume and try running your command there over console.
As #Botond says, you need to set the right permissions.
However, if you run Flume within a program, like supervisor or with a custom script, you might want to change the default path, as it's relative to the launcher.
This path is defined in your /path/to/apache-flume-1.6.0-bin/conf/log4j.properties. There you can change the line
flume.log.dir=./logs
to use an absolute path that you would like to use - you still need the right permissions, though.

How to view FsImage/Edit Logs file in hadoop

I'm Beginner in Hadoop. I wanted to view fs-image and Edit logs in hadoop. I have searched it in many blogs, nothing is clear. Please can any one tell me step by step procedure to view the Edit log/fs-image file in hadoop.
My version: Apache Hadoop: Hadoop-1.2.1
My Installed director is ![/home/students/hadoop-1.2.1]
I'm listing steps what i have tried based on some blogs.
Ex.1. $ hdfs dfsadmin -fetchImage /tmp
Ex.2. hdfs oiv -i /tmp/fsimage_0000000000000001386 -o /tmp/fsimage.txt
Nothing works for me.
It shows that hdfs is not a directory or a file.
For edit log, navigate to
/var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current
then;
ls -l
to view the complete name of the log file you want to extract; after then
hdfs oev -i editFileName -o /home/youraccount/Desktop/edits_your.xml -p XML
For the fsimage;
hdfs oiv -i fsimage -o /home/youraccount/Desktop/fsimage_your.xml
Go to the bin directory and try to execute the same commands

How to import/export hbase data via hdfs (hadoop commands)

I have saved my crawled data by nutch in Hbase whose file system is hdfs. Then I copied my data (One table of hbase) from hdfs directly to some local directory by command
hadoop fs -CopyToLocal /hbase/input ~/Documents/output
After that, I copied that data back to another hbase (other system) by following command
hadoop fs -CopyFromLocal ~/Documents/input /hbase/mydata
It is saved in hdfs and when I use list command in hbase shell, it shows it as another table i.e 'mydata' but when I run scan command, it says there is no table with 'mydata' name.
What is problem with above procedure?
In simple words:
I want to copy hbase table to my local file system by using a hadoop command
Then, I want to save it directly in hdfs in another system by hadoop command
Finally, I want the table to be appeared in hbase and display its data as the original table
If you want to export the table from one hbase cluster and import it to another, use any one of the following method:
Using Hadoop
Export
$ bin/hadoop jar <path/to/hbase-{version}.jar> export \
<tablename> <outputdir> [<versions> [<starttime> [<endtime>]]
NOTE: Copy the output directory in hdfs from the source to destination cluster
Import
$ bin/hadoop jar <path/to/hbase-{version}.jar> import <tablename> <inputdir>
Note: Both outputdir and inputdir are in hdfs.
Using Hbase
Export
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export \
<tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
Copy the output directory in hdfs from the source to destination cluster
Import
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
Reference: Hbase tool to export and import
If you can use the Hbase command instead to backup hbase tables you can use the Hbase ExportSnapshot Tool which copies the hfiles,logs and snapshot metadata to other filesystem(local/hdfs/s3) using a map reduce job.
Take snapshot of the table
$ ./bin/hbase shell
hbase> snapshot 'myTable', 'myTableSnapshot-122112'
Export to the required file system
$ ./bin/hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to fs://path_to_your_directory
You can export it back from the local file system to hdfs:///srv2:8082/hbase and run the restore command from hbase shell to recover the table from the snapshot.
$ ./bin/hbase shell
hbase> disable 'myTable'
hbase> restore_snapshot 'myTableSnapshot-122112'
Reference:Hbase Snapshots

pig beginner's example [unexpected error]

I am new to Linux and Apache Pig. I am following this tutorial to learn pig:
http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm
This is a basic word counting example. The data file 'input.txt' and the program file 'wordcount.pig' are in the Wordcount package, linked on the site.
I already have Pig 0.11.1 downloaded on my local machine, as well as Hadoop, and Java 6.
When I downloaded the Wordcount package it took me to a "tar.gz" file. I am unfamiliar with this type, and wasn't sure how to extract it.
It contains the files 'input.txt','wordcount.pig' and a Readme file. I saved 'input.txt' to my Desktop. I wasn't sure where to save wordcount.pig, and decided to just type in the commands line by line in the shell.
I ran pig in local mode as follows:pig -x local
and then I just copy-pasted each line of the wordcount.pig script at the grunt> prompt like this:
A = load '/home/me/Desktop/input.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = group B by word;
D = foreach C generate COUNT(B), group;
dump D;
This generates the following errors:
...
Retrying connect to server: localhost/127.0.0.1:8021. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected error during execution.
My questions:
1.
Should I be saving 'input.txt' and the original 'wordcount.pig' script to some special folder inside the directory pig-0.11.1? That is, create a folder called word inside pig-0.11.1 and put 'wordcount.pig' and 'input.txt' there and then type in "wordcount.pig" from the grunt> prompt ???
In general, if I have data in say, 'dat.txt', and a script say, 'program.pig', where should I be saving them to run 'program.pig' from the grunt shell??? I think they should both go in pig-0.11.1,so I can do $ pig -x local wordcount.pig, but I am not sure.
2.
Why am I not able to run the script line by line as I tried to?
I have specified the location of the file 'input.txt' in the load statement.
So why does it not just run the commands line by line and dump the contents of D to my screen???
3.
When I try to run Pig in mapreduce mode using $pig, it gives this error:
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage
This error indicates that Pig is unable to connect to Hadoop to run the job. You say you have downloaded Hadoop -- have you installed it? If you have installed it, have you started it up according to its docs -- have you run the bin/start-all.sh script? Using -x local tells Pig to use the local filesystem instead of HDFS, but it still needs a running Hadoop instance to perform the execution. Before trying to run Pig, follow the Hadoop docs to get your local "cluster" set up and make sure your NameNode, DataNodes, etc. are up and running.
2043 error occurs when hadoop and pig fail to communicate with each other.
Never do a right click --> extract here, when dealing with tar.gz files.
U shud always do a tar -xzvf *.tar.gz on terminal when extracting them.
I noticed that pig doesn't get installed properly when u do a right click on pig..tar.gz file and select extract here. It's good to do a tar -xzvf pig..tar.gz from terminal.
Make sure u are running Hadoop before u execute pig -x local kind of commands.
If u want to run *.pig files from grunt> prompt, use:
grunt> exec *.pig
If u want to run pig files outside grunt> prompt, use:
$ pig -x local *.pig

Resources