Redirect Hadoop job output to a file

Redirect Hadoop job output to a file - hadoop

Im running a Hadoop job and outputs are displayed on the console.
Is there a way for me to redirect the output to a file..I tried the below command to redirect the output but it does not work.
hduser#vagrant:/usr/local/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output>joboutput

You can redirect the error stream to file, which is the output of hadoop job. That is use;
hadoop jar share/hadoop/mapreduce/hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 2>joboutput

If you are running the examples from the Hadoop homepage (https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) the output will be written to
/user/hduser/gutenberg /user/hduser/gutenberg-output
on HDFS and not the local file system.
You can see the output via
hadoop fs -text /user/hduser/gutenberg /user/hduser/gutenberg-output/*
And to dump that output to a local file
hadoop fs -text /user/hduser/gutenberg /user/hduser/gutenberg-output/* > local.txt
The -text option will decompress the data so you get textual output in case you have some type of compression enabled.

Related

retrieve size of data copied with hadoop distcp

I am running a hadoop distcp command as below:
hadoop distcp src-loc target-loc
I want to know the size of the data copied by running this command.
I am planning to run the command on Qubole.
Any help is appreciated

Run following command
hadoop dfs -dus -h target-loc
225.2 G target-loc
It will print the human readable summary for the target-loc.

Hadoop fs -ls throws error

the problem that I am facing is that when I give this command
"hadoop fs -ls" , it throws this message , "ls: `.': No such file or directory
".
For reference Output result to my "jps" command is
18276 SecondaryNameNode
19684 Jps
17942 NameNode
18566 NodeManager
18441 ResourceManager

First you should have a data node running which stores the data otherwise you will not be able to deal with hadoop fs (File System).
Try to stall all services
$start-all-sh
$jps
Ensure that data node is running and nothing obstacles it
Then try
$hadoop fs -ls /

When you don't pass any argument to this hadoop fs -ls command, the default hdfs directory it tries to list is /user/{your_user_name}
The problem in your case could be that this hdfs directory does not exist.
Try running hadoop fs -ls /user/ to see which directories are created for which users.
You can also just create your user's hdfs default directory. Running the below command will fix your error:
hadoop fs -mkdir -p /user/$(whoami)

Store file on hadoop

I want to store some .tbl files in hadoop.
I am using this command: hadoop fs -put customer.tbl
But Im getting:
Usage: java FsShell [- put <localsrc> .. <dst>]
If I do hadoop fs -cat cusomer.tbl
It appears that file does note exist.

It seems like you need to provide local-src and HDFS-dst.
Can you try to add destination?
e.g. hadoop fs -put customer.tbl .
please also try execute "ls" on the HDFS:
hadoop fs -ls
please also try execute "ls" on the HDFS using hdfs command, 'hdfs' should be found under hadoop-version-number/bin/:
hdfs dfs -ls

How to unzip file in hadoop?

I was trying to unzip a zip file, stored in Hadoop file system, & store it back in hadoop file system. I tried following commands, but none of them worked.
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp
I get errors like gzip: stdin has more than one entry--rest ignored, cat: Unable to write to output stream., Error: Could not find or load main class put on terminal, when I run those commands. Any help?
Edit 1: I don't have access to UI. So, only command lines are allowed. Unzip/gzip utils are installed on my hadoop machine. I'm using Hadoop 2.4.0 version.

To unzip a gzipped (or bzipped) file, I use the following
hdfs dfs -cat /data/<data.gz> | gzip -d | hdfs dfs -put - /data/
If the file sits on your local drive, then
zcat <infile> | hdfs dfs -put - /data/

I use most of the times hdfs fuse mounts for this
So you could just do
$ cd /hdfs_mount/somewhere/
$ unzip file_in_hdfs.zip
http://www.cloudera.com/content/www/en-us/documentation/archive/cdh/4-x/4-7-1/CDH4-Installation-Guide/cdh4ig_topic_28.html
Edit 1/30/16: In case if you use hdfs ACLs: In some cases fuse mounts don't adhere to hdfs ACLs, so you'll be able to do file operations that are permitted by basic unix access privileges. See https://issues.apache.org/jira/browse/HDFS-6255, comments at the bottom that I recently asked to reopen.

To stream the data through a pipe to hadoop, you need to use the hdfs command.
cat mydatafile | hdfs dfs -put - /MY/HADOOP/FILE/PATH/FILENAME.EXTENSION

gzip use -c to read data from stdin
hadoop fs -put doesnt support read the data from stdin
I tried a lots of things and would help.I cant find the zip input support of hadoop.So it left me no choice but download the hadoop file to local fs ,unzip it and upload to hdfs again.

How to make mahout interact with hadoop HDFS

I am using HDP mahout version 0.8. I have set MAHOUT_LOCAL="". When I run mahout, I see the message HADOOP LOCAL NOT SET RUNNING ON HADOOP but my program is not writing output to HDFS directory.
Can anyone tell me how to make my mahout program take input from HDFS and write output to HDFS?

Did you set the $MAHOUT_HOME/bin and $HADOOP_HOME/bin on the PATH ?
For example on Linux:
export PATH=$PATH:$MAHOUT_HOME/bin/:$HADOOP_HOME/bin/
export HADOOP_CONF_DIR=$HADOOP_HOME/conf/
Then, almost all the Mahout's commands use the options -i (input) and -o (output).
For example:
mahout seqdirectory -i <input_path> -o <output_path> -chunk 64

Assuming you have your mahout jar build which takes input and write to hdfs. Do the following:
From hadoop bin directory:
./hadoop jar /home/kuntal/Kuntal/BIG_DATA/mahout-recommender.jar mia.recommender.RecommenderIntro --tempDir /home/kuntal/Kuntal/BIG_DATA --recommenderClassName org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender
#Input Output Args specify if required
-Dmapred.input.dir=./ratingsLess.txt -Dmapred.output.dir=/input/output
Please check this:
http://chimpler.wordpress.com/2013/02/20/playing-with-the-mahout-recommendation-engine-on-a-hadoop-cluster/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Redirect Hadoop job output to a file - hadoop

You can redirect the error stream to file, which is the output of hadoop job. That is use; hadoop jar share/hadoop/mapreduce/hadoopexamples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 2>joboutput

Related

retrieve size of data copied with hadoop distcp

Hadoop fs -ls throws error

Store file on hadoop

How to unzip file in hadoop?

How to make mahout interact with hadoop HDFS

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Redirect Hadoop job output to a file - hadoop

You can redirect the error stream to file, which is the output of hadoop job. That is use; hadoop jar share/hadoop/mapreduce/hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 2>joboutput

Related

retrieve size of data copied with hadoop distcp

Hadoop fs -ls throws error

Store file on hadoop

How to unzip file in hadoop?

How to make mahout interact with hadoop HDFS

Categories

Resources

You can redirect the error stream to file, which is the output of hadoop job. That is use; hadoop jar share/hadoop/mapreduce/hadoopexamples.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 2>joboutput