Hadoop log files cannot be found - hadoop

I have configured hadoop-2.7.2 in windows. I couldn't find any logs in %HADOOP_HOME%\logs directory for hdfs and yarn.
In Hadoop-2.5.2, there will be two log files hadoop.log and yarn.log. But in new hadoop version, the log files are not generated it seems.
How to enable these logs again to debug the services.
Thanks,
Kumar

For hadoop.log and yarn.log you need to enable this mean.
open %HADOOP_HOME%\etc\hadoop\log4j.properties
Check following properties
hadoop.root.logger=INFO,console
hadoop.log.file=hadoop.log
hadoop.log.maxfilesize=200MB
hadoop.log.maxbackupindex=5
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
And set set YARN_ROOT_LOGGER=INFO,RFA,console and set HADOOP_ROOT_LOGGER=INFO,RFA,console at hadoop-env.cmd and yarn-env.cmd respectively.

Related

NiFi ListHDFS cannot find directory, FileNotFoundException

Have pipeline in NiFi of the form listHDFS->moveHDFS, attempting to run the pipeline we see the error log
13:29:21 HSTDEBUG01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Returning CLUSTER State: StandardStateMap[version=43, values={emitted.timestamp=1525468790000, listing.timestamp=1525468790000}]
13:29:21 HSTDEBUG01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Found new-style state stored, latesting timestamp emitted = 1525468790000, latest listed = 1525468790000
13:29:21 HSTDEBUG01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Fetching listing for /hdfs/path/to/dir
13:29:21 HSTERROR01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Failed to perform listing of HDFS due to File /hdfs/path/to/dir does not exist: java.io.FileNotFoundException: File /hdfs/path/to/dir does not exist
Changing the listHDFS path to /tmp seems to run ok, thus making me think that the problem is with my permissions on the directory I'm trying to list. However, changing the NiFi user to a user that can access that directory (eg. hadoop fs -ls /hdfs/path/to/dir) by setting the bootstrap.properties value run.as=myuser and restarting (see https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#bootstrap_properties) still seems to produce the same problem for the directory. The literal dir. string being used that is not working is:
"/etl/ucera_internal/datagov_example/raw-ingest-tracking/version-1/ingest"
Does anyone know what is happening here? Thanks.
** Note: The hadoop cluster I am accessing does not have kerberos enabled (it is a secured MapR hadoop cluster).
Update: It appears that the mapr hadoop implementation is different enough that it requires special steps in order for NiFi to properly work on it (see https://community.mapr.com/thread/10484 and http://hariology.com/integrating-mapr-fs-and-apache-nifi/). May not get a chance to work on this problem for some time to see if still works (as certain requirements have changed), so am dumping the link here for others who may have this problem in the meantime.
Could you once make sure you have entered correct path and directory needs to be exists in HDFS.
It seems to be list hdfs processor not able to find the directory that you have configured in directory property and logs are not showing any permission denied issues.
If logs shows permission denied then you can change the nifi running user in bootstrap.conf and
Once you make change in nifi properties then NiFi needs to restart to apply the changes (or) change the permissions on the directory that NiFi can have access.

How to change tmp directory in yarn

I have written a MR job and have run it in local mode with following configuration settings
mapred.local.dir=<<local directory having good amount of space>>
fs.default.name=file:///
mapred.job.tracker=local
on Hadoop 1.x
Now I am using Hadoop 2.x and the same Job I am running with the same Configuration settings, but I am getting error :
Disk Out of Space
Is it that If I switch from Hadoop 1.x to 2.x (using Hadoop-2.6 jars), the same Configuration Settings to change the Tmp Dir not work.??
What are the new Settings to configure the "tmp" directory of MR1 (mapred API) on Hadoop 2.6.
Kindly advice.
Regards
Cheers :))
Many properties in 1.x have been deprecated and replaced with new properties in 2.x.
mapred.child.tmp has been replaced by mapreduce.task.tmp.dir
mapred.local.dir has been replaced by mapreduce.cluster.local.dir
Have a look at complete list of deprecated properties and new equivalent properties at Apache website link
It can be done by setting
mapreduce.cluster.local.dir=<<local directory having good amount of space>>

Technique to know the Default scheduler in hadoop

I have installed a multi node setup in 3 Ubuntu systems 12.04. I am using Hadoop1.2.1 in all three.Now i want to which scheduler is running by default???
How to check the default scheduler running in Hadoop1.2.1?
Default scheduler in hadoop is JobQueueTaskScheduler, which is a FIFO scheduler. As a default scheduler you need to refer the property mapred.jobtracker.taskScheduler in mapred-default.xml. If you want you can change the default scheduler to either CapacityScheduler or FairScheduler based on your requirement.
mapred-site.xml is used to override the default values inside mapred-default.xml, which can be found inside the configuration directory. You may not find mapred-default file in the configuration directory along with hadoop binary distribution(rpm,deb etc.), instead mapred-default.xml can be found directly inside the jar file hadoop-core-1.2.1.jar.
hackzon:~/hadoop-1.2.1$ jar -tvf hadoop-core-1.2.1.jar | grep mapred-default.xml
47324 Mon Jul 22 15:12:48 IST 2013 mapred-default.xml
These file is used in the below mentioned hadoop source files as an argument to addDefaultResource() method as
addDefaultResource("mapred-default.xml"); // First
addDefaultResource("mapred-site.xml"); // Second
Initially mapred-default.xml would be loaded, then mapred-site.xml. So that properties which need to be overridden can be specified inside mapred-site.xml
org.apache.hadoop.conf.Configuration.java
org.apache.hadoop.mapred.JobConf.java
org.apache.hadoop.mapred.TaskTracker.java
org.apache.hadoop.mapred.JobClient.java
org.apache.hadoop.mapred.JobTracker.java
org.apache.hadoop.mapred.JobHistoryServer.java
Have a look at any of the source code.
Goto your Resource Manager UI and under "Tools" click on "Configuration", or simply type the url. Replace <resource-manager> with your resource manager domain name.
http://<resource-manager>:8088/conf
Search for any settings that you want.
After much hard work i finally got how to check the scheduler which is running in Hadoop-1.1.2. After running a word-count job i went into jobtracker web interface. There go for job history. there right side of job file one link will be there. Click on it you will get every thing like scheduler, dfs replication etc.
Also sir in hadoop-1.1.2 its mapred-site.xml file where we need to add some properties as specified in apache documentation for hadoop-1.1.2.

apache Hadoop-2.0.0 aplha version installation in full cluster using fedration

I had installed hadoop stable version successfully. but confused while installing hadoop -2.0.0 version.
I want to install hadoop-2.0.0-alpha on two nodes, using federation on both machines. rsi-1, rsi-2 are hostnames.
what should be values of below properties for implementation of federation. Both machines are also used for datanodes too.
fs.defaulFS dfs.federation.nameservices dfs.namenode.name.dir dfs.datanode.data.dir yarn.nodemanager.localizer.address yarn.resourcemanager.resource-tracker.address yarn.resourcemanager.scheduler.address yarn.resourcemanager.address
One more point, in stable version of hadoop i have configuration files under conf folder in installation directory.
But in 2.0.0-aplha version, there is etc/hadoop directory and it doesnt have mapred-site.xml, hadoop-env.sh. do i need to copy conf folder under share folder into hadoop-home directory? or do i need to copy these files from share folder into etc/hadoop directory?
Regards, Rashmi
You can run hadoop-setup-conf.sh in sbin folder. It instructs you step-by-step to configure.
Please remember when it asks you to input the directory path, you should use full link
e.g., when it asks for conf directory, you should input /home/user/Documents/hadoop-2.0.0/etc/hadoop
After completed, remember to check every configuration file in etc/hadoop.
As my experience, I modified JAVA_HOME variable in hadoop-env.sh and some properties in core-site.xml, mapred-site.xml.
Regards

Where HDFS stores files locally by default?

I am running hadoop with default configuration with one-node cluster, and would like to find where HDFS stores files locally.
Any ideas?
Thanks.
You need to look in your hdfs-default.xml configuration file for the dfs.data.dir setting. The default setting is: ${hadoop.tmp.dir}/dfs/data and note that the ${hadoop.tmp.dir} is actually in core-default.xml described here.
The configuration options are described here. The description for this setting is:
Determines where on the local
filesystem an DFS data node should
store its blocks. If this is a
comma-delimited list of directories,
then data will be stored in all named
directories, typically on different
devices. Directories that do not exist
are ignored.
Seems like for the current version(2.7.1) the dir is
/tmp/hadoop-${user.name}/dfs/data
Based on dfs.datanode.data.dir, hadoop.tmp.dir setting from:
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/core-default.xml
As "more recent answer" and to clarify hadoop version numbers:
If you use Hadoop 1.2.1 (or something similar), #Binary Nerd's answer is still true.
But if you use Hadoop 2.1.0-beta (or something similar), you should read the configuration documentation here and the option you want to set is: dfs.datanode.data.dir
For hadoop 3.0.0, the hdfs root path is as given by the property "dfs.datanode.data.dir"
Run this in the cmd prompt, and you will get the HDFS location:
bin/hadoop fs -ls /

Resources