Hadoop webuser: No such user - hadoop

While running a hadoop multi-node cluster , i got below error message on my master logs , can some advise what to do..? do i need to create a new user or can i gave my existing Machine user name over here
2013-07-25 19:41:11,765 WARN
org.apache.hadoop.security.UserGroupInformation: No groups available
for user webuser 2013-07-25 19:41:11,778 WARN
org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception
trying to get groups for user webuser
org.apache.hadoop.util.Shell$ExitCodeException: id: webuser: No such
user
hdfs-site.xml file
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
i followed http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ .
Hadoop 1.2.0
jetty-6.1.26
After adding my hdfs-site.xml looks
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.web.ugi</name>
<value>hduser,hadoop</value>
</property>
</configuration>

Edit the dfs.web.ugi property in hdfs-site.xml and add your user there. It is by default webuser,webgroup.

Related

How to access my Namenode GUI in hadoop outside the GCP instance in browser

I just set up single node HADOOP setup on a GCP instance. Doing JPS command is showing all the processes are running fine.
I want to access the GUI of my namenode. I am using http://localhost:50070/ on my laptop browser.
It shows This site can’t be reached
Coresite.xml
hduser#laptop:~$ vi /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description></description>
</property>
</configuration>
Mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>
</description>
</property>
</configuration>
Solution attempted:
I have tried replacing my values in <value> tag with the public DNS of GCP instance but then the namenode stopped working.
Anyone having any idea here what i am doing wrong??
I found the answer to this problem:
you need to use your public IP and port number
check your firewall setting it should allow all the traffic in inbound rules in
AWS and firewall setting in GCP

Pseudo Distributed Mode Hadoop

I have installed Pseudo Distributed mode Hadoop 2.7.3 in Mac & did all Configuration which is specified in Plural Sight. I Copied Csv file from Local to hdfs. But next day when i searched for files , it is not present in hdfs and removed automatically. Is there any other conf setting so that my files are not loss?
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Thanks,
Add these properties to hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/username/hadoop-dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/username/hadoop-dfs/data</value>
</property>
The metadata and data blocks are stored under /tmp by default as it is the value of hadoop.tmp.dir. The contents inside /tmp are deleted on reboot.
After adding these properties, format the namenode and start the services.

ResouceManager got stucked in Accepted State

I am trying to integrate my es 2.2.0 version with hadoop HDFS.In my envoirnment,I have 1 master node and 1 data node. On my master node my Es is installed.
But while integrating it with HDFS my resource manager applications jobs get stuck in Accepted state.
Somehow i found link to change my yarn-site.xml settings:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
I have done this also but it is not giving me expected output.
Configuration:-
my core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.
</description> </property>
<property> <name>fs.default.name</name>
<value>
hdfs://localhost:54310
</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
</description>
</property>
my mapred-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description>
</property>
my hdfs-site.xml,
<property>
<name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description>
</property>
Please help me how can i change my RM job to running state.So that i can use my elasticsearch data on HDFS.
If the screenshot is correct - you have 0 nodemanager - thus the application can’t start running - you need to start at least 1 nodemanager, so that application master and later tasks can be started.

issue with hadoop secondary node

I am new to hadoop. When I run wordcount test project, evrything works fine. But, I can't access the JobTracker at http://localhost:50030. in fact, when I get my secondary node log file, I get exception message :
java.io.IOException: Bad edit log manifest (expected txid = 3: [[21,22], [23,24]
[8683,8684], [8685,8686], [8687,8688], [8689,8690], [8691,8692], [8693,8694], [8695,8696], [8697,8698], [8699,8700]]...
....
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:438)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:540)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
at java.lang.Thread.run(Thread.java:745)
Btw, when I run jps, I get 53745 JobHistoryServer 77259 Jps
UPDATE : here's my config
in core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
in hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>
and nothing is set in my yarn-site.xml
If you are using latest version of Hadoop, then Job Tracker will not be available. Job tracker is replaced by Resource Manager and History Server.
If you want to access past job details, go to http://hostname:19888. This is the web UI address for job history server.
Please refer Hadoop Cluster Setup for further details.

Failed to get system directory - hadoop

Using hadoop multinode setup (1 mater , 1 salve)
After starting up start-mapred.sh on master , i found below error in TT logs (Slave an)
org.apache.hadoop.mapred.TaskTracker: Failed to get system directory
can some one help me to know what can be done to avoid this error
I am using
Hadoop 1.2.0
jetty-6.1.26
java version "1.6.0_23"
mapred-site.xml file
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/workspace</value>
</property>
</configuration>
It seems that you just added hadoop.tmp.dir and started the job. You need to restart the Hadoop daemons after adding any property to the configuration files. You have specified in your comment that you added this property at a later stage. This means that all the data and metadata along with other temporary files is still in the /tmp directory. Copy all those things from there into your /home/hduser/workspace directory, restart Hadoop and re run the job.
Do let me know the result. Thank you.
If, it is your windows PC and you are using cygwin to run Hadoop. Then task tracker will not work.

Resources