I can't proceed hadoop example - hadoop

I do put README.txt file and do jar command but hdfs don't proceed anymore This is last terminal screen
I think "SASL encryption trust check" or "Unable to find 'resource-types.xml'" are the problem so I tried to insert
export HADOOP_SECURE_DN_USER=
to HADOOP-env.sh and insert
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
to mapred-site.xml
but It didn't work again
Hadoop version is 3.1.3
Java version is oracle java 1.8.0_212
hdfs-site.xml
core-site.xml
mapred-site.xml
yarn-site.xml
please help me...
This is 8088 page Is it YARN UI?

Related

Hadoop localhost:9870 browser interface is not working

I need to do data analysis using Hadoop. Therefore I have installed Hadoop and configured as below. But localhost:9870 is not working. Even I have format namenode every time I worked with that. Some articles and answers of this forum mentioned that 9870 is the updated one from 50070. I have win 10. I also referred answers in this forum but none of them worked. Java-home and hadoop-home paths are set. Paths to bin and sbin of hadoop are also set up. Can anyone please tell me what I am doing wrong in here?
I referred this site to do the installation and configuration.
https://medium.com/#pedro.a.hdez.a/hadoop-3-2-2-installation-guide-for-windows-10-454f5b5c22d3
core-site.xml
I have set up the Java path in this xml as well.
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9870</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-3.2.2\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-3.2.2\data\datanode</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
If you look at the namenode logs, it very likely has an error saying something about a port already being in use.
The default fs.defaultFS port should be 9000 - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html ; you shouldn't change this without good reason.
The Namenode web UI isn't the value in fs.defaultFS. It's default port is 9870, and is defined by dfs.namenode.http-address in hdfs-site.xml
need to do data analysis
You can do analysis on Windows without Hadoop using Spark, Hive, MapReduce, etc. directly and it'll have direct access to your machine without being limited by YARN container sizes.

How to access s3 files using on prem hadoop cluster?

I have a cloudera VM and able to set up aws CLI and set up keys.But, I am not able to read s3 files or access s3 files using hadoop fs -ls s3://gft-ri or any hadoop command. I could see the directory/files using aws CLI.
Snapshot of the commands:
(base) [cloudera#quickstart conf]$ **aws s3 ls s3://gft-risk-aml-market-dev/**
PRE test/
2019-11-27 04:11:26 458 required
(base) [cloudera#quickstart conf]$ **hdfs dfs -ls s3://gft-risk-aml-market-dev/**
19/11/27 05:30:45 WARN fs.FileSystem: S3FileSystem is deprecated and will be removed in future releases. Use NativeS3FileSystem or S3AFileSystem instead.
ls: `s3://gft-risk-aml-market-dev/': No such file or directory
I have put the core-site.xml details.
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3.S3FileSystem</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>ANHS</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>EOo</value>
</property>
<property>
<name>fs.s3.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
<property>
<name>fs.s3.connection.ssl.enabled</name>
<value>false</value>
</property>
Finally. Cloudera Quickstart V13 and below core-site.xml worked.
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value>AKIAxxxx</value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value>Xxxxxx</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3A</value>
<description>The implementation class of the S3A AbstractFileSystem.</description>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
<property>
<name>fs.s3a.readahead.range</name>
<value>64K</value>
<description>Bytes to read ahead during a seek() before closing and
re-opening the S3 HTTP connection. This option will be overridden if
any call to setReadahead() is made to an open stream.</description>
</property>
<property>
<name>fs.s3a.list.version</name>
<value>2</value>
<description>Select which version of the S3 SDK's List Objects API to use.
Currently support 2 (default) and 1 (older API).</description>
</property>
I would use the Linux console to mount the S3 bucket and then move files from there to HDFS in that fashion. You will probably need to install it on the Cloudera quickstart by sudo'ing into root first, e.g., sudo yum install s3fs-fuse

Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current

I get the error
Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current
While trying to install hadoop on my local Mac.
What could be the reason for this? Just for reference, I'm putting my xml files down below:
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
I think my problem lies in my hdfs-site.xml file, but I'm not sure how to pinpoint/change it.
I'm using this tutorial, and "hadoop" in the file path is replaced by my username.
Possible error: misconfiguration of the hdfs-site.xml file
This happened to me when I was following a setup tutorial. The contents of the hdfs-site.xml for me was
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/nameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dataNode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Only then I realized that the text hadoop in the above file corresponds to the user name, where in my case, it had to replaced with hduser. When both occurrences of hadoop was replaced with hduser, the hdfs namenode -format command worked fine.
I had this problem too and it was a permission problem. I just did:
sudo chmod 777 /home/hadoop/hadoopinfra/hdfs/namenode/
and works!
In the step where you need to verify the hadoop installation, instead of 'hdfs namenode -format' use '/usr/local/hadoop/bin/hdfs namenode -format'
Found this answer from:
hadoop java.io.IOException: while running namenode -format
If you are not using any other distro than native hadoop, then add the current user to hadoop group and retry formatting the namenode.
sudo usermod -a -G hadoop <current-username>
In case of using thirdparty hadoop distros such Cloudera, Hortonworks or MapR, switch to root user and again switch to hdfs user then try formatting the namenode will succeed.
$ sudo -i
$ su - hdfs
$ hdfs namenode -format
Try the Hadoop command with sudo

Connection refused in Hbase Shell while Connecting HBase to HDFS

I am trying to connect my HBase to HDFS. I have my hdfs namenode(bin/hdfs namenode) and datnode(/bin/hdfs datanode) running. I can also start my Hbase (sudo ./bin/start-hbase.sh) and local region servers (sudo ./bin/local-regionservers.sh start 1 2). But when I try to execute a command from Hbase shell it gives the following error:
cis655stu#cis655stu-VirtualBox:/teaching/14f-cis655/proj-dtracing/hbase/hbase-0.99.0-SNAPSHOT$ ./bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.99.0-SNAPSHOT, rUnknown, Sat Aug 9 08:59:57 EDT 2014
hbase(main):001:0> list
TABLE
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/teaching/14f-cis655/proj-dtracing/hbase/hbase-0.99.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/teaching/14f-cis655/proj-dtracing/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-01-19 13:33:07,179 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ERROR: Connection refused
Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list
hbase> list 'abc.*'
hbase> list 'ns:abc.*'
hbase> list 'ns:.*'
Below are my configuration files for HBase and Hadoop:
HBase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<!--for psuedo-distributed execution-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.wait.on.regionservers.mintostart</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/teaching/14f-cis655/tmp/zk-deploy</value>
</property>
<!--for enabling collection of traces
-->
<property>
<name>hbase.trace.spanreceiver.classes</name>
<value>org.htrace.impl.LocalFileSpanReceiver</value>
</property>
<property>
<name>hbase.local-file-span-receiver.path</name>
<value>/teaching/14f-cis655/tmp/server-htrace.out</value>
</property>
</configuration>
Hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/teaching/14f-cis655/proj-dtracing/hadoop-2.6.0/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/teaching/14f-cis655/proj-dtracing/hadoop-2.6.0/yarn/yarn_data/hdfs/datanode</value>
</property>
<property>
<name>hadoop.trace.spanreceiver.classes</name>
<value>org.htrace.impl.LocalFileSpanReceiver</value>
</property>
<property>
<name>hadoop.local-file-span-receiver.path</name>
<value>/teaching/14f-cis655/proj-dtracing/hadoop-2.6.0/logs/htrace.out</value>
</property>
</configuration>
Core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Please check does you HDFS is available from shell:
$ hdfs dfs -ls /hbase
Also make sure that you've all environment variables in hdfs-env.sh file:
HADOOP_CONF_LIB_NATIVE_DIR="/hadoop/lib/native"
HADOOP_OPTS="-Djava.library.path=/hadoop/lib"
HADOOP_HOME=/hadoop
YARN_HOME=/hadoop
HBASE_HOME=/hbase
HADOOP_HDFS_HOME=/hadoop
HBASE_MANAGES_ZK=true
Do you run Hadoop and HBase using the same OS user? If you use separate users, please check if HBase user is allowed to access HDFS.
Make sure that you have a copy of hdfs-site.xml and core-stie.xml (or symlink) files in ${HBASE_HOME}/conf directory.
Also fs.default.name option is deprecated for YARN (but it must still work), you must consider using fs.defaultFS instead.
Do you use Zookeeper? Because you've specified hbase.zookeeper.property.dataDir option, but there is no hbase.zookeeper.quorum there, and other significant options. Please read http://hbase.apache.org/book.html#zookeeper for more information.
Please add next option to hdfs-site.xml to make HBase work correctly (replace $HBASE_USER variable by your system user, which is used to run HBase):
<property>
<name>hadoop.proxyuser.$HBASE_USER.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.$HBASE_USER.hosts</name>
<value>*</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>

Hbase master not running

I am trying to run Hbase in a pseudo-distributed mode. I followed this link.
I am using ubuntu version 12.04 Hbase version 0.94.8 Hadoop Version 2.4.0
In hbase/conf/hbase-env.sh, i added the following
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_25
export HBASE_REGIONSERVERS=/usr/lib/hbase/hbase-0.94.8/conf/regionservers
export HBASE_MANAGES_ZK=true
Then I set the HBASE_HOME path in bashrc file
In hbase/conf/hbase-site.xml I added the following,
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/prashasti/Installed/hbase-0.94.8/HBASE/zookeeper</value>
</property>
</configuration>
To prevent version mismatch between hadoop and hbase, I added
hadoop-common-2.4.0.jar
and
hadoop-mapreduce-client-core-2.4.0.jar
in hbase/lib folder
When I start hbase using
$./bin/start-hbase.sh
No error turns up, but the Hmaster doesn't start.
can you pl try removing all the configuration parameters from hbase-site.xml except hbase.rootdir and then try starting the hbase.
Also comment out export HBASE_REGIONSERVERS export HBASE_MANAGES_ZK in hbase-env.xml

Resources