yarn in docker - __spark_libs__.zip does not exist - hadoop

I have looked through this StackOverflow post but they haven't helped me much.
I am trying to get Yarn working on an existing cluster. So far we have been using spark standalone manger as our resource allocator and it has been working as expected.
This is a basic overview of our architecture. Everything in the white boxes run in docker containers.
From master-machine I can run the following command from within the yarn resource manager container and get a spark-shell running that uses yarn: ./pyspark --master yarn --driver-memory 1G --executor-memory 1G --executor-cores 1 --conf "spark.yarn.am.memory=1G"
However, if I try to run the same command from client-machine within the jupyter container I get the following error in the YARN-UI.
Application application_1512999329660_0001 failed 2 times due to AM
Container for appattempt_1512999329660_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://master-machine:5000/proxy/application_1512999329660_0001/Then, click on links to logs of each attempt.
Diagnostics: File file:/sparktmp/spark-58732bb2-f513-4aff-b1f0-27f0a8d79947/__spark_libs__5915104925224729874.zip does not exist
java.io.FileNotFoundException: File file:/sparktmp/spark-58732bb2-f513-4aff-b1f0-27f0a8d79947/__spark_libs__5915104925224729874.zip does not exist
I can find file:/sparktmp/spark-58732bb2-f513-4aff-b1f0-27f0a8d79947/ on the client-machine but I am unable to find spark-58732bb2-f513-4aff-b1f0-27f0a8d79947on the master machine
As a note, spark-shell works from the client-machine when it points to the standalone spark manager on the master machine.
No logs are printed to the yarn log directories on the worker-machines either.
If I run a spark-submit on spark/examples/src/main/python/pi.py I get the same error as above.
Here is the yarn-site.xml
<configuration>
<property>
<description>YARN hostname</description>
<name>yarn.resourcemanager.hostname</name>
<value>master-machine</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
<!-- <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler</value> -->
<!-- <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> -->
</property>
<property>
<description>The address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:5000</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<description>Set to false, to avoid ip check</description>
<name>hadoop.security.token.service.use_ip</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>1000</value>
<description>Maximum number of applications in the system which
can be concurrently active both running and pending</description>
</property>
<property>
<description>Whether to use preemption. Note that preemption is experimental
in the current version. Defaults to false.</description>
<name>yarn.scheduler.fair.preemption</name>
<value>true</value>
</property>
<property>
<description>Whether to allow multiple container assignments in one
heartbeat. Defaults to false.</description>
<name>yarn.scheduler.fair.assignmultiple</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
And here is the spark.conf:
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
# DRIVER PROPERTIES
spark.driver.port 7011
spark.fileserver.port 7021
spark.broadcast.port 7031
spark.replClassServer.port 7041
spark.akka.threads 6
spark.driver.cores 4
spark.driver.memory 32g
spark.master yarn
spark.deploy.mode client
# DRIVER AND EXECUTORS
spark.blockManager.port 7051
# EXECUTORS
spark.executor.port 7101
# GENERAL
spark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory
spark.port.maxRetries 10
spark.local.dir /sparktmp
spark.scheduler.mode FAIR
# SPARK UI
spark.ui.port 4140
# DYNAMIC ALLOCATION AND SHUFFLE SERVICE
# http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
spark.dynamicAllocation.enabled false
spark.shuffle.service.enabled false
spark.shuffle.service.port 7061
spark.dynamicAllocation.initialExecutors 5
spark.dynamicAllocation.minExecutors 0
spark.dynamicAllocation.maxExecutors 8
spark.dynamicAllocation.executorIdleTimeout 60s
# LOGGING
spark.executor.logs.rolling.maxRetainedFiles 5
spark.executor.logs.rolling.strategy size
spark.executor.logs.rolling.maxSize 100000000
# JMX
# Testing
# spark.driver.extraJavaOptions -Dcom.sun.management.jmxremote.port=8897 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
# Spark Yarn Configs
spark.hadoop.yarn.resourcemanager.address <master-machine IP>:8032
spark.hadoop.yarn.resourcemanager.hostname master-machine
And this shell script is run on all the mahcines:
# The main ones
export CONDA_DIR=/cluster/conda
export HADOOP_HOME=/usr/hadoop
export SPARK_HOME=/usr/spark
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$SPARK_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$CONDA_DIR/bin:/cluster/libs-python:/cluster/batch
export PYTHONPATH=/cluster/libs-python:$SPARK_HOME/python:$PY4JPATH:$PYTHONPATH
export SPARK_CLASSPATH=/cluster/libs-java/*:/cluster/libs-python:$SPARK_CLASSPATH
# Core spark configuration
export PYSPARK_PYTHON="/cluster/conda/bin/python"
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_PORT=7078
export SPARK_MASTER_WEBUI_PORT=7080
export SPARK_WORKER_WEBUI_PORT=7081
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Duser.timezone=UTC+02:00"
export SPARK_WORKER_DIR="/sparktmp"
export SPARK_WORKER_CORES=22
export SPARK_WORKER_MEMORY=43G
export SPARK_DAEMON_MEMORY=1G
export SPARK_WORKER_INSTANCEs=1
export SPARK_EXECUTOR_INSTANCES=2
export SPARK_EXECUTOR_MEMORY=4G
export SPARK_EXECUTOR_CORES=2
export SPARK_LOCAL_IP=$(hostname -I | cut -f1 -d " ")
export SPARK_PUBLIC_DNS=$(hostname -I | cut -f1 -d " ")
export SPARK_MASTER_OPTS="-Duser.timezone=UTC+02:00"
This is the hdfs-site.xml on the master-machine(namenodes):
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hdfs</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hdfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.replication.max</name>
<value>3</value>
</property>
<property>
<name>dfs.replication.min</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>supergroup</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>002</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<!-- 1000Mbit/s -->
<name>dfs.balance.bandwidthPerSec</name>
<value>125000000</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/cluster/config/hadoopconf/namenode/dfs.hosts.exclude</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.replication.work.multiplier.per.iteration</name>
<value>10</value>
</property>
<property>
<name>dfs.namenode.replication.max-streams</name>
<value>50</value>
</property>
<property>
<name>dfs.namenode.replication.max-streams-hard-limit</name>
<value>100</value>
</property>
</configuration>
And this is the hdfs-site.xml on the worker-machines (data-node):
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hdfs,/hdfs2,/hdfs3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hdfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.replication.max</name>
<value>3</value>
</property>
<property>
<name>dfs.replication.min</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>supergroup</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>002</value>
</property>
<property>
<!-- 1000Mbit/s -->
<name>dfs.balance.bandwidthPerSec</name>
<value>125000000</value>
</property>
</configuration>
This is the core-site.xml on the worker-machines (datanodes)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master-machine:54310/</value>
</property>
</configuration>
This is the core-site.xml on the master-machine (name node):
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master-machine:54310/</value>
</property>
</configuration>

After a lot of debugging I was able to identify that for some reason the jupyter container was not looking in the correct hadoop conf directory even though the HADOOP_HOME environment variable was pointing to the correct location. All I had to do to resolve the above problem was to point HADOOP_CONF_DIR to the correct directory and everything started working again.

Related

Hadoop DataNode not starting -Apache Bigtop install using puppet with kerberos

I am trying to deploy hadoop using Apache Bigtop 3.1.1 with puppet.Hadoop version is 3.2.4 .
OS I am using is CentOS 7.
Deployment works fine without kerberos. But with kerberos, hadoop datanode not stating up. Tried to start it manually with
sudo systemctl start hadoop-hdfs-datanode.service and
this is the error it gives:
Nov 21 12:15:59 master.local hadoop-hdfs-datanode[13646]: ERROR: You must be a privileged user in order to run a secure service. Nov 21 12:16:04 master.local hadoop-hdfs-datanode[13646]: Failed to start Hadoop datanode. Return value: 3[FAILED] Nov 21 12:16:04 master.local systemd[1]: hadoop-hdfs-datanode.service: control process exited, code=exited status=3 Nov 21 12:16:04 master.local systemd[1]: Failed to start LSB: Hadoop datanode.
This is my site.yaml file
---
bigtop::hadoop_head_node: "master.local"
hadoop::hadoop_storage_dirs:
- /data/1
- /data/2
- /data/3
- /data/4
hadoop_cluster_node::cluster_components:
- hdfs
- spark
- hive
- tez
- sqoop
- zookeeper
- kafka
- livy
- oozie
- zeppelin
- solrcloud
- kerberos
- httpfs
bigtop::bigtop_repo_uri: "http://10.42.65.70:90/bigtop/3.1.1/rpm/"
# - "https://archive.apache.org/dist/bigtop/bigtop-3.1.1/repos/centos-7/"
hadoop::common_hdfs::hadoop_http_authentication_signature_secret: "FaztheBits123!"
# Kerberos
hadoop::hadoop_security_authentication: "kerberos"
kerberos::krb_site::domain: "bigtop.apache.org"
kerberos::krb_site::realm: "BIGTOP.APACHE.ORG"
kerberos::krb_site::kdc_server: "%{hiera('bigtop::hadoop_head_node')}"
kerberos::krb_site::kdc_port: "88"
kerberos::krb_site::admin_port: "749"
kerberos::krb_site::keytab_export_dir: "/var/lib/bigtop_keytabs"
hadoop::common_hdfs::hadoop_http_authentication_type: "%{hiera('hadoop::hadoop_security_authentication')}"
# to enable tez in hadoop, uncomment the lines below
hadoop::common::use_tez: true
hadoop::common_mapred_app::mapreduce_framework_name: "yarn-tez"
# to enable tez in hive, uncomment the lines below
hadoop_hive::common_config::hive_execution_engine: "tez"
I tried changing workers in /etc/hadoop/conf/workers with proper hostnames. My core-site.xml is this:
<configuration>
<property>
<!-- URI of NN. Fully qualified. No IP.-->
<name>fs.defaultFS</name>
<value>hdfs://master.local:8020</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>hudson,testuser,root,hadoop,jenkins,oozie,hive,httpfs,users</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>hudson,testuser,root,hadoop,jenkins,oozie,hive,httpfs,users</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>hudson,testuser,root,hadoop,jenkins,oozie,hive,httpfs,users</value>
</property>
<!-- enable proper authentication instead of static mock authentication as
Dr. Who -->
<property>
<name>hadoop.http.filter.initializers</name>
<value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
<!-- disable anonymous access -->
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>false</value>
</property>
<!-- enable kerberos authentication -->
<property>
<name>hadoop.http.authentication.type</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.principal</name>
<value>HTTP/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.keytab</name>
<value>/etc/HTTP.keytab</value>
</property>
<!-- provide secret for cross-service-cross-machine cookie -->
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>/etc/hadoop/conf/hadoop-http-authentication-signature-secret</value>
</property>
<!-- make all services on all hosts use the same cookie domain -->
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value>local</value>
</property>
<property>
<name>hadoop.security.key.provider.path</name>
<value>kms://http#master.local:9600/kms</value>
</property>
</configuration>
My hdfs-site.xml is:
<configuration>
<!-- non HA -->
<property>
<name>dfs.namenode.rpc-address</name>
<value>master.local:8020</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master.local:50070</value>
</property>
<property>
<name>dfs.namenode.https-address</name>
<value>master.local:50470</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<!-- NameNode security config -->
<property>
<name>dfs.https.address</name>
<value>master.local:50475</value>
</property>
<property>
<name>dfs.https.port</name>
<value>50475</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/hdfs.keytab</value> <!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<property>
<name>dfs.namenode.kerberos.https.principal</name>
<value>host/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/hdfs.keytab</value> <!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<!-- Secondary NameNode security config -->
<property>
<name>dfs.secondary.http.address</name>
<value>master.local:0</value>
</property>
<property>
<name>dfs.secondary.https.address</name>
<value>master.local:50495</value>
</property>
<property>
<name>dfs.secondary.https.port</name>
<value>50495</value>
</property>
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>/etc/hdfs.keytab</value> <!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>hdfs/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.https.principal</name>
<value>host/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<!-- DataNode security config -->
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/hdfs.keytab</value> <!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<property>
<name>dfs.datanode.kerberos.https.principal</name>
<value>host/_HOST#BIGTOP.APACHE.ORG</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/1/hdfs,file:///data/2/hdfs,file:///data/3/hdfs,file:///data/4/hdfs</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/1/namenode,file:///data/2/namenode,file:///data/3/namenode,file:///data/4/namenode</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
<description>The name of the group of super-users.</description>
</property>
<!-- increase the number of datanode transceivers way above the default of 256
- this is for hbase -->
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<!-- Configurations for large cluster -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
I don't seem to find log for datanode. I can find logs for namenode in
/var/log/hadoop-hdfs but not datanode logs.
What am I doing wrong.

hadoop's start-dfs not creating datanode on the slave

I am trying to set a Hadoop cluster over two nodes. start-dfs.sh on my master node is opening a window and shortly after the window closes, and when i execute start-dfs it logs namenode is correctly launched, but datanode is not and logs the following :
Problem binding to [slave-VM1:9005] java.net.BindException: Cannot assign requested address: bind; For more details see: http://wiki.apache.org/hadoop/BindException
I have set
ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(and also set adminstrators_authorized_keys file with the right public key) (also ssh user#remotemachine is working and gives access to the slave)
Here's my full Hadoop configuration set on both master and slave machines (Windows):
hdfs-site.xml :
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/C:/Hadoop/hadoop-3.2.2/data/namenode</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>slaveVM1:50475</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/C:/Hadoop/hadoop-3.2.2/data/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
core-site.xml :
<configuration>
<property>
<name>dfs.datanode.http.address</name>
<value>slaveVM1:9005</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://masterVM2:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/C:/Hadoop/hadoop-3.2.2/hadoopTmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://masterVM2:8020</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>masterVM2:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
PS : i am adminstrator on both machines, and i set HADOOP_CONF_DIR C:\Hadoop\hadoop-3.2.2\etc\hadoop
I also set the slave IP in hadoop_conf_dir slaves file.
PS : if i remove the code :
<property>
<name>dfs.datanode.https.address</name>
<value>slave:50475</value>
</property>
from hdfs-site.xml
Then both datanote and namenode launch on the master node.
hosts :
*.*.*.* slaveVM1
*.*.*.* masterVM2
... are the IPs of the respective machines, all other entries are commented out
This usually happens
BindException: Cannot assign requested address: bind;
when the port in use. Meaning maybe it's the application was already started, or was started previously and didn't shut down properly or another applicaiton is using that port. Try rebooting, (as a heavy handed but reasonably effective way of clearing ports).

Hadoop jobtracker's tracking url cannot access

I have configured my hadoop system in wsl and run the wordcount example. But when I want to see the history of the job, I found the tracking url cannot access.
The job is working well, the jobhistory is running as well.
The history tracking url is my wsl hostname:8088/proxy/application_1585482453915_0002/.
You can see the url above.
But I can still access to localhost:19888/jobhistory to see my jobhistory.
How is this problem occurs? Is it a problem of configuration?
My hadoop version is 2.7.1.
My core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
     <name>fs.defaultFS</name>
     <value>hdfs://localhost:9000</value>
</property>
My hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop/tmp/dfs/data</value>
</property>
My mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
My yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
My /etc/hosts
127.0.0.1 localhost
127.0.1.1 DESKTOP-U1EOV4J.localdomain DESKTOP-U1EOV4J
The JobHistoryServer daemon is running in localhost (127.0.0.1), whereas the tracking URL is constructed with the hostname, thus redirecting to DESKTOP-U1EOV4J.localdomain (127.0.1.1).
For a Pseudo distributed cluster, it is safer to leave the host of JobHistoryServer to be 0.0.0.0.
Update the job history server properties in mapred-site.xml
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
and restart the JobHistoryServer.

GridGain No FileSystem for scheme: ggfs

everyone
I want to use GridGain in Hadoop 2.4.0
my hadoop config under that
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop-data</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>ggfs://ggfs#R</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/hadoop-data/journal</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>r,host002,host004</value>
</property>
<property>
<name>fs.AbstractFileSystem.ggfs.impl</name>
<value>org.gridgain.grid.ggfs.hadoop.v2.GridGgfsHadoopFileSystem</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>NEVER</value>
</property>
</configuration>
finish setting and start hdfs
I use
hadoop fs -ls /
ls: No FileSystem for scheme: ggfs
How should I do
Thanks
Add the followings to the core-site.xml:
<property>
<name>fs.ggfs.impl</name>
<value>org.gridgain.grid.ggfs.hadoop.v1.GridGgfsHadoopFileSystem</value>
</property>
The second version of Hadoop File System API is used rarely. The most of parts of Hadoop ecosystem works through first version of API.
And if you want to use GGFS only you don't need to start HDFS services.

distcp between nameservice1 and nameservice2

we have CDH 5.2 with Cloudera Manager 5.
We want to copy data from nameservice2 to nameservice1
Both clusters are on same CDH version
When I tried hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo
I got error
java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice2
So I added following config from Nameservice2 to Nameservice1
HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml in Cloudera manager (Gateway Default Group)
<property>
<name>dfs.nameservices</name>
<value>nameservices2</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>
But I am still getting same error.
Any workaround this ?
thanks
In HA enabled HDFS namenode nameservice1,nameservice2 are logical names, you cannot use ports along with that logical name.
you have two methods.
Easy method is to find the active namenodes and use the active namenode:port in the distcp command as follows. Namenode web UI can be used for finding active namenodes of two clusters.
hadoop distcp hdfs://hnn001.prod.cc:8020:8020/foo/bar hdfs://<dest-cluster-active-nn-hostname>:8020/bar/foo
Another method is to use logical names of two clusters as follow, But before trying the below command make sure you have properly configured nameservice1 and nameservice2 in your client hdfs-site.xml.
hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo
Confiruting remote cluster's nameservice in local cluster.
Looks like nameservice2 is your local and nameservice1 is your remote. You need to keep the all associated properties of nameservice1 and nameservice2 in the local cluster ie. Your local cluster's client hdfs-site.xml files should be as follows.
<configuration>
<!-- Available nameservices -->
<property>
<name>dfs.nameservices</name>
<value>nameservices1,nameservices2</value>
</property>
<!-- Local nameservice2 properties -->
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>
<!-- Remote nameservice1 properties -->
<!-- You can find these properties in the remote machine's hdfs-site.xml file -->
<property>
<name>dfs.client.failover.proxy.provider.nameservices1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices1</name>
<value>namenodeXX,namenodeYY</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenode**XX**</name>
<value><Remote-nn1>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50470</value>
</property>
<!-- Other properties -->
</configuration>
In the above configuration files replace all place holders like YY XX with corresponding values in the remote machine's hdfs site.xml.

Resources