Hadoop can't execute a basic Example - hadoop

The software I'm using:
System:macOS Mojave 10.14.2
Hadoop:3.1.1
JDK:10.0.2
I execute this command:hadoop jar /usr/local/Cellar/hadoop/3.1.1/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar pi 2 5, it failed:
I need help, thank you!!!
In hadoop-env.sh, I just add the sentence:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-10.0.2.jdk/Contents/Home
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

I solved it.
It's the reason of java version.
When I added the two lines of code to yarn-env.sh, it didnt't work for me.
export YARN_RESOURCEMANAGER_OPTS="--add-modules=ALL-SYSTEM"
export YARN_NODEMANAGER_OPTS="--add-modules=ALL-SYSTEM"
In the end, I change the java version, set it to java8 and deleted above two lines of code, it worked for me.
You can set it in hadoop-env.sh.
Thx

Related

Hadoop job keeps running and no container is allocated

I tried running a mapreduce job in Hadoop 2.8.5 but it keeps running.
The Application State is as below:
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
RM web UI:
The health-report says: 1/1 local-dirs are bad: /home/hduser/hadooptmpdata/nm-local-dir; 1/1 log-dirs are bad: /home/hduser/hadoop-2.8.5/logs/userlogs
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadooptmpdata</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<name>dfs.name.dir</name>
<value>file:///home/hduser/hdfs/namenode</value>
<name>dfs.data.dir</name>
<value>file:///home/hduser/hdfs/datanode</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>100</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>3</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>3</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hduser/hadooptmpdata/nm-local-dir</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/user/hduser/hadooptmpdata/mapred/local</value>
</property>
</configuration>
I am running Hadoop on ubuntu and my pc have intel i7 processor with 16 gb RAM and 256 GB SSD
YARN's Resource Manager need compute resources from Node Manager(s) in order to run anything. Your Node Manager shows it's local directory is bad. Which means you have no compute resources available (which is verified looking at your cluster metrics. See all the zeros.) which is why your application is stuck in "ACCEPTED".
Fix your yarn.nodemanager.local-dirs and make sure YARN has full permissions on it to proceed.

Exception from container-launch on a mapreduce job

I've setup a hadoop cluster with one master node and 3 datanodes. When I try to run a mapreduce job on the master node I get the following error:
18/05/23 19:22:59 INFO mapreduce.Job: Task Id : attempt_1527096061793_0001_m_000000_0, Status : FAILED
Exception from container-launch.
Container id: container_1527096061793_0001_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
I've managed to find the error in the data nodes log files as well but they don't say anything more than has been shown in the console. I'm really stuck for quite some time now and I'm not sure how to approach this one. Any suggestions or help is appreciated.
Thanks
core-site.xml
<configuration>
<!-- core-site.xml -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://NameNode:9000/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>NameNode</value>
</property>
<property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:/usr/local/hadoop_work/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:/usr/local/hadoop_work/yarn/log</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://NameNode:9000/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- hdfs-site.xml -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user/app</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Djava.security.egd=file:/dev/../dev/urandom</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2000</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2000</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>1600</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>1600</value>
</property>
</configuration>
Hadoop version is 2.7.6
I found the problem. It was in the mapred-site.xml configuration file on these lines here:
<property>
<name>mapreduce.map.java.opts</name>
<value>1600</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>1600</value>
</property>
Every value inside the tag should have an "-Xmx" prefix and an "m" suffix.
The right way to configure these properties:
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1600m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1600m</value>
</property>
Hope this helps someone as it took way too much of my time to figure it out.

Apache Kylin not able to load models/configuration

I'm new to hadoop,hive, hbase and kylin. I tried to install thoose first three, and it's seems to be working.
After that I tried to install apache kylin, run the sample.sh and success.
After running the script I restart and open the web interface. Some page cannot be opened ex: /cube, /models, /admin/config
The problem is: I can see there are 5 tables created in hive, and also 2 cubes created. But when I open in web gui, the models is in loading-state and I cannot build the cube.
When I try to build the cube
I cannot find any infomative log (Or maybe there is one, but I don't know about it)
kylin.log
https://pastebin.com/TUZkQepa
hadoop-hadoop-namenode-master.log
https://pastebin.com/T8eNt3PY
hadoop-hadoop-secondarynamenode-master.log
https://pastebin.com/iMJDNFfU
yarn-hadoop-resourcemanager-master.log
https://pastebin.com/TGwJWTRF
hbase-hadoop-zookeeper-master.log
https://pastebin.com/Ym6eky5h
hbase-hadoop-master-master.log
https://pastebin.com/p1ygfw4W
Here is the configuration for hadoop
(yarn-site.xml)
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Configuration for hbase
regionservers
slave2
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/datadir</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave2</value>
</property>
</configuration>
Configuration for hive
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>gwudainget</value>
<description>password for connecting to mysql server</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
</configuration>
For kylin, I use default configuration, because I don't really know what to do with the kylin configuration.
What i use:
hadoop 2.7.5 binary
hbase 1.2.6 binary
hive 1.2.2 binary
kylin 2.2.0 source (I just add logs)

Hadoop : DataNode change directory not taking effect

We are using hadoop 2.7.3 changed the hdfs-site.xml to point to new directory provided permissions on new directory too ...and ran start-dfs.sh and stop-dfs.sh ..on name node ...but changes are not taking effect it still points to the old directory ...
Am I missing anything while doing the configuration changes? And how can we make sure to use the new directory?
it's a multi node cluster
this is the hdfs-site.xml on name node
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///test/hadoop/hadoopinfra/hdfs/namenode</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///tmp/hadoop/data</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>2368709120</value>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>1.0</value>
</property>
</configuration>
this is the hdfs-site.xml under data node
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///test/hadoop/hadoopinfra/hdfs/datanode</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///tmp/hadoop/data</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>2368709120</value>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>1.0</value>
</property>
</configuration>

Can't copy file into HDFS

I have problem with HDSF.
I can't copy any files into it, but I have ample space in DataNodes?
Maybe I have some bad configuration?
You should provide specific details like the exception you get, steps you follow etc, Since you have not specified any information at all, i would say check for the config files to make sure you have all the required entries in corresponding files :
In core-site.xml you should have
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipaddress:port</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/$user/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
Similarly hdfs-site.xml should have
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/$user/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/$user/hdfs/name</value>
</property>
</configuration>
And finally the mapred-site.xml should have
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ip:port</value>
</property>
</configuration>
Hope this helps.

Resources