I Installed and Set Up a 7-Node Hadoop(3.3.1) Cluster. But When I run the Hadoop word count program the program is stuck at "Running job".
root#hadoop-master:~# bash run-wordcount.sh
2022-07-31 07:00:21,038 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2022-07-31 07:00:21,494 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1659250805263_0001
2022-07-31 07:00:21,927 INFO input.FileInputFormat: Total input files to process : 2
2022-07-31 07:00:22,269 INFO mapreduce.JobSubmitter: number of splits:2
2022-07-31 07:00:22,400 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1659250805263_0001
2022-07-31 07:00:22,400 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-07-31 07:00:22,577 INFO conf.Configuration: resource-types.xml not found
2022-07-31 07:00:22,578 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-07-31 07:00:23,067 INFO impl.YarnClientImpl: Submitted application application_1659250805263_0001
2022-07-31 07:00:23,108 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1659250805263_0001/
2022-07-31 07:00:23,109 INFO mapreduce.Job: Running job: job_1659250805263_0001
my config is following
3 VM ==> 24Core, 128G RAM , 2T SSD
1 VM for name node
2 VM for data node(run with 3 docker containers)
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- Appoint MapReduce The program runs in Yarn On -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
</configuration>
What do I need to do to solve this problem? Please help
Related
I have a problem with sqoop if you help me I really appreciate your help.
I write a sqoop command from my local computer to export data from hdfs to oracle data database. I use hadoop-3.3.0 and sqoop 1.4.7 in my local computer.
and the error is :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
sqoop command:
sqoop export --connect "jdbc:oracle:thin:#(description=(address=(protocol=tcp)(host=172.16.49.30)(port=1521))(connect_data=(service_name=stgdb)))" --table CORE_ETL.DEPOSIT_TURNOVER --username username --password password --export-dir /tmp/merged_deposit_turnover/sqoop/ --input-fields-terminated-by "," --input-lines-terminated-by '\n'
yarn-site.xml:
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>true</value>
</property>
<property>
<name>yarn.admin.acl</name>
<value>*</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>cluster.com:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>cluster.com:8033</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>cluster.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>cluster.com:8031</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>cluster.com:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>cluster.com:8090</value>
</property>
<property>
<name>yarn.resourcemanager.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.resourcemanager.admin.client.thread-count</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>512</value>
</property>
<property>
<name> yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name>
<value>1000</value>
</property>
<property>
<name>yarn.am.liveness-monitor.expiry-interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name>
<value>1000</value>
</property>
<property>
<name>yarn.nm.liveness-monitor.expiry-interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
</property>
<property>
<name>yarn.resourcemanager.max-completed-applications</name>
<value>10000</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
</configuration>
environment variables:
export HADOOP_HOME=/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export YARN_CONF_DIR=/etc/hadoop/etc/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/dfs/nn</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address</name>
<value>cluster.com:8022</value>
</property>
<property>
<name>dfs.https.address</name>
<value>cluster.com:9871</value>
</property>
<property>
<name>dfs.https.port</name>
<value>9871</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>cluster.com:9870</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>67108864</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>022</value>
</property>
<property>
<name>dfs.client.block.write.locateFollowingBlock.retries</name>
<value>7</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>false</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.skip.checksum</name>
<value>false</value>
</property>
<property>
<name>dfs.client.domain.socket.data.traffic</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/etc/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/etc/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/etc/hadoop</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/common/*,$HADOOP_MAPRED_HOME/share/hadoop/common/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/lib/*</value>
</property>
</configuration>
sqoop error:
Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/lib/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2020-08-22 17:56:24,879 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2020-08-22 17:56:25,173 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2020-08-22 17:56:25,492 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
2020-08-22 17:56:25,579 INFO manager.SqlManager: Using default fetchSize of 1000
2020-08-22 17:56:25,579 INFO tool.CodeGenTool: Beginning code generation
2020-08-22 17:56:27,694 INFO manager.OracleManager: Time zone has been set to GMT
2020-08-22 17:56:27,883 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM CORE_ETL.DEPOSIT_TURNOVER t WHERE 1=0
2020-08-22 17:56:28,188 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /etc/hadoop
Note: /tmp/sqoop-hatef/compile/dc629ada72d032251eb72d68f8f68c85/CORE_ETL_DEPOSIT_TURNOVER.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-08-22 17:56:33,829 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hatef/compile/dc629ada72d032251eb72d68f8f68c85/CORE_ETL.DEPOSIT_TURNOVER.jar
2020-08-22 17:56:33,902 INFO mapreduce.ExportJobBase: Beginning export of CORE_ETL.DEPOSIT_TURNOVER
2020-08-22 17:56:33,902 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2020-08-22 17:56:34,381 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2020-08-22 17:56:36,685 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-08-22 17:56:38,545 INFO manager.OracleManager: Time zone has been set to GMT
2020-08-22 17:56:38,638 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2020-08-22 17:56:38,645 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2020-08-22 17:56:38,647 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2020-08-22 17:56:38,996 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hdp-name1-esxi12.sdb247.com/172.16.49.10:8032
2020-08-22 17:56:40,130 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/airflow/.staging/job_1597060731030_0459
2020-08-22 18:01:01,798 INFO input.FileInputFormat: Total input files to process : 1
2020-08-22 18:01:01,885 INFO input.FileInputFormat: Total input files to process : 1
2020-08-22 18:01:02,817 INFO mapreduce.JobSubmitter: number of splits:4
2020-08-22 18:01:02,999 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2020-08-22 18:01:05,962 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1597060731030_0459
2020-08-22 18:01:05,962 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-08-22 18:01:08,561 INFO conf.Configuration: resource-types.xml not found
2020-08-22 18:01:08,562 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-08-22 18:01:08,901 INFO impl.YarnClientImpl: Submitted application application_1597060731030_0459
2020-08-22 18:01:09,086 INFO mapreduce.Job: The url to track the job: http://hdp-name1-esxi12.sdb247.com:8088/proxy/application_1597060731030_0459/
2020-08-22 18:01:09,088 INFO mapreduce.Job: Running job: job_1597060731030_0459
2020-08-22 18:01:11,442 INFO mapreduce.Job: Job job_1597060731030_0459 running in uber mode : false
2020-08-22 18:01:11,444 INFO mapreduce.Job: map 0% reduce 0%
2020-08-22 18:01:11,671 INFO mapreduce.Job: Job job_1597060731030_0459 failed with state FAILED due to: Application application_1597060731030_0459 failed 2 times due to AM Container for appattempt_1597060731030_0459_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-08-22 18:03:19.337]Exception from container-launch.
Container id: container_1597060731030_0459_02_000001
Exit code: 1
[2020-08-22 18:03:19.338]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
[2020-08-22 18:03:19.339]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
For more detailed output, check the application tracking page: http://cluster.com:8088/cluster/app/application_1597060731030_0459 Then click on links to logs of each attempt.
. Failing the application.
2020-08-22 18:01:11,780 INFO mapreduce.Job: Counters: 0
2020-08-22 18:01:11,916 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
2020-08-22 18:01:11,921 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 273.1812 seconds (0 bytes/sec)
2020-08-22 18:01:12,013 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2020-08-22 18:01:12,015 INFO mapreduce.ExportJobBase: Exported 0 records.
2020-08-22 18:01:12,015 ERROR mapreduce.ExportJobBase: Export job failed!
2020-08-22 18:01:12,016 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
at org.apache.sqoop.manager.OracleManager.exportTable(OracleManager.java:465)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
You mention you have a cluster installed with Cloudera, but it is not clear where Sqoop is running or where you got those XML files.
If you have a fully installed Cloudera Cluster, Sqoop should already be installed and configured there for you to run without much issues (you might need extra JDBC drivers, but that should be it)
Otherwise, if you are trying to setup Sqoop (and Hadoop) externally, you'll want to grab a copy of the $HADOOP_HOME/conf folder from a worker node in the Hadoop cluster to make sure all the client configurations are the same.
I'm trying to run a word count program using Hadoop 3.2.1 on my Ubuntu 20.04 Virtual Machine. But I've been getting a "resource-types.xml" not found error and although it shows that the job is running it does not give any output.
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/richa/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/richa/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
When i try to run my hadoop jar command, I get the following:
richa#richa-VirtualBox:~$ hadoop jar /home/richa/wc.jar WordCount /home/richa/input/wc_input.txt /home/richa/output
2020-08-31 08:37:38,144 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
2020-08-31 08:37:39,986 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-08-31 08:37:40,049 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/richa/.staging/job_1598842809949_0002
2020-08-31 08:37:40,419 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:40,862 INFO input.FileInputFormat: Total input files to process : 1
2020-08-31 08:37:41,020 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:41,512 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:41,573 INFO mapreduce.JobSubmitter: number of splits:1
2020-08-31 08:37:42,344 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:42,506 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1598842809949_0002
2020-08-31 08:37:42,507 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-08-31 08:37:43,130 INFO conf.Configuration: resource-types.xml not found
2020-08-31 08:37:43,131 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-08-31 08:37:43,351 INFO impl.YarnClientImpl: Submitted application application_1598842809949_0002
2020-08-31 08:37:43,462 INFO mapreduce.Job: The url to track the job: http://richa-VirtualBox:8088/proxy/application_1598842809949_0002/
2020-08-31 08:37:43,464 INFO mapreduce.Job: Running job: job_1598842809949_0002
I am unable to understand, where am i going from? I have included almost all of the jar files. Did i miss something in my mapred-site.xml? Or should i wait for a longer time for it to complete it's job? How much time does it even take to run a small program?
All my environment variables are also correct.
Thank you in advance!
I am trying to teach myself some hadoop basics and so have build a simple hadoop cluster. This works and I can put, ls, cat from the hdfs filesystem without any issues.
So I took the next step and tried to do a wordcount on a file I had put into hadoop, but I get the following error
$ hadoop jar /home/hadoop/share/hadoop/mapreduce/*examples*.jar wordcount data/sectors.txt results
2018-06-06 07:57:36,936 INFO client.RMProxy: Connecting to ResourceManager at ansdb1/10.49.17.12:8040
2018-06-06 07:57:37,404 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1528191458385_0014
2018-06-06 07:57:37,734 INFO input.FileInputFormat: Total input files to process : 1
2018-06-06 07:57:37,869 INFO mapreduce.JobSubmitter: number of splits:1
2018-06-06 07:57:37,923 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-06-06 07:57:38,046 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528191458385_0014
2018-06-06 07:57:38,048 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-06-06 07:57:38,284 INFO conf.Configuration: resource-types.xml not found
2018-06-06 07:57:38,284 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-06-06 07:57:38,382 INFO impl.YarnClientImpl: Submitted application application_1528191458385_0014
2018-06-06 07:57:38,445 INFO mapreduce.Job: The url to track the job: http://ansdb1:8088/proxy/application_1528191458385_0014/
2018-06-06 07:57:38,446 INFO mapreduce.Job: Running job: job_1528191458385_0014
2018-06-06 07:57:45,499 INFO mapreduce.Job: Job job_1528191458385_0014 running in uber mode : false
2018-06-06 07:57:45,501 INFO mapreduce.Job: map 0% reduce 0%
2018-06-06 07:57:45,521 INFO mapreduce.Job: Job job_1528191458385_0014 failed with state FAILED due to: Application application_1528191458385_0014 failed 2 times due to AM Container for appattempt_1528191458385_0014_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-06-06 07:57:43.301]Exception from container-launch.
Container id: container_1528191458385_0014_02_000001
Exit code: 1
[2018-06-06 07:57:43.304]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
[2018-06-06 07:57:43.304]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
For more detailed output, check the application tracking page: http://ansdb1:8088/cluster/app/application_1528191458385_0014 Then click on links to logs of each attempt.
. Failing the application.
2018-06-06 07:57:45,558 INFO mapreduce.Job: Counters: 0
I have searched lots of website and they seem to say that my environment isn't right. I have tried many of the suggested fixes, but nothing has worked.
Everything is running on both nodes:
$ jps
31858 ResourceManager
31544 SecondaryNameNode
6152 Jps
31275 DataNode
31132 NameNode
$ ssh ansdb2 jps
16615 NodeManager
21290 Jps
16478 DataNode
I can ls hadoop:
$ hadoop fs -ls /
Found 3 items
drwxrwxrwt - hadoop supergroup 0 2018-06-06 07:58 /tmp
drwxr-xr-x - hadoop supergroup 0 2018-06-05 11:46 /user
drwxr-xr-x - hadoop supergroup 0 2018-06-05 07:50 /usr
hadoop version:
$ hadoop version
Hadoop 3.1.0
Source code repository https://github.com/apache/hadoop -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
Compiled by centos on 2018-03-30T00:00Z
Compiled with protoc 2.5.0
From source with checksum 14182d20c972b3e2105580a1ad6990
This command was run using /home/hadoop/share/hadoop/common/hadoop-common-3.1.0.jar
hadoop classpath:
$ hadoop classpath
/home/hadoop/etc/hadoop:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/*:/home/hadoop/share/hadoop/hdfs:/home/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/share/hadoop/hdfs/*:/home/hadoop/share/hadoop/mapreduce/*:/home/hadoop/share/hadoop/yarn:/home/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/share/hadoop/yarn/*
my environment is setup:
# hadoop
## JAVA env variables
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
## HADOOP env variables
export HADOOP_HOME=/home/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
My hadoop xml files
core-site.xml:
$ cat $HADOOP_HOME/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ansdb1:9000/</value>
</property>
</configuration>
hdfs-site.xml:
$ cat $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/datanode</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/namenode</value>
</property>
<property>
<name>dfs.checkpoint.dir</name>
<value>/data/hadoop/secondarynamenode</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
yarn-site.xml:
$ cat $HADOOP_HOME/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ansdb1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ansdb1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ansdb1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ansdb1:8040</value>
</property>
</configuration>
mapred-site.xml:
$ cat $HADOOP_HOME/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
I have checked which jar file contains MRAppMaster:
$ find /home/hadoop -name '*.jar' -exec grep -Hls MRAppMaster {} \;
/home/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-client-app-3.1.0-sources.jar
/home/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-client-app-3.1.0-test-sources.jar
/home/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.1.0.jar
Clearly I am missing something, so could somebody please point me the right direction.
After much googling of the same question asked different ways, I found this https://mathsigit.github.io/blog_page/2017/11/16/hole-of-submitting-mr-of-hadoop300RC0/ (it's in Chinese).
So I set the following properties in mapred-site.xml
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
And everything works
I'm trying to run a wordcount jar on a cluster hadoop 2.7.1 (one master and 4 slaves), but the MapReduce job was blocked at:
$ hadoop jar wc.jar WordCount /input /output_hocine
17/03/13 09:41:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/13 09:41:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/03/13 09:41:43 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/03/13 09:41:44 INFO input.FileInputFormat: Total input paths to process : 3
17/03/13 09:41:44 INFO mapreduce.JobSubmitter: number of splits:3
17/03/13 09:41:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489393376058_0003
17/03/13 09:41:44 INFO impl.YarnClientImpl: Submitted application application_1489393376058_0003
17/03/13 09:41:44 INFO mapreduce.Job: The url to track the job: http://ibnbadis21:8088/proxy/application_1489393376058_0003/
17/03/13 09:41:44 INFO mapreduce.Job: Running job: job_1489393376058_0003
Via The navigator, The output via the navigator is shown at this image:
Here is the content of the configuration files:
Core-site.xml:
<configuration>
<!-- <property>
<name>fs.defaultFS</name>
<value>hdfs://ibnbadis21:9000</value>
</property>-->
<property>
<name>fs.default.name</name>
<value>hdfs://ibnbadis21:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
yarn-site.xml:
<?xml version="1.0"?> <configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
mapred-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ibnbadis21:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ibnbadis21:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user/app</value>
</property>
</configuration>
hdfs-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property> <name>dfs.namenode.checkpoint.dir</name>
<value>file:/usr/local/hadoop_data/hdfs/namesecondary</value>
</property>
<property> <name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
Can anyone tell me how can solve this problem, please?
Connecting to ResourceManager at /0.0.0.0:8032
0.0.0.0 (the default) is not a valid hostname.
So, add this in yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value> YOUR VALUE HERE </value> <!-- Needs Fully Qualified Domain Name -->
</property>
There are many values that you probably didn't set.
Refer Hadoop | Configuring the Hadoop Daemons
By the way, fs.defaultFS is the correct property to use.
Finally the problem was about access rights. The framework haven't the right to access at my yarn-site.xml file. That's why it used the default value 0.0.0.0/8030. Thus When I executed the command with privilege (sudo):
sudo hadoop jar wc.jar WordCount /input /output
My job MapReduce is executed successfully!
I want to test out a cluster of some few computers: Each with 2 Cores and 256 MB of RAM. By following Cloudera's tutorial, I've tried instructing Hadoop 2.6.0 about my low resources NodeManagers( Ubuntu 14.04). I have the following configurations :
mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-master:19888</value>
</property>
<property>
<name>mapred.task.profile</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>200</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>200</value>
</property>
<property>
<name>mapreduce.map.java.opts.max.heap</name>
<value>160</value>
</property>
<property>
<name>mapreduce.reduce.java.opts.max.heap</name>
<value>160</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>200</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>200</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8050</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///usr/local/hadoop/local</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>200</value>
</property>
</configuration>
But when I try and run a small pi generation example, I get this error:
yarn jar hadoop-mapreduce-examples-2.6.0.jar pi 1 1
Number of Maps = 1
Samples per Map = 1
Wrote input for Map #0
Starting Job
16/01/28 19:23:24 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/10.0.3.100:8050
16/01/28 19:23:25 INFO input.FileInputFormat: Total input paths to process : 1
16/01/28 19:23:25 INFO mapreduce.JobSubmitter: number of splits:1
16/01/28 19:23:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1454008935455_0001
16/01/28 19:23:26 INFO impl.YarnClientImpl: Submitted application application_1454008935455_0001
16/01/28 19:23:26 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1454008935455_0001/
16/01/28 19:23:26 INFO mapreduce.Job: Running job: job_1454008935455_0001
16/01/28 19:23:34 INFO mapreduce.Job: Job job_1454008935455_0001 running in uber mode : false
16/01/28 19:23:34 INFO mapreduce.Job: map 0% reduce 0%
16/01/28 19:23:34 INFO mapreduce.Job: Job job_1454008935455_0001 failed with state FAILED due to: Application application_1454008935455_0001 failed 2 times due to AM Container for appattempt_1454008935455_0001_000002 exited with exitCode: -103
For more detailed output, check application tracking page:http://hadoop-master:8088/proxy/application_1454008935455_0001/Then, click on links to logs of each attempt.
Diagnostics: Container [pid=847,containerID=container_1454008935455_0001_02_000001] is running beyond virtual memory limits. Current usage: 210.8 MB of 200 MB physical memory used; 1.3 GB of 420.0 MB virtual memory used. Killing container.
Dump of the process-tree for container_1454008935455_0001_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 855 847 847 847 (java) 466 16 1410424832 53695 /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
|- 847 845 847 847 (bash) 0 0 5431296 276 /bin/bash -c /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001/stdout 2>/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
16/01/28 19:23:34 INFO mapreduce.Job: Counters: 0
Job Finished in 9.962 seconds
java.io.FileNotFoundException: File does not exist: hdfs://hadoop-master:9000/user/hduser/QuasiMonteCarlo_1454009003268_765740795/out/reduce-out
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1750)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Is there an error in this configuration? Or maybe Hadoop isn't made for such low resources. I'm just doing this for learning purposes.
Yeah you'll get into trouble with low resources. For testing purposes disable mem checks:
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
For yarn.scheduler.minimum-allocation-mb you might go even lower because actual reserved mem is used in incremental steps. i.e. if you set it to 100 and request 101, yarn will round it up to 200.
vmem check is unreliable and imho should really be disabled on yarn by default.