hadoop wordcount and upload file into hdfs - hadoop

hello everyone i am very new in hadoop and i install hadoop in pseudo mode.
configurations files are here
core-site.xml
<configuration>
<property>
<name>fs.default.name </name>
<value> hdfs://localhost:9000 </value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop_usr/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop_usr/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
and am successfully start datanode and namenode
Now i want to put my file into hdfs by using following way
what's going wrong why i get error message. Please help me to resolve this problem
If i using following way to put file into hdfs that time command is working fine. now i appand hdfs url.
Please help me why i getting error in first way.
Because when in running my wordcount.jar that time am also getting error message when i mentioned data.txt as input file on which operation sould be performed.
Thanks in advance.

The reason the first put operation to data/data.txt is not working is likely that you do not have a folder data in your hdfs yet.
You can just create it using hadoop fs -mkdir /data.

Related

Hadoop localhost:9870 browser interface is not working

I need to do data analysis using Hadoop. Therefore I have installed Hadoop and configured as below. But localhost:9870 is not working. Even I have format namenode every time I worked with that. Some articles and answers of this forum mentioned that 9870 is the updated one from 50070. I have win 10. I also referred answers in this forum but none of them worked. Java-home and hadoop-home paths are set. Paths to bin and sbin of hadoop are also set up. Can anyone please tell me what I am doing wrong in here?
I referred this site to do the installation and configuration.
https://medium.com/#pedro.a.hdez.a/hadoop-3-2-2-installation-guide-for-windows-10-454f5b5c22d3
core-site.xml
I have set up the Java path in this xml as well.
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9870</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-3.2.2\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-3.2.2\data\datanode</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
If you look at the namenode logs, it very likely has an error saying something about a port already being in use.
The default fs.defaultFS port should be 9000 - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html ; you shouldn't change this without good reason.
The Namenode web UI isn't the value in fs.defaultFS. It's default port is 9870, and is defined by dfs.namenode.http-address in hdfs-site.xml
need to do data analysis
You can do analysis on Windows without Hadoop using Spark, Hive, MapReduce, etc. directly and it'll have direct access to your machine without being limited by YARN container sizes.

File is not Loaded Into HDFS from Local Using Streamsets (validated Successfully!)

I just have started using streamsets, and i'm trying to load a text file from local to HDFS.
Please note: I'm using Cloudera Manager, here is a view of "core-site.xml":
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.proxyuser.sdc.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.sdc.groups</name>
<value>*</value>
</property>
</configuration>
The local file is a text file stored in "/home/cloudera/Desktop".
Here is a view of the source (Local) configuration in Streamsets:
Here is a view of Hadoop fs configuration in Streamsets:
It was validated successfully!
After I played the pipline, I'm supposed to find the file in HDFS directory that I specified, especially at "/user/cloudera".
But when I run it the file hasn't been loaded.
I'm sure I missed something, and I couldn't find answer for this.
Could you please help!
Thanks,
You need to play the pipeline, not only validate it.

Pseudo Distributed Mode Hadoop

I have installed Pseudo Distributed mode Hadoop 2.7.3 in Mac & did all Configuration which is specified in Plural Sight. I Copied Csv file from Local to hdfs. But next day when i searched for files , it is not present in hdfs and removed automatically. Is there any other conf setting so that my files are not loss?
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Thanks,
Add these properties to hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/username/hadoop-dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/username/hadoop-dfs/data</value>
</property>
The metadata and data blocks are stored under /tmp by default as it is the value of hadoop.tmp.dir. The contents inside /tmp are deleted on reboot.
After adding these properties, format the namenode and start the services.

Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current

I get the error
Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current
While trying to install hadoop on my local Mac.
What could be the reason for this? Just for reference, I'm putting my xml files down below:
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
I think my problem lies in my hdfs-site.xml file, but I'm not sure how to pinpoint/change it.
I'm using this tutorial, and "hadoop" in the file path is replaced by my username.
Possible error: misconfiguration of the hdfs-site.xml file
This happened to me when I was following a setup tutorial. The contents of the hdfs-site.xml for me was
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/nameNode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dataNode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Only then I realized that the text hadoop in the above file corresponds to the user name, where in my case, it had to replaced with hduser. When both occurrences of hadoop was replaced with hduser, the hdfs namenode -format command worked fine.
I had this problem too and it was a permission problem. I just did:
sudo chmod 777 /home/hadoop/hadoopinfra/hdfs/namenode/
and works!
In the step where you need to verify the hadoop installation, instead of 'hdfs namenode -format' use '/usr/local/hadoop/bin/hdfs namenode -format'
Found this answer from:
hadoop java.io.IOException: while running namenode -format
If you are not using any other distro than native hadoop, then add the current user to hadoop group and retry formatting the namenode.
sudo usermod -a -G hadoop <current-username>
In case of using thirdparty hadoop distros such Cloudera, Hortonworks or MapR, switch to root user and again switch to hdfs user then try formatting the namenode will succeed.
$ sudo -i
$ su - hdfs
$ hdfs namenode -format
Try the Hadoop command with sudo

issue with hadoop secondary node

I am new to hadoop. When I run wordcount test project, evrything works fine. But, I can't access the JobTracker at http://localhost:50030. in fact, when I get my secondary node log file, I get exception message :
java.io.IOException: Bad edit log manifest (expected txid = 3: [[21,22], [23,24]
[8683,8684], [8685,8686], [8687,8688], [8689,8690], [8691,8692], [8693,8694], [8695,8696], [8697,8698], [8699,8700]]...
....
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:438)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:540)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
at java.lang.Thread.run(Thread.java:745)
Btw, when I run jps, I get 53745 JobHistoryServer 77259 Jps
UPDATE : here's my config
in core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
in hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>
and nothing is set in my yarn-site.xml
If you are using latest version of Hadoop, then Job Tracker will not be available. Job tracker is replaced by Resource Manager and History Server.
If you want to access past job details, go to http://hostname:19888. This is the web UI address for job history server.
Please refer Hadoop Cluster Setup for further details.

Resources