Hadoop not using config files? - hadoop

I'm deploying Hadoop at work and I've been troubleshooting some days. Yesterday it was working perfectly but today something strange is happening.
I have hadoop.tmp.dir set in core-site.xml as well as other directories for HDFS (datanode, namenode and secondarynamenode in hdfs-site.xml). But today, when I format the FS it's creating all the files in /tmp and not in /usr/local/hadoop/tmp which is the one I have configured.
$ bin/hdfs namenode -format
[...]
INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
[...]
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/usr/local/hadoop/hdfs/secondname</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value>
</property>
Anyone has any clue about what's happening?
Thanks!

make sure this directory exists and have enough permission
give the path as file:///usr/local/hadoop/tmp

Found what was wrong and it was really embarrassing. My hadoop user had bash as default sh but wasn't loading correctly the profile until I explicitly did "bash" on command line.
Saw it with printenv command.

Related

Hadoop-Apache Ranger: StackOverflowError on namenode restart

I am getting this error after enabling hdfs plugin in apache ranger.
When I run enable-hdfs-plugin.sh ranger adds following configuration in hdfs-site.xml.
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.inode.attributes.provider.class</name>
<value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>
But if I remove the above property and restart my namenode, it starts with no error. Also, when I try to format the namenode it gives me the same error.
This is my install.properties of ranger's hdfs-plugin.
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-hdfs-plugin-impl to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-hdfs-plugin-impl
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-hdfs-plugin-shim-1.0.0-SNAPSHOT.jar to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-hdfs-plugin-shim-1.0.0-SNAPSHOT.jar
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-plugin-classloader-1.0.0-SNAPSHOT.jar to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-plugin-classloader-1.0.0-SNAPSHOT.jar
follow these instruction as per your file path. The problem is due to classloader is not found in your hadoop file path.

Hadoop fs -ls outputs current working directory's files rather than hdfs volume's files

Have set up a single pseudo-distributed node (localhost) with Hadoop 2.8.2 on OpenSuse Tumbleweed 20170703. Java version is 1.8.0_151. Generally, it seems to be set up correctly. I can format namenode with no errors etc.
However, when I try hadoop fs -ls, files/dirs from the current working directory are returned rather than the expected behaviour of returning the hdfs volume files (which should be nothing at the moment).
Was originally following this guide for CentOS (making changes as required) and the Apache Hadoop guide.
I'm assuming that it's a config issue, but I can't see why it would be. I've played around with core-site.xml and hdfs-site.xml as per below with no luck.
/opt/hadoop-hdfs-volume/ exists and is assigned to user hadoop in user group hadoop. As is the /opt/hadoop/ directory (for bin stuff).
EDIT:
/tmp/hadoop-hadoop/dfs/name is where the hdfs namenode -format command runs. /tmp/ also seems to hold my user (/tmp/hadoop-dijksterhuis) and the hadoop user directories.
This seems odd to me considering the *-site.xml config files below.
Have tried restarting the dfs and yarn services with the .sh scripts in the hadoop/sbin/ directory. Have also rebooted. No luck!
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-hdfs-volume/${user.name}</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Anyone have any ideas? I can provide more details if needed.
Managed to hack a fix via another SO answer:
Add $HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop to the hadoop user's .bashrc.
This has the effect of overriding the value in etc/hadoop-env.sh, which keeps pointing the namenode to the default tmp/hadoop-${user-name} directory.
source .bashrc et voila! Problem fixed.

hadoop wordcount and upload file into hdfs

hello everyone i am very new in hadoop and i install hadoop in pseudo mode.
configurations files are here
core-site.xml
<configuration>
<property>
<name>fs.default.name </name>
<value> hdfs://localhost:9000 </value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop_usr/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop_usr/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
and am successfully start datanode and namenode
Now i want to put my file into hdfs by using following way
what's going wrong why i get error message. Please help me to resolve this problem
If i using following way to put file into hdfs that time command is working fine. now i appand hdfs url.
Please help me why i getting error in first way.
Because when in running my wordcount.jar that time am also getting error message when i mentioned data.txt as input file on which operation sould be performed.
Thanks in advance.
The reason the first put operation to data/data.txt is not working is likely that you do not have a folder data in your hdfs yet.
You can just create it using hadoop fs -mkdir /data.

Incorrect HDFS File URL

I was trying to access a file present in the HDFS (location: /user/input/UsageSummary.txt). I am not sure what will be the URL for this file.
I tried this url but it did not worked:
hdfs://127.0.0.1:9000/user/input/UsageSummary.txt
Even I tried these but none of them worked:
hdfs://localhost:9000/user/input/UsageSummary.txt
hdfs://localhost/user/input/UsageSummary.txt
Please let me know how to find out the correct URL.
EDIT
This is the content of core-site.xml file:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<!-- HTTPFS proxy user setting -->
<property>
<name>hadoop.proxyuser.tomcat.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.tomcat.groups</name>
<value>*</value>
</property>
</configuration>
Typically, the HDFS resource path is
hdfs://<NameNodeHost>:<port>/path to resource
If you just want to print a file content, the below is sufficient.
hadoop fs -cat /user/input/UsageSummary.txt
What is the output or error you are getting, and what mode are you running hadoop in - local, fully or psuedo distributed?
What do you have set as fs.defaultFS in your core-site.xml? if its set to hdfs://host:port/ you should be able to run something like
hdfs dfs -cat /user/input/UsageSummary.txt
or run
hdfs dfs -ls /
to see root ensure directory structure does exist.

Error in starting hadoop Job Tracker

I tried to run a simple program in hadoop using Windows-Cygwin.
I am able to start the namenode .
The jobtracker start however fails with exception :
FATAL mapred.JobTracker: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2200)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2192)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2186)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)
I tried all possible methods to resolve this ,but in vain. Any pointers will greatly help me.
Hdfs-site.xml configurations :
<configuration><br>
<property>
<name>fs.default.name< /name>
<value>hdfs://localhost:9100</value>
</property>
<property>
<name>mapred.job.tracker< /name>
<value>localhost:9101< /value>
</property>
<property>
<name>dfs.replication< /name>
<value>1</value>
</property>
</configuration>
The problem is the following lines should on into mapred-site.xml and NOT hdfs-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
By the way why are you trying to run Hadoop in Windows? For development? You don't have a linux machine or reluctant to install one?
One more thing, you usually put this property in core-site.xml not hdfs-site.xml,
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9100</value>
</property>
I faced the same issue when working on the "Pseudo Distributed" examples as at this page: http://hadoop.apache.org/docs/r1.1.2/single_node_setup.html#PseudoDistributed
It turned out that hadoop simply wasn't picking up my conf files. The examples at the link above assume you are running in your install of hadoop (i.e. /Usr/jane/hadoop-1.1.2). I was trying to run the examples in another directory. I'm sure you could configure hadoop to recognize other 'conf' directories, but I took the easy route and just started running in my hadoop directory.
This thread helped me figure it out: https://issues.apache.org/jira/browse/HDFS-2515

Resources