How to configure the actual setting for localhost? - hadoop

My core-site.xml is configured like this.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Now, when I enter 'start-all.cmd' in the command prompt, I see the services startup and I enter this 'localhost:9000' into my web browser, and I get an error message. When I enter this 'localhost:8088' into the web browser, I see the Hadoop cluster, which is up and running just fine. It seems like the core-site.xml is ignored, and the 'localhost:8088' is picked up from somewhere else, but I can't find it. Can someone give me a quick and dirty description of how this actually works? I already Googled for an answer, but I didn't seen anything useful about this.

format name node using this :
hdfs namenode -format
For more information:
Follow installation steps from this Site. It is working perfectly fine.
http://pingax.com/install-hadoop2-6-0-on-ubuntu/

Related

Unable to install hadoop on macosx

I am unable to run hadoop on my Max OSX.
When I run hadoop version -> nothing happens
When I run sudo hadoop version -> it shows my hadoop version. I read somewhere that I shldn't be using sudo to run hadoop. but anyway this tells me that my hadoop is installed?
Because hadoop version returns nothing, I am unable to open any nodes. Everytime I try to start a node with start-dfs.sh, nothing happens as well. Does anyone knows what's happening here? I've look through all the configuration files multiple times to ensure that I have set them right. Not sure where the problem lies here.
What i did:
Followed the instructions from here
In my config files I edited:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
etc/hadoop/hadoop-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home
In fact I managed to run everything before.
I am not sure what I did after tht, but I am unable to start the nodes again. I think it might be because I failed to close my nodes before I shut down my computer before?

how do you enable webhdfs?

I am trying to use webhdfs.
I have place these lines to hdfs-site.xml file:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
I did a stop-all.sh and start-all.sh on hadoop cluster.
I am trying to see if I can use webhdfs by this url call:
http://<myserver>:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS
if I just do:
http://<myserver>:50070
I get hadoop over view page.
So, I can remotely access my server at port 50070 but webhdfs piece does not seem to be working. I would like to be able to access hdfs using rest api calls, to either read or put files.
any ideas what I am missing here?

Namenode not starting -su: /home/hduser/../libexec/hadoop-config.sh: No such file or directory

Installed Hadoop 2.7.1 on Ubuntu 15.10
Everything is working fine, only when I hit JPS , I can see all the demons running, except namenode .
at start it shows : -su: /home/hduser/../libexec/hadoop-config.sh: No such file or directory
When I googled it I came to know that , I can ignore this , as my
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
are set properly and hduser ( the user which runs the hadoop) owns the permission for these folders
any clue ??
After spending some time , this simple change worked for me .
press ifconfig.
copy ip address
sudo gedit /etc/hosts
comment this line
#127.0.0.1 localhost
add the following line
10.0.2.15(your ip address) Hadoop-NameNode
This might be problem due to frequent Namenode format. Please see the namenode logs in logger.
Probable solution :
Check your hadoop.tmp.dir in core-site.xml.
On that location, make sure that you have same clusterid for namenode and datanode(otherwise make them same).
You can see clusterid inside VERSION file in dfs/name/current and dfs/data/current. If that make sense.

Hadoop/MR temporary directory

I've been struggling with getting Hadoop and Map/Reduce to start using a separate temporary directory instead of the /tmp on my root directory.
I've added the following to my core-site.xml config file:
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
I've added the following to my mapreduce-site.xml config file:
<property>
<name>mapreduce.cluster.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>${hadoop.tmp.dir}/mapred/staging</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>${hadoop.tmp.dir}/mapred/temp</value>
</property>
No matter what job I run though, it's still doing all of the intermediate work out in the /tmp directory. I've been watching it do it via df -h and when I go in there, there are all of the temporary files it creates.
Am I missing something from the config?
This is on a 10 node Linux CentOS cluster running 2.1.0.2.0.6.0 of Hadoop/Yarn Mapreduce.
EDIT:
After some further research, the settings seem to be working on my management and namednode/secondarynamed nodes boxes. It is only on the data nodes that this is not working and it is only with the mapreduce temporary output files that are still going to /tmp on my root drive, not the my data mount where I have set in the configuration files.
If you are running Hadoop 2.0, then the proper name of the config file you need to change is mapred-site.xml, not mapreduce-site.xml.
An example can be found on the Apache site: http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
and it uses the mapreduce.cluster.local.dir property name, with a default value of ${hadoop.tmp.dir}/mapred/local
Try renaming your mapreduce-site.xml file to mapred-site.xml in your /etc/hadoop/conf/ directories and see if that fixes it.
If you are using Ambari, you should be able to just go to use the "Add Property" button on the MapReduce2 / Custom mapred-site.xml section, enter 'mapreduce.cluster.local.dir' for the property name, and a comma separated list of directories you want to use.
I think you need to specify this property in hdfs-site.xml rather than core-site.xml.Try setting this property in hdfs-site.xml. I hope this will solve your problem
The mapreduce properties should be in mapred-site.xml.
I was facing a similar issue where some nodes would not honor the hadoop.tmp.dir set in the config.
A reboot of the misbehaving nodes fixed it for me.

Bigtop Hbase tables disappeared after PC restart

I installed Bigtop 0.7.0 on Ubuntu 12.04 and I started without any problem the master server with:
sudo hbase master start
I was able to connect with hbase shell and create a table.
After I restarted the PC, I saw that table is not there anymore.
I read that the problem is that it stores tables in /tmp which is cleared after restart, so I tried to change the configuration hbase-site.xml to set another folder.
the default hbase-site.xml was:
<configuration/>
(No properties defined)
When I wrote in hbase-site.xml, then I tried to start the hbase master again and I recieved Zookeeper client exception not possible to connect to server.
Can you please give me some advice on how to configure this right or if there is maybe some other problem that I'm not aware of?
EDIT (from the comments):
My hbase-site.xml is:
<configuration>
<!--property>
<name>hbase.rootdir</name>
<value>file://app/hadoop/tmp/hbase</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<property-->
</configuration>

Resources