I want to do some computation with hadoop and mahout on my quad core machine, so I am using hadoop in pseudo-distributed mode.
The problem is that the space on my root drve is limited, so how can I configure it to use space available on some other external hard drive.
You can configure where hdfs strores its data. Add the following to your conf/hdfs-site.xml:
<property>
<name>dfs.data.dir</name>
<value>__path_to_where_you_want_to_store_your_data/hdfs/data/</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>__path_to_where_you_want_to_store_your_data/hdfs/name/</value>
</property>
After theese changes you will have to format your namenode:
hadoop namenode -format
Related
In Hadoop is it mandatory that all the slaves in the Hadoop cluster should be of the same configuration?
Hadoop datanode can have different configuration like datanode total memory, datanode mount points etc.
Example:
Datanode1 etc/hadoop/hdfs-site.xml can be like
<property>
<name>dfs.datanode.data.dir</name>
<value>/mount/data1,/mount/data2,/mount/data3</value>
</property>
Datanode2 etc/hadoop/hdfs-site.xml can be like
<property>
<name>dfs.datanode.data.dir</name>
<value>/mnt/dt1</value>
</property>
According to what I've been reading, you can run Hive without Hadoop or HDFS (like in cases of using Spark or Tez), i.e. in local mode by setting the fs.default.name and hive.metastore.warehouse.dir to local paths. However, when I do this, I get an error:
Starting Hive metastore service.
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
My hive-site.xml file:
<property>
<name>mapred.job.tracker</name>
<value>local</value>
</property>
<property>
<name>hive.metastore.schema.verification/name>
<value>false</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>file:///tmp/hive/warehouse</value>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp/hive</value>
</property>
Does this mean that I still need to have all of the hadoop binaries downloaded and have HADOOP_HOME set to that path? Or does local mode in hive allow me to run without needing all of that content?
Hive doesn't require HDFS or YARN to execute, but it still requires the Hadoop input / output formats like Spark
I'm currently running Apache Ignite Hadoop accelerator for MapReduce. The jobs run, but I am unable to see them in the JobHistoryServer. I wouldn't expect to see the jobs in Yarn's Resource Manager (and don't).
I'm running my MapReduce jobs like
hadoop --config path/to/config/ jar path/to/jar ....
In the mapred-site.xml, I've added
<property>
<name>mapreduce.framework.name</name>
<value>ignite</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>[your_host]:11211</value>
</property>
My mapreduce.jobhistory.* settings have not been changed.
In the core-site.xml I've added
<property>
<name>fs.default.name</name>
<value>igfs://igfs#/</value>
</property>
<property>
<name>fs.igfs.impl</name>
<value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.igfs.impl</name>
<value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
</property>
I've also added ignite-core-1.6.0.jar, ignite-hadoop-1.6.0.jar, and ignite-shmem-1.0.0.jar to the $HADOOP_HOME path. Similarly, I've exported HADOOP_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, and HADOOP_MAPRED_HOME.
Is this functionality not supported by Ignite or am I doing something wrong?
Also, is there a way to track the MapReduce job running on Ignite?
Currently Ignite does not integrate anyhow with Hadoop History server, issue https://issues.apache.org/jira/browse/IGNITE-3766 requests that.
I am trying to install hadoop in a multinode cluster environment. I have installed ubuntu 15.10 on an SSD. I want to install hadoop 2.6.2 on the SSD and keep my HDFS on a separate SATA hard drive. For this, what steps should I follow?
I have installed hadoop in SSD with the following configurations in hdfs-site.xml. Hence I have set the property dfs.datanode.data.dir as file:///media/coea23/HDFS/hdfs/datanode. But the datanode is not showing while executing jps whereas the namenode is showing which is in the SSD where the hadoop installation has been done.
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///media/coea23/HDFS/hdfs/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hduser/hdfs/namenode</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
Please give your valuable suggestion. Thanks in advance.
Kamaruddin
you configure this in the config file...
hdfs-site.xml on each DataNode:
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value>
</property>
edit
Note:
dfs.data.dir and dfs.name.dir are deprecated; you should use dfs.datanode.data.dir and dfs.namenode.name.dir instead, though dfs.data.dir and dfs.name.dir will still work.
What is the right configuration of hdfs-site.xml file while configuring Hadoop.
On all the websites I see this :
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
I used the same configuration but was unable to start the datanode.
later I changed the configuration of datanode to
<name>dfs.datanode.name.dir</name>
instead of
<name>dfs.datanode.data.dir</name>
and it worked.
which one is right name.dir or data.dir?
because all the websites say the data.dir but that does not work in my case.
Thanks guys.
The options are dfs.namenode.name.dir and dfs.datanode.data.dir. The first defines the directory where namenode information will be held. Second defines where datanode information will be held.
If the node is a namenode, you need the namenode folder configuration. If it's a datanode, you need the datanode folder configuration. If it's a single-node cluster, you need both, as the node acts as both namenode and datanode.
If that doesn't work, what is the error message you get? Have you formatted the namenode?