Cloudera Manager - dfs.datanode.du.reserved not working - hadoop

I have set dfs.datanode.du.reserved property to 10 GB using Cloudera Manager. But when I check the map-reduce job.xml file, I find dfs.datanode.du.reserved is still set to 0. How do I verify whether the property is set ??
PS: I am using Cloudera Standard 4.7.2 with CDH 4.4.0

This flag is set in the hdfs-site.xml and not in the mapred-site.xml.
You will not be able to see this flag in the client configurations (/etc/hadoop/conf/hdfs-site.xml) without tweaking configuration.
It is only set in the datanode configuration that is regenerated by Cloudera Manager. This configuration can be found in /var/run/cloudera-scm-agent/process/XXXXXX-hdfs-DATANODE/hdfs-site.xml, where XXXXXX is a incremented number of some kind (used by Cloudera Manager).
From within Cloudera Manager you can see this flag on Datanode (), click Processes, then Configuration files/Environment - Show and then you find the hdfs-site.xml for a datanode.

Related

Cloudera Manager and hdfs-site.xml

When using Cloudera Manager I can access to the hdfs-site.xml file via :
Cloudera Manager > Cluster > HDFS > Instances > (NameNode, for example)> Processes
COnfiguration Files > hdfs-site.xml
Then the URL points to :
http://quickstart.cloudera:7180/cmf/process/8/config?filename=hdfs-site.xml
Is this file accessible directly via the file system and if yes, where is it located
The configurations set in the Cloudera Manager are stored in the Cloudera Manager Database. They are not persisted in the configuration files as in for other distributions.
On starting the service, the related configurations are passed as runtime configurations to the Cloudera Agent running on the node where the service is to be started. These passed on configurations are stored in the runtime directory /var/run/cloudera-scm-agent/ of the agent.

CDH 5.9 dfs.datanode.data.dir configuration

I've installed CDH 5.9 using Cloudera Manager Installer, where I've specified directories for HDFS metadata (/dfs/nn) and actual data (/dfs/dn).
After installation HDFS works correctly and stores metadata and data in defined in Claudera Manager locations, but in /etc/hadoop/hdfs-site.xml there is no setting for dfs.datanode.data.dir parameter.
Running following command returns default location for data.dir:
# hdfs getconf -confKey dfs.datanode.data.dir
file:///tmp/hadoop-root/dfs/data.
Can anyone tell where in CDH5.9 I can find configuration for HDFS that reflects my setup?
Regards,
Search for dfs.datanode.data.dir from cloudera Manager, you will get default values, from there you can change

How to find installation mode of Hadoop 2.x

what is the quickest way of finding the installation mode of the Hadoop 2.x?
I just want to learn the best way to find the mode when I login first time into a Hadoop installed machine.
In hadoop 2 - go to /etc/hadoop/conf folder and check the Fs.defaultFS in core-site.xml and Yarn.resourcemanager.hostname property in yarn-site.xml. The values for those properties decide which mode you are running in.
Fs.defaultFS
Standalone mode - file:///
pseudo distributed- hdfs://localhost:8020/
Fully distributed - hdfs://namenodehostname:8020/
Yarn.resourcemanager.hostname
Standalone mode - file:///
pseudo distributed - hdfs://localhost:8021/
Fully ditributed - hdfs://resourcemanagerhostname:8021/
Alternatively you can use jps command to check the mode. if you see namenode/secondary namenode /jobtracker daemons running separately then it is distributed.
similarly in MR1 go to /etc/hadoop/conf folder and check the fs.default.name in core-site.xml and mapred.job.tracker property in mapred-site.xml.

Setting up Hadoop Client on Mac OS X

Currently, I have 3-node cluster running CDH 5.0 using MRv1. I am trying to figure out how to setup Hadoop on my Mac. So, I can submit jobs to the cluster. According to the "Managing Hadoop API Dependencies in CDH 5", you just need the files in /usr/lib/hadoop/client-0.20/* Do I need the following files too? Does Cloudera has hadoop-client in tarball?
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
Yes, I'nk you can make use of cloudera tarball for setting up hadoop client, the same can be downloaded from the following path, configuration files are availble under etc/hadoop/ directory under Hadoop, just need to modify those files according to your environment.
http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.2.0-cdh5.0.0-beta-2.tar.gz
If the above link doesn't match your version, use the following link for getting the available hadoop versions
http://archive-primary.cloudera.com/cdh5/cdh/5/

Best place for json Serde JAR in CDH Hadoop for use with Hive/Hue/MapReduce

I'm using Hive/Hue/MapReduce with a json Serde. To get this working I have copied the json_serde.jar to several lib directories on every cluster node:
/opt/cloudera/parcels/CDH/lib/hive/lib
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib
/opt/cloudera/parcels/CDH/lib/hadoop/lib
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib
...
On every CDH update of the cluster I have to do that again.
Is there a more elegant way where the distribution of the Serde in the cluster would be automatic and resistant to updates?
If using HiveServer2 (Default in Cloudera 5.0+) the following configuration will work across your entire cluster without having to copy the jar to each node.
In your hive-site.xml config file, or if you're using Cloudera Manager in the "HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml" config box
<property>
<name>hive.aux.jars.path</name>
<value>/user/hive/aux_jars/hive-serdes-1.0-snapshot.jar</value>
</property>
Then create the directory in your HDFS filesystem (/user/hive/aux_jars) and place the jar file in it. If you are running HUE you can do this part via the web UI, just click on File Browser at the top right.
It depends on the version of Hue and if using Beeswax or HiveServer2:
Beeswax: there is a workaround with the HIVE_AUX_JARS_PATH https://issues.cloudera.org/browse/HUE-1127
HiveServer2 supports a hive.aux.jars.path property in the hive-site.xml. HiveServer2 does not support a .hiverc and Hue is looking at providing an equivalent at some point: https://issues.cloudera.org/browse/HUE-1066

Resources