In my core-site.xml, I changed the hadoop.tmp.dir location in another big HHD (/data/hadoop_tmp), this HHD is not linux /tmp location, then formatted my namenode, started my dfs and yarn, I believe it worked.
But the default location appears in the same folder, and when I use hive, hive-jar is loaded in the default location (/tmp), my /tmp is too small and then hive job fails
I dont know why my config does not work.
Related
everytime my hadoop server reboots, I have to format the namenode to start the hadoop. This removes all of the files in my hadoop installation.
I need to move my hadoop hdfs location from /tmp file to permenant location where whenever the server reboots, I don't have to format the namenode etc.
I am very new to hadoop.
How do I create a hdfs file in another directory?
How do I reference this data directory in config file so that I don't have to format the namenode?
These two properties of the hdfs-site.xml determine where local files are stored.
The defaults are under /tmp
dfs.namenode.name.dir
dfs.datanode.data.dir
You typically have to format a namenode only when the HDFS processes failed to terminate correctly (such as a power failure or forced shutdown). It is encouraged to run a standby Namenode to prevent these scenarios.
How can I change java.io.tmpdir folder for my Hadoop 3 Cluster running on YARN?
By default it gets something like /tmp/***, but my /tmp filesystem is to small for everythingYARN Job will write there.
Is there a way to change it ?
I have also set hadoop.tmp.dir in core-site.xml, but it looks like, it is not really used.
perhaps its a duplicate of What should be hadoop.tmp.dir ?. Also, go through all .conf's in /etc/hadoop/conf and search tmp, see if anything is hardcoded. Also specify:
Whether you see (any) files getting created # what you specified as hadoop.tmp.dir.
What pattern of files are being formed # /tmp/** after your changes are applied.
I have also noticed hive creating files in /tmp. So, you may also have a look # hive-site.xml. Similar for any other ecosystem product you are using.
I have configured yarn.nodemanager.local-dirs property in yarn-site.xml and restarted the cluster. After that spark stopped using /tmp file system and used directories, configured in yarn.nodemanager.local-dirs.
java.io.tmpdir property for spark executors was also set to directories defined in yarn.nodemanager.local-dirs property.
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/somepath1,/anotherpath2</value>
</property>
I have added the new disk to the hortonworks sandbox on the OracleVM, following this example:
https://muffinresearch.co.uk/adding-more-disk-space-to-a-linux-virtual-machine/
I set the owner of the mounted disk directory as hdfs:hadoop recursively and give the 777 permisions to it.
I added the mounted disk folder to the datanode directories after coma using Ambari. Also tried changing XML directly.
And after the restart the dataNode always crashes with the DiskErrorException Too many failed volumes.
Any ideas what I am doing wrong?
I found the workaround of this problem, I mounted the disk to the to the /mnt/sdb folder,
and this folder I use as the datanode directories entry. But if create the /mnt/sdb/data and use it as entry the exception dissapears and everythig works like a charm.
No idea why(
So I installed Hadoop via Cloudera Manager cdh3u5 on CentOS 5. When I run cmd
hadoop fs -ls /
I expected to see the contents of hdfs://localhost.localdomain:8020/
However, it had returned the contents of file:///
Now, this goes without saying that I can access my hdfs:// through
hadoop fs -ls hdfs://localhost.localdomain:8020/
But when it came to installing other applications such as Accumulo, accumulo would automatically detect Hadoop Filesystem in file:///
Question is, has anyone ran into this issue and how did you resolve it?
I had a look at HDFS thrift server returns content of local FS, not HDFS , which was a similar issue, but did not solve this issue.
Also, I do not get this issue with Cloudera Manager cdh4.
By default, Hadoop is going to use local mode. You probably need to set fs.default.name to hdfs://localhost.localdomain:8020/ in $HADOOP_HOME/conf/core-site.xml.
To do this, you add this to core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost.localdomain:8020/</value>
</property>
The reason why Accumulo is confused is because it's using the same default configuration to figure out where HDFS is... and it's defaulting to file://
We should specify data node data directory and name node meta data directory.
dfs.name.dir,
dfs.namenode.name.dir,
dfs.data.dir,
dfs.datanode.data.dir,
fs.default.name
in core-site.xml file and format name node.
To format HDFS Name Node:
hadoop namenode -format
Enter 'Yes' to confirm formatting name node. Restart HDFS service and deploy client configuration to access HDFS.
If you have already did the above steps. Ensure client configuration is deployed correctly and it points to the actual cluster endpoints.
I installed HIVE on CentOS 7 3-node cluster the first time for POC purpose. HIVE is installed inside a user(hduser1)'s root folder and specified in the .bashrc file.
export HIVE_HOME=/home/hduser1/hive
I also created an HDFS folder for HIVE warehouse, with the following commands.
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /user/hive/warehouse
Everything works fine. After I created a table, I saw a file appearing in the warehouse folder.
Here is my question - how does HIVE know about this warehouse path, considering that I did not add this path /user/hive/warehouse in any configuration file?
I saw another person's installation, which created the Hive warehouse folder at /user/hive234/warehouse and that installation still worked. Does HIVE figure it out by some naming convention?
Well, as you know that default location is maintain as /user/hive/warehouse, But you can change location as well, by specifying the desired directory in hive.metastore.warehouse.dir configuration parameter present in the hive-site.xml, one can change this default location.
Here is the example