I have created a hive UDF JAR file and I am trying to deploy it. For this, I have put all the files into my edge node location /opt/hive/jars and set this path in hive-site.xml file.
<property>
<name>hive.aux.jars.path</name>
<value>/opt/hive/jars</value>
</property>
I have restarted by hive server to using following command
sudo restart hive-server2
However when i login to my beeline I am not able to see jars. When I create a function and call it it's giving an error.
Update 1:
I put the file on hdfs and included that location as well. No luck.
I included the same property in /etc/hive/conf/hiveserver2-site.xml but no luck.
Directory where Jars are located in owned by hive user and has 777 permission.
Update 2:
I checked from which path the other jars files are being picked up.
I put my jars files into these location and restarted the hive server. And Now it's working.
I've followed the instructions in Hadoop the definitive guide, 4th edition : Appendix A to configure Hadoop in pseudo-distributed mode. Everything is working good, except for when I try to make a directory :
hadoop fs -mkdir -p /user/$USER
The commande is returning the following message : mkdir: /user/my_user_name': Input/output error.
Although, when I first log into my root account sudo -s and then type the hadoop fs -mkdir -p /user/$USER commande, the directory 'user/root'is created (all directories in the path).
I think I'm having Hadoop permission issues.
Any help would be really appreciated,
Thanks.
It means that you have a mistake in the 'core-site.xml' file. For instance, I had an error in the first line (name) in which I wrote 'fa.defaultFS' instead 'fs.defaultFS'.
After that, you have to execute the script 'stop-all.sh' to stop Hadoop. Probably, here, you will have to format the namenode with the commands: 'rm -Rf /app/tmp/your-username/*' and 'hdfs namenode -format'. Next, you have to start Hadoop with the 'start-all.sh' script.
Maybe, you have to reboot the system when you have executed the stop script.
After these steps, I could run that command again.
I Corrected the core-site.xml file based on standard commands and it works fine now.
<property>
<name>hadoop.tmp.dir</name>
<value>/home/your_user_name/hadooptmpdata</value>
<description>Where Hadoop will place all of its working files</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>Where HDFS NameNode can be found on the network</description>
</prosperty>
When I try to run Phoenix's sqlline.py localhostcommand, I get
WARN util.DynamicClassLoader: Failed to identify the fs of
dir hdfs://localhost:54310/hbase/lib, ignored
java.io.IOException: No FileSystem for scheme:
hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass...
and nothing else happens. I also could not get Squirrel to work (it freezes when I click 'list drivers').
As per these instructions, I have copied phoenix-4.2.1-server.jar to my hbase/lib folder and restarted hbase. I have also copied core-site.xml and hbase-site.xml to my phoenix/bin directory.
I have not added 'the phoenix-[version]-client.jar to the classpath of any Phoenix client'
since I do not know what this refers to.
I am using HBase 0.98.6.1-hadoop2, Phoenix 4.2.1 and hadoop 2.2.0.
I fix the same issue by adding setting in
${PHOENIX_HOME}/bin/hbase-site.xml
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
I have installed Hadoop in pseudo distributed mode on my laptop, OS is Ubuntu.
I have changed paths where hadoop will store its data (by default hadoop stores data in /tmp folder)
hdfs-site.xml file looks as below :
<property>
<name>dfs.data.dir</name>
<value>/HADOOP_CLUSTER_DATA/data</value>
</property>
Now whenever I restart machine and try to start hadoop cluster using start-all.sh script, data node never starts. I confirmed that data node is not start by checking logs and by using jps command.
Then I
Stopped cluster using stop-all.sh script.
Formatted HDFS using hadoop namenode -format command.
Started cluster using start-all.sh script.
Now everything works fine even if I stop and start cluster again. Problem occurs only when I restart machine and try to start the cluster.
Has anyone encountered similar problem?
Why this is happening and
How can we solve this problem?
By changing dfs.datanode.data.dir away from /tmp you indeed made the data (the blocks) survive across a reboot. However there is more to HDFS than just blocks. You need to make sure all the relevant dirs point away from /tmp, most notably dfs.namenode.name.dir (I can't tell what other dirs you have to change, it depends on your config, but the namenode dir is mandatory, could be also sufficient).
I would also recommend using a more recent Hadoop distribution. BTW, the 1.1 namenode dir setting is dfs.name.dir.
For those who use hadoop 2.0 or above versions config file names may be different.
As this answer points out, go to the /etc/hadoop directory of your hadoop installation.
Open the file hdfs-site.xml. This user configuration will override the default hadoop configurations, that are loaded by the java classloader before.
Add dfs.namenode.name.dir property and set a new namenode dir (default is file://${hadoop.tmp.dir}/dfs/name).
Do the same for dfs.datanode.data.dir property (default is file://${hadoop.tmp.dir}/dfs/data).
For example:
<property>
<name>dfs.namenode.name.dir</name>
<value>/Users/samuel/Documents/hadoop_data/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/samuel/Documents/hadoop_data/data</value>
</property>
Other property where a tmp dir appears is dfs.namenode.checkpoint.dir. Its default value is: file://${hadoop.tmp.dir}/dfs/namesecondary.
If you want, you can easily also add this property:
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/Users/samuel/Documents/hadoop_data/namesecondary</value>
</property>
I just get started on Apache Hive, and I am using my local Ubuntu box 12.04, with Hive 0.10.0 and Hadoop 1.1.2.
Following the official "Getting Started" guide on Apache website, I am now stuck at the Hadoop command to create the hive metastore with the command in the guide:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
the error was mkdir: failed to create /user/hive/warehouse
Does Hive require hadoop in a specific mode? I know I didn't have to do much to my Hadoop installation other that update JAVA_HOME so it is in standalone mode. I am sure Hadoop itself is working since I am run the PI example that comes with hadoop installation.
Also, the other command to create /tmp shows the /tmp directory already exists so it didn't recreate, and /bin/hadoop fs -ls is listing the current directory.
So, how can I get around it?
Almost all examples of the documentation have this command wrong. Just like unix you will need the "-p" flag to create the parent directories as well unless you have already created them. This command will work.
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
When running hive on local system, just add to ~/.hiverc:
SET hive.metastore.warehouse.dir=${env:HOME}/Documents/hive-warehouse;
You can specify any folder to use as a warehouse. Obviously, any other hive configuration method will do (hive-site.xml or hive -hiveconf, for example).
That's possibly what Ambarish Hazarnis kept in mind when saying "or Create the warehouse in your home directory".
This seems like a permission issue. Do you have access to root folder / ?
Try the following options-
1. Run command as superuser
OR
2.Create the warehouse in your home directory.
Let us know if this helps. Good luck!
When setting hadoop properties in the spark configuration, prefix them with spark.hadoop.
Therefore set
conf.set("spark.hadoop.hive.metastore.warehouse.dir","/new/location")
This works for older versions of Spark. The property has changed in spark 2.0.0
Adding answer for ref to Cloudera CDH users who are seeing this same issue.
If you are using Cloudera CDH distribution, make sure you have followed these steps:
launched Cloudera Manager (Express / Enterprise) by clicking on the desktop icon.
Open Cloudera Manager page in browser
Start all services
Cloudera has /user/hive/warehouse folder created by default. Its just that YARN and HDFS might not be up and running to access this path.
While this is a simple permission issue that was resolved with sudo in my comment above, there are a couple of notes:
create it in home directory should work as well, but then you may need to update hive setting for the path of metastore, which I think defaults to /user/hive/warehouse
I ran into another error of CREATE TABLE statement with Hive shell, the error was something like this:
hive> CREATE TABLE pokes (foo INT, bar STRING);
FAILED: Error in metadata: MetaException(message:Got exception: java.io.FileNotFoundException File file:/user/hive/warehouse/pokes does not exist.)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
It turns to be another permission issue, you have to create a group called "hive" and then add the current user to that group and change ownership of /user/hive/warehouse to that group. After that, it works. Details can be found from this link below:
http://mail-archives.apache.org/mod_mbox/hive-user/201104.mbox/%3CBANLkTinq4XWjEawu6zGeyZPfDurQf+j8Bw#mail.gmail.com%3E
if you r running linux check (in hadoop core-site.xml ) data directory & permission, it looks like you ve kept the default which is /data/tmp and im most cases that will take root permission ..
change the xml config file , delete /data/tmp and run fs format (OC after you ve modified the core xml config)
I recommend using upper versions of hive i.e. 1.1.0 version, 0.10.0 is very buggy.
Run this command and try to create a directory it would grant full permission for the user in hdfs /user directory.
hadoop fs -chmod -R 755 /user
I am using MacOS and homebrew as package manager. I had to set the property in hive-site.xml as
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/local/Cellar/hive/2.3.1/libexec/conf/warehouse</value>
</property>