Snappy Compression not working due to tmp folder previliges - hadoop

I have a problem whenever i am trying to store my data in a compressed format with pig, Sqoop, or Spark. I know the problem is with mounting our tmp folder to nonexec and this causes for instance snappy to give me this error:
java.lang.IllegalArgumentException: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-fe4e30d0-e4a5-4b1a-ae31-fd1861117288-libsnappyjava.so: /tmp/snappy-1.1.2-fe4e30d0-e4a5-4b1a-ae31-fd1861117288-libsnappyjava.so: failed to map segment from shared object: Operation not permitted
The solutions that i found in the internet is that either mount the tmp folder to exec which is not an option for me as the sysadmin won't allow it due to security concerns.The other option is to change the java opts execution path to some other paths instead of tmp.
I have tried the following approach, but it didn't solve the problem.
add these lines to hadoop-env.sh and sqoop-env
export HADOOP_OPTS="$HADOOP_OPTS -Dorg.xerial.snappy.tempdir=/newpath"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.io.tmpdir=/newpath"
I would appreciate if you guys have any other solutions that could solve the issue.
Thanks

For other users with this issue, try starting Hive with
hive --hiveconf org.xerial.snappy.tempdir=/../
and supply a location that can execute

Related

How hive access Hadoop setup using different user

If i install hadoop using 'hadoop' user, and install hive using 'hive' user on same node(Pseudo distribution mode).
How can my hive access hadoop?
when i input 'hive --version', i receive error like this:
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path.
The question is hive user have no right to access hadoop, but i don't know how to fix it.
Thanks a lot.
As the error says, $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path.
So, edit /home/hive/.bash_profile, for example assuming you're on Linux, and add one of those values to set the environment variable to the downloaded Hadoop package
For example
export HADOOP_HOME=/opt/hadoop # example
export PATH=$HADOOP_HOME/bin:$PATH

Hive couldn't create directory in hdfs and fail to start?

I am deploying hive 2.3 in remote mode with a mysql database in another machine as metastore.
I am about to finish the whole process and I am checking whether the deployment is working by running bin/hive
Then I got this error:
Exception in thread "main" java.lang.RuntimeException: Couldn't create directory /user/hive/tmp/54de671c-0236-49e2-b967-7c3da8973f3a_resources
I know this is set by the property hive.downloaded.resources.dir in hive-site.xml. And I set it to be /user/hive/tmp/${hive.session_id}_resources.
I have create /user/hive/tmp in hdfs.
I have changed the directory access $hdfs dfs -chmod -R 777 /user/hive/tmp
I recently met this problem, too. I have solved it. The key is that "/user/hive/tmp" is not in hdfs, it's in your local folder, and you should create "/user/hive/tmp" in your local folder and change the directory access. Hope to help you solve the problem.

pig local mode spill data issue

I am trying to solve this issue but unable to understand. The pig script in my Development machine ran on a 1.8 GB data file successfully.
When I am trying to run it in server it is stating that it cannot find a local device to spill data spill0.out
I have modified the pig.temp.Dir property in the pig.property file to point to a location having space..
error:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out
So how to find out where pig is spilling out the data and can we change the pig spill directory location as well somehow.
I using pig in local mode.
Any ideas or suggestions or workarounds will be of great help.
Thanks..
I found an answer.
We need to put the follwing to the $PIG_HOME/conf/pig.properties file
mapreduce.jobtracker.staging.root.dir
mapred.local.dir
pig.temp.dir
and then test.
This has helped me solve the problem.
This is not a problem with Pig.
I'm not using Pig and I also have exactly the same error.
The problem seems to be more related to Hadoop. I also use it in local mode. I'm using Hadoop 2.6.0
I had no luck with these answers, Pig (version 0.15.0) was still writing pigbag* files to /tmp dir so I just renamed my /tmp dir and created a symbolic link to the desired location like this:
sudo -s #change to root
cd /
mv tmp tmp_local
ln -s /desired/new/tmp/location tmp
chmod 1777 tmp
mv tmp_local/* tmp
Make sure there are no active applications writing to tmp folder at the time of running these commands.

Hive failed to create /user/hive/warehouse

I just get started on Apache Hive, and I am using my local Ubuntu box 12.04, with Hive 0.10.0 and Hadoop 1.1.2.
Following the official "Getting Started" guide on Apache website, I am now stuck at the Hadoop command to create the hive metastore with the command in the guide:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
the error was mkdir: failed to create /user/hive/warehouse
Does Hive require hadoop in a specific mode? I know I didn't have to do much to my Hadoop installation other that update JAVA_HOME so it is in standalone mode. I am sure Hadoop itself is working since I am run the PI example that comes with hadoop installation.
Also, the other command to create /tmp shows the /tmp directory already exists so it didn't recreate, and /bin/hadoop fs -ls is listing the current directory.
So, how can I get around it?
Almost all examples of the documentation have this command wrong. Just like unix you will need the "-p" flag to create the parent directories as well unless you have already created them. This command will work.
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
When running hive on local system, just add to ~/.hiverc:
SET hive.metastore.warehouse.dir=${env:HOME}/Documents/hive-warehouse;
You can specify any folder to use as a warehouse. Obviously, any other hive configuration method will do (hive-site.xml or hive -hiveconf, for example).
That's possibly what Ambarish Hazarnis kept in mind when saying "or Create the warehouse in your home directory".
This seems like a permission issue. Do you have access to root folder / ?
Try the following options-
1. Run command as superuser
OR
2.Create the warehouse in your home directory.
Let us know if this helps. Good luck!
When setting hadoop properties in the spark configuration, prefix them with spark.hadoop.
Therefore set
conf.set("spark.hadoop.hive.metastore.warehouse.dir","/new/location")
This works for older versions of Spark. The property has changed in spark 2.0.0
Adding answer for ref to Cloudera CDH users who are seeing this same issue.
If you are using Cloudera CDH distribution, make sure you have followed these steps:
launched Cloudera Manager (Express / Enterprise) by clicking on the desktop icon.
Open Cloudera Manager page in browser
Start all services
Cloudera has /user/hive/warehouse folder created by default. Its just that YARN and HDFS might not be up and running to access this path.
While this is a simple permission issue that was resolved with sudo in my comment above, there are a couple of notes:
create it in home directory should work as well, but then you may need to update hive setting for the path of metastore, which I think defaults to /user/hive/warehouse
I ran into another error of CREATE TABLE statement with Hive shell, the error was something like this:
hive> CREATE TABLE pokes (foo INT, bar STRING);
FAILED: Error in metadata: MetaException(message:Got exception: java.io.FileNotFoundException File file:/user/hive/warehouse/pokes does not exist.)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
It turns to be another permission issue, you have to create a group called "hive" and then add the current user to that group and change ownership of /user/hive/warehouse to that group. After that, it works. Details can be found from this link below:
http://mail-archives.apache.org/mod_mbox/hive-user/201104.mbox/%3CBANLkTinq4XWjEawu6zGeyZPfDurQf+j8Bw#mail.gmail.com%3E
if you r running linux check (in hadoop core-site.xml ) data directory & permission, it looks like you ve kept the default which is /data/tmp and im most cases that will take root permission ..
change the xml config file , delete /data/tmp and run fs format (OC after you ve modified the core xml config)
I recommend using upper versions of hive i.e. 1.1.0 version, 0.10.0 is very buggy.
Run this command and try to create a directory it would grant full permission for the user in hdfs /user directory.
hadoop fs -chmod -R 755 /user
I am using MacOS and homebrew as package manager. I had to set the property in hive-site.xml as
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/local/Cellar/hive/2.3.1/libexec/conf/warehouse</value>
</property>

Cannot find hadoop installation: $HADOOP_HOME must be set or hadoop must be in the path

So a little background. I've been trying to setup Hive on a CentOS 6 machine. I followed the instructions of this Youtube video: http://www.youtube.com/watch?v=L2lSrHsRpOI
For my case, I'm using Hadoop-1.1.2 and Hive 0.9.0, all the directories labeled "mnt" in this video I replaced it with "opt" because that's where all of my hadoop and hive packages have been opened up.
As I reached the portion of the video where I was actually supposed to run Hive via "./hive"
this error popped up:
"Cannot find hadoop installation: $HADOOP_HOME must be set or hadoop must be in the path"
I guess one of the questions I have is, in which directory did I have to edit the ".profile" file? because I don't understand why we would have to go to the "home" directory for this change. And also if this helps, this is what I had put down in the ".profile" file in my /home/hadoop directory
export HADOOP_HOME=/opt/hadoop/hadoop
export HIVE_HOME=/opt/hadoop/hive
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin
Thank you so much!
Go to /etc/profile.d directory and create a hadoop.sh file in there with
export HADOOP_HOME=/opt/hadoop/hadoop
export HIVE_HOME=/opt/hadoop/hive
export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin
After you save the file, make sure to
chmod +x /etc/profile.d/hadoop.sh
source /etc/profile.d/hadoop.sh
This should take care of it.

Resources