Issue loading data in Hive (group related) - hadoop

I have a sample_data file (created schema earlier in hive table people
Upon running following command to load data in table people:
LOAD DATA LOCAL INPATH 'sample_data.csv' OVERWRITE INTO TABLE people;
I get following trace:
Loading data to table default.people Failed with exception Unable to
move source file:/home/hduser1/sample_data.csv to destination
hdfs://hive-master:54310/user/hive/warehouse/people/sample_data.csv
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
Tried following but went In-vain:
hadoop fs -chmod g+w /user/hive/warehouse
sudo chmod -R 777 /home/hduser1/sample_data.csv
Further, analysis shows something interesting
-rwxrwxrwx 1 hduser1 hadoop_group 2874 Feb 21 09:50 sample_data.csv
Note: File sample_data.csv has access rights for hduser1 in hadoop_group whereas the following line shows that /user/hive/warehouse has access rights are for the hduser1 in supergroup.
drwxrwxrwx - hduser1 supergroup 0 2018-02-21 10:35 /user/hive/warehouse/people
How can I overcome this issue? Am I missing any sort of configurations?

When you use LOCAL option with LOAD DATA INPATH... the file is expected to be on the server running Hive. If you don't have access to it, the best way is to move the data to HDFS manually and use LOAD DATA INPATH...,

Related

unable to create internal hive table

i am getting below error while trying to create hive internal table
CREATE TABLE employee(id INT,Name STRING);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/employee is not a directory or unable to create one)
i have created user/hive/warehouse directory in HDFS location and given below permission also
hadoop#raja-VirtualBox:~$ hadoop fs -ls /user/hive/
Found 1 items
drwxrwxrwx - hadoop supergroup 0 2022-01-21 16:19 /user/hive/warehouse
below is the screenshot
enter image description here
still i am getting error

hive hadoop permissions not correct

I installed apache kylin which requires Hadoop, hove, hbase and java to work. All things are installed correctly. Now when I try to run this example. I get error after the first command ie ${KYLIN_HOME}/bin/sample.sh
and below is the error I am getting
Loading data to table default.kylin_sales
Failed with exception Unable to move source file:/usr/lib/kylin/sample_cube/data/DEFAULT.KYLIN_SALES.csv to destination hdfs://localhost:54310/user/hive/warehouse/kylin_sales/DEFAULT.KYLIN_SALES.csv
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
I have set 777 permissions for both the above path and I am operating as root
Check the hdfs directory permission. If it is not like below, change permission like below
hdfs dfs -chmod g+w /user/hive/warehouse

Not able to create new table in hive from Spark-shell

I am using single node setup in Redhat and installed Hadoop Hive Pig and Spark . I configured hive metadata in Derby and everything . I created new folder for Hive tables and gave full privilege (chmod 777 ) . Then I created one table from Hive CLI and I am able to select those data in Spark-shell and printed those values to the console. But from Spark-shell/Spark-Sql I am not able to create new tables .It is throwing error as
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/2016/hive/test2 is not a directory or unable to create one)
I checked the permission and User(using same user for Installation and Hive and Hadoop Spark etc).
Is there anything need to be done for getting full integration of Spark and Hive
Thanks
Check that the permissions in hdfs are correct (not just the filesystem)
hadoop fs -chmod -R 755 /user
If the error message persists afterwards please update the question.

Hadoop Hive: How to allow regular user continuously write data and create tables in warehouse directory?

I am running Hadoop 2.2.0.2.0.6.0-101 on a single node.
I am trying to run Java MRD program that writes data to an existing Hive table from Eclipse under regular user. I get exception:
org.apache.hadoop.security.AccessControlException: Permission denied: user=dev, access=WRITE, inode="/apps/hive/warehouse/testids":hdfs:hdfs:drwxr-xr-x
This happens because regular user has no write permission to warehouse directory, only hdfs user does:
drwxr-xr-x - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxr-xr-x - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
To circumvent this I change permissions on warehouse directory, so everybody now have write permissions:
[hdfs#localhost wks]$ hadoop fs -chmod -R a+w /apps/hive/warehouse
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxrwxrwx - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
This helps to some extent, and MRD program can now write as a regular user to warehouse directory, but only once. When trying to write data into the same table second time I get:
ERROR security.UserGroupInformation: PriviledgedActionException as:dev (auth:SIMPLE) cause:org.apache.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.testids
Now, if I delete output table and create it anew in hive shell, I again get default permissions that do not allow regular user to write data into this table:
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxr-xr-x - hdfs hdfs 0 2014-03-11 12:19 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
Please advise on Hive correct configuration steps that will allow a program run as a regular user do the following operations in Hive warehouse:
Programmatically create / delete / rename Hive tables?
Programmatically read / write data from Hive tables?
Many thanks!
If you maintain the table from outside Hive, then declare the table as external:
An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.
A Hive administrator can create the table and it can point it toward your own user owned HDFS storage location and you grant Hive permission to read from there.
As a general comment, there are no ways for an unprivileged user to do an unauthorized privileged action. Any such way is technically an exploit and you should never rely on it: even if is possible today, it will likely be closed soon. Hive Authorization (and HCatalog authorization) is orthogonal to HDFS authorization.
Your application is also incorrect, irrelevant of authorization issues. You are trying to write 'twice' in the same table which means your application does not handle partitions correctly. Start from An Introduction to Hive’s Partitioning.
You can configure for hdfs-site.xml such as:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
This configure will disable permissions on HDFS. So, a regular user can do the operations on HDFS.
I hope this solve will help you.

Hive No files matching path file and file Exists

I'm having a lot of trouble getting hive to work. I'm running CDH4.5 with YARN, all installed from Cloudera's yum repo. I followed their instructions to set up hive but for some reason it does not recognize legitimate files on my local file system.
[msknapp#localhost data]$ pwd
/home/msknapp/data
[msknapp#localhost data]$ ll | grep county_insurance_pp.txt
-rw-rw-rw- 1 msknapp msknapp 162537 Jan 5 14:58 county_insurance_pp.txt
[msknapp#localhost data]$ sudo -u hive hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/hive/hive_job_log_9e8bf55b-7ec8-4b79-be9b-cc2200a33f91_1795256456.txt
hive> describe count_insurance;
2014-01-08 02:42:59.000 GMT Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
----------------------------------------------------------------
2014-01-08 02:42:59.443 GMT:
Booting Derby version The Apache Software Foundation - Apache Derby - 10.4.2.0 - (689064): instance a816c00e-0143-6fbb-3f3a-000007a1d270
on database directory /var/lib/hive/metastore/metastore_db
Database Class Loader started - derby.database.classpath=''
OK
fips int
st string
stfips int
name string
a int
b int
c int
d int
e int
f int
total int
Time taken: 5.195 seconds
hive> LOAD DATA LOCAL INPATH 'county_insurance_pp.txt' OVERWRITE INTO TABLE count_insurance;
FAILED: SemanticException Line 1:23 Invalid path ''county_insurance_pp.txt'': No files matching path file:/home/msknapp/data/county_insurance_pp.txt
The file I'm trying to load does exist. I get the same exception when I use an absolute path in my load statement.
On a side note, I still don't know why it keeps giving me a FileNotFoundException for the derby log with a permission warning. A long time ago I went to /var/lib/hive and did 'sudo chmod -R 777 ./*', so permissions should not be a problem.
BTW I am running hadoop in pseudo-distributed mode, and have all three hive daemons running locally. I used hive-server2 not 1.
Somebody please let me know what I'm doing wrong here, or how to debug this.
This is Koji. I had a same problem recently.
The hive script run Hadoop server. If the file county_insurance_pp.txt does not exist on Hadoop server, it can not find the file.
You have to send your target file to Hadoop server before running the script. There are 2 ways to handle this:
use scp
use webhdfs (http://hadoop.apache.org/docs/r1.0.4/webhdfs.html)

Resources