HDP Sandbox SQOOP failed due to permission error - hadoop

Below is the error message:
Unable to move source
hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/DimDepartmentGroup/part-m-00000
to destination
hdfs://sandbox-hdp.hortonworks.com:8020/warehouse/tablespace/managed/hive/dbodimemployee/delta_0000001_0000001_0000:
Permission denied: user=hive, access=WRITE,
inode="/user/maria_dev/DimDepartmentGroup":maria_dev:hdfs:drwxr-xr-x
I am totally confused. The error message itself shows that Maria_dev has write permission on the folder inode="/user/maria_dev/DimDepartmentGroup":maria_dev:hdfs:drwxr-xr-x
What did I miss?

When you run Sqoop , ** generally ** it first loads the Data from Your external database , then store that as a multi-part file at the given location (--target-dir /goldman/yahoo) then from that location to hive table (--hive-table topclient.mpool)
Now you can have access denied at 2 level .
1) If you see access denied at file location /goldman/yahoo, then set filelocation access to 777 running as hdfs user - sudo -u hdfs hadoop fs -chmod 777 /goldman/yahoo
2) If you see access denied while creating table , run sqoop command as user hive, beacuse the user hive has access to hive tables, i.e.
sudo -u hive sqoop import --connect 'jdbc:sqlserver://test.goldman-invest.data:1433;databaseName=Investment_Banking' --username user_***_cqe --password ****** --table cases --target-dir /goldman/yahoo --hive-import --create-hive-table --hive-table topclient.mpool

Finally, I got it to work. I logged in as root and switch to hive user using su - hive.
Then I was able to run the SQOOP command successfully. Previously I logged in as maria_dev and could not use su command. I do not have the password to user hive because hive is not a regular user in HDP sandbox.
Still, it is strange to me that a user needs to have root access to load some data into HDP HIVE.

Related

Create database in console hive without permission in Ranger

I have not kerberos cluster Hadoop. I manage the permission hive, hdfs via Ranger.
The Resource Path in Ranger for HDFS are:
/user/myLogin
/apps/hive/warehouse/mylogin_*
/apps/hive/warehouse
I can create a database in hive ( via console) also in Ambari.
But when I remove the permission /apps/hive/warehouse I can't create a database in Hive (Console) but in Ambari I can create it.
This following the error:
hive> create database database_tesst;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTa sk. MetaException(message:org.apache.hadoop.security.AccessControlException:
Permission denied: user=AAAAA, access=EXECUTE,
inode="/apps/hive/warehouse/database_tesst.db":hdfs:hdfs:d---------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPe rmissionChecker.java:353)
How can I create a database or runing a request in hive (console) without the permission /apps/hive/warehouse ? Because I should remove this permission from Ranger to allow access users only to there data.
Thank you

How can we import all tables in RDBMS into a Hive custom database?

I want to "import-all-tables" using sqoop from mysql to a Hive Custom Database ( Not Hive default Database )
Steps tried:
Create a custom database in hive under "/user/hive/warehouse/Custom.db"
Assigned all permissions for this directory- so there will be NO issues in writing into this directory by sqoop.
Used below command with option "--hive-database" option on CDH5.7 VM :
sqoop import-all-tables
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username retail_dba
--password cloudera
--hive-database "/user/hive/warehouse/sqoop_import_retail.db"
Tables created in hive default database only, not in the custom DB in this case: "sqoop_import_retail.db"
Else its trying to creates tables in the previous HDFS directories (/user/cloudera/categories), and error out stating table already exists:
16/08/30 00:07:14 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
16/08/30 00:07:14 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
[cloudera#quickstart etc]$
How to address this issues?
1. Creating tables in hive custom DB
2. Flushing previous directory references with Sqoop.
You did not mention --hive-import in your command. So, it will import it to HDFS under /user/cloudera/ in your case.
You are exceuting query again. That's why getting Exception
Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
Modify import command:
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --hive-database custom --hive-import
It will fetch all the tables from retail_db of MySQL and create corresponding table to custom database in hive.

Import data from Oracle(Windows) to HDFS (CDH3) machine using sqoop

Hi I am taking a training in HADOOP. I have a task in which I have to import a tables data from oracle(windows, 11g xe) to hdfs using sqoop. I am reading the following article. My question is that how do I exactly import data from windows to hdfs. Noramally I use Winscp to transfer files from Windows to hdfs machine. I have imported data from MySql which was installed in hdfs(cdh3) machine. But I don't know to import data from Oracle in windows to hdfs. Please help.
Link that I am following
Following is the step wise process:
1.Connect oracle sql command line log in with your credentials:
e.g username : system password: system
(make sure that this user has all administrative privileges or connect as sysdba in oracle make a new user with all privilegdes)
Create a user with all privileges in Oracle
Create tables under that user and insert some values and commit
2.Now we need a connector for transferring our data from Oracle to HDFS.
So, we need to download the oracle -sqoop connector jar file and place it in the following path of CDH3.(use sudo in your commands while copying in the following path as it requires admin acess in linux)
/usr/lib/sqoop/bin
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html --Download link--ojdbc6.jar
Use winscp to transfer the downloaded jar from windows to CDH3.then move it to the above mentioned path in CDH3.
3.Command:
sudo bin/sqoop import –connect jdbc:oracle:thin:system/system#192.168.XX.XX:1521:xe–username system -P –table system.emp –columns “ID” –target-dir /sqoopoutput1 -m 1
sqoopoutput is the ouput file in HDFS where you will get your data ,U can change dis as per your
-m 1 : this tells the number no of mappers for this sqoop job here it is 1.
192.168.XX.XX:1521--ip address of your windows machine
you don't need to import data from oracle to local machine. Then copy it to HDFS machine. Then import it in HDFS.
Sqoop is here to import your RDBMS tables in HDFS directory.
Use command:
sqoop import --connect 'jdbc:oracle:thin:#192.xx.xx.xx:1521:ORCL' --username testuser --password testpassword --table testtable --target-dir /tmp/testdata
Go to the machine on which sqoop is running. Go to the terminal (I believe its linux). Just fire above mentioned command and check --target-dir (I mentioned /tmp/testdata in the example command) in hdfs. You will find files corresponding to your oracle table there.
Check sqoop docs for more details.

hadoop sqoop load csv file into mysql

I am learning hadoop sqoop. I am working on a hortonworks sandbox (a hadoop virtual machine of single node: http://hortonworks.com/products/hortonworks-sandbox/#install).
I am trying to load a csv file via sqoop into a mysql table. I have created a database flightinfo and a table weather in it. I created a table in hive called sqoop_tmp with the file location of that csv file.
I used following command to load the csv into mysql:
sqoop export --connect jdbc:mysql://localhost/flightinfo –-table weather –-export-dir /apps/hive/warehouse/sqoop_tmp
Here is the error message:
update: #z-1
I tried your code and it returns something different
You have to provide username and password of MySQL and must have permissions to access the database.
It looks like you did a fresh install of mysql and didn't not configure the root account to secure with root password.
You can do that using the following steps:
$ mysqladmin -u root password "newpassword"
for example $ mysqladmin -u root password mysql-root-password
Restart MySQL daemon. $ sudo service mysqld restart or $ sudo /etcinit.d/mysqld restart
Login to MySQL using $ mysql -u root -P and enter the password when prompted.
If you have your database created as root user. Now you can issue the following sqoop command to export data from hdfs to sqoop database.
$ sqoop export --connect jdbc:mysql://localhost/flightinfo --username root -P --table weather --export-dir /apps/hive/warehouse/sqoop_tmp
Still facing permission issue?? Permission to the user to access database is the problem.
Then, You should be able to solve by granting permissions using the steps below.
Login to MySQL using $ mysql -u root -P
mysql> GRANT all PRIVILEGES on filghtinfo.* to 'root'#'localhost' IDENTIFIED BY 'password'
Note: The password you set here should be used from sqoop.
try with username and password of mysql
sqoop export --connect jdbc:mysql://localhost/flightinfo --table weather --export-dir /apps/hive/warehouse/sqoop_tmp --username SomeUser -P
Note - user should have permission.
Try this:
sqoop export --connect jdbc:mysql://localhost/flightinfo --table weather --export-dir /apps/hive/warehouse/sqoop_tmp
You had –- not --. These two are not the same.

Hive External Table

I am trying to import data from Oracle to Hive using sqoop.
I used the command below once, now I want to overwrite the existing data with new data (Daily Action).
I ran this command again.
sqoop import --connect jdbc:oracle:thin:#UK01WRS6014:2184:WWSYOIT1
--username HIVE --password hive --table OIDS.ALLOCATION_SESSION_DIMN
--hive-overwrite --hive-database OI_DB --hive-table ALLOCATION_SESSION_DIMN
But I am getting an error File already exists:
14/10/14 07:43:59 ERROR security.UserGroupInformation:
PriviledgedActionException as:axchat
(auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException:
Output directory
hdfs://uslibeiadg004.aceina.com:8020/user/axchat/OIDS.ALLOCATION_SESSION_DIMN
already exists
The tables that I created in hive were all external tables.
Like mapreduce, do we have to delete that file everytime we execute the same command ?
Any help would be highly appreciated.
When you delete from an EXTERNAL table you only delete objects in the Hive metastore: you don't delete the files over which that table is superimposed. A non-external table belongs soley to Hive and, when deleted, will result in metastore- AND HDFS-data being removed.
So you can either try deleting the HDFS data explicitly, or define the table as being internal to hive.

Resources