Load local data from file system to Hive table - hadoop

I am running the insert script in hive by load '.csv' file from local file system into hive table as below:
load data local inpath 'xxx.csv' into table xxx;
I got an error say
Failed with exception Unable to move source
file:/home/hadoop/hbase-data/xxx.csv to destination
hdfs://xxx.xxx.xxx:8020/test/xxx.csv FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Can anyone help me out with this?
Thanks so much for your effort.

Related

how to find the location of a dataset in cloudera

I am getting the below error
AnalysisException: INPATH location 'hdfs://quickstart.cloudera:8020/home/cloudera/UNSW_NB15.csv' does not exist.
when i put my code as
LOAD DATA INPATH '/home/cloudera/UNSW_NB15.csv' OVERWRITE INTO TABLE mybigdata_qu

How to delete an external table in Hive when the hdfs path has been deleted?

I've removed my HDFS path /user/abc, and some Hive tables were stored in /user/abc/data/abc.db , with a rm -R command.
Despite having my regular tables correctly deleted with Hive SQL, my external tables didn't drop, with the following error:
[Code: 1, SQL State: 08S01] Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to load storage handler: Error in loading storage handler.org.apache.phoenix.hive.PhoenixStorageHandler)
How can I safely delete the tables?
I tried using:
delete from TBL_COL_PRIVS where TBL_ID=[myexternaltableID];
delete from TBL_PRIVS where TBL_ID=[myexternaltableID];
delete from TBLS where TBL_ID=[myexternaltableID];
But it didn't work with the following error message:
[Code: 10297, SQL State: 42000] Error while compiling statement: FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table sys.TBLS that is not transactional
Thank you,
NB: I know a schema is supposed to be deleted more safely with HiveQL but on this particular case, this was not done this way.
Solution is to delete the tables from Hive Metastore (PostgreSQL) with
delete from "TABLE_PARAMS" where "TBL_ID"='[myexternaltableID]';
delete from "TBL_COL_PRIVS" where "TBL_ID"='[myexternaltableID]';
delete from "TBL_PRIVS" where "TBL_ID"='[myexternaltableID]';
delete from "TBLS" where "TBL_ID"='[myexternaltableID]';
NB: Order is important.

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

EXTERNAL TABLE to a file in Hive?

Is it possible to use file in LOCATION for external table in HIVE?
CREATE EXTERNAL TABLE table1
(
line string
)
LOCATION '/hdp_in/fd/file.txt.gz';
cause I get an error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.FileAlreadyExistsException Parent path is not a directory: /hdp_in/fd/file.txt.gz file.txt.gz
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1957)
(...)
Do I have to use only directories? I haven't found that info in Manual Reference...
Regards
Pawel
Yes you will have to put this file in a directory and then create an external table on top of it. As per the documentation : An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir
Even if you create an internal table hive by default creates a directory for it inside the hive.metastore.warehouse.dir and the same behavior is expected while creating an external table except for the fact that the default directory is not used.

Error in metadata: MetaException(message:java.lang.IllegalStateException: Can't overwrite cause)

I have created a external table in hive and when I provide the location of the data for this table I get the following error:
FAILED: Error in metadata: MetaException(message:java.lang.IllegalStateException: Can't overwrite cause)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Also I am able to load the same file using PIG Script using the PigStorage() loader function.
I have the following permissions on the file: rw-rw-r-
and on the folder where this file resides (Giving the path of this folder in location in the query ) : drwxrwxr-x
What can be the cause for this and how to correct this error ?
The solution is to have write permission on the file....
Another possible cause of this issue is having your LOCATION wrong for your hive table (in case someone else has this issue and can't figure out what is going wrong).

Resources