how to find the location of a dataset in cloudera - hadoop

I am getting the below error
AnalysisException: INPATH location 'hdfs://quickstart.cloudera:8020/home/cloudera/UNSW_NB15.csv' does not exist.
when i put my code as
LOAD DATA INPATH '/home/cloudera/UNSW_NB15.csv' OVERWRITE INTO TABLE mybigdata_qu

Related

Alter table in hive is not working for serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' in Hive "Apache Hive (version 2.1.1-cdh6.3.4)"

Environment:
Apache Hive (version 1.1.0-cdh5.14.2)
I tried creating a table with below DDL.
create external table test1 (v_src_code string,d_extraction_date date) partitioned by (d_mis_date date) row format serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' with serdeproperties ("field.delim"="~|") stored as textfile location '/hdfs_path/test1' tblproperties("serialization.null.format"="");
Then I alter this table by adding one extra column as below.
alter table test1 add columns(n_limit_id bigint);
This is working perfectly fine.
But recently our cluster got upgraded. The new environment is
Apache Hive (version 2.1.1-cdh6.3.4)
The same table is created in this new environment. When I do alter table I get below error.
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: type expected at the position 0 of '<derived from deserializer>:bigint' but '<' is found. (state=08S01,code=1)

How to delete an external table in Hive when the hdfs path has been deleted?

I've removed my HDFS path /user/abc, and some Hive tables were stored in /user/abc/data/abc.db , with a rm -R command.
Despite having my regular tables correctly deleted with Hive SQL, my external tables didn't drop, with the following error:
[Code: 1, SQL State: 08S01] Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed to load storage handler: Error in loading storage handler.org.apache.phoenix.hive.PhoenixStorageHandler)
How can I safely delete the tables?
I tried using:
delete from TBL_COL_PRIVS where TBL_ID=[myexternaltableID];
delete from TBL_PRIVS where TBL_ID=[myexternaltableID];
delete from TBLS where TBL_ID=[myexternaltableID];
But it didn't work with the following error message:
[Code: 10297, SQL State: 42000] Error while compiling statement: FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table sys.TBLS that is not transactional
Thank you,
NB: I know a schema is supposed to be deleted more safely with HiveQL but on this particular case, this was not done this way.
Solution is to delete the tables from Hive Metastore (PostgreSQL) with
delete from "TABLE_PARAMS" where "TBL_ID"='[myexternaltableID]';
delete from "TBL_COL_PRIVS" where "TBL_ID"='[myexternaltableID]';
delete from "TBL_PRIVS" where "TBL_ID"='[myexternaltableID]';
delete from "TBLS" where "TBL_ID"='[myexternaltableID]';
NB: Order is important.

Load local data from file system to Hive table

I am running the insert script in hive by load '.csv' file from local file system into hive table as below:
load data local inpath 'xxx.csv' into table xxx;
I got an error say
Failed with exception Unable to move source
file:/home/hadoop/hbase-data/xxx.csv to destination
hdfs://xxx.xxx.xxx:8020/test/xxx.csv FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Can anyone help me out with this?
Thanks so much for your effort.

Spark SQL failed while reading table format OpenCSVSerde ClassNotFoundException Class org.apache.hadoop.hive.serde2.OpenCSVSerde not found

I am trying to load a hive table format SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' to PySpark using below code.
from pyspark.sql import HiveContext
ent = sqlContext.sql("select * from temp.employee")
Error:
MetaException(message:java.lang.ClassNotFoundException Class org.apache.hadoop.hive.serde2.OpenCSVSerde not found)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:290)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:281)
I have tried adding Hive serde jars to Spark classpath as below
spark.driver.extraClassPath=/usr/hdp/2.3.2.0-2950/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/hdp/2.3.2.0-2950/hadoop/lib/hive-serde-1.2.1.2.3.2.0-2950.jar
spark.executor.extraClassPath=/usr/hdp/2.3.2.0-2950/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/hdp/2.3.2.0-2950/hadoop/lib/hive-serde-1.2.1.2.3.2.0-2950.jar
After adding above jars, the table reading went fine for this table but it is failing for other normal tables with below error.
Py4JJavaError: An error occurred while calling o38.sql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/type/HiveIntervalYearMonth
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.<clinit>(PrimitiveObjectInspectorUtils.java:228)
at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:341)
Is there something wrong in adding the jars to spark executor path?
How to resolve this?

EXTERNAL TABLE to a file in Hive?

Is it possible to use file in LOCATION for external table in HIVE?
CREATE EXTERNAL TABLE table1
(
line string
)
LOCATION '/hdp_in/fd/file.txt.gz';
cause I get an error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.FileAlreadyExistsException Parent path is not a directory: /hdp_in/fd/file.txt.gz file.txt.gz
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1957)
(...)
Do I have to use only directories? I haven't found that info in Manual Reference...
Regards
Pawel
Yes you will have to put this file in a directory and then create an external table on top of it. As per the documentation : An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir
Even if you create an internal table hive by default creates a directory for it inside the hive.metastore.warehouse.dir and the same behavior is expected while creating an external table except for the fact that the default directory is not used.

Resources