EXTERNAL TABLE to a file in Hive?

EXTERNAL TABLE to a file in Hive? - hadoop

Is it possible to use file in LOCATION for external table in HIVE?
CREATE EXTERNAL TABLE table1
(
line string
)
LOCATION '/hdp_in/fd/file.txt.gz';
cause I get an error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.FileAlreadyExistsException Parent path is not a directory: /hdp_in/fd/file.txt.gz file.txt.gz
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1957)
(...)
Do I have to use only directories? I haven't found that info in Manual Reference...
Regards
Pawel

Yes you will have to put this file in a directory and then create an external table on top of it. As per the documentation : An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir
Even if you create an internal table hive by default creates a directory for it inside the hive.metastore.warehouse.dir and the same behavior is expected while creating an external table except for the fact that the default directory is not used.

Related

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?

I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

What path should be used to locate CSV File used in SQL statement (Load data local infile) When WAR file Deployed on Tomcat

I have been working on Spring Boot project, I am using Flyway for database version control in this project. In migration folder there are some SQL files having "Load data local infile" Statements - referencing some CSV files.
Example:
load data local infile 'C:/Program Files (x86)/Apache Software Foundation/Tomcat 8.5/webapps/originator/WEB-INF/classes/insertData/subject.csv' INTO TABLE subject
How can I make this path relative?
I have tried
'./classes/insertData/subject.csv'
'./insertData/subject.csv'
And some other combinations also but could not fixed this issue
Error:
Caused by: java.sql.SQLException: Unable to open file '../../insertData/subject.csv'for 'LOAD DATA LOCAL INFILE' command.Due to underlying IOException:
BEGIN NESTED EXCEPTION java.io.FileNotFoundException MESSAGE:
....\insertData\subject.csv (The system cannot find the path
specified) STACKTRACE: java.io.FileNotFoundException:
....\insertData\subject.csv (The system cannot find the path
specified)

I was able to insert data into tables from CSV files within a flyway migration from a resource path. Within the migration script I used the entire path written as shown below.
LOAD DATA LOCAL INFILE './src/main/resources/<FOLDER>/<FILE>.csv' INTO TABLE <TABLE_NAME>
FIELDS TERMINATED BY ','
optionally enclosed by '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
The other statements would be dependent on your file structure I just wanted to include the entire example.

Instead of writing SQL script you can use Java-based migration to read and insert data into a table. You can use "flyway.locations" property to specify the path for java based migration in your application.properties. As flyway by default search for "./db/migration" of resources.
For further details check the https://flywaydb.org/documentation/migrations#java-based-migrations

Load local data from file system to Hive table

I am running the insert script in hive by load '.csv' file from local file system into hive table as below:
load data local inpath 'xxx.csv' into table xxx;
I got an error say
Failed with exception Unable to move source
file:/home/hadoop/hbase-data/xxx.csv to destination
hdfs://xxx.xxx.xxx:8020/test/xxx.csv FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Can anyone help me out with this?
Thanks so much for your effort.

Oracle Data Integrator SQL to HDFS IKM returns error

I am using ODI (12.1.3.0.0). I created topology for Oracle DB which is OK and I created topology for HDFS using File technology where I think the problem is in.
DataServer for HDFS, I left JDBC driver empty, and filled JDBC Url with hdfs://remotehostname:port
Physical Schema for HDFS, I filled both Schema and Work Schema with /my/path
Then created Logical Schema and Model. After that created Datastore under the model with these definitions.
Name: TestName
Resource Name: TESTFILE.txt
File Format: Fixed
After all these, created a project and a mapping under the project.
Finally when I run the mapping I see these errors:
ODI-1217: Session Oracle2HDFSMapping_Physical_SESS (15) fails with return code ODI-1298.
ODI-1226: Step Physical_STEP fails after 1 attempt(s).
ODI-1240: Flow Physical_STEP fails while performing a Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- operation. This flow loads target table null.
ODI-1298: Serial task "SERIAL-MAP_MAIN- (10)" failed because child task "SERIAL-EU-GGUSER_UNIT (20)" is in error.
ODI-1298: Serial task "SERIAL-EU-GGUSER_UNIT (20)" failed because child task "Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- (40)" is in error.
Caused By: java.io.IOException: Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at java.lang.Runtime.exec(Runtime.java:617)
at java.lang.Runtime.exec(Runtime.java:450)
at java.lang.Runtime.exec(Runtime.java:347)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:54)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:29)
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:52)
at oracle.odi.runtime.agent.execution.SessionTask.processTask(SessionTask.java:203)
at oracle.odi.runtime.agent.execution.SessionTask.doExecuteTask(SessionTask.java:114)
at oracle.odi.runtime.agent.execution.AbstractSessionTask.execute(AbstractSessionTask.java:886)
at oracle.odi.runtime.agent.execution.SessionExecutor$SerialTrain.runTasks(SessionExecutor.java:2198)
at oracle.odi.runtime.agent.execution.SessionExecutor.executeSession(SessionExecutor.java:591)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:718)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:611)
at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:203)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor.doProcessStartAgentTask(TaskExecutorAgentRequestProcessor.java:800)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.access$1400(StartSessRequestProcessor.java:74)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$StartSessTask.doExecute(StartSessRequestProcessor.java:702)
at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:180)
at oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$2.run(DefaultAgentTaskExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:385)
at java.lang.ProcessImpl.start(ProcessImpl.java:136)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 20 more
I wonder where I did it wrong?

For a file Datastore, you need to define the attributes (columns) by opening the Datastore and going on the attribute tab. If the file already exists, you can reverse-engineer the attributes and rename them and change the datatype if needed.
The error message you received for the second task mentions that the file (generated in the fist task) does not exist. So there might be a problem with the first task, probably due to the missing attributes in your datastore.
Here is a detailed article about SQL To HDFS file (Sqoop) KM written by the ODI A-Team : http://www.ateam-oracle.com/importing-data-from-sql-databases-into-hadoop-with-sqoop-and-oracle-data-integrator-odi/

Error in metadata: MetaException(message:java.lang.IllegalStateException: Can't overwrite cause)

I have created a external table in hive and when I provide the location of the data for this table I get the following error:
FAILED: Error in metadata: MetaException(message:java.lang.IllegalStateException: Can't overwrite cause)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Also I am able to load the same file using PIG Script using the PigStorage() loader function.
I have the following permissions on the file: rw-rw-r-
and on the folder where this file resides (Giving the path of this folder in location in the query ) : drwxrwxr-x
What can be the cause for this and how to correct this error ?

The solution is to have write permission on the file....

Another possible cause of this issue is having your LOCATION wrong for your hive table (in case someone else has this issue and can't figure out what is going wrong).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

EXTERNAL TABLE to a file in Hive? - hadoop

Related

Cannot create Hive external table using jdbcStorageHandler

What path should be used to locate CSV File used in SQL statement (Load data local infile) When WAR file Deployed on Tomcat

Load local data from file system to Hive table

Oracle Data Integrator SQL to HDFS IKM returns error

Error in metadata: MetaException(message:java.lang.IllegalStateException: Can't overwrite cause)

Categories

Resources