Create tables in hive from ddl files - jdbc

I am using JRuby to connect to Hive, which is happening successfully. Now I want to create tables, but instead of writing the create statement as a parameter to execute() method, I want to call a ddl file that has the table definition.
I cannot take the file contents and use them because they are usually more than one statement before the actual table creation (i.e. CREATE DATABASE IF NOT EXISTS, CREATE TABLE IF NOT EXISTS ..)
Is there a command that I can use through my JDBC connect that take the ddl file and executes it?

As per my knowledge there is no direct way with JDBC API to do an operation similar to hive -f ,
option 1)
you can parse your SQL file and write a method to execute the commands sequentially (or) use third party code,
here is one reference http://www.codeproject.com/Articles/802383/Run-SQL-Script-sql-containing-DDL-DML-SELECT-state
option 2) If your client environment where Jruby code is running also supports hive, write a shell script which can connect to remote JDBC and run SQL with beeline which will make remote Thrift calls
Reference : https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

Related

db2 load command fails using JDBC

I'm using JDBC to transfer data from a delimited file to a db2 database table. Initially, I encountered SQLCODE=-104, SQLSTATE=42601, so on further debugging I found this which referred me to call stored procedure SYSPROC.ADMIN_CMD.
I modified the call and tried running the procedure version, but I'm still getting the same error:
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=CLIENT;LOAD;FROM, DRIVER=4.26.14
Error |
at com.ibm.db2.jcc.am.b7.a(b7.java:810)
Error |
at com.ibm.db2.jcc.am.b7.a(b7.java:66)
I'm not sure what exactly am doing wrong.
Code:
CALL SYSPROC.ADMIN_CMD('LOAD CLIENT FROM "<PATH_TO_FILE>" OF DEL MODIFIED BY COLDEL0X09 INSERT INTO <SCHEMA_NAME>.<TABLE_NAME> NONRECOVERABLE')
I ran the LOAD command on the db2 command prompt and it ran without any issues.
Db2 version: 11.5
The load client command is intended for use at a client workstation, not at a Db2-server, so sysproc.admin_cmd will reject the client keyword.
The stored procedure executes at the Db2-server , it does not have access to files that are stored at the client workstation.
Therefore any file you mention in parameters to sysproc.admin_cmd stored procedure must be relative to the Db2-server file-system and must be accessible (readable) to the Db2-instance owner account.
If your data-file is already located on the Db2-server, just reference its fully qualified filename and run the sysproc.admin_cmd procedure with the load command. Carefully check the documentation for the load command to understand all of the implications of using LOAD, especially if the target Db2-server is highly-available. This is an administration matter.
If your data-file is not already located at a Db2-server, then first copy the file to the Db2-server and retry with load (or slower import).
You can also run the import command via sysproc.admin_cmd when the data file is accessible to the Db2-instance owner on the Db2-server than runs that stored-procedure.
If your Db2-version is recent you can also consider the ingest command, refer to the documentation for details.
If your data file is located on a client workstation, and you are unable or unwilling to transfer it to a Db2-server, then you can use your local workstation Db2-client (if you installed a suitable Db2-client that includes the db2 CLP) to run a script/batch-file to perform the relevant commands. You cannot use jdbc for that specific purpose, although you can exec/shell out from java to run the required commands in one script (db2 connect to ... user ...using .., db2 load client ... , or db2 ingest ... or db2 import ...
If your target Db2-server is already at version 11.5 or higher then it should support insert from external table, and remote external table, and since INSERT is plain SQL then you can do that via jdbc.
Apart from the above, most DBAs would arrange for direct Db2-server to Db2-server transfers if both are Db2-LUW and have IP-connectivity and if security rules permit, this avoids the slow and insecure business of taking the data outside of Db2. That is not a programming matter, more an administrative matter.

How to handle bad files generated by external tables

I've always developed shell scripts on server Unix where the script before runs the SQL-Loader for loading the file to be inserted into an Oracle table and after verifies if it's been generated any BAD file and in that case for example it sends an email to me with a warning.
Instead, by using an external table, I've got the main advantage not to handle any shell scripts but since only at the moment I run the SELECT from my external table a BAD file might be generated on the server, how can I have an automated check on its existence and to handle it from Oracle?
Oracle version 10g
Thanks!
With external tables, everything you do, you do in Oracle (i.e. within the database).
Therefore, you could
create a PL/SQL program (anonymous PL/SQL block or a stored procedure)
access the external table
do whatever you do
after it is finished, use UTL_FILE to check log/bad file
use DBMS_MAIL to send an e-mail if there's something "wrong"

Package or automating execution of Hive queries

In Oracle or other DBs, we have a concept of PL/SQL package where we can package multiple queries/procedures and call them inside a UNIX script. In case of Hive queries, what's the process used to package and automate the query processing in actual production environments.
If you are looking to automate the execution of numerous Hive queries, the hive or beeline CLI (think sqlplus with Oracle) allows you to pass a file containing one or more commands such as multiple inserts, select, create tables, etc. The contents of said file can be created programmatically using your favorite scripting language like python or shell.
See the "-i" option in this documentation: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
In terms of a procedural language, please see:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=59690156
HPL/SQL does have a Create Package option but if whatever you are trying to achieve is scripted outside of HPL/SQL (e.g. python, shell), you can 'package' your application in accordance with scripting best practices of your selected language.
To run mutilpe queries simply write it down one after another in a file (say 'hivescript.hql') and then it can be run from bash by simply calling it through beeline or hive shell
beeline -u "jdbc:hive2://HOST_NAME:10000/DB" -f hivescript.hql

oracle and shell scripts

I'm using oracle 11g and I'm loading data into the database using the sql loader which are invoked through the unix scripts. I want to select some rows of data and write the data into the file using shell scripts. Is it possible to write a shell script for the same.
Here is an excellent tutorial which clearly explain how to execute a query from UNIX
Ultimately what he does is login into Oracle i.e., setup a session for Oracle and read a query that needs to be executed and execute that query and do what ever operation needed.
Instead of reading the query from the user we could read it from a file or even hardcode it there itself as per our need.
Blog which explain about usage of Oracle query in Shell script

Loading data from a web-hosted CSV into Oracle?

I don't have time to write a perl or anything like that nor do I have admin access to the back end, so how can I get data from a file on the intranet (http://) and parse it to create a table? Maybe somehow via PL/SQL? Keep in mind I don't have much admin access.
If you want it to be completely automated
You can use the UTL_HTTP package to retrieve the data from the HTTP server
You can either parse the CSV response yourself using INSTR and SUBSTR or you can use the UTL_FILE package to write the data to a file on the database server file system and then create an external table that parses the CSV file you just created.
You can then insert the parsed data into a table you've already created (I'm assuming that the CSV data is in the same format each time).
You can use the DBMS_SCHEDULER or DBMS_JOB package to schedule the job
The Oracle database account you're using will need to be granted access to all of these packages.
You may download the file into your host machine and then use SQL*Loader to populate a table.
Other ways there are some wizards that may be easier than SQL*Loader, if you are using IDEs like PLSQLDeveloper(Tools->Text Importer) or SQLDeveloper(right click on Tables-> Import Data).
Create an external table that references your CSV file. That means you can run select statements on the file from within Oracle.

Resources