Package or automating execution of Hive queries - hadoop

In Oracle or other DBs, we have a concept of PL/SQL package where we can package multiple queries/procedures and call them inside a UNIX script. In case of Hive queries, what's the process used to package and automate the query processing in actual production environments.

If you are looking to automate the execution of numerous Hive queries, the hive or beeline CLI (think sqlplus with Oracle) allows you to pass a file containing one or more commands such as multiple inserts, select, create tables, etc. The contents of said file can be created programmatically using your favorite scripting language like python or shell.
See the "-i" option in this documentation: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
In terms of a procedural language, please see:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=59690156
HPL/SQL does have a Create Package option but if whatever you are trying to achieve is scripted outside of HPL/SQL (e.g. python, shell), you can 'package' your application in accordance with scripting best practices of your selected language.

To run mutilpe queries simply write it down one after another in a file (say 'hivescript.hql') and then it can be run from bash by simply calling it through beeline or hive shell
beeline -u "jdbc:hive2://HOST_NAME:10000/DB" -f hivescript.hql

Related

Scheduling Oracle sql files using Unix based SAS enviornment

I have bunch of SQL queries that run against an Oracle database. Is there a way to schedule these .sql files using UNIX Based SAS, so they can execute one after another at certain time of day?
If they are .sql files, why do you want to schedule them using SAS? Are they SAS programs? If not, I would do one of three things, depending on my constraints:
1) Convert the .sql files to stored procedures and call them from DBMS_SCHEDULER within Oracle, since Oracle has a fantastic job scheduling subsystem (actually multiple variants) that protects against duplicate jobs among other issues, and you get transactional control, auditing and logging. http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_sched.htm
2) If converting them to stored procs is too much, then call the .sql scripts directly from DBMS_SCHEDULER with DBMS_SCHEDULER.CREATE_PROGRAM() and then schedule that program with DBMS_SCHEDULER.CREATE_JOB.
3) Use cron or atrun to schedule batch / shell script wrappers that call sqlplus to run .sql files.
If the question is specifically how to do this with SAS, then DBMS_SCHEDULER can still execute external SAS programs using option (2) above.

Teradata Jobs and KSH

I tried searching online but was unable to find anything pertaining to my requirements.
I am new to Teradata.
In our team Teradata jobs are used to call the ksh which in turn calls the procedure to run at a scheduled time.
I want to understand how exactly does this calling works? How does a job call a KSH and then how does a KSH call a procedure in turn.
Your help would be much appreciated.
At a very basic level UNIX has a scheduler mechanism called cron. Users with sufficient privilege on the UNIX server can use cron to run jobs at a scheduled time by defining a crontab. Your crontab can call UNIX commands or in many cases a shell script (ksh in your example) to perform a complex set of operations. In many production environments jobs may be scheduled using an enterprise platform instead of many independent crontab files across many users and many servers in the data center.
As this pertains to Teradata, the ksh is likely invoking a Teradata utility such as BTEQ to logon to the database and execute a stored procedure, macro, or set of SQL statements contained within the BTEQ script. Once the BTEQ script has completed a return code is sent to the ksh script to account for any error handling should an error occur within the BTEQ script or an unhandled/handled error within the stored procedure.
You can use your search engine of choice to read up on how to develop UNIX shell scripts (Korn, Bash, etc.) and how Teradata utilities such as BTEQ work. If you have a more discrete question about something in your environment feel free to post a separate question here with the appropriate tags in the question to target the audience who can best help you.

Where does oracle store metadata when queries are run from scripts?

Does oracle store the query run inside of scripts ?
If so, where ?
Also, can i find which query is from which script ?
I found v$sql but it doesnt have a link to pid. It has a "module" field that isnt useful for my purpose.
If you want to provide metadata for code executed in Oracle, whether from scripts, application servers, or PL/SQL code, the best way may be to use DBMS_Application_Info.
This associates the executed code with meaningful module and action names, so you could use the module "SQL*Plus Script" for all of your scripts and actions such as "Export invoices".

oracle and shell scripts

I'm using oracle 11g and I'm loading data into the database using the sql loader which are invoked through the unix scripts. I want to select some rows of data and write the data into the file using shell scripts. Is it possible to write a shell script for the same.
Here is an excellent tutorial which clearly explain how to execute a query from UNIX
Ultimately what he does is login into Oracle i.e., setup a session for Oracle and read a query that needs to be executed and execute that query and do what ever operation needed.
Instead of reading the query from the user we could read it from a file or even hardcode it there itself as per our need.
Blog which explain about usage of Oracle query in Shell script

Getting output in flat file using oracle on UNIX

How to get the output of a query into a flat file using Oracle on UNIX?
For example:
I have a TEST table; I want to get the content of the TEST table into a flat file and then store the output in some other folder in .txt format.
See Creating a Flat File in the SQL*Plus User's Guide and Reference.
in the oracle SQLplus terminal you could type
spool ;
run your query
spool off;
Now the would contain the results of the query.
In fact it would contain all the output to the terminal since the execution of the spool command till spool off.
If you have access to directories on the database server, and authority to create "Directory" objects in Oracle, then you have lots of options.
For example, you can use the UTL_FILE package (part of the PL/SQL built-ins) to read or write files at the operating system level.
Or use the "external table" functionality to define objects that look like single tables to Oracle but are actually flat files at the OS level. Well documented in the Oracle docs.
Also, for one-time tasks, most of the tools for working SQL and PL/SQL provide facilities for moving data to and from the database. In the Windows environment, Toad's good at that. So is Oracle's free SQLDeveloper, which runs on many platforms. You wouldn't want to use those for a process that runs every day, but they're fine for single moves. I've generally found these easier to use than SQL*Plus spooling, but that's a primitive version of the same functionality.

Resources