How to test Hive CRUD queries from Shell scripting - shell

i am creating a shell script, which should execute The HIVE basic queries and assert that with expected result.
from where should i start in shell scripting.?
thanks in advance

I have found an answer,
one thing we can do is, create a hql file which contain basic queries to be test and triggered hql file through beeline(as i was using) in bash file.

Related

Informatica PC restart workflow with different sql query

I am using Informatica PC.
I have workflow which have sql query.
This query like "select t1, t2, t3 from table where t1 between date '2020-01-01' and date '2020-01-31'"
I need to download all data between 2020 and 2022. But I can't write it in query because I will have ABORT SESSION from Teradata.
I want to write smth, which will restart workflow with different dates automatically.
From first start take 01.2020, second start 02.2020, third start 03.2020 and etc.
How can I solve this problem?
This is a long solution and can be achieved in two ways. Using only shell script will give you lot of flexibility.
First of all parameterize your mapping with two mapping parameter. Use them in SQL like below.
select t1, t2, t3 from table where t1 between date '$$START_DT' and date '$$END_DT'
Idea is to change them at each run.
Using only shell script - Its flexible because you can handle as many run as you want using this method. You need to call this shell script using some CMD task.
Create a master file which has data like this
2020-01-01,2020-01-31
2020-02-01,2020-02-29
2020-03-01,2020-03-31
Create three informatica parameter file using above entries. First file(file1) should look like this
[folder.workflow.session_name]
$$START_DT=2020-01-01
$$END_DT=2020-01-31
Use file(file1) in a pmcmd to kick off informatica workflow. Pls add --wait so it waits for this to complete.
Loop above steps until all entries of master file are complete.
Using informatica only method - This method is not as flexible as above and applicable for only your quesion.
Create a shell script that creates three parameter file using above master file.
Create three session or three worklets which uses above three parameter files. You need to be careful to use correct parameter for correct session.
You can attach those sessions/worklets one after another or in parallel.

How to speed up this query to retrieve lastUpdateTime of all hive tables?

I have created a bash script (GitHub Link) to query for all hive databases; query each table within them and parse the lastUpdateTime of those tables and extract them to a csv with columns "tablename,lastUpdateTime".
This query is however slow because in each iteration, the call to "hive -e..." starts a new hive cli command which takes noticeably significant amount of time to load.
Is there a way to speed up either loading up the hive cli or speed up the query in some other way to solve the same problem?
I have thought about loading the hive cli just once at the start of the script and try to call the bash commands from within the hive cli using the ! <command> method but not sure how to do loops then within the cli and also if I can process the loops inside a bash script file and execute that, then I am not sure how to pass the results of queries executed within hive cli as arguments to this script.
Without giving specification about the system I am running it on, the script can process about ~10 tables per minute which I think is really slow considering there can be thousands of tables in the database we want to apply it on.

SAS Macro code to Pig/Hive

I am working on converting SAS programs to Hadoop ie. Pig or Hive, and I am having trouble converting the macro code in SAS to something in hive. Is there any equivalent for the same since I already read that Hive does not support Stored Procedures? I need to write a hive script which has a macro code like function to call variables and use in the script.
I figured out a way to write the macro code in an if...else statement within Hive itself. Thanks guys for all the help! I know the question was not that greatly put up, but I will learn over time.

Can we run queries from the Custom UDF in Hive?

guys I am newbie to Hive and have some doubts in it.
Normally we write custom UDF in Hive for the particular number of columns. (Consider UDF is in Java). Means it performs some operation on that particular column.
I am thinking that can we write such UDF through which we can give the particular column as a input to some query and can we return that query from UDF which will execute on Hive CLI by taking the column as a input?
Can we do this? If yes please suggest me.
Thanks and sorry for my bad english.
This is not possible out of the box because as the Hive query is running, there has been a plan already built that is going to execute. What you suggest is to dynamically change that plan while it is running, which is not only hard because the plan is already built, but also because the Hadoop MapReduce jobs are already running.
What you can do is have your initial Hive query output new Hive queries to a file, then have some sort of bash/perl/python script that goes through that and formulates new Hive queries and passes them to the CLI.

Writing to oracle logfile from unix shell script?

I am having a Oracle concurrent program which calls a UNIX shell script which will execute SQL loader program. This is used for inserting flat file from legacy to Oracle Base tables.
My question here is,
How do I capture my custom messages, validation error messages etc in the Oracle log file of the concurrent program.
All help in this regards are much appreciated.
It looks like you are trying to launch SQL*Loader from Oracle Apps. The simplest way would be to use the SQL*Loader type of executables, this way you will get the output and log files right in the concurrent requests window.
If you want to write in the log file and the output file from a unix script, you can find them in the FND_CONCURRENT_REQUESTS table (column logfile_name and outfile_name). You should get the REQUEST_ID passed as a parameter to your script.
These files should be in $XX_TOP\log and should be called l{REQUEST_ID}.req and o{REQUEST_ID}.out (apps 11.5.10).
How is your concurrent process defined? If it's using the "Host" execution method then the output should go into the concurrent log file. If it's being executed from a stored procedure, I'm not sure where it goes.
Have your script use sqlplus to sign into oracle, and insert/update the information you need.

Resources