Informatica PC restart workflow with different sql query - etl

I am using Informatica PC.
I have workflow which have sql query.
This query like "select t1, t2, t3 from table where t1 between date '2020-01-01' and date '2020-01-31'"
I need to download all data between 2020 and 2022. But I can't write it in query because I will have ABORT SESSION from Teradata.
I want to write smth, which will restart workflow with different dates automatically.
From first start take 01.2020, second start 02.2020, third start 03.2020 and etc.
How can I solve this problem?

This is a long solution and can be achieved in two ways. Using only shell script will give you lot of flexibility.
First of all parameterize your mapping with two mapping parameter. Use them in SQL like below.
select t1, t2, t3 from table where t1 between date '$$START_DT' and date '$$END_DT'
Idea is to change them at each run.
Using only shell script - Its flexible because you can handle as many run as you want using this method. You need to call this shell script using some CMD task.
Create a master file which has data like this
2020-01-01,2020-01-31
2020-02-01,2020-02-29
2020-03-01,2020-03-31
Create three informatica parameter file using above entries. First file(file1) should look like this
[folder.workflow.session_name]
$$START_DT=2020-01-01
$$END_DT=2020-01-31
Use file(file1) in a pmcmd to kick off informatica workflow. Pls add --wait so it waits for this to complete.
Loop above steps until all entries of master file are complete.
Using informatica only method - This method is not as flexible as above and applicable for only your quesion.
Create a shell script that creates three parameter file using above master file.
Create three session or three worklets which uses above three parameter files. You need to be careful to use correct parameter for correct session.
You can attach those sessions/worklets one after another or in parallel.

Related

Need to load the excel data to database table on weekly basis automatically

create table LOADER_TAB (
i_id,
i_name,
risk
);
csv file :
portal,,
ex portal,,
,,
i_id,i_name,risk
1,a,aa
2,b,bb
3,c,cc
4,d,dd
5,e,ee
6,f,ff
7,g,gg
8,h,hh
9,i,ii
10,j,jj
I need to load the excel data to a database table on weekly basis. Currently, I am doing using SQL Loader every time. But is there any other way to automate it? I need to load the data every Monday into the table. I am wondering if this can be done with dbms_scheduler? How I will be able to load the data into the table automatically when I load the excel in the folder on weekly basis ?
If you want to keep using SQL*Loader, then you probably wouldn't use DBMS_SCHEDULER (although it is capable of running operating system level scripts) - you'd use operating system scheduling application (such as Task Scheduler on MS Windows).
Therefore (if on Windows):
Create a batch script (.bat) which would ...
... run sqlldr.exe which ...
... uses your current control file and loads data into the table.
Using Task Scheduler, schedule the .bat script to run on Mondays at desired time

How to speed up this query to retrieve lastUpdateTime of all hive tables?

I have created a bash script (GitHub Link) to query for all hive databases; query each table within them and parse the lastUpdateTime of those tables and extract them to a csv with columns "tablename,lastUpdateTime".
This query is however slow because in each iteration, the call to "hive -e..." starts a new hive cli command which takes noticeably significant amount of time to load.
Is there a way to speed up either loading up the hive cli or speed up the query in some other way to solve the same problem?
I have thought about loading the hive cli just once at the start of the script and try to call the bash commands from within the hive cli using the ! <command> method but not sure how to do loops then within the cli and also if I can process the loops inside a bash script file and execute that, then I am not sure how to pass the results of queries executed within hive cli as arguments to this script.
Without giving specification about the system I am running it on, the script can process about ~10 tables per minute which I think is really slow considering there can be thousands of tables in the database we want to apply it on.

sqoop oozie write query result to a file

I have a current oozie job that queries an Oracle table and writes - overwrites the result on a hive query.
Now I need to prevent overwriting the hive table and save the existing data on that hive table.
For this I wanted to plan such steps:
1st step: Get record count running a "select count(*) from..." query and write it on a file.
2nd step: Check the count written in file.
3rd step: decision step whether or not 4th step will be applied.
4th step: Run the main query and overwrite the hive table.
My problem is I couldn't find anything on documentation and or examples regarding writing them on a file (I know import and export is the aim of sqoop) .
Does anyone know how to write the wuery result on a file?
In theory:
build a Pig job to run the "count(*)" and dump the result to StdOut
as if it was a Java property e.g. my.count=12345
in Oozie, define a Pig Action, with <capture_output/> flag, to run that job
then define a Decision based on the value for key my.count using
the appropriate EL function
In practise, well, have fun!

Writing autosys job information to Oracle DB

Here's my situation: we have no access to the autosys server other than using the autorep command. We need to keep detailed statistics on each of our jobs. I have written some Oracle database tables that will store start/end times, exit codes, JIL, etc.
What I need to know is what is the easiest way to output the data we require (which is all available in the autosys tables that we do not have access to) to an Oracle database.
Here are the technical details of our system:
autosys version - I cannot figure out how to get this information
Oracle version - 11g
We have two separate environments - one for UAT/QA/IT and several PROD servers
Do something like below
Create a table with the parameters you want to put. Put a key columns which should be auto generated. The jil column should be able to handle huge data. Also add one columns for sysdate.
Create a shell script. Inside it do as follows
"autorep -j -l0" to get all the jobs you want and put them in a file. -l0 is to ignore duplicate jobs. If a Box contain a job, then without -l0 you will get the job twice.
create a loop and read all the job names one by one.
In the loop, set varaibles for jobname/starttime/endtime/status (which all you can get from autorep -j . Then use a variable to hold jil by autorep -q -j
Append all these variable values in a flat file.
End the loop. After exiting a loop you wil end up with a file with all the job details.
Then use SQL loader to put the data in your oracle table. You can hardcode a control file and use it for every run. But the content of data file will change for every run.
Let me know if any part is not clear.

Job action string too long

I'm trying to create a job that will sync two databases in the midnight. There are 10 tables that need to be synced. And it's a very long PL SQL script. When I set this script to JOB ACTION and try to create the job I get "string value too long for attribute job action". What do you suggest I do? Should I seperate the scipt into 10? Isn't there a way to make the job run the code as a script. If I do it manualy all 10 anonymous blocks get executed one after another. I need something that will kind of press F5 for me in the midnight.
What you need is a DBMS_Scheduler chain, in which each action is a separate step and they can be executed at the same time.
http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_sched.htm

Resources