Invoking an select script through ODI (Oracle Data Integrator) - etl

May I have your opinion on below queries please:
Option 1:
I have select script handy with me which fetch data by joining many source tables and performs some transformations like aggregations (group by), data conversion, sub-string etc.
Can I invoke this script through ODI mapping and return results (transformed data output) can be inserted into target of ODI mapping ?
Option2:
Convert the select script into equivalent ODI mapping by using equivalent ODI transformations , functions , look ups etc and use various tables (tables in join clause) as source of mappings.
Basically develop ODI mapping which is equivalent to provided select script plus a target table to insert records into it.
I need to know pros and cons of both options in above (if option 1 is possible).
Is it still possible to track transformation errors, join source tables and where clause condition related errors etc through ODI with option 1?
Log file for mapping failure will have as granular level details as offered by option 2?
Can I still enable Flow Control at Knowledge Module and redirect select script errors into E$_ error tables provided by ODI?
Thanks,
Rajneesh

Option 1: ODI 12c includes that concept out of the box. On the physical tab of a mapping, click on the source node (datastore). Then in the properties pane, there is the CUSTOM_TEMPLATE option under "Extract Options" menu. This allows to enter a custom SQL statement that will be used instead of the code generated by ODI.
However it is probably less maintainable over time than option 2. SQL is less visual than mapping components. Also if you need to bulk change it, it will be trickier. Changing a component in several mappings can be done with the SDK. Changing SQL code would require to parse it. You might indeed have less information in your operator logs as the SQL would be seen as just one block of code. It also wouldn't provide any lineage.
I believe using Flow Control would work but I haven't tested it.
Option 2 would take more time to complete but with that you would benefit from all the functionalities of ODI.
My own preference would be to occasionally use option 1 for really complex SQL queries but to use option 2 for most of the normal use cases.

Related

How much of Talend functionality is translated in SQL-Query and how much in Java?

I am facing an internship and they asked me to learn how to use talend ETL.
I did it, not so difficult.
One of the extra-tasks that have been assigned to me is to verify how much of the operations I set on the design workspace is executed in java and what is done through the use of queries.
I've set up a simple Join using the TMap component and I monitored the SQLdatabase through the use of SQL Profiler. the result is that only the essential create/drop and the select/insert of the table is done via sql while every other thing like the actual join is made "Java" side.
As long as it is an simple operation like join, wouldn't it be convenient to execute it through a query without having to bother java to perform it?
For those who also know SAP, in terms of performance is there so much difference between Talend and SAP?
Only operations in tDB components (create,select,insert, etc) are actually done through SQL. All operations done in other talend components (tMap, tFilter, aggregate, etc) are done through java.
Indeed you'll have better performances doing operations SQL-side. You then have to find the right balance between an "all-in-sql" type of job and an "all-java" one. (it could be harder for a talend developer to debug operations if all the sql part is done through a unique query inside a single component...).
You could definitely have your joins inside a tDBInput component, and output the result in a single output flow.
You can also check ELT* components : they let you use SQL-engine instead of java-engine to perform all operations (join,aggregate,filter) while using a talend interface.

How to convert a ODI interface into an SQL query?

I am working on ODI 10 project which has 153 interfaces divided in a few packages. What I want to do is create a PL/SQL procedure with INSERT statements instead of having 153 interfaces. These interfaces are more or less similar i.e they have the same source table and same target (in my case target is a Essbase Hyperion cube), the transformations & filters are different. So anytime I have to update something like a column value , I have to open 153 interfaces and update in each and every one of them. In a procedure, I could do this so easily, I can just replace all values.
So I feel that its best that I create a PL/SQL procedure, as I can maintain the code better that way.
Is there a way to convert the interface into a SQL query?. I want a direct data dump, I don't want to do an complex incremental load. I am just looking to truncate the table and load the data.
Thanks in advance.
It is possible to get the SQL code generated by ODI from the Operator in the log tables. It can also be retrieved in the repository.
Here is an example of a query for ODI 12c (10g being out of support for a long time now) :
SELECT s.sess_no, s.nno, step_name, scen_task_no, def_txt
FROM SNP_SESS_STEP s, SNP_SESS_TASK_LOG t
WHERE s.sess_no = t.sess_no
AND s.nno = t.nno;
Starting with ODI 11g, it is also possible to simulate the execution instead of doing an actual execution. This functionality will just display the code generated in ODI Studio instead of running it.
Finally, upgrading to a more recent of ODI would allow to use the ODI SDK. With it you could programmatically do changes to all the mappings in one go. Reusable mappings could also help as it sounds that some logic is implemented multiple times. That would enable to ease these kind of changes while keeping the benefits of an ELT tool (scheduling, monitoring, visual representation of flows, cross-technology, ...).
Disclaimer : I'm an Oracle employee

Update functionality to data base from reporting tools

is there any way we can update the values in BIRT report which in-turn will update the database ? We need to present a report generated in Microsoft SQL server to the client , we tried providing the report in excel however our client changes the format and it is difficult to again consume it in our proprietary tool
(which is Microsoft SQL based). Is there any way we can achieve this? Client should update the values in the report and it should get reflected in the DB
while it's possible to write to wrtie back to db from BIRT using a servlet (see Eclipse Community Forum) I don't know of a way how BIRT could track the changed values.
While it's difficult to campare excel files it should be simpler to create csv files from these excel files and comparing the csv files independant of excel formating changes.
I see the gattering of value changes and writing back to the db as an independant separate workflow not related to the reporting.
Reporting tools are made for generating output only.
A general automatism concept is impossible, if you think about it from a more abstract point of view:
There's data D in the data base (usually spread accross several tables T1, ..., Tn, and records R1, ..., Rm).
The report output data O = (o1, o2, ...) is a the result of a more or less complex (the opposite of trivial) function f(R1, ..., Rm).
An automatic back-propagation automatism of any kind like you dream of would have to know what changing the value of o1 from "spam" to "eggs" means for R1, ..., Rm.
... Or even for records which were not selected by f, for example if the user changed the value of a primary key column.
This is only possible if the function f is bijective (I don't know if the english word is correct), but usually f isn't bijective. Even if it is, the task of inverting a non-trivial function is very hard.
Thus, if you want to let the user change values and persist the changes inside the DB, you need some kind of database UI or some kind of import interface.
Depending on your database, it might be as trivial as let the user work with Oracle SQL*developer or similar tools which support importing data from excel sheets.
However, these tools are intended for SQL developers, as the name implies.
OTOH, if all you want is to perform DML statements in BIRT, this is possible indirectly: You can write stored procedures in the database doing the DML work, and call these procedures from BIRT (use a JDBC Stored Procedure Query instead of JDBC SQL Select Query).

hive enable describe statement on restricted tables

When using a big-data tool like HIVE usually a select * from mytable works,
However, for a user which is only allowed to view specific columns the describe table statement and also integration for tools like Tableau are broken as these no longer can view all (or all allowed) columns as DESCRIBE TABLE is no longer possible i.e. denied in ranger.
Is there a workaround to re-enable the DESCRIBE statement?
Currently, I see that manually creating a masked view.
It turns out that masking the tables in ranger is sufficient to restrict access, but retain the ability to execute the DESCRIBE statement.
Currently, this is a bit tedious as all columns need to be specified manually - probably atlas & tagging would be a more efficient solution.

IBM Data Stage - How to find database tables used in jobs

For a project we need to investigate an existing installation of IBM Data Stage, doing a whole lot of ETL in loads of jobs.
The job flow diagrams contain lots of tables being used a source (both in MSSQL as well as Oracle), as well as a target (mostly in Oracle).
My question is now
How can I find all database tables used by all jobs in a certain Data Stage Project ?
I looked in Tools - Advanced Find, and there I can see all "table definitions". BUT, most of the tables actually used in jobs do not show up there, as they are defined as what Data Stage calls "Parallel Jobs" which in effect are SQL queries against database tables.
I am particularly interested in locating TARGET tables which are being loaded by a job.
So to put it bluntly, I want to be able to answer the question "Which job loads table XY ?".
If that is not possible, an automated means of extracting all the SQL statements used by the jobs would be an alternative.
We have access to IBM Websphere Data Stage and Quality Stage Designer 8.1
Exporting the jobs creates a text file that details what the job does. Open the export file in a text editor and you should be able to find SQL inserts with a simple search. Start with searching for SQL keywords like 'INTO' and 'FROM'.
Edit: Alternatively, if every table that was used was defined by importing table definitions, you should be able to find the table definition in the folder for its type. This however, will not make it apparent where and how the table was used (which job, insert or select from?), so I would recommend the first method of searching the Export files.

Resources