AS400 to Oracle 10g via xml with Informatica Powercenter - oracle

Is the following workflow possible with Informatica Powercenter?
AS400 -> Xml(in memory) -> Oracle 10g stored procedure (pass xml as param)
Specifically, I need to take a result set eg. 100 rows. Convert those rows into a single xml document as a string in memory and then pass that as a parameter to an Oracle stored procedure that is called only once. I understood that a workflow runs row-by-row and this kind of 'batching' is not possible.

Yes, this scenario should be possible.
You can connect to AS/400 sources with native Informatica connector(s), although this might require (expensive) licenses. Another option is to extract the data from AS/400 source into a text file, and use that as a normal file source.
To convert multiple rows into one row, you would use an Aggregator transformation. You may need to create a dummy column (with same value for all rows) using an Expression, and use that column as the grouping key of the Aggregator, to squeeze the input into one single row. Row values would be concatenated together (separated by some special character) and then you would use another Expression to split and parse the data into as many ports (fields) as you need.
Next, with an XML Generator transformation you can create the XML. This transformation can have multiple input ports (fields) and its result will be directed into a single output port.
Finally, you would load the generated XML value into your Oracle target, possibly using a Stored Procedure transformation.

Related

Input files with different columns to be loaded in single Nifi flow

I have some input files with different column names. Can I create a common Nifi flow which processes all the types of files? Each files will be having different types of columns and different types of output table to be loaded. For example, File 1 will be having column A, Column B to be loaded to Table AB, File 2 will be having column C, Column D, Column E to be loaded to Table CDE. Can I achieve this in a single flow or should I create different flows for different types of files? I am new to Nifi, please suggest.
You should be able to do this with a single flow, perhaps with RouteOnContent to look for the header so you know which type of file it is. Each outgoing connection would correspond to a different type of file / output table, so you could have an UpdateAttribute on the other end of each outgoing connection, to set attributes for things like the table name, possibly the record schema (if using record-based processors, which I recommend), etc. Then you can use a funnel to merge the sub-flows, or just connect all the outgoing connections to whatever the next downstream processor is (PutDatabaseTable for example).
If you don't want to split the flow at all, you'd probably need to do the same work of identifying the file type and setting the attributes, but from a single script (using ExecuteScript for example). In any case, the downstream processors should be able to make use of the attributes using NiFi Expression Language such that the same processor can handle the different file types appropriately.

Read data from multiple tables at a time and combine the data based where clause using Nifi

I have scenario where I need to extract multiple database table data including schema and combine(combination data) them and then write to xl file?
In NiFi the general strategy to read in from a something like a fact table with ExecuteSQL or some other SQL processor, then using LookupRecord to enrich the data with a lookup table. The thing in NiFi is that you can only do a table at a time, so you'd need one LookupRecord for each enrichment table. You could then write to a CSV file that you could open in Excel. There might be some extensions elsewhere that can write directly to Excel but I'm not aware of any in the standard NiFi distro.

Change datasource in the same tranformation ETL

I have a transformation to extract data for a database but the database source has many tables with different names. The database does have a structure consistent with my transformation, e.g: Events_1, Events_2, Events_3.
It possible change the connection parameters for the extraction of all tables dynamically? I want to extract all data with just one job that will still work when there is a new insert or a new table like Events_600.
Screen-shot of DB:
You can use variables in the transformation and use them to set the connection and even to change the source table.
You will need a job to run through the list of variable values, then once for each row pass those values as parameters to the transformation, for example.

Can I keep data of different file formats in same hive table?

I am receiving data of formats like csv, xml, json and I want to keep all the files in same hive table.Is it achievable?
Hive expects all the files for one table to use the same delimiter, same compression applied etc. So, you cannot use a Hive table on top of files with multiple formats.
The solution you may want to use is
Create a separate table (json/xml/csv) for each of the file formats
Create a view for the UNION of the 3 tables created above.
This way the consumer of the data has to query only one view/object, if that's what you are looking for.
Yes, you can achieve this through a combination of different external tables.
Because different SerDes with different specifications for how to read columns in the different files will be needed, you will need to create one external table per type of file (and table). The data from each of these external tables can then be combined into a view with UNION, as suggested by Ramesh. The view can could then be used for reading from these, and you could e.g. insert the data into a managed table.

How can a Microsoft Word binary be stored in Hive?

Question from a relative Hadoop/Hive newbie: How can I pass the contents of a Microsoft Word (binary) document as a parameter to a Hive function?
My goal is to be able to provide the full contents of a binary file (a Microsoft Word document in my particular use case) as a binary parameter to a UDTF. My initial approach has been to slurp the file's contents into a staging table and then provide it to the UDTF in a query later on, and this was how I attempted to build that staging table:
create table worddoc(content BINARY);
load data inpath '/path/to/wordfile' into table worddoc;
Unfortunately, there seem to be newlines in the Word document (or something acting enough like newlines) that results in the staging table having many rows instead of a single comprehensive blob, the latter of which is what I was hoping for. Is there some way of ensuring that the ingest doesn't get exploded into multiple rows? I've seen similar questions here on SO regarding other binary data like image files, so that is why I'm guessing it's the newlines that are tripping me up.
Failing all that, is there a way to skip storing the file's contents in an intermediary Hive table and just provide the content directly to the UDTF at invocation time? Nothing obvious jumped out during my search through Hive's built-in functions, but maybe I am missing something.
Version-wise, the environment is Hive 0.13.1 and Hadoop 1.2.1 (although upgrades to both are pending).
This is a hack-y workaround but what I ended up doing is this:
1) base64 encode the binary document and put the encoded file into HDFS
2) In Hive:
CREATE TABLE staging_table (content STRING);
LOAD DATA INPATH '/path/to/base64_encoded_file' INTO TABLE staging_table;
CREATE TABLE target_table (content BINARY);
INSERT INTO target_table SELECT unbase64(content) FROM staging_table;
Theoretically this should work for any arbitrary binary file that you'd want to squish into Hive this way. A gotcha to watch out for is to make sure your base64 encoding implementation produces a single-line file (my OS X base64 utility produces 1-line output, while the base64 utility in a CentOS 6 VM I was using produced hundreds of lines) - if it doesn't, you can manually glue it together before putting it into HDFS.

Resources