I am trying to map oracle db to an XML file and have come to a blocker.Would appreciate any help.My xml file has the following structure
<Root>
<Import>
<Add-Item1>
.
.
<Add-Item n></Add-Item n>
</Import>
Odi 12c xml driver generates a ParentElementFK CurrentElementPK and CurrentElementOrder,corresponding to every tag that is there in xml.
My issue is in spite of scouring oracle forums i have not found a good definition of what data we need to populate in these ODI generated columns. Are these only for maintaining the hierarchical relationship?if so,wouldn't they be populated automatically on reverse engineering?.suppose the data that i would fill in this xml structure would be of an item with properties-brand,description item id(child tags under ) .Do these generated columns play any role in the mapping?
I have tried multiple things and found the answer myself . here is what i understood. Suppose you have Import complex type and Add-Item complexType . It will generate two datastores in model one for Import and one for Add-Item. First populate the primary key of the Import Complex type. Then you will see IMport FK in the Add-item complex type populate this value with the same value as you populated above that works . All other order and can be left optional if you don't need any particular order of these
Related
I need to migrate the data from a db2 table to a mssql table but one column has a different name, but the same datatype.
Db2 table:
NROCTA,NUMRUT,DIASMORA2
MSSQL table:
NROCTA,NUMRUT,DIAMORAS
As you see DIAMORAS is different.
Im using the following flow:
ExecuteSQL -> SplitAvro -> PutDatabaseRecord
In PutDataBaseRecord I have as RecordReader an AvroReader configured in this way:
Schema Acesss Strategy: Use Embedded Avro Schema.
Schema Text: ${avro.schema}
The flow just insert the two first columns.¿How I can do the mapping between DIASMORA2 and DIAMORAS columns ?
Thanks in advance!
First thing, you probably don't need SplitAvro in your flow at all, unless there's some logical subset of rows that you are trying to send as individual transactions.
For the column name change, use UpdateRecord and set the field /DIASMORAS to the record path /DIASMORA2, and change the name of the field in the AvroRecordSetWriter's schema from DIASMORA2 to DIASMORAS.
That last part is a little trickier since you are using the embedded schema in your AvroReader. If the schema will always be the same, you can stop the UpdateRecord processor and put in an ExtractAvroMetadata processor to extract the avro.schema attribute. That will put the embedded schema in the flowfile's avro.schema attribute.
Then before you start UpdateRecord, start the ExecuteSQL and ExtractAvroMetadata processors, then inspect a flow file in the queue to copy the schema out of the avro.schema attribute. Then in your AvroRecordSetWriter in ConvertRecord, instead of Inheriting the schema, you can choose to Use Schema Text, then paste in the schema from the attribute, changing DIASMORA2 to DIASMORAS. This approach puts values from the DIASMORA2 field into the DIASMORAS field, but since DIASMORA2 is not in the output schema, it is ignored, thereby effectively renaming the field (although under the hood it is a copy-and-remove).
I have a metadata Name as CONTACTS(SOURCE.CSV|TAGET.CSV). Now I read this file using reader and populate the value in table that I created as CONTACT_TABLE(PK NUMBER, Source_name varchar2(500),target_name varchar2(500)) after that I want to read these source.csv and target.csv file stored in my table CONTACT_TABLE AND populate the value in other table called SOURCE_COLUMN_TARGET_COLUMN_TABLE(PK,FK as pk of contact_table,source_column,target_column) this table should contain all the column of source and target and should have one to one relationship with that, for example, source.csv(fn)-----target.csv(firstName)
My objective is whenever we add some other attribute in source or target I should not change the entire mapping for eg if we add source.csv(email) and target.csv(email) it should directly map
Thanks!
please help!
I have this task completed before Friday and I searched every source I found dynamic mapping thing and parameter thing but it was not very helpful I want to do this way itself
Not clear what you are asking actually. The source analyser uses source files(.csv) on import itself and thereby contains the same format in source qualifier.
So, if any of the values gets added into your existing files (source.csv, target.csv) then it becomes a new file for your existing mapping. hence, you dont need to change the whole mapping just that you need to import it again.
Usually, I have been using the following calculated column when importing the data from an excel file:
(Sum([Units]) - Sum([Units]) OVER (PreviousPeriod([Axis.Columns]))) / Sum([Units]) OVER (PreviousPeriod([Axis.Columns])) * 100 as [% Difference]
In this scenario, however, the data is coming directly from an Oracle database.
When I try to create the calculated column, I get the error message:
"Could not find function: 'PreviousPeriod' "
I have done some research and found that I should be using the THEN keyword, but I have the same problem when I try to insert it after the aggregated expression.
You need to import that date via an INFORMATION LINK or EMBED the data in your analysis in order to use the majority of the functions in SPOTFIRE. If you must keep your data EXTERNAL, that is not connected via in Information Link or Embedded, you will not be able to use all the functions within SPOTFIRE.
I'm trying to build a star schema in Oracle 12c. In my case my data source is not a relational database but a single excel/csv file which is populated via a google form, which means I don't have any sort of reference from a source system such as auto incremental keys/ids. Now what would be the best approach to build a star schema given this condition?
File row sample:
<submitted timestamp>,<submitted by user>,<region>,<country>,<branch>,<branch location>,<branch area>,<branch type>,<branch name>,<branch private? yes/no value>,<the following would be all "fact" values (measurements),...,...,...
In case i wanted to build a "branch" dimension, how would I handle updates/inserts after the first load into the dimension table?
Thought solution so far:
I had thought of making a concatenated string "key" with the branch values, which would make it unique (underscore would be the "glue" to concatenate the values), eg:
<region>_<country>_<branch>_<branch location> as branch_key
I would insert all the distinct branches into a staging table, including they branch_key column for each one of them, then when trying to load into the dimension I could compare which key does not exists yet in my dimension table and then insert it. As for updates, I'm a bit stuck on how to handle that, I had thought of having another file mapping which branches are active having a expiration date column. Basically trying to simulate what I could do having the data in a database instead of CSV files.
This is all I can think of so far, do you have any other recommendations/ideas on how to implement this? Take on consideration that the data source cannot as in I have to read these csv files, since data is not stored anywhere else.
Thank you.
What's the difference between ResourceSchema & Schema in pig?
There is already Schema class provided, why does pig bother to add another Schema-akin class called ResourceSchema(it is almost like Schema API , it needs to set its ResourceFieldSchema's name and type , it also can has child ResourceSchema) for storage functions?
The API Docs backup #zsxwing's comment:
Schema - The Schema class encapsulates the notion of a schema for a relational operator. A schema is a list of columns that describe the output of a relational operator.
Each column in the relation is represented as a FieldSchema, a static class inside the Schema. A column by definition has an alias, a type and a possible schema (if the column is a bag or a tuple).
In addition, each column in the schema has a unique auto generated name used for tracking the lineage of the column in a sequence of statements. The lineage of the column is tracked using a map of the predecessors' columns to the operators that generate the predecessor columns.
The predecessor columns are the columns required in order to generate the column under consideration. Similarly, a reverse lookup of operators that generate the predecessor column to the predecessor column is maintained.
ResourceSchema - A represenation of a schema used to communicate with load and store functions. This is separate from Schema, which is an internal Pig representation of a schema.
So one of the main differences i can see from the API docs is that a Schema is able to track the input columns required to build it, where as ResourceSchema is just the schema definition of the field name, type (and optional sub-schema)