Upsolver snowflake output creating NULL records in snowflake child table - upsolver

We have a nested json in our input stream which we are writing into snowflake normalized tables using Upsolver snowflake outputs. The parent table is fine but seeing NULL records in the child table. Why is this happening and how can we solve this issue?

This is happening perhaps because you have empty child nodes in your input json. Since the parent id is always present in the input record, the child table output record will have this parent id populated and rest of the child table columns will be NULL as there was no data in the input event for it. To account for this, please add "WHERE <child_identifier> is NOT NULL" in your child table Upsolver Snowflake output. This will ensure only valid child nodes would actually get written.
Note: The preview for the Upsolver output for the child record would show nulls as well, that should be an indicator that the SQL for this output needs some correction, in this case, the WHERE filter.
If this job has already been run, please stop the job, truncate the snowflake table, edit the job to add the WHERE criteria and re-run the job from the beginning (replay from beginning).

Related

Identify a new inserted record from a record obtained by query

How can I identify in oracle forms the difference between a record obtained from the database and one just newly inserted ?
I need to requery after a button click if the record is queried before and if it is a newly inserted record then I only need to query the new record, because a requery won't contain the new record or if no query happened before then a requery will do a full query.
I tried with :system.record_status, but after commit it also contains QUERY
Exactly; after commit & requery, all records are equal and their status is QUERY.
I don't think you can do that, unless you mark newly added rows, somehow (e.g. by adding a timestamp so all rows with the MAX(timestamp) value are considered to be new).

Deduplication in Oracle

Situation:-
Table 'A' is receiving data from OracleGoldenGate feed and gets the data as New,Updated,Duplicate feed that either creates a new record or rewrites the old one based on it's characteristics (N/U/D). Every entry in table has its UpdatedTimeStamp column contain insertion timestamp.
Scope:-
To write a StoredProcedure in Oracle that pulls the data for a time period based on UpdatedTimeStamp column and publishes an xml using DBMSXMLGEN.
How can I ensure that a duplicate entered in the table is not processed again ??
FYI-am currently filtering via a new table that I created, named as 'A-stg' and has old data inserted incrementally.
As far as I understood the question, there are a few ways to avoid duplicates.
The most obvious is to use DISTINCT, e.g.
select distinct data_column from your_table
Another one is to use timestamp column and get only the last (or the first?) value, e.g.
select data_column, max(timestamp_column)
from your_table
group by data_column

Creating a record history table - How do I create a record on creation?

For a project, I want to have a "History" table for my records. I have two tables for this (example) system:
RECORDS
ID
NAME
CREATE_DATE
RECORDS_HISTORY
ID
RECORDS_ID
LOG_DATE
LOG_TYPE
MESSAGE
When I insert a record into RECORDS, how can I automatically create an associated entry in RECORDS_HISTORY where RECORDS_ID is equal to the newly inserted ID in RECORDS?
I currently have a sequence on the ID in RECORDS to automatically increment when a new row is inserted, but I am unsure how to prepopulate a record in RECORDS_HISTORY that will look like this for each newly created (not updated) record.
INSERT INTO RECORDS_HISTORY (RECORDS_ID, LOG_DATE, LOG_TYPE, MESSAGE) VALUES (<records.id>, sysdate(), 'CREATED', 'Record created')
How can I create this associated _HISTORY record on creation?
You didn't mention the DB you are working with. I assume its Oracle. The most obvious answer is: Use a "On Insert Trigger". You even can get back the ID (sequence) from the insert statement into table RECORDS. Disadvantages of this solution: Triggers are kinda "hidden" code, can slow down processes on massive inserts and you consume like double diskspace on storing data partially redundant. What if RECORDS got updated or deleted? Can that happen and do you have to take care of that as well? The big question is: What is your goal?
There are proved historisation concepts around. Have a look at this: https://en.wikipedia.org/wiki/Slowly_changing_dimension

Why Phoenix always add a extra column (named _0) to hbase when I execute UPSERT command?

When I execute the UPSERT command on apache phoenix, I always see that Phoenix add an extra column (named _0) with an empty value in the hbase, this column(_0) is auto generate by phoenix, but I don't need it, like this:
ROW COLUMN+CELL
abc column=F:A,timestamp=1451305685300,value=123
abc column=F:_0, timestamp=1451305685300, value=  # I want to avoid generate this row
Could you tell me how to avoid that? Thank you very much!
"At create time, to improve query performance, an empty key value is
added to the first column family of any existing rows or the default
column family if no column families are explicitly defined. Upserts will also add this empty key value. This improves query performance by having a key value column we can guarantee always being there and thus minimizing the amount of data that must be projected and subsequently returned back to the client."
Apache Phoenix Documentation
Regarding your question if that is avoidable:
You could work around the problem by adding the following statements at the end of your sql:
ALTER TABLE "<your-table>" ADD "<your-cf>"."_0" VARCHAR(1);
ALTER TABLE "<your-table>" DROP COLUMN "<your-cf>"."_0";
You should only do this if you query some table with phoenix but then access the table with another system that is not aware of this phoenix-specific dummy value.

Informatica: Delete rows from multiple tables sequentially. Then Insert

Consider the following scenario:
Main Control Table: 100 rows (Denormalized table with multiple processing ID's).
Set of 10 Parent Tables populated based on Control table.
Set of 10 Child Tables populated based on the Parent tables.
For daily processing:
We need to delete the data from Child tables first.
Parent Tables next.
Control table last.
Then insert data into Control table using multiple Insert Statements as it is denormalized.
Is this possible in one mapping?
One suggestion is to use SQL Transform and just execute the SQL's one after the other.
Is there an alternative way of Handling this?

Resources