Default values in target tables - oracle

I have some mappings, where business entities are being populated after transformation logic. The row volumes are on the higher side, and there are quite a few business attributes which are defaulted to certain static values.
Therefore, in order to reduce the data pushed from mapping, i created "default" clause on the target table, and stopped feeding them from the mapping itself. Now, this works out just fine when I am running the session in "Normal" mode. This effectively gives me target table rows, with some columns being fed by the mapping, and the rest taking values based on the "default" clause on the table DDL.
However, since we are dealing with higher end of volumes, I want to run my session in bulk mode (there are no pre-existing indexes on the target tables).
As soon as I switch the session to bulk mode, this particular feature, (of default values) stops working. As a result of this, I get NULL values in the target columns, instead of defined "default" values.
I wonder -
Is this expected behavior ?
If not, am I missing out on some configuration somewhere ?
Should I be making a ticket to Oracle ? or Informatica ?
my configuration -
Informatica 9.5.1 64 bit,
with
Oracle 11g r2 (11.2.0.3)
running on
Solaris (SunOS 5.10)
Looking forward to help here...

Could be expected behavior.
Seem that bulk mode in Informatica use "Direct Path" API in Oracle (see for example https://community.informatica.com/thread/23522 )
From this document ( http://docs.oracle.com/cd/B10500_01/server.920/a96652/ch09.htm , search Field "Defaults on the Direct Path") I gather that:
Default column specifications defined in the database are not
available when you use direct path loading. Fields for which default
values are desired must be specified with the DEFAULTIF clause. If a
DEFAULTIF clause is not specified and the field is NULL, then a null
value is inserted into the database.
This could be the reason of this behaviour.

I don't believe that you'll see a great benefit from not including the defaults, particularly in comparison to the benefits of a direct path load. If the data is going to be readonly then consider compression also.
You should also note that SQL*Net features compression for same values in the same column, so even in conventional path inserts the network overhead is not as high as you might think.

Related

How to identify if a schema in database (its structure/metadata) has changed or not

I need to identify if a schema in database has any change in metadata such as changed table columns or changed procedure/package PL/SQL-codes additional/deleted triggers etc. I've tried to make a expdp with content=metadata_only and calculated a checksum of the dump. But this doesn't work because the checksum changes every time despite the same unchanged database. How to identify if a schema in database (its structure) has changed or not? Do I have to export the plain text metadata instead? Thx.
If you only need to know who did what when, use database auditing.
If you only need to know something might have changed, but don't care what and are okay with the possibility of the change not being significant, you can use the last_ddl_time from dba_objects and compare it to the last maximum value you got on the previous check. This can be done either at the schema or object level.
If you really do need to generate a delta and know for certain that something changed, you have two choices:
Construct data dictionary queries against all application dictionary views (lots of work, because there are lot of views - columns, tables, partitions, subpartitions, indexes, index partitions, index subpartitions, lobs, lob partitions, etc, etc, etc.)
(Recommended) Use dbms_metadata to extract the DDL of the entire schema. See this answer for a query that will export almost every object you would likely care about.
Either using #1 or #2, you can then compare old/new strings or use a hash function (e.g. dbms_crypto.hash) to compute a hash value and compare that. I wrote a schema upgrade tool that does exactly this - surgically identifies and upgrades individual objects that are different than some template source schema. I use dbms_metadata to look for diffs on the hash values. You will, however, need to set certain transforms to omit clauses you don't care about and that could have arbitrary changes, or mask them with regexp_replace after the fact (e.g. a sequence will contain the current value which will always be different.. you don't want to see this as a change). It can be a bit of work.

Dynamically List contents of a table in database that continously updates

It's kinda real-world problem and I believe the solution exists but couldn't find one.
So We, have a Database called Transactions that contains tables such as Positions, Securities, Bogies, Accounts, Commodities and so on being updated continuously every second whenever a new transaction happens. For the time being, We have replicated master database Transaction to a new database with name TRN on which we do all the querying and updating stuff.
We want a sort of monitoring system ( like htop process viewer in Linux) for Database that dynamically lists updated rows in tables of the database at any time.
TL;DR Is there any way to get a continuous updating list of rows in any table in the database?
Currently we are working on Sybase & Oracle DBMS on Linux (Ubuntu) platform but we would like to receive generic answers that concern most of the platform as well as DBMS's(including MySQL) and any tools, utilities or scripts that can do so that It can help us in future to easily migrate to other platforms and or DBMS as well.
To list updated rows, you conceptually need either of the two things:
The updating statement's effect on the table.
A previous version of the table to compare with.
How you get them and in what form is completely up to you.
The 1st option allows you to list updates with statement granularity while the 2nd is more suitable for time-based granularity.
Some options from the top of my head:
Write to a temporary table
Add a field with transaction id/timestamp
Make clones of the table regularly
AFAICS, Oracle doesn't have built-in facilities to get the affected rows, only their count.
Not a lot of details in the question so not sure how much of this will be of use ...
'Sybase' is mentioned but nothing is said about which Sybase RDBMS product (ASE? SQLAnywhere? IQ? Advantage?)
by 'replicated master database transaction' I'm assuming this means the primary database is being replicated (as opposed to the database called 'master' in a Sybase ASE instance)
no mention is made of what products/tools are being used to 'replicate' the transactions to the 'new database' named 'TRN'
So, assuming part of your environment includes Sybase(SAP) ASE ...
MDA tables can be used to capture counters of DML operations (eg, insert/update/delete) over a given time period
MDA tables can capture some SQL text, though the volume/quality could be in doubt if a) MDA is not configured properly and/or b) the DML operations are wrapped up in prepared statements, stored procs and triggers
auditing could be enabled to capture some commands but again, volume/quality could be in doubt based on how the DML commands are executed
also keep in mind that there's a performance hit for using MDA tables and/or auditing, with the level of performance degradation based on individual config settings and the volume of DML activity
Assuming you're using the Sybase(SAP) Replication Server product, those replicated transactions sent through repserver likely have all the info you need to know which tables/rows are being affected; so you have a couple options:
route a copy of the transactions to another database where you can capture the transactions in whatever format you need [you'll need to design the database and/or any customized repserver function strings]
consider using the Sybase(SAP) Real Time Data Streaming product (yeah, additional li$ence is required) which is specifically designed for scenarios like yours, ie, pull transactions off the repserver queues and format for use in downstream systems (eg, tibco/mqs, custom apps)
I'm not aware of any 'generic' products that work, out of the box, as per your (limited) requirements. You're likely looking at some different solutions and/or customized code to cover your particular situation.

Issue with SSIS Lookup Cache Mode and NULL values

I’m hoping that someone may be able to help me.
My question relates to SISS, specifically the Lookup Data Flow Item and how it handles NULL values depending on the selected Cache Mode.
I have a very large dataset (72 columns, 37,000,000 records) which uses a Type 2 update methodology.
I use a lookup in the data flow to identify updates to existing record, I match on all of the relevant fields and if all the fields match then obviously the incoming record matches the existing record in the table and it is therefore discarded. If there isn’t a match then a type 2 update is performed.
Due to the large dataset and limited server resources if the Cache Mode of the Lookup is set to Full Cache, it causes the process to fail due to insufficient memory; I have therefore had to switch the Cache Mode to Partial Cache. This resolves the memory issue, but causes another issue. For some reason in Partial Cache mode a NULL value from the table does not match a NULL value in the incoming records, while if the Cache Mode is set to Full Cache then it does.
This behaviour seams quite odd and I am unable to find it documented anywhere. One way round it could be to coalesce the NULL values, but this is something I would like to avoid.
Any help would be much appreciated.
Cheers
Ben
No Cache and Partial Cache Modes use the database engine to match. In most database engines (SQL Server included) NULL does not equal NULL. NULL means an unknown value so you will never get a match. Do an isnull on all your nullable col

Effect of renaming table column on explain/execution plans

I have a table with 300+ columns and hundreds of thousands of records. I need to re-name one of the existing columns.
Is there anything that I need to be worried about? Will this operation have any effect on the explain plans etc ?
Notes:
I am working on a live production database on Oracle 11g.
This column is not being used currently. It's not populated for any of the rows and I am 100% sure none of the existing queries refer to this column.
If "working on a live production database" means that you are going to try to do this without testing in lower environments while people are working, I would strongly caution against that plan.
Existing query plans that involve the table you're doing DDL on will be invalidated so those queries will need to be hard parsed again. That can easily be an expensive operation if there are large numbers of such queries. It is certainly possible that some query plans will change because something else has changed (i.e. statistics are different, settings are different, bind variables are different, etc.) They won't change because of the column name change but the column name change may result in changed plans.
Any queries that you're executing will, obviously, need to use the new name as soon as you rename the column. That generally means that you need to do a coordinated release where you modify the code (including stored procedures) as well as the column name. That, in turn, generally implies that you're doing this as part of a build that includes at least a bit of downtime. You probably could, if you have the enterprise edition, do edition-based redefinition without downtime but that adds complexity to the process and is something that you would absolutely need to test thoroughly before implementing it in prod.

What are the consequences of adding a column to an existing HIVE table?

Suppose that a couple hundred Gigs after starting to use HIVE I want to add a column.
From the various articles & pages I have seen, I cannot understand the consequences in terms of
storage space required (double ?)
blocking (can I still read the table in other processes) ?
time (is it quick or as slow as a MysqL change ?)
underlying storage (do I need to change all the underlying files ? How can it be done using RCFile ?)
Bonus to whoever can answer the same question on structs in a HIVE column.
If you add a column to a hive table, only the underlying metastore is updated.
The required storage space is not increased as long as you do not add data
The change can be made while other processes are accessing the table
The change is very quick (only the underlying metastore is updated)
You do not have to change the underlying files. Existing records have the value null for the new column
I hope this helps.
ALTER TABLE commands modifies the METADATA only. The underlying data remains untouched. However, it is user's responsibility to ensure that the any alteration does not break the data consistency.
Also any changes to METADATA is applied to the metastore - which is most typically MySQL - in which case the response time is comparable.
Altering the definition will only modify how the files are read, not the contents of the underlying files.
If your files were tab delimited text with 3 columns, you could create a table that references those files with a scheme like new_table(line STRING) that would read the entire line without parsing out columns based upon the tab characters.
When you add a column, since there are no more delimiters in the record, it will default to NULL, as Helmut mentioned.

Resources