Informatica - failing due to unique constraint violation for the primary key column - informatica-powercenter

A sequence generator transformation is used to generate the primary key column which generates this key in sequential order for the incoming records and stores the maximum key value used in each session at the end of the session in the informatica repository. The issue is likely to occur when two or more workflows start running with the same key value stored in the repository and the workflow with more number of records completed first. In this case the maximum value updated by the first workflow gets updated with the lesser value (which is the maximum value of the second workflow) when the workflow with less number of records completes. Because of this in the next ODS run, the first workflow gets a value for sequence generator which is already used in the previous run and thus causes unique constraint violation.

Related

Hbase auto increment any column/row-key

I am new to Hbase
is it possible to/how can I auto increment row-key in Hbase? (like for each insert row-key has to be auto increment itself)
or is it possible to auto-increment any other column ? (like for each insert this column has to be auto-increment by 1)
Monolitically increasing row keys are not recommended in HBase, see this for reference: http://hbase.apache.org/book/rowkey.design.html, p.6.3.2. In fact, using globally ordered row keys would cause all instances of your distributed application write to the same region, which will become a bottleneck.
If you can avoid using auto-increment IDs and need to have just unique IDs in a distributed system, you can use something like "hostname" + "PID" + "TIMESTAMP" as a key. This way it would be unique for each row
If you are sure you need a global autoincrement in a table (it can be a key or some value from the column), you can use incrementColumnValue call - have a separate row in your table (or create a dedicated table for this) that would store the actual value, and the process will call incrementColumnValue before inserting new row to get the next value. But this way you cannot guarantee that there would be no gaps: if the client will fail after calling the incrementColumnValue, you might get the counter incremented but the row won't be inserted.
In short, all of the proposed solutions are client-side, there is no server-side implementation for this feature in HBase

Oracle sequence generator within interval

I'm using oracle 11gr2 and for the product table when a new product is inserted I need to assign an autoincrement id going from 1 to 65535. Product could be then be deleted.
When I reach the 65535th, I need to scan the table to find a free hole for assigning new ID.
As I have this requirement oracle sequence could not be used, so I am using a function (tried also a trigger on insert) in order to generate a free id...
The problem is that I could not handle batch insert for example and I have concurrency problems...
How could I solve this ? By using some sort of external Id generator ?
Sounds like an arbitrary design. Is there a good reason for having a 16-bit max product id or for reusing IDs? Both constraints are bad practice.
I doubt any external generator is going to provide anything that Oracle doesn't already provide. I recommend using sequences for batch insert. The problem you have is how to recycle the IDs. Oracle plain sequences don't track the primary key, so you need a solution to find recycled keys first, then fallback to the sequence perhaps.
Product ID Recycling
Batch Inserts - Use sequence for keys the first time you load them. For this small range, set NOCACHE on the sequence to eliminate gaps.
Deletes - When a product is deleted, instead of actually deleting the row, set a DELETED = 'Y' flag on the row.
Inserts - Update the first record available with DELETED flag set, or either select the min ID from product table where DELETED = 'Y'. Update record with new product info (but same ID) and set DELETED = 'N'
This ensures you always recycle before you insert new sequence IDs
If you want to implement the logic in the database, you can create a view (VIEW$PRODUCTS) where DELETED = 'N' and an INSTEAD OF INSERT trigger to do the insert.
In any scenario, when you run out of sequences (or sequence wraps), you are out of luck for batch inserts. I'd reconsider that part of the design if I were you.

Sqlite appending data performance linear degradation, is this solvable?

I have a test set up to write rows to a database.
Each transaction inserts 10,000 rows, no updates.
Each step takes a linear time longer then the last.
The first ten steps took the following amount of time in ms to perform a commit
568, 772, 942, 1247, 1717, 1906, 2268, 2797, 2922, 3816, 3945
By the time it reaches adding 10,00 rows to a table of 500,000 rows, it takes 37149 ms to commit!
I have no foreign key constraints.
I have found using WAL, improves performance (gives figures above), but still linear degradation
PRAGMA Synchronous=OFF has no effect
PRAGMA locking_mode=EXCLUSIVE has no effect
Ran with no additional indexes and additional indexes. Made a roughly constant time difference, so was still a linear degradation.
Some other settings I have
setAutocommit(false)
PRAGMA page_size = 4096
PRAGMA journal_size_limit = 104857600
PRAGMA count_changes = OFF
PRAGMA cache_size = 10000
Schema has Id INTEGER PRIMARY KEY ASC, insertion of which is incremental and generated by Sqlite
Full Schema as follows (I have run both with and without indexes, but have included)
create table if not exists [EventLog] (
Id INTEGER PRIMARY KEY ASC,
DocumentId TEXT NOT NULL,
Event TEXT NOT NULL,
Content TEXT NOT NULL,
TransactionId TEXT NOT NULL,
Date INTEGER NOT NULL,
User TEXT NOT NULL)
create index if not exists DocumentId ON EventLog (DocumentId)
create index if not exists TransactionId ON EventLog (TransactionId)
create index if not exists Date ON EventLog (Date)
This is using sqlite-jdbc-3.7.2 running in a windows environment
SQLite tables and indexes are internally organized as B-Trees. In tables, the Rowid is the sorting key. (Your INTEGER PRIMARY KEY is the Rowid.)
If your inserted IDs are not larger than the largest ID already in the table, then the records are not appended, but inserted somewhere in the middle of the tree. When inserting enough records in one transaction, and if the distribution of IDs is random, this means that almost every page in the database must be rewritten.
To avoid this,
insert the IDs in increasing order; or
insert the IDs as NULL so that SQLite chooses the next value; or
prevent SQLite from using your ID field a Rowid by declaring it as INTEGER UNIQUE (or just INTEGER if you don't need the extra check/index), thus making the table ordering independent of your ID.
In the case of indexes, inserting an indexed field with a random distribution requires that the index is updated at a random position. Like with tables, when inserting enough records in one transaction, this means that almost every page in the index must be rewritten.
When you're loading large amounts of data, it is recommended to do this without any indexes and to recreate them afterwards. (Unlike some other databases, SQLite has no function to temporarily disable indexes; just drop them.)
FYI, although I haven't limited the structure in terms of the content of the key, in 99.999% of cases, it will be a guid. So to resolve the performance issue I just wrote an algorithm for generating sequential guids using a time based value for the first 8 hex digits. This worked very well, even if blocks of guids are generated using early time values.

Data generation plan failed because of duplicate PRIMARY KEY constraints

In VS2010 database project, I try to generate test data for a table that has existing data (by clicking 'No' when prompted). Identity column (that is the primary key) is SQL computed value so I can not change the data generator for that column.
So why the data generation plan doesn't recognize existing primary key values in the database, but tries always to insert duplicates, i.e., seems that the plan is starting always from the seed value, not from the next available identity column value? Can I force the data generation plan to start from some other seed value for this particular table?
To answer my second question: seems that from the schema view, I can set some other seed to start from. Still, I don't understand why the data generation plan doesn't automatically recogzine the next available IDENTITY value.

Unique constraint violation during insert: why? (Oracle)

I'm trying to create a new row in a table. There are two constraints on the table -- one is on the key field (DB_ID), the other constrains a value to be one of several the the field ENV. When I do an insert, I do not include the key field as one of the fields I'm trying to insert, yet I'm getting this error:
unique constraint (N390.PK_DB_ID) violated
Here's the SQL that causes the error:
insert into cmdb_db
(narrative_name, db_name, db_type, schema, node, env, server_id, state, path)
values
('Test Database', 'DB', 'TYPE', 'SCH', '', 'SB01', 381, 'TEST', '')
The only thing I've been able to turn up is the possibility that Oracle might be trying to assign an already in-use DB_ID if rows were inserted manually. The data in this database was somehow restored/moved from a production database, but I don't have the details as to how that was done.
Any thoughts?
Presumably, since you're not providing a value for the DB_ID column, that value is being populated by a row-level before insert trigger defined on the table. That trigger, presumably, is selecting the value from a sequence.
Since the data was moved (presumably recently) from the production database, my wager would be that when the data was copied, the sequence was not modified as well. I would guess that the sequence is generating values that are much lower than the largest DB_ID that is currently in the table leading to the error.
You could confirm this suspicion by looking at the trigger to determine which sequence is being used and doing a
SELECT <<sequence name>>.nextval
FROM dual
and comparing that to
SELECT MAX(db_id)
FROM cmdb_db
If, as I suspect, the sequence is generating values that already exist in the database, you could increment the sequence until it was generating unused values or you could alter it to set the INCREMENT to something very large, get the nextval once, and set the INCREMENT back to 1.
Your error looks like you are duplicating an already existing Primary Key in your DB. You should modify your sql code to implement its own primary key by using something like the IDENTITY keyword.
CREATE TABLE [DB] (
[DBId] bigint NOT NULL IDENTITY,
...
CONSTRAINT [DB_PK] PRIMARY KEY ([DB] ASC),
);
It looks like you are not providing a value for the primary key field DB_ID. If that is a primary key, you must provide a unique value for that column. The only way not to provide it would be to create a database trigger that, on insert, would provide a value, most likely derived from a sequence.
If this is a restoration from another database and there is a sequence on this new instance, it might be trying to reuse a value. If the old data had unique keys from 1 - 1000 and your current sequence is at 500, it would be generating values that already exist. If a sequence does exist for this table and it is trying to use it, you would need to reconcile the values in your table with the current value of the sequence.
You can use SEQUENCE_NAME.CURRVAL to see the current value of the sequence (if it exists of course)

Resources