SQL identity column insert using pentaho data integration - etl

I am new to Pentaho data integration tool.I am trying to move data from a source table into target table ... both is SQL Server. The tables are identical and has an identity column.
Tried many options but ... it gives an error every time saying "Indentity insert is set to OFF"
Tried introducing a hop inbetween to execute a SQL statement to "SET identity_insert tblname ON" .. still dint work.
Any suggestions would be highly appreciated.
Thanks.

Putting that in a hop certainly wont work, because PDI/kettle uses a connection(s) per step. You need to put that setting in the advanced options of the database connection and then you should be ok - it will then be used for all instances of that database connection.
Also make sure you "share" your database connections, otherwise if you create them from hand in every transformation you'll need to apply that setting to every single database connection in each transformation. ( Unless you're using a database or EE repository in which case the connections are centralised so you're ok )

One other thing you can try is to remove the identity columns from the select you are using to pass from the source to the destination.
This way, you will make sure that SQL will create a new identity for each one of the rows intead of trying to insert them,

You should add a command after db connection established.

Related

Oracle: How to efficiently copy a table from one schema to another on a different database and server

I have a large table (3.5MM records) that I need to copy from one schema/database to another schema/database. I tried TOAD's copy data from table feature, but got errors and it never fully copied, in part because the connection keeps getting dropped. I'm trying the object copy feature of SQLDeveloper, and after 11 minutes, it's still copying. I tried the SQLPlus COPY statement but got a syntax error (help needed). I'm still open to extracting the data as INSERT statements that I can just run directly.
1) SQLPLUS Copy as follows:
copy from report_new/mypassword#(DESCRIPTION= (ADDRESS=(PROTOCOL=TCP)(HOST=10.15.15.20)(PORT=1541))(CONNECT_DATA=(SERVICE_NAME=STAGE))) to report/mypassword#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.18.22.25)(PORT=1550))(CONNECT_DATA=(SERVICE_NAME=DEV))) CREATE USER_USAGE_COUNT USING SELECT * FROM _USER_USAGE_COUNT
The above gives me
SQL> start copy_user_count_table.sql
SP2-0758: FROM clause missing username
2) I tried TOAD
The TOAD "Copy data to another schema" fails due to the connection getting
dropped. I set the commit threshold first to 5000 then to 500.
3) I'm trying SQLDeveloper's copy function, but I think it's not going to finish anytime soon and it gives me no real progress indications. For all I know, it could be hung but that it just doesn't want to tell me.
4) I thought about creating a datalink, but I don't have the authority to create one, and it's in a corporate environment wherein the DBA's don't respond in under 3 days.
Todo: Should I write my own Java code to just do this one record at a time?? I shouldn't have to do this, but somehow it's easier to send a man to the moon than to copy data from one schema to another.
You can use the copy command of sqlcl which is part of newer SQLdeveloper releases. The sqlcl is found in the Sqldeveloper\bin directory and is named sql.exe (Windows) or sql (Unix/Linux/Mac). The steps to follow are:
Connect to Destination database with sqlcl
sql username/password#destindationdb
Use the copy command
copy from username#sourcedatabase create newtablename using select * from sourcetable;

db2 9.5: substr function fails but left function works ok

I have this select statement, but it never ends:
select * from table where substr(field,1,3)='001'
but when I change it to:
select * from table where left(field,3)='001'
it works! thus, I think it's a resources issue. Now, I'll have to modify the statement but I want to know if it's possible to solve this problem making changes to the db parameters, maybe from:
db2 get db cfg ...
Aditional info:
Version database is 9.5 (windows).
Field is one of 3 key fields of the table.
Table content: 863820 rows
In a comment you ask "I was wondering if it's posible to change a db parameter to allow more resources available to run the first statement "
You could try autoconfigure https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.5.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0008960.html
e.g. db2 autoconfigure using mem_percent 80 apply none
to see what it would suggest (or change if you say APPLY DB AND DBM and not APPLY NONE) if you asked Db2 to use 80% of your system memory

Sequences (using as ID) issue in Oracle SQL Developer

I am using sequences to create IDs, so while executing insert stored procedure it will create unique value for ID. But after some time it is losing the definition for the sequence.
Not sure why this is happening again and again and how to solve the problem?
I am using Oracle SQL Developer and in the edit table property there is 'Identity Column' setting. See below:
Next step is setting up trigger and sequence:
It was working fine for some time until this property defaulted. Now it is not there anymore:
Still have this trigger and sequence object in the schema and able to setup again but it will break later.
How to avoid this problem in future?
I think it is just a bug/limitation in your client software, Oracle SQL Developer. The "Identity Column" tab is a handy way to create the corresponding sequence and trigger but it doesn't seem to recognise existing elements. I've just verified my own system and that's exactly what happens.
It makes sense, because adding a new sequence and trigger is a pretty straightforward task (all you need is a template) but displaying current sequence is hard given that a trigger can implement any conceivable logic. Surely it could be done but the cost-benefit ratio probably left things this way.
In short, your app is not broken so nothing needs to be fixed on your side.
This is what I received from IT support regarding the issue:
A few possibilities that might cause this:
1 - Another user with limited privileges might be editing the table using SQL Developer. In this case, if this user's privilege is not enough to obtain the sequence and/or trigger information from the database, the tool might leave the fields blank and disable it when table changes are saved.
2 - The objects are being changed or removed outside of SQL Developer, causing it to lose the information. In my tests I noticed that dropping the trigger and recreating it with the same name caused the identity property information to be lost on SQL Developer.
Even being the trigger enabled, and working for inserts it could not retrieve the information.
Then, if I run an alter trigger to enable it (even tough dba_trigger is reporting it as already enabled), SQL Developer will list the information again:
ALTER TRIGGER "AWS"."TABLE1_TRG" ENABLE;
So it looks like there are some issues with the SQL Developer, that is causing this behavior.
Next time it happen, please check if the trigger still exist on the database and is enabled with the query below:
select owner, trigger_name, TRIGGER_TYPE, TRIGGERING_EVENT, TABLE_OWNER, TABLE_NAME, STATUS
from dba_triggers
where trigger_name = 'ENTER_YOUR_TRG_NAME'; --Just change the trigger name in WHERE

Same stored procedure acts differently on two/(three) different IDEs

I just created a stored procedure in MS SQL DB using TOAD.
what it does is that it accepts an ID wherein some records are associated with, then it inserts those records to a table.
next part of the stored procedure is to use the ID input to search on the table where the items got inserted and then return it as the result set to the user just to confirm that the information got inserted.
IN TOAD, it does what is expected. It inserts date and returns information using just the stored procedure.
IN Oracle SQL developer however, it does the insert and it ends at that. It seems to not execute the 2nd part of the stored procedure which is a select stmt.
I just have a feeling that this is because of the jdbc adapter. Also why I'm asking is because I'm using a reporting tool Pentaho Report Designer and it would really make it easier if I can do 2 things at the same time. Pentaho Report Designer is also using jdbc adapters, not a coincidence maybe?
But if there are other things that I can tweak I'd really appreciate it.
This is a guess, but worth considering...
There are things called "Batches", where are sets of SQL Statements that are all sent to the server at once, and executed by the server as one set of statements, within a single server-side session. Sending a set of sql statements to the server as a batch will often result in different results than if you sent them one at a time, where each statement is executed in its own session.
I haven't used Toad (or Oracle) in a while, but as I recall, it dealt with batches differently than the other ide I used. If the second statement in your set is relying on being in the same session as the first, and in one ide it is in a separate session, then this might explain what is happening.

How to find out when an Oracle table was updated the last time

Can I find out when the last INSERT, UPDATE or DELETE statement was performed on a table in an Oracle database and if so, how?
A little background: The Oracle version is 10g. I have a batch application that runs regularly, reads data from a single Oracle table and writes it into a file. I would like to skip this if the data hasn't changed since the last time the job ran.
The application is written in C++ and communicates with Oracle via OCI. It logs into Oracle with a "normal" user, so I can't use any special admin stuff.
Edit: Okay, "Special Admin Stuff" wasn't exactly a good description. What I mean is: I can't do anything besides SELECTing from tables and calling stored procedures. Changing anything about the database itself (like adding triggers), is sadly not an option if want to get it done before 2010.
I'm really late to this party but here's how I did it:
SELECT SCN_TO_TIMESTAMP(MAX(ora_rowscn)) from myTable;
It's close enough for my purposes.
Since you are on 10g, you could potentially use the ORA_ROWSCN pseudocolumn. That gives you an upper bound of the last SCN (system change number) that caused a change in the row. Since this is an increasing sequence, you could store off the maximum ORA_ROWSCN that you've seen and then look only for data with an SCN greater than that.
By default, ORA_ROWSCN is actually maintained at the block level, so a change to any row in a block will change the ORA_ROWSCN for all rows in the block. This is probably quite sufficient if the intention is to minimize the number of rows you process multiple times with no changes if we're talking about "normal" data access patterns. You can rebuild the table with ROWDEPENDENCIES which will cause the ORA_ROWSCN to be tracked at the row level, which gives you more granular information but requires a one-time effort to rebuild the table.
Another option would be to configure something like Change Data Capture (CDC) and to make your OCI application a subscriber to changes to the table, but that also requires a one-time effort to configure CDC.
Ask your DBA about auditing. He can start an audit with a simple command like :
AUDIT INSERT ON user.table
Then you can query the table USER_AUDIT_OBJECT to determine if there has been an insert on your table since the last export.
google for Oracle auditing for more info...
SELECT * FROM all_tab_modifications;
Could you run a checksum of some sort on the result and store that locally? Then when your application queries the database, you can compare its checksum and determine if you should import it?
It looks like you may be able to use the ORA_HASH function to accomplish this.
Update: Another good resource: 10g’s ORA_HASH function to determine if two Oracle tables’ data are equal
Oracle can watch tables for changes and when a change occurs can execute a callback function in PL/SQL or OCI. The callback gets an object that's a collection of tables which changed, and that has a collection of rowid which changed, and the type of action, Ins, upd, del.
So you don't even go to the table, you sit and wait to be called. You'll only go if there are changes to write.
It's called Database Change Notification. It's much simpler than CDC as Justin mentioned, but both require some fancy admin stuff. The good part is that neither of these require changes to the APPLICATION.
The caveat is that CDC is fine for high volume tables, DCN is not.
If the auditing is enabled on the server, just simply use
SELECT *
FROM ALL_TAB_MODIFICATIONS
WHERE TABLE_NAME IN ()
You would need to add a trigger on insert, update, delete that sets a value in another table to sysdate.
When you run application, it would read the value and save it somewhere so that the next time it is run it has a reference to compare.
Would you consider that "Special Admin Stuff"?
It would be better to describe what you're actually doing so you get clearer answers.
How long does the batch process take to write the file? It may be easiest to let it go ahead and then compare the file against a copy of the file from the previous run to see if they are identical.
If any one is still looking for an answer they can use Oracle Database Change Notification feature coming with Oracle 10g. It requires CHANGE NOTIFICATION system privilege. You can register listeners when to trigger a notification back to the application.
Please use the below statement
select * from all_objects ao where ao.OBJECT_TYPE = 'TABLE' and ao.OWNER = 'YOUR_SCHEMA_NAME'

Resources