Specifing no of records to delete in Tibco JDBC Update activity - tibco

how to specify no of records to delete in Tibco JDBC Update activity in batch update mode.
Actually I need to delete 25 million of records from the database so I wrote Tibco code to do the same and it is taking lot of time .. So I am planning to use Batch mode in Delete query so I don't know how to specify no of records in JDBC Update activity.
Help me if any one has any idea.. thanks

From the docs for the Batch Update checkbox:
This field is only meaningful if there are prepared parameters in the
SQL statement (see Prepared Parameters).
In which case the input will be an array of records. It will execute the statement once for each record.
To avoid running out of memory, you will still need to iterate over the 25mil, but you can iterate in groups of 1000 or 10000.
If this is not something you would do often (deleting 25M rows, sounds pretty one-off), an alternative is to use BW to create a file containing the delete statements and then giving the file to a DBA to execute.

please use subset feature of jdbc palette!! Let me know if you face any issues?

I would suggest two points:
If this is an one time activity then it is not adviced to use Tibco BW code for that. SQL script should be the better alternative.
When you say 25 million records- what criteria is this based on. It can be achieved through subset iteration .But there should be proper load testing in the Pre - Prod environment to check that the process is not causing any memory/DB issue.
You can also try using SQL procedure and invoking the same through BW.

Related

Impala query with LIMIT 0

Being production support team member, I investigate issues with various Impala queries and while researching on an issue , I see a team submits an Impala query with LIMIT 0 which obviously do not return any rows and then again without LIMIT 0 which gives them result. I guess they submit these queries from IBM Datastage. Before I question them why they do so.. wanted to check what could be a reason for someone to run with LIMIT 0. Is it just to check syntax or connection with Impala? I see a similar question discussed here in context of SQL but thought to ask anyway in Impala perspective. Thanks Neel
I think you are partially correct.
Pls note, limit will process all the data and then apply limit clause.
LIMIT 0 is mostly used to -
to check if syntax of SQL is correct. But impala do fetch all the records before applying limit. so SQL is completely validated. Some system may use this to check out the sql they generated automatically before actually applying it in server.
limit fetching lots of rows from a huge table or a data set every time you run a SQL.
sometime you want to create an empty table using structure of some other tables but do not want to copy store format, configurations etc.
dont want to burden the hue/any interface that is interacting with impala. All data will be processed but will not be returned.
performance test - this will somewhat give you an idea of run time of SQL. i used the word somewhat because its not actual time to complete but estimated time to complete a SQL.

Why informatica fetches more records from source when source itself has less records

I have an issue in production env, one of the work flow is running more tgan one day and inserting records in to sql server db. It s just direct load mapping, there is no sq over ride as well. Monitor shows sq count as 7 million and inseting same no of records inyo target. But source db shows around 3 million records only. How can this be possible?
Have you checked if the source qualifier is joining more than one table? A screenshot of the affected mapping pipeline and obfuscated logfile would help.
Another thought... given your job ran for a day, were there any jobs ran in that time to purge old records from the source table?
Cases when I saw this kind of things happening:
There's a SQL Query override doing something different than I thought (eg. joining some tables)
I'm looking at a different source - verify the connections and make sure to check the same object on the same database at the same server the PowerCenter is connecting to.
It's a reusable session being executed multiple times by different workflows. In such case in workflow monitor it may happen that Source/Target Statistics will refer to another execution.

How to enhance neo4j batch insertion?

I have an Oracle DB of roughly 20 million record. I used the BatchInserter to insert the data in my model.
The problem is that I have to loop over a result set containing the whole 20 million data to get the needed properties to insert. But it take too long time to do just the loop process.
Anyone tried something like that? and What is the best way to do it in an optimum time?
Can you share more details? Where do you have to loop?
Check http://neo4j.org/develop/import for some options.
If you have JDBC you can also drive the import directly from your JDBC results.
Just loop twice over the results, once for nodes and once for rels.

Exporting 8million records from Oracle to MongoDB

Now I have an Oracle Database with 8 millions records and I need to move them to MongoDB.
I know how to import some data to MongoDB with JSON file using import command but I want to know that is there a better way to achieve this regarding these issues.
Due to the limit of execution time, how to handle it?
The database is going up every seconds so what's the plan to make sure that every records have been moved.
Due to the limit of execution time, how to handle it?
Don't do it with the JSON export / import. Instead you should write a script that reads the data, transforms into the correct format for MongoDB and then inserts it there.
There are a few reasons for this:
Your tables / collections will not be organized the same way. (If they are, then why are you using MongoDB?)
This will allow you to monitor progress of the operation. In particular you can output to log files every 1000th entry or so to get some progress and be able to recover from failures.
This will test your new MongoDB code.
The database is going up every seconds so what's the plan to make sure that every records have been moved.
There are two strategies here.
Track the entries that are updated and re-run your script on newly updated records until you are caught up.
Write to both databases while you run the script to copy data. Then once you've done the script and everything it up to date, you can cut over to just using MongoDB.
I personally suggest #2, this is the easiest method to manage and test while maintaining up-time. It's still going to be a lot of work, but this will allow the transition to happen.

Hibernate with Oracle JDBC issue

I have a select query which takes 10 min to complete as it runs thru 10M records. When I run thru TOAD or program using normal JDBC connection I get the results back, but while running a Job which uses Hibernate as ORM does not return any results. It just hangs up ...even after 45 min? Please help
Are you saying you trying to retrieve 10M records using an ORM like hibernate?
If this is the case you have one big problems, you need to redesign your application because this is not going to work, and about why it hangs up, well, I bet is because it runs out of memory.
Have you enabled SQL output for Hibernate? You need to set hibernate.show_sql to true in order to do that.
Once that's done, compare the generated SQL with the one you're been running through TOAD. Are they exactly the same or not?
I'm going to venture a guess here and say they're not because once SQL is generated Hibernate does nothing fancy - connection is taken from a pool; prepared statement is created and executed - so it should be no different from JDBC.
Thus the question most likely is how can your HQL be optimized. If you need any help with that you'll have to post the HQL in question as well as appropriate mappings / table schemas. Running explain on query would help as well.

Resources