What's the best way to load huge volume tables using Informatica? [closed] - oracle

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Currently, in our project, we are using Informatica for Data loading.
We have a requirement to load 100 tables (in future it will increase) and each has 100 Million records, and we need to perform delta operation on that. What might be the best way to perform this operation in an efficient way?

If it's possible, try truncate and load. This way after each run you will have a full, fresh dump.
If you can't truncate the targets and need the delta, get some timestamp or counter that will allow to read modified rows only - like new and updated. Some "upddated date". This way you will limit the number of data being read. This will not let you do the deletes, though. So...
Create a separate flow for seeking deleted rows, that will not read the full row, but IDs only. This will still need to check all rows, but limited to just one column, so as a result it should be quite efficient. Use it to delete rows in target - or just to mark them as deleted.

Related

Data structure for dealing with millions of record [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Which data structure is appropriate for operating over millions of records and later need to iterate over it.
While simple linked list might be sufficient for your needs, in case you also need to be able to maintain records in sorted order, and efficiently access records or begin iteration at a arbitrary point, I would recommend looking in to using a B-tree.
In case you want to persist it to disk, you should use a key-value store, which often use B-tree's (or LSM Trees) "under the hood" as well as providing ACID guarantees. Examples include LMDB, BerkeleyDB, LevelDB
In short, use a database.

What are the performance improving techniques in HBASE? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
It can be while creating a table or while using other queries like Inserting, updating, deleting on a table.
I understood that using options like BloomFilter, BlockCache can have an impact. But I would like to know the other techniques that will improve the overall throughput. Also can anyone show how to add a BloomFilter on a Hbase table. I'd like to try it for practicing.
Any help is appreciated.
You question is too general. In order to know how to properly build you DataStore in HBase you should understand its internal logic of the storage and how data is distributed across the regions. This is probably the main place for start. I would recommend you to get acquainted with LSM-tree and how HBase implements it in this article. After this I would advice you to read about the proper design of the data schema here as it would play the main role in your performance. Correct schema with good key would make your data properly distributed across the nodes and would avoid you from having such thing as hotspotting. Then you can start looking through optimization techniques like blume filters, BlockCache, custom secondary indexes and other stuff.

Collection of stats in oracle [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Collecting Stats in oracle- How Performance gets improved?
When doing collect stats on fields/indexes , system collects the information like: total row counts of the table, how many distinct values are there in the column, how many rows per value, is the column indexed, if so unique or non unique etc
The above information are known as statistics.
1.How Performance gets improved?
2.How does the Parsing Engine/Cost Based Optimizer(CBO) use the statistics for the better performance of a query?
3.Why do i need to collect stats on the indexed columns , despite the fact
using indexed columns in where clause/joins itself will give better performance?
The above information are known as statistics. so How Performance gets improved?
Because the more and accurate information will let the optimizer decide for a better execution plan.
For example,
When you try to reach your destination for the first time, you gather information about the routes, directions, landmark etc. Once you reach your destination, you have all the information gathered, and the next time you would reach your destination using the shortest path or the best way to reach in least time.

Oracle - Intervals timestamp [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What is the best practice to manage time intervals in Oracle? For example: I have a room that will be rented between 8:15 till 9:00. So I have at least 2 fields: dt_start and dt_end, I suppose. I can not permit to enter a rent between 8:45 till 9:20. So how would be the best table structure for that? Thanks
There is no clear consensus on the best way to implement this. The answer certainly depends a great deal on your exact situation. The options are:
Table with unique constraint on ROOM_ID and a block of time. This is only realistic if the application allocates a reasonably small amount of time using reasonably large blocks. For example, if a room can only be allocated for at most a week, 5 minutes at a time. But if reservations are to the second, and can span over a year, this would require 31 million rows for one reservation.
Trigger. Avoid this solution if possible. The chance of implementing this logic in a trigger that is both consistent and concurrent is very low.
Materialized view. This is my preferred approach. For example, see my answer here.
Enforced by the application. This only works if the application can serialize access and if no ad hoc SQL is allowed.
Commercial Tool. For example, RuleGen.
BEFORE INSERT TRIGGER is the best way to accomplish your need.
In trigger, figure out that the new time is not conflicting the current time of your particular room, and if so you can Rais Error, otherwise let the update happen.

Disk / Data read increase after putting on an index [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a small query that runs pretty fast. And somehow I thought adding an index to an unindexed collumn would make it faster but turned out it didn't. In fact, it does increase my disk reads and execution time. What I'd like to ask is can someone explain me a detailed info about how the index works and why it could decrease performance rather than increase it.
Thanks in advance!
PS : My RDBMS : Oracle
Entirely possible on a small table. If the table is truly small it could be that the table can be read entirely into memory with a single read, and a full table scan can be performed entirely in memory. Adding an index here would require reading at least a single index page, followed by reading the data page, for a doubling of the I/O's. This is an unusual case but not unheard of.
However, this is just guesswork on my part. To truly find out what's going on grab the execution plan for your query with the index on, drop the index, and grab the execution plan without the index. Compare the plans, and decide if you want to re-add the index.
Share and enjoy.

Resources