Partitions in Informatica / Oracle tables - etl

In my project they talk about Informatica job to drop and the re-create 'weekly/monthly partitions' every time it runs. Does that mean that the weekly/monthly partitions need to be created in Informatica or Oracle tables?

Yes it means you need to have partitioned table , you can have range partition, in your case it can be daily/weekly/monthly partition. when you create new one , if older data is not required you can delete partition.
Informatica reference
Oracle reference

Related

How to add new partitions automatically in partitioned table based on data inserted in GreenPlum.?

I have a partitioned table in greenplum(modeled after psql), which has been partitioned with specific range of values.
Now, i have to insert the data again into the same table. New values for Partitions might overlap with existing ones. I have created alter command with new start and end dates. But, if the overlaps are there, the command fails. So, i need to create partition for each date, in order to avoid whole command failure.
Just wondering, if there is a way in greenplum to create partitions based on the inserted data automatically, just like hive does.
thanks for your help.
Greenplum does not (currently) create additional partitions for data which does not fit into an existing partition.
If you have a default partition on the table it will receive all the records which do not fit into one of the specified partitions. You can then use ALTER TABLE ... SPLIT DEFAULT PARTITION (see the documentation if required) to create the new partitions for any new dates at the end of the load batch.

How does partitioning is generated on oracle Exadata?

I want to know how oracle created it because i want to be able to have all month partition in single TBS for back-up purposes ,example of partition name oracle automatically generated is 'SYS_P321847'

Refresh one hive table from another hive table

I have a few Hive tables that i am bringing in from RDBMS using Sqoop incremental imports every hour and staging them. I am joining these tables and creating new dimension tables. Whenever i bring in new rows from RDBMS into Hive staging tables, I have to refresh the dimension tables. If there are no new rows, the refresh of dim tables should not be done. The hive version I'm using does not have ACID features.
Need some advice on how this could be achieved in hive.
You can INSERT new data in existing Hive tables, like any other database. And Hive also supports the WHERE NOT EXISTS clause.
INSERT INTO TABLE MyDim
SELECT Id, Blah1, Blah2
FROM MySource s
WHERE NOT EXISTS
(SELECT 1 FROM MyDim z WHERE z.Id =s.Id)
But there is a catch: each INSERT will create a new HDFS file, even when there are zero records involved. Too much fragmentation will reduce performance over time.
A weekly "compaction" job would be helpful (e.g. rename the fragmented table, re-CREATE the table, INSERT OVERWRITE from renamed table, drop renamed)

Hadoop & Hive as warehouse: daily data deliveries

I am evaluating the combination of hadoop & hive (& impala) as a repolacement for a large data warehouse. I already set up a version and performance is great in read access.
Can somebody give me any hint what concept should be used for daily data deliveries to a table?
I have a table in hive based on a file I put into hdfs. But now I have on a daily basis new transactional data coming in.
How do I add them ti the table in hive.
Inserts are not possible. HDFS cannot append. So whats the gernal concept I need to follow.
Any advice or direction to documentation is appreciated.
Best regards!
Hive allows for data to be appended to a table - the underlying implementation of how this happens in HDFS doesn't matter. There are a number of things you can do append data:
INSERT - You can just append rows to an existing table.
INSERT OVERWRITE - If you have to process data, you can perform an INSERT OVERWRITE to re-write a table or partition.
LOAD DATA - You can use this to bulk insert data into a table and, optionally, use the OVERWRITE keyword to wipe out any existing data.
Partition your data.
Load data into a new table and swap the partition in
Partitioning is great if you know you're going to be performing date based searches and gives you the ability to use options 1, 2, & 3 at either the table or partition level.
Inserts are not possible
Inserts are possible ,like you can create a new table and insert the data from new table to old table.
But simple solution is You can load data of the file into Hive table with the below command.
load data inpath '/filepath' [overwrite] into table tablename;
If you use overwrite then only existing data replced with new data otherwise It is appending only.
You can even schedule the script by creating a shell script.

how to get change global_stats for a particular partition to YES

i have sql table which has partitions.
I recently created one partition. Even if i enter data to satisfy that partition , my no of records for this partition remains null. Its not getting increased.
the only difference i found in the existing and newly creating partitions is GLOBAL_STATS(in table All_tab_partitions) is YES for all, except the one i have created now.
Please guide me in this.
Oracle doesn't recompute statistis automatically for every row inserted. You can analyze the partition manualy. There is also no relation to global_stats.

Resources