Difference between materialised view and table which is refreshed incremental - oracle

There is question that why we create materialised view. I have a table and to refresh the table incrementally I have dbms job which merges the data from different table into this. So it is equivalent to materialised view with fast refresh. Is there any difference? Which implementation is better in above two cases?

Materialized view can be refreshed on demand or on refresh frequency. It can consist of a join of two or more tables.
When creating a materialized view, you have the option of specifying whether the refresh occurs ON DEMAND or ON COMMIT. In the case of ON COMMIT, the materialized view is changed every time a transaction commits, thus ensuring that the materialized view always contains the latest data. Alternatively, you can control the time when refresh of the materialized views occurs by specifying ON DEMAND. In the case of ON DEMAND materialized views, the refresh can be performed with refresh methods provided in either the DBMS_SYNC_REFRESH or the DBMS_MVIEW packages:
Here's the documentation link
Also materialized views can be refreshed incrementally.
Your Custom Solution
Takes a lot of code and does not scale well.

Materialized views following certain conventions can be fast refreshed using a materialized view log. That means Oracle will only have to process actual changes in order to refresh the materialized view, where a merge operation would have to compare all rows every time. Therefore a materialized view would allow for much faster and more frequent refreshes especially with larger tables.

After lot of search I got something on materialised view which we can not do on normal table and it is query rewrite. Below is my finding.
SQL> GRANT GLOBAL QUERY REWRITE to mydbdba;
SQL> CREATE MATERIALIZED VIEW customers_mw ENABLE QUERY REWRITE
AS
SELECT COUNT(*) c,state_id FROM sg.customers GROUP BY state_id;
SQL> alter session set QUERY_REWRITE_ENABLED=TRUE;
Session altered.
SQL> SELECT COUNT(*) c,state_id FROM sg.customers GROUP BY state_id;
Execution Plan
Plan hash value: 799451518
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 52 | 364 | 3 (0)| 00:00:01 |
| 1 | MAT_VIEW REWRITE ACCESS FULL| CUSTOMERS_MW | 52 | 364 | 3 (0)| 00:00:01 |
SQL> alter session set QUERY_REWRITE_ENABLED=FALSE;
Session altered.
SQL> SELECT COUNT(*) c,state_id FROM sg.customers GROUP BY state_id;
Execution Plan
Plan hash value: 1577413243
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 52 | 156 | 327 (1)| 00:00:01 |
| 1 | HASH GROUP BY | | 52 | 156 | 327 (1)| 00:00:01 |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 50000 | 146K| 326 (1)| 00:00:01 |
you can see in above example how a materialised view is used instead of scanning whole table again.

Related

How do I delete all rows of a database table using NiFi Processor? [duplicate]

I have a table that consists of 2 columns
cattlegendernm | price
----------------+-------
Female | 10094
Female | 12001
Male | 12704
I would like to add another column with filename to this table using a nifi processor.
Which Processor and SQL query should I use for this ?
Using a PutSQL processor would be suitable to add such a DDL statement like
ALTER TABLE T ADD COLUMN filename VARCHAR(150)
into the SQL Statement attribute of the properties tab as illustrated below
where PostGRES_DB represents a pre-configured Controller Service to interact with the database

Check constraint to enforce referential integrity?

Can we use a check constraint to enforce referential constraint? Let's say I have a column that contains a UUID; this UUID can reference either table A or B depending on the value of a 2nd column.
------------------------------------------
|ID | Type | PK in Other Table |
------------------------------------------
|1 | Employee | 500 |
------------------------------------------
|2 | Store | 7000 |
------------------------------------------
so record #1 points to a record in the employee table, #2 points to a record in store table with the respective PK. so the goal is to enforce the referential integrity based of "Type".
Not with this data model, no.
You could have separate columns, i.e. employee_id and store_id, with foreign key constraints to the appropriate tables and a check constraint that ensures that only the correct column for the particular type is entered.
There are potentially other ways to set up the data model depending on what you're actually modeling. I'm a bit hard-pressed to think of employees and stores as separate subtypes of some higher level type. But if your actual use case is something else, it potentially makes sense to have a supertype table that is the actual parent that all the tables are children of.

How can I check the database of a table?

I have a table name SOME_TABLE but I don't know the databse it belongs to.
How do I check it?
I suppose, it's impossible to do it from Hive level, cause you need to select database first...
Pawel
Query the metastore directly
Demo
Hive
create table SOME_TABLE (i int);
Metastore (MySQL)
use metastore;
select d.name
from TBLS as t
join DBS as d
on d.DB_ID =
t.DB_ID
where t.TBL_NAME = 'some_table'
;
+----------+
| name |
+----------+
| local_db |
+----------+
You can use the Hive command SHOW DATABASES; to list all databases and then use the SHOW TABLES IN database_name LIKE 'table_name'; command to to see if the table exist in the database.

Can oracle merge bitmap indexes during fast full scan?

I have a large fact table with 300M rows and 50 columns in it. There are multiple reports over this table and each report uses only couple out of 50 columns from the table.
Each column in the fact table is indexed with BITMAP INDEX. The idea is to use these indexes as a one-column version of the original table assuming that oracle could merge BITMAP INDEXes easily.
If I use several columns from the table in WHERE statement, I can see that oracle is able to merge these indexes effectively. There is BITMAP AND operation in execution plan as expected.
If I use several columns from the table in SELECT statement, I can see that depending on columns selectivity, oracle is either performing unneeded TABLE ACCESS or BITMAP CONVERSION [to rowids] and then HASH JOIN of these conversions.
Is there any way to eliminate the HASH JOIN in case of joining several BITMAP INDEXes? Is there any hint in oracle to force BITMAP MERGE when columns appear in SELECT statement rather than WHERE?
Intuitively it seems like the HASH JOIN for BITMAP INDEXes is unneeded operation in SELECT statement taking into account it is indeed unneeded in WHERE statement. But I couldn't find any evidence that oracle could avoid it.
Here are some examples:
SELECT a, b, c /* 3 BITMAP CONVERSIONs [to rowids] and then 2 unneeded HASH JOINS */
FROM fact;
SELECT a, b, c, d, e /* TABLE ACCESS [full] instead of reading all the data from indexes */
FROM fact;
SELECT a /* BITMAP INDEX [fast full scan] as expected*/
FROM fact
WHERE b = 1 and c = 2; /* BITMAP AND over two BITMAP INDEX [single value] as expected */
Are there any hints to optimize examples #1 and #2?
In production I use oracle11g but I tried similar queries on oracle12c and it look like in both versions of oracle behave the same.
After some research it looks like oracle12c is incapable of joining BITMAP INDEXes if they are used in SELECT clause efficiently.
There is no dedicated access path to join BITMAP INDEXes in SELECT clause and so HASH JOIN is used in this case.
Oracle cannot use BITMAP MERGE access path in this case as it performs OR operation between two bitmaps:
How Bitmap Merge Works
A merge uses an OR operation between two bitmaps.
The resulting bitmap selects all rows from the first bitmap,
plus all rows from every subsequent bitmap.
Detailed analysis showed that only HASH JOIN was considered by cost optimizer in my case. I wasn't able to find any evidence that BITMAP INDEXes could be used efficiently in SELECT statement. Oracle documentation suggests using BITMAP INDEXes only in WHERE clause or joining fact to dimensions.
And either of the following are true:
The indexed column will be restricted in queries (referenced in the
WHERE clause).
or
The indexed column is a foreign key for a dimension table. In this
case, such an index will make star transformation more likely.
In my case it is neither of the two.
I think what you are seeing is essentially the "index join access path" in action :) Oracle needs to join the data from both scans on ROWID, to stitch the rows together. The hash join is the only method open to Oracle. The fact that you are using bitmap indexes is actually irrelevant; you see the same behaviour with b-tree indexes
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1973K| 43M| 137K (30)| 00:00:06 |
| 1 | VIEW | index$_join$_001 | 1973K| 43M| 137K (30)| 00:00:06 |
|* 2 | HASH JOIN | | | | | |
|* 3 | INDEX FAST FULL SCAN| IO | 1973K| 43M| 17201 (78)| 00:00:01 |
|* 4 | INDEX FAST FULL SCAN| IT | 1973K| 43M| 17201 (78)| 00:00:01 |
-------------------------------------------------------------------------------------------

When the VIEW table calculate or change the value?

I create a View table, like
CREATE VIEW a AS SELECT b.kcu_id, sum(b.price), FROM b GROUP BY b.kcu_id
I create a view because my table contains too many rows, like 10000 or more. And it is too costly if we must sum that many rows every time get operation is called.
I use spring data jpa to get the data from the view. What I want to ask is, when I use the getPrice method to get the sum of prices, it is calculate the sum when I use the get method or the database calculate the sum when there are a change in column performance in b table in database?
For your info, price column is rarely change in my case.
If it's just a "regular" view like you have in your example, the data will be calculated anew every time you query it. A view, after all, is just a slightly modified view of the data in the table at any given point.
You can have what they call "materialised views", which are more like a physical table that's updated from the underlying table periodically but you generally have to do that differently that with a normal "create view" command.
With PostgreSQL, the commands you're looking for are:
create materialized view
refresh materialized view
The former creates a materialised view in pretty much the same way as your create view and also populates the view with data (unless you've used the with no data clause). It also remembers the underlying query used to create the view (like any view does) so that you can update the data at some later point (which is what the refresh command above does).
By way of example, the following PostgreSQL code:
create table below (val integer);
insert into below values (42);
create materialized view above as select * from below;
insert into below values (99);
select * from below;
select * from above;
refresh materialized view above;
select * from above;
will materialise the view when the table contains only the 42 and later refresh it to include the 99 as well:
Underlying table 'below' with both data items:
| val |
|-----|
| 42 |
| 99 |
Materialised view 'above', created before the insert of 99:
| val |
|-----|
| 42 |
Materialised view 'above' after refreshing the view:
| val |
|-----|
| 42 |
| 99 |
Provided you're willing to live with the possibility that the data may be a little out of date, that's probably the best way to do it. Given your comment that the "price column is rarely change[d]", that may not be an issue.
However, I'm actually quite surprised that 10,000 rows is causing you an problem, it's not really that big a table. Hence you may want to look at other possible fixes, such as ensuring you have an index on the kcu_id column.

Resources