We have a transactional database with just 2 weeks of data and another archive database which holds data older than 2 weeks. Both DBs share the same schema structure and are in separate servers. We have a reporting application which queries data from both these databases where the user selects which database he wants to query by using a dropdown selection. In order to improve user experience we are thinking to do away with the dropdown selection by making the DB selection transparent in the background. Below are the few options we had in mind
Use UNION for the 2 select queries via DB links
Query DB1 first and if no records query DB2
Since the data volume is more we are apprehensive about our choices.
Appreciate if anyone has any other suggestions on how to approach this.
In my particular opinion, the best two choices are:
always give the user the data newer than a relative date (e.g. the last three months of data).
always give the user the last n data (e.g. the newest 250 rows).
Give all data will be inefficient when you have a big dataset.
But if you want to strengthen the autonomy and protect the user's work (two important design principles in user interfaces) then you must let the user configure the relative time or the number of data items desired. Or you can also let the user explore all/older data in particular situation (e.g. using a special window, a pagination system, a particular interface or a completly new use case).
Let's see two examples. (I assume that user is querying the server with newest data and OLD is the name of the dblink you use to reference the server with the data older than two weeks. I also assume that the target table is named DATATABLE and the column with the date is called DATADATE).
To retrieve the last three months (first choice):
SELECT * FROM DATATABLE
UNION ALL
SELECT * FROM DATATABLE#OLD WHERE MONTHS_BETWEEN(SYSDATE, DATADATE) >= 3;
And, to retrieve the last 250 rows (second choice):
SELECT *
FROM (SELECT * FROM DATATABLE ORDER BY DATADATE DESC)
WHERE ROWNUM <= 250;
UNION ALL
SELECT *
FROM (SELECT * FROM DATATABLE#OLD ORDER BY DATADATE DESC)
WHERE ROWNUM <= (250 - (SELECT COUNT(*) FROM DATATABLE));
Related
There is a query having multiple inner joins. It involves two views, of which one view is based on four tables, and total there are four tables(including two views).
The same query with the same amount of data in the source tables runs in both, Oracle and DB2. In DB2, surprisingly, it takes 2 minutes to load 3 million records. While in Oracle, it is taking two hours. Same indexes are on all source tables in both the environments. Is the behavior of views (when used in joins) different in both environments (Oracle vs DB2)?
a dummy query I am sharing :-
INSERT INTO TABLE_A
SELECT
adf.column1,
adf.column2,
dd.column3,
SUM(otl.column4) column4,
SUM(otl.column5) column5,
(Case when SUM(otl.column5) = 0 then 0
else round(CAST(SUM(otl.column4) AS DECIMAL(19,2)) /abs(CAST(SUM(otl.column4) AS DECIMAL(18,2))),4)
end) taxl_unrlz_cgl_pct
FROM
view_a adf
INNER JOIN table_b hr on hr.hh_ref_id = adf.hh_ref_id
AND hr.col_typ_cd = 'FIRM'
AND hr.col_end_dt = TO_DATE('1/1/2900','MM/DD/YYYY')
INNER JOIN dw.table_c ar on ar.colb_id = adf.colb_id
AND ar.col_cd = '#'
AND ar.col_num BETWEEN 10000000 AND 89999999
AND ar.col_dt IS NULL
INNER JOIN table_d dd on dd.col_id = adf.col_id
INNER JOIN view2 otl ON otl.cola_id = ar.cola_id
GROUP BY adf.column1, adf.column2, dd.column3;
Technically, both DB2 and Oracle will try to rewrite the query in most efficient way possible using the base query that you have coded. But one of the common (but not frequent) issues that I have seen when using multi-table view is DBMS not being able to rewrite the query using underlying tables. So depending on complexity of the view itself and sometime the additional joins, DBMS may not be able to rewrite the query to use the underlying tables properly and hence resulting in not being able to use the indexes on the underlying tables used in the view. When this happens, the view itself acts like a materialized table (work table) and query goes for table scan on the materialized table.
There is no consistent pattern on when such issue can happen. So you will need to check on a case by case basis.
Since you are mentioning about 2 hrs vs 2 minutes, in most probability that might be the case. So you will need to check the access path on both Oracle and DB2. But you will also need to make sure that stats are updated and access path is based on latest stats on DBMS. Else it won't be apples to apples compare.
I am a SQL Server guy and just started working on Netezza, one thing pops up to me is a daily query to find out the size of a table filtered out by year: 2016,2015, 2014, ...
What I am using now is something like below and it works for me, but I wonder if there is a better way to do it:
select count(1)
from table
where extract(year from datacolumn) = 2016
extract is a built-in function, applying a function on a table with size like 10 billion+ is not imaginable in SQL Server to my knowledge.
Thank you for your advice.
The only problem i see with the query is the where clause which executes a function on the 'variable' side. That effectively disables zonemaps and thus forces netezza to scan all data pages, not only those with data from that year.
Instead write something like:
select count(1)
from table
where datecolumn between '2016-01-01' and '2016-12-31'
A more generic alternative is to create a 'date dimension table' with one row per day in your tables (and a couple of years into the future)
This is an example for Postgres: https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac
This enables you to write code like this:
Select count(1)
From table t join d_date d on t.datecolumn=d.date_actual
Where year_actual=2016
You may not have the generate_series() function on your system, but a 'select row_number()...' can do the same trick. A download is available here: https://www.ibm.com/developerworks/community/wikis/basic/anonymous/api/wiki/76c5f285-8577-4848-b1f3-167b8225e847/page/44d502dd-5a70-4db8-b8ee-6bbffcb32f00/attachment/6cb02340-a342-42e6-8953-aa01cbb10275/media/generate_series.tgz
A couple of further notices in 'date interval' where clauses:
Those columns are the most likely candidate for a zonemaps optimization. Add a 'organize on (datecolumn)' at the bottom of your table DDL and organize your table. That will cause netezza to move around records to pages with similar dates, and the query times will be better.
Furthermore you should ensure that the 'distribute on' clause for the table results in an even distribution across data slices of the table is big. The execution of the query will never be faster than the slowest dataslice.
I hope this helps
I am using Oracle Xpress Edition. I want to know how to select only user created tables in Oracle DB.?
I am using this query:
select * from user_tables;
But it showing 24 rows. But i have only created 6 table.I don't know why & from where other tables (like APEX$_WS_FILES,DEPT, DEMO_USERS,APEX$_ACL,, APEX$_WS_HISTORY, etc) are showing.
How to avoid those useless table.?
These tables were presumably created during any Oracle APEX related installation. You can use the below steps to get rid of them.
SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = 'TABLE' AND OWNER = 'your_user' ORDER BY created;
As these tables have been installed via an application, they most
probably have been installed in a small and coherent time window. What
I mean here is that probably they have been installed within a time
frame of 30 mins, 1 hr or so. So if you order them by creation time,
they will all flock to consecutive rows in the output of above query.
Identify the time frame in which they have been started and
finished installing those tables. Write the above query once again to
filter that time frame out. You are then expected to get only your
tables.
I'm using Oracle SQL Developer version 4.02.15.21.
I need to write a query that accesses multiple databases. All that I'm trying to do is get a list of all the IDs present in "TableX" (There is an instance of Table1 in each of these databases, but with different values) in each database and union all of the results together into one big list.
My problem comes with accessing more than 4 databases -- I get this error: ORA-02020: too many database links in use. I cannot change the INIT.ORA file's open_links maximum limit.
So I've tried dynamically opening/closing these links:
SELECT Local.PUID FROM TableX Local
UNION ALL
----
SELECT Xdb1.PUID FROM TableX#db1 Xdb1;
ALTER SESSION CLOSE DATABASE LINK db1
UNION ALL
----
SELECT Xdb2.PUID FROM TableX#db2 Xdb2;
ALTER SESSION CLOSE DATABASE LINK db2
UNION ALL
----
SELECT Xdb3.PUID FROM TableX#db3 Xdb3;
ALTER SESSION CLOSE DATABASE LINK db3
UNION ALL
----
SELECT Xdb4.PUID FROM TableX#db4 Xdb4;
ALTER SESSION CLOSE DATABASE LINK db4
UNION ALL
----
SELECT Xdb5.PUID FROM TableX#db5 Xdb5;
ALTER SESSION CLOSE DATABASE LINK db5
However this produces 'ORA-02081: database link is not open.' On whichever db is being closed out last.
Can someone please suggest an alternative or adjustment to the above?
Please provide a small sample of your suggestion with syntactically correct SQL if possible.
If you can't change the open_links setting, you cannot have a single query that selects from all the databases you want to query.
If your requirement is to query a large number of databases via database links, it seems highly reasonable to change the open_links setting. If you have one set of people telling you that you need to do X (query data from a large number of tables) and another set of people telling you that you cannot do X, it almost always makes sense to have those two sets of people talk and figure out which imperative wins.
If we can solve the problem without writing a single query, then you have options. You can write a bit of PL/SQL, for example, that selects the data from each table in turn and does something with it. Depending on the number of database links involved, it may make sense to write a loop that generates a dynamic SQL statement for each database link, executes the SQL, and then closes the database link.
If you want need to provide a user with the ability to run a single query that returns all the data, you can write a pipelined table function that implements this sort of loop with dynamic SQL and then let the user query the pipelined table function. This isn't really a single query that fetches the data from all the tables. But it is as close as you're likely to get without modifying the open_links limit.
Is it possible to determine the average of concurrent connections on a 10g large database installation?
Any ideas??
This is probably more of a ServerFault question.
On a basic level, you could do this by regularly querying v$session to count the number of current sessions, store that number somewhere, and average it over time.
But there are already good utilities available to help with this. Look into STATSPACK. Then look at the scripts shown here to get you started.
Alternatively you could install a commercial monitoring application like Spotlight on Oracle.
If you have Oracle Enterprise Manager set up you can create a User Defined Metric which records SELECT COUNT(*) FROM V$SESSION. Select Related Links -> User Defined Metrics to set up a new User Defined Metric. Once it collects some data you can get the data out in raw form or it will do some basic graphing for you. As a bonus you can also set up alerting if you want to be e-mailed when the metric reaches a certain value.
The tricky bit is recording the connections. Oracle doesn't do this by default, so if you haven't got anything in place then you won't have a historical record.
The easiest way to start recording connections is with Oracle's built in audit functionality. It's as simple as
audit session
/
We can see the records of each connection in a view called dba_audit_session.
Now what? The following query uses a Common Table Expression to generate a range of datetime values which span 8th July 2009 in five minute chunks. The output of the CTE is joined to the audit view for that date; A count is calulated for each connection which spans a five minute increment.
with t as
( select to_date('08-JUL-2009') + ((level-1) * (300/86400)) as five_mins
from dual connect by level <= 288)
select to_char(t.five_mins, 'HH24:MI') as five_mins
, sum(case when t.five_mins between timestamp and logoff_time
then 1
else 0 end) as connections
from t
, dba_audit_session ssn
where trunc(ssn.timestamp) = to_date('08-JUL-2009')
group by to_char(t.five_mins, 'HH24:MI')
order by t.five_mins
/
You can then use this query as the input into a query which calculates the average number of connections.
This is a fairly crude implementation: I choose five minute increments out of display considerations , but obviously the finer grained the increment the more accurate the measure. Be warned: if you make the increments too fined grained and you have a lot of connections the resultant cross join will take a long time to run!