I have a query like this:
select PROMOTER_DSMID,
PROMOTER_NAME,
PROMOTER_MSISDN,
RETAILER_DSMID,
RETAILER_MSISDN,
RETAILER_NAME ,
ATTENDANCE_FLAG,
ATTENDANCE_DATE
from PROMO_ATTENDANCE_DETAILS
where PROMOTER_DSMID not in
(SELECT PROMOTER_DSMID
FROM PROMO_ATTENDANCE_DETAILS
WHERE PROMOTERS_ASM_DSMID='ASM123'
AND ATTENDANCE_FLAG='TRUE'
AND TRUNC(ATTENDANCE_DATE) ='16-07-17')
and PROMOTERS_ASM_DSMID='ASM123'
AND ATTENDANCE_FLAG='FALSE'
AND TRUNC(ATTENDANCE_DATE) ='16-07-17';
This query is taking too much time when I run this in PROD database because of large number of records.
I need to write a procedure for this but am not able to get the correct approach of how to write a procedure. Somebody please guide me.
"was thinking to write a proc in which inner select statement can put the data in some temporary table and then from that temporary table I can run the outer select statement"
No need for that. Use a WITH clause to select the data once and use it twice.
with cte as (
select PROMOTER_DSMID,
PROMOTER_NAME,
PROMOTER_MSISDN,
RETAILER_DSMID,
RETAILER_MSISDN,
RETAILER_NAME ,
ATTENDANCE_FLAG,
ATTENDANCE_DATE
from PROMO_ATTENDANCE_DETAILS
where PROMOTERS_ASM_DSMID='ASM123'
AND TRUNC(ATTENDANCE_DATE) ='16-07-17'
)
select *
from cte
where ATTENDANCE_FLAG='FALSE'
AND PROMOTER_DSMID not in
(SELECT PROMOTER_DSMID
FROM cte
where ATTENDANCE_FLAG='TRUE')
;
This will perform better than a temporary table, which involve a lot of disk I/O.
There are other possible performance improvements, depending on the usual tuning
considerations: data volume and skew, indexes, etc
Related
All the Oracles out here,
I have an Oracle PL/SQL procedure but very small data that can run on the query. I suspect that when the data gets large, the query might start performing back. Are there ways in which I can check for performance and take corrective measure even before the data build up? If I wait for the data buildup, it might get too late.
Do you have any general & practical suggestions for me? Searching the internet did not get me anything convincing.
Better to build yourself some test data to get an idea of how things will perform. Its easy to get started, eg
create table MY_TEST as select * from all_objects;
gives you approx 50,000 rows typically. You can scale that easily with
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10);
Now you have 500,000 rows
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
Now you have 500,000,000 rows!
If you want unique values per row, then add rownum eg
create table MY_TEST as select rownum r, a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
If you want (say) 100,000 distinct values in a column then TRUNC or MOD. You can also use DBMS_RANDOM to generate random numbers, strings etc.
Also check out Morten's test data generator
https://github.com/morten-egan/testdata_ninja
for some domain specific data, and also the Oracle sample schemas on github which can also be scaled using techniques above.
https://github.com/oracle/db-sample-schemas
I created a table in oracle like
CREATE TABLE suppliers AS (SELECT * FROM companies WHERE id > 1000);
I would like to know the complete select statement which was used to create this table.
I have already tried get_ddl but it is not giving the select statement. Can you please let me know how to get the select statement?
If you're lucky one of these statements will show the DDL used to generate the table:
select *
from gv$sql
where lower(sql_fulltext) like '%create table suppliers%';
select *
from dba_hist_sqltext
where lower(sql_text) like '%create table%';
I used the word lucky because GV$SQL will usually only have results for a few hours or days, until the data is purged from the shared pool. DBA_HIST_SQLTEXT will only help if you have AWR enabled, the statement was run in the last X days that AWR is configured to hold data (the default is 8), the statement was run after the last snapshot collection (by default it happens every hour), and the statement ran long enough for AWR to think it's worth saving.
And for each table Oracle does not always store the full SQL. For security reasons, DDL statements are often truncated in the data dictionary. Don't be surprised if the text suddenly cuts off after the first N characters.
And depending on how the SQL is called the case and space may be different. Use lower and lots of wildcards to increase the chance of finding the statement.
TRY THIS:
select distinct table_name
from
all_tab_columns where column_name in
(
select column_name from
all_tab_columns
where table_name ='SUPPLIERS'
)
you can find table which created from table
Some tables' data need to be updated(or deleted , inserted) in my system.
But I want to know which data are updated,deleted and inserted.
So before the data are changed ,I will backup the table in different schema
just like this:
create table backup_table as select * from schema1.testtable
and after the data are changed,I want to find the difference between backup_table
and testtable ,and I want to save the difference into a table in the backup schema.
the sql I will run is like this:
CREATE TABLE TEST_COMPARE_RESULT
AS
SELECT 'BEFORE' AS STATUS, T1.*
FROM (
SELECT * FROM backup_table
MINUS
SELECT * FROM schema1.testtable
) T1
UNION ALL
SELECT 'AFTER' AS STATUS, T2.*
FROM (
SELECT * FROM schema1.testtable
MINUS
SELECT * FROM backup_table
) T2
What I am worried about is that I heared about that the minus operation will use
a lot of system resource.In my sysytem, some table size will be over 700M .So I want to
know how oracle will read the 700M data in memory (PGA??) or the temporary tablespace?
and How I should make sure that the resource are enough to to the compare operation?
Minus is indeed a resource intensive task. It need to read both tables and do sorts to compare the two tables. However, Oracle has advanced techniques to do this. It won't load the both tables in memory(SGA) if can't do it. It will use, yes, temporary space for sorts. But I would recommend you to have a try. Just run the query and see what happens. The database won't suffer and allways you can stop the execution of statement.
What you can do to improve the performance of the query:
First, if you have columns that you are sure that won't changed, don't include them.
So, is better to write:
select a, b from t1
minus
select a, b from t2
than using a select * from t, if there are more than these two columns, because the work is lesser.
Second, if the amount of data to compare si really big for your system(too small temp space), you should try to compare them on chunks:
select a, b from t1 where col between val1 and val2
minus
select a, b from t2 where col between val1 and val2
Sure, another possibility than minus is to have some log columns, let's say updated_date. selecting with where updated_date greater than start of process will show you updated records. But this depends on how you can alter the database model and etl code.
I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows;
select *
from TableA
where TA_timestamp > (select timestmp from TableB where id="hourDim")
Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?!
To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the corresponding hour. So when the query runs for the second time (and consecutive runs) the query should only run on the new data (i.e. - timestamp > previous_execution_timestamp). I am trying to do that by storing the latest executed timestamp in a separate hive table, and then use that value to filter out the raw stats.
Can this be achieved this using hive ?!
Subqueries inside a WHERE clause are not supported in Hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
However, often you can use a JOIN statement instead to get to the same result:
https://karmasphere.com/hive-queries-on-table-data#join_syntax
For example, this query:
SELECT a.KEY, a.value
FROM a
WHERE a.KEY IN
(SELECT b.KEY FROM B);
can be rewritten to:
SELECT a.KEY, a.val
FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)
Looking at the business requirements underlying your question, it occurs that you might get more efficient results by partitioning your Hive table using hour. If the data can be written to use this factor as the partition key, then your query to update the summary will be much faster and require fewer resources.
Partitions can get out of hand when they reach the scale of millions, but this seems like a case that will not tease that limitation.
It will work if you put in :
select *
from TableA
where TA_timestamp in (select timestmp from TableB where id="hourDim")
EXPLANATION : As > , < , = need one exact figure in the right side, while here we are getting multiple values which can be taken only with 'IN' clause.
Why inline views are used..??
There are many different reasons for using inline views. Some things can't be done without inline views, for example:
1) Filtering on the results of an analytic function:
select ename from
( select ename, rank() over (order by sal desc) rnk
from emp
)
where rnk < 4;
2) Using ROWNUM on ordered results:
select ename, ROWNUM from
( select ename
from emp
order by ename
);
Other times they just make it easier to write the SQL you want to write.
The inline view is a construct in Oracle SQL where you can place a query in the SQL FROM, clause, just as if the query was a table name.
Inline views provide
Bind variables can be introduced inside the statement to limit the data
Better control over the tuning
Visibility into the code
To get top N ordered rows.
SELECT name, salary,
FROM (SELECT name, salary
FROM emp
ORDER BY salary DESC)
WHERE rownum <= 10;
An inline view can be regarded as an intermediate result set that contributes to the required data set in some way. Sometimes it is entirely a matter of improving maintainability of the code, and sometimes it is logically neccessary.
From the Oracle Database Concepts document there are the inline view concept definition:
An inline view is not a schema object.
It is a subquery with an alias
(correlation name) that you can use
like a view within a SQL statement.
About the subqueries look in Using Subqueries from the Oracle SQL Reference manual. It have a very nice pedagogic information.
Anyway, today is preferred to use the Subquery Factoring Clause that is a more powerfull way of use inline views.
As an example of all together:
WITH
dept_costs AS (
SELECT department_name, SUM(salary) dept_total
FROM employees e, departments d
WHERE e.department_id = d.department_id
GROUP BY department_name),
avg_cost AS
SELECT * FROM dept_costs
WHERE dept_total >
(SELECT avg FROM (SELECT SUM(dept_total)/COUNT(*) avg
FROM dept_costs)
)
ORDER BY department_name;
There you can see one of all:
An inline view query: SELECT SUM...
A correlated subquery: SELECT avg FROM...
A subquery factoring: dept_costs AS (...
What are they used for?:
To avoid creating an intermediate view object: CREATE VIEW ...
To simplify some queries that a view cannot be helpfull. For instance, when the view need to filter from the main query.
You will often use inline views to break your query up into logical parts which helps both readability and makes writing more complex queries a bit easier.
Jva and Tony Andrews provided some good examples of simple cases where this is useful such as Top-N or Pagination queries where you may want to perform a query and order its results before using that as a part of a larger query which in turn might feed a query doing some other processing, where the logic for these individual queries would be difficult to achieve in a single query.
Another case they can be very useful is if you are writing a query that joins various tables together and want to perform aggregation on some of the tables, separating group functions and the processing into different inline views before performing the joins makes managing cardinality a lot easier. If you want some examples, I would be happy to provide them to make it more clear.
Factored subqueries (where you list your queries in the WITH clause at the start of the query) and inline views also often bring performance benefits. If you need to access the results of the subquery multiple times, you only need to run it once and it can be materialized as a Global Temporary Table (how the optimizer acts isn't totally black and white so I won't go into it here but you can do your own research - for example, see http://jonathanlewis.wordpress.com/2007/07/26/subquery-factoring-2/)