The view has several join, but no WHEREcloses. It helped our developers to have all the data the needed in one appian object, that could be easily used in the "low code" later on. In most cases, Appian add conditions to query the data on the view, in a subsequent WHERE clause like below:
query: [Report on Record Type], order by: [[Sort[histoDateAction desc], Sort[id asc]]],
filters:[((histoDateAction >= TypedValue[it=9,v=2022-10-08 22:00:00.0])
AND (histoDateAction < TypedValue[it=9,v=2022-10-12 22:00:00.0])
AND (histoUtilisateur = TypedValue[it=3,v=miwem6]))
]) (APNX-1-4198-000) (APNX-1-4205-031)
Now we start to have data in the database, and performances get low. Reason seems to be, from execution plan view, query do not use indexes when data is queried.
Here is how the query for view VIEW_A looks:
SELECT
<columns> (not much transformation here)
FROM A
LEFT JOIN R on R.id=A.id_type1
LEFT JOIN R on R.id=A.id_type2
LEFT JOIN R on R.id=A.id_type3
LEFT JOIN U on U.id=A.id_user <500>
LEFT JOIN C on D.id=A.id_customer <50000>
LEFT JOIN P on P.id=A.id_prestati <100000>
and in the current, Appian added below clauses:
where A.DATE_ACTION < to_date('2022-10-12 22:00:00', 'YYYY-MM-DD HH24:MI:SS')
and A.DATE_ACTION >= to_date('2022-10-08 22:00:00', 'YYYY-MM-DD HH24:MI:SS')
and A.USER_ACTION = 'miwem6'
typically, when I show explain plan for the VIEW_A WHERE <conditions> , I have a cost around 6'000, and when I show explain plan for the <code of the view> where <clause>, the cost is 30.
Is it possible to use some Oracle hint to tell it: "Some day, someone will query this adding a WHERE clause on some columns, so don't be a stupid engine and use indexes when time comes"?
First, this isn't a great architecture. I can't tell you how many times folks have pulled me in to diagnose performance problems due to unpredictably dynamic queries where they are adding an assortment of unforseable WHERE predicates.
But if you have to do this, you can increase your likelihood of using indexes by lowering their cost. Like this:
SELECT /*+ opt_param('optimizer_index_cost_adj',1) */
<columns> (not much transformation here)
FROM A . . .
If you know for sure that nested loops + index use is the way you want to access everything, you can even disable the CBO entirely:
SELECT /*+ rule */
<columns> (not much transformation here)
FROM A . . .
But of course it's on you to ensure that there's an index on every high cardinality column that your system may use to significantly filter desired rows by. That's not every column, but it sounds like it may be quite a few.
Oh, and one more thing... please ignore COST. By definition Oracle always chooses what it computes as the lowest cost plan. When it makes wrong choices, it's because it's computation of cost is incorrect. Therefore by definition, if you are having problems, the COST numbers you see are wrong. Ignore them.
Related
I need someone who can explain me about "group function is not allowed here" because I don't understand it and I would like to understand it.
I have to get the product name and the unit price of the products that have a price above the average
I initially tried to use this, but oracle quickly told me that it was wrong.
SELECT productname,unitprice
FROM products
WHERE unitprice>(AVG(unitprice));
search for information and found that I could get it this way:
SELECT productname,unitprice FROM products
WHERE unitprice > (SELECT AVG(unitprice) FROM products);
What I want to know is why do you put two select?
What does group function is not allowed here mean?
More than once I have encountered this error and I would like to be able to understand what to do when it appears
Thank you very much for your time
The phrase "group function not allowed here" is referring to anything that is in some way an "aggregation" of data, eg SUM, MIN, MAX, etc et. These functions must operate on a set of rows, and to operate on a set of rows you need to do a SELECT statement. (I'm leaving out UPDATE/DELETE here)
If this was not the case, you would end up with ambiguities, for example, lets say we allowed this:
select *
from products
where region = 'USA'
and avg(price) > 10
Does this mean you want the average prices across all products, or just the average price for those products in the USA? The syntax is no longer deterministic.
Here's another option:
SELECT *
FROM (
SELECT productname,unitprice,AVG(unitprice) OVER (PARTITION BY 1) avg_price
FROM products)
WHERE unitprice > avg_price
The reason your original SQL doesn't work is because you didn't tell Oracle how to compute the average. What table should it find it in? What rows should it include? What, if any, grouping do you wish to apply? None of that is communicated with "WHERE unitprice>(AVG(unitprice))".
Now, as a human, I can make a pretty educated guess that you intend the averaging to happen over the same set of rows you select from the main query, with the same granularity (no grouping). We can accomplish that either by using a sub-query to make a second pass on the table, as your second SQL did, or the newer windowing capabilities of aggregate functions to internally make a second pass on your query block results, as I did in my answer. Using the OVER clause, you can tell Oracle exactly what rows to include (ROWS BETWEEN ...) and how to group it (PARTITION BY...).
I am trying to optimize a query where I am using a function() call in the where clause.
The function() simply changes the timezone of the date.
When I call the function as part of the SELECT, it executes extremely fast (< 0.09 sec against table of many hundreds of thousands of rows)
select
id,
fn_change_timezone (date_time, 'UTC', 'US/Central') AS tz_date_time,
value
from a_table_view
where id = 'keyvalue'
and date_time = to_date('01-10-2014','mm-dd-yyyy')
However, this version runs "forever" [meaning I stop it after umpteen minutes]
select id, date_time, value
from a_table_view
where id = 'keyvalue'
and fn_change_timezone (date_time, 'UTC', 'US/Central') = to_date('01-10-2014','mm-dd-yyyy')
(I know I'd have to change the date being compared, its just for example)
So my question is two-fold:
If the function is so fast outside of the where clause, why is it so much slower than say using TRUNC() or other functions (obviously trunc() doesnt do a table lookup like my function - but still the function is very very fast outside the where clause)
What are alternate ways of accomplishing this outside of the where clause ?
I tried this as an alternative, which did not seem any better, it still ran until I stopped the query:
select
tz.date_time,
v.id,
v.value
from
(select
fn_change_timezone(to_date('01/10/2014-00:00:00', 'mm/dd/yyyy-hh24:mi:ss'), 'UTC', 'US/Central') as date_time
from dual
) tz
inner join
(
select
id,
fn_change_timezone (date_time, 'UTC', 'US/Central') AS v_date_time,
value
from a_table_view
where id = 'keyvalue'
) v ON
v.tz_date_time = tz.date_time
Hopefully I am explaining the issue well.
There are at least four potential issues with using functions in the WHERE clause:
Functions may prevent indexes. A function-based index can solve this issue.
Functions may prevent partition pruning. Hard-coding values or maybe virtual column partitioning are possible solutions, although neither is likely helpful in this case.
Functions may run slowly. Even if the function is cheap, it is often very expensive to switch between SQL and PL/SQL. Some possible solutions are DETERMINISTIC, PARALLEL_ENABLE, function result caching, defining the logic in purely SQL, or with 12c defining the function in SQL.
Functions may cause bad cardinality estimates. It's hard enough for the optimizer to guess the result of normal conditions, adding procedural code makes it even more difficult. Using ASSOCIATE STATISTICS it is possible to provide some information to the optimizer about the cost and cardinality of the function.
Without more information, such as an explain plan, it is difficult to know what the specific issue is with this query.
Function calls in the WHERE clause are a Bad Thing. The problem is that the function may be called for every row in the table, which may be many more than the selected set. This can be a real performance killer (don't ask me how I know :-). In the first version with the function call in the SELECT list the function will only be called when a row has been chosen and is being added to the result set - in the second version the function may well be called for every row in the table. Also, depending on the version of Oracle you're using there may be significant overhead to calling a user function from SQL, but I think this penalty has been largely eliminated in versions since 10g.
Best of luck.
Share and enjoy.
I have two tables need to inner join, one table has relatively small number of records compared to the other one. I need to apply some string manipulation to the smaller table, and my question is can I apply the string function after the join, or should I apply them in a sub query and then join the sub select to the bigger table?
An example would be something like this:
Option 1:
SELECT SUBSTR("SMALL_TABLE"."COL_NAME",x,y) "NEW_COL" FROM "BIG_TABLE"
JOIN "SMALL_TABLE" ON ...
Option 2:
SELECT "NEW_COL"
FROM "BIG_TABLE"
JOIN
(
SELECT SUBSTR("SMALL_TABLE"."COL_NAME",x,y) "NEW_COL" FROM "SMALL_TABLE"
) "T"
ON ...
Which is better for performance option 1 or 2?
I am using oracle 11g.
Regardless of how you structure the query, Oracle's optimizer is free to evaluate the function before or after the join. Assuming that the string manipulation is only done as part of the projection step (i.e. it is done only in the SELECT clause and is not used as a predicate in the WHERE clause), I would expect that Oracle would apply the SUBSTR before joining the tables if you used either formulation because it would then have to apply the function to fewer rows (though it can probably treat the SUBSTR as a deterministic call and cache the results if it applies the function after the join).
As with any query optimization question, the first step is always to generate a query plan and see if the different queries actually produce different plans. I would expect the plans to be identical and, thus, the performance to be identical. But there are any number of reasons that one of the two options might produce different plans on your system given your optimizer statistics, initialization parameters, etc.
It is better to apply the operations before doing the join and then joining and querying for the final result. This is called query optimization.
By doing so for ur question you will perform lesser operations when "join"ing as u will be eliminating the useless rows beforehand.
Lots of examples here : http://beginner-sql-tutorial.com/sql-query-tuning.htm
and this is the best one I could find : http://www.cse.iitb.ac.in/~sudarsha/db-book/slide-dir/ch14.ppt
I am trying to display volunteer information with duty and what performance is allocated.
I want to display this information. However, when I run the query, it did not gather the different date from same performance. And also availability_date is mixed up. Is it right query for it? I am not sure it is right query.
Could you give me some feedback for me?
Thanks.
Query is here.
SELECT Production.name, performance.performance_date, volunteer_duty.availability_date, customer.name "Customer volunteer", volunteer.volunteerid, membership.name "Member volunteer", membership.membershipid
FROM Customer, Membership, Volunteer, volunteer_duty, duty, performance_duty, performance, production
WHERE
Customer.customerId (+) = Membership.customerId AND
Membership.membershipId = Volunteer.membershipId AND
volunteer.volunteerid = volunteer_duty.volunteerid AND
duty.dutyid = volunteer_duty.dutyid AND
volunteer_duty.dutyId = performance_duty.dutyId AND
volunteer_duty.volunteerId = performance_duty.volunteerId AND
performance_duty.performanceId = performance.performanceId AND
Performance.productionId = production.productionId
--Added image--
Result:
The query seems reasonable, in terms of it having what appear to be the appropriate join conditions between all the tables. It's not clear to me what issue you are having with the results; it might help if you explained in more detail and/or showed a relevant subset of the data.
However, since you say there is some issue related to availability_date, my first thought is that you want to have some condition on that column, to ensure that a volunteer is available for a given duty on the date of a given performance. This might mean simply adding volunteer_duty.availability_date = performance.performance_date to the query conditions.
My more general recommendation is to start writing the query from scratch, adding one table at a time, and using ANSI join syntax. This will make it clearer which conditions are related to which joins, and if you add one table at a time hopefully you will see the point at which the results are going wrong.
For instance, I'd probably start with this:
SELECT production.name, performance.performance_date
FROM production
JOIN performance ON production.productionid = performance.productionid
If that gives results that make sense, then I would go on to add a join to performance_duty and run that query. Et cetera.
I suggest that you explicitly write JOINS, instead of using the WHERE-Syntax.
Using INNER JOINs the query you are describing, could look like:
SELECT *
FROM volunteer v
INNER JOIN volunteer_duty vd ON(v.volunteerId = vd.colunteerId)
INNER JOIN performance_duty pd ON(vd.dutyId = pd.dutyId AND vd.volunteerId = pd.colunteerId)
INNER JOIN performance p ON (pd.performanceId = p.performanceId)
I have heard that I should avoid using 'order by rand()', but I really need to use it. Unlike what I have been hearing, the following query comes up very fast.
select
cp1.img_id as left_id,
cp1.img_filename as left_filename,
cp1.facebook_name as left_facebook_name,
cp2.img_id as right_id,
cp2.img_filename as right_filename,
cp2.facebook_name as right_facebook_name
from
challenge_photos as cp1
cross join
challenge_photos as cp2
where
(cp1.img_id < cp2.img_id)
and
(cp1.img_id,cp2.img_id) not in ((0,0))
and
(cp1.img_status = 1 and cp2.img_status = 1)
order by rand() limit 1
is this query considered 'okay'? or should I use queries that I can find by searching "alternative to rand()" ?
It's usually a performance thing. You should avoid, as much as possible, per-row functions since they slow down your queries.
That means things like uppercase(name), salary * 1.1 and so on. It also includes rand(). It may not be an immediate problem (at 10,000 rows) but, if you ever want your database to scale, you should keep it in mind.
The two main issues are the fact that you're performing a per-row function and then having to do a full sort on the output before selecting the first row. The DBMS cannot use an index if you sort on a random value.
But, if you need to do it (and I'm not making judgement calls there), then you need to do it. Pragmatism often overcomes dogmatism in the real world :-)
A possibility, if performance ever becomes an issue, is to get a count of the records with something like:
select count(*) from ...
then choose a random value on the client side and use a:
limit <start>, <count>
clause in another select, adjusting for the syntax used by your particular DBMS. This should remove the sorting issue and the transmission of unneeded data across the wire.