Oracle/PLSQL. Select only best match between two tables - oracle

I have two tables:
Vehicles
make model modification
Audi A 5 A 5 2010 Sportsback 2.8
Audi A 5 A 5 2012 Quattro L
Audi A 5 A 5 Cabriolet
and
matchingModel
make model modContain modEnd finalModel
Audi A 5 Sportback A5 Sportback
Audi A 5 L A5 L
Audi A 5 A5
My task is to get only best fitting finalModel by finding matches (can be seen in select below).
First i tried to join tables
(SELECT
matchingModel.finalModel
FROM vehicles
LEFT OUTER JOIN matchingModel ON
matchingModel.TEXT1 = vehicles.make
AND vehicles.model = nvl(matchingModel.model,vehicles.model)
AND vehicles.modification LIKE decode(matchingModel.modContain, NULL, vehicles.modification, '%'||matchingModel.modContain||'%')
AND vehicles.modification LIKE decode(matchingModel.modEnd, NULL, vehicles.modification, '%'||' '||matchingModel.modEnd)
)
AS bestMatch
but that did not work, because as Sportsback was found as sportsback, later its overwritten as a simple A5 because that matches too.
So next i made this happen simply by "nvling" all possible options: nvl(nvl(nvl(select where make, model fits and modContains is in the middle of Modification and option cell is empty), (select where make, model fits and modEnd is like ending of Modification and modEnd is not empty), (select where make and model fits AND so on)) AS Bestmatch
This works, but it is very slow (and both tables have more that 500k records).
This is just a part of very huge select, so its difficult to rewrite this normal way.
Anyway, the question is, are there any best practices how to get best match, only once, fast, in oracle? The problems i have run into, is performance, or values fits twice, or "where" clause does not work, because i can not know if modContain or modEnd is empty or not.
Thank You in advance.
Sorry for English.

It is not quite there yet but I worked out an example you can continue to work out for yourself: SQL Fiddle Demo
select * from (
(select
case when v.modification like '%'||m.modContain||'%' then 2
when m.modcontain is null then 1
else 0 end m1,
case when v.modification like '%'||m.modend then 2
when m.modend is null then 1
else 0 end m2
, m.make mmake, m.model mmodel, modcontain, modend, finalmodel
, v.make vmake, v.model vmodel, modification
from vehicles v, matchingmodel m
where
v.make = m.make
and soundex(v.model) = soundex(m.model) ) ) x
order by m1+m2 desc
So the sub-query adds together the matches and the highest match should be your best match. I also used soundex which may also help you because Sportback and Sportsback is not quite the same and that helped me to make A5 and A 5 make the same. Also to make it fast you will have to work a lot with assigning good indicies and watching the explain plan, especially if you have 500k records. That is not an easier undertaking.
To the idea about writing a procedure (which is a good idea) untested it might look like this:
create or replace function vehicle_matching(i_vehicles vehicles%rowtype,
i_matchingmodel matchingmodel%rowtype)
return number
is
l_return number;
begin
if i_vehicles.modification like '%'||i_matchingmodel.modContain||'%' then
l_return := 3;
elsif soundex(i_vehicles.modification) like '%'||soundex(i_matchingmodel.modContain)||'%' then
l_return := 2;
...
if i_vehicles.modification like '%'||i_matchingmodel.modend then
l_return := l_return + 1; -- there is no i++ in PL/SQL
elsif
...
return l_return;
end vehicle_matching;
Also I was thinking if it is more efficient to work with INSTR and SUBSTR than with the % but I actually do not really think this is the case.

you may consider something like this:
write a query to return 1 on any partial match
then write another query to return another 1 on another partial match - etc.
repeat this for all possible columns that count towards your 'similarity'
in the end, you will find the row with the highest sum (or count) of 1's and that will be the closest match.

Related

Select from a loop in Oracle

In oracle 11g, I want to execute a query like that :
In this case, I didn't allowed use Function or Procedure.
I tried to Google it, but i couldn't find a good solution. Almost show me the way to use Function or Store Procedure.
Table X with columns (A,B,C)
With a row in table X i want to select :
Count = B - A;
for(i=0;i<Count;i++)
{
C++;
D = C * A;
}
Expect result : table Y with columns (A,B,C,D)
You are thinking like a 3GL developer. Java (or whatever) only has arrays, so everything is an iteration. But SQL is a set-oriented language: we don't need loops to work on sets of data. Oracle SQL has built-in aggregation functions which allow us to compute values from sets of records.
For instance, this query calculates total remuneration (salary plus commission), number of employees and average salary:
select sum(sal + nvl(comm,0)) as total_renum
, count(*) as total_emps
, avg(sal) as average_salary
from emp
/
Oracle has a comprehensive range of such functions, some of them are really powerful. Find out more. Be sure to check out analytic functions too.
Hmmm, so you subsequently posted a cryptic snippet of code. It's still not clear exactly what you want, but this might produce the outcome for your tab;e Y:
select a
, b
, c
, 0 + ((c+level) * a) as d
from x
connect by level <= (b-a)
/
For each row in table X it will generate (b-a) rows, with a derived value of d. I have assumed a start of 0 for d.

Trying to figure out top 5 land areas of the 50 states in the U.S

I have a table created. With one column named states and another column called land area. I am using oracle 11g. I have looked at various questions on here and cannot find a solution. Here is what I have tried so far:
SELECT LandAreas, State
FROM ( SELECT LandAreas, State, DENSE_RANK() OVER (ORDER BY State DESC) sal_dense_rank
FROM Map )
WHERE sal_dense_rank >= 5;
This does not provide the top 5 land areas as far as number wise.
I have also tried this one but no go either:
SELECT * FROM Map order by State desc)
where rownum < 5;
Anyone have any suggestions to get me on the right track??
Here is a samle of the table
states land areas
michagan 15000
florida 25000
tennessee 10000
alabama 80000
new york 150000
california 20000
oregon 5000
texas 6000
utah 3000
nebraska 1000
Desired output from query:
States land area
new york 150000
alabama 80000
florida 25000
california 20000
Try:
Select * from
(SELECT State, LandAreas FROM Map ORDER BY LandAreas DESC)
where rownum < 6
Link to Fiddle
Use a HAVING clause and count the number state states larger:
SELECT m.state, m.landArea
FROM Map m
LEFT JOIN Map m2 on m2.landArea > m.landArea
GROUP BY m.state, m.landArea
HAVING count(*) < 5
ORDER BY m.landArea DESC
See SQLFiddle
This joins each state to every state whose area is greater, then uses a HAVING clause to return only those states where the number of larger states was less than 5.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
The left join is needed for the case of the largest state, which has no other larger state to join to.
The ORDER BY is optional.
Try something like this
select m.states,m.landarea
from map m
where (select count(‘x’) from map m2 where m2.landarea > m.landarea)<=5
order by m.landarea
There are two bloomers in your posted code.
You need to use landarea in the DENSE_RANK() call. At the moment you're ordering the states in reverse alphabetical order.
Your filter in the outer query is the wrong way around: you're excluding the top four results.
Here is what you need ...
SELECT LandArea, State
FROM ( SELECT LandArea
, State
, DENSE_RANK() OVER (ORDER BY landarea DESC) as area_dr
FROM Maps )
WHERE area_dr <= 5
order by area_dr;
... and here is the SQL Fiddle to prove it. (I'm going with the statement in the question that you want the top 5 biggest states and ignoring the fact that your desired result set has only four rows. But adjust the outer filter as you will).
There are three different functions for deriving top-N result sets: DENSE_RANK, RANK and ROW_NUMBER.
Using ROW_NUMBER will always guarantee you 5 rows in the result set, but you may get the wrong result if there are several states with the same land area (unlikely in this case, but other data sets will produce such clashes). So: 1,2,3,4,5
The difference between RANK and DENSE_RANK is how they handle ties. DENSE_RANK always produces a series of consecutive numbers, regardless of how many rows there are in each rank. So: 1,2,2,3,3,3,4,5
RANK on the other hand will produce a sparse series if a given rank has more than one hit. So: 1,2,2,4,4,4.
Note that each of the example result sets has a different number of rows. Which one is correct? It depends on the precise question you want to ask.
Using a sorted sub-query with the ROWNUM pseudo-column will work like the ROW_NUMBER function, but I prefer using ROW_NUMBER because it is more powerful and more error-proof.

Selecting data from one table or another in multiple queries PL/SQL

The easiest way to ask my question is with a Hypothetical Scenario.
Lets say we have 3 tables. Singapore_Prices, Produce_val, and Bosses_unreasonable_demands.
So Prices is a pretty simple table. Item column containing a name, and a Price column containing a number.
Produce_Val is also simple 2 column table. Type column containing what type the produce is (Fruit or veggie) and then Name column (Tomato, pineapple, etc.)
The Bosses_unreasonable_demands only contains one column, Fruit, which CAN contain the names of some fruits.
OK? Ok.
SO, My boss wants me to write a query that returns the prices for every fruit in his unreasonable demands table. Simple enough. BUT, if he doesn't have any entries in his table, he just wants me to output the prices of ALL fruits that exist in produce_val.
Now, assuming I don't know where the DBA who designed this silly hypothetical system lives (and therefore can't get him to fix this), our query would look like this:
if <Logic to determine if Bosses demands are empty>
Then
select Item, Price
from Singapore_Prices
where Item in (select Fruit from Bosses_Unreasonable_demands)
Else
select Item, Price
from Singapore_Prices
where Item in (select Name from Produce_val where type = 'Fruit')
end if;
(Well, we'd select those into a variable, and then output the variable, probably with bulk-collect shenanigans, but that's not important)
Which works. It is entirely functional, and won't be slow, even if we extend it out to 2000 other stores other than Singapore. (Well, no slower than anything else that touches 2000 some tables) BUT, I'm still doing two different select statements that are practically identical. My Comp Sci teacher rolls in their grave every time my fingers hit ctrl-V. I can cut this code in half and only do one select statement. I KNOW I can.
I just have no earthly idea how. I can't use cursors as an in statement, I can't use nested tables or varrays, I can't use cleverly crafted strings, I... I just... I don't know. I don't know how to do this. Is there a way? Does it exist?
Or do I have to copy/paste forever?
Your best bet would be dynamic SQL, because you can't parameterize table or column names.
You will have a SQL query template, have a logic to determine tables and columns that you want to query, then blend them together and execute.
Another aproach, (still a lot of ctrl-v like code) is to use set construction UNION ALL:
select 1st query where boss_condition
union all
select 2nd query where not boss_condition
Try this:
SELECT *
FROM (SELECT s.*, 'BOSS' AS FRUIT_SOURCE
FROM BOSSES_UNREASONABLE_DEMANDS b
INNER JOIN SINGAPORE_FRUIT_LIST s
ON s.ITEM = b.FRUIT
CROSS JOIN (SELECT COUNT(*) AS BOSS_COUNT
FROM BOSSES_UNREASONABLE_DEMANDS)) x
UNION ALL
(SELECT s.*, 'NORMAL' AS FRUIT_SOURCE
FROM PRODUCE_VAL p
INNER JOIN SINGAPORE_FRUIT_LIST s
ON (s.ITEM = p.NAME AND
s.TYPE = 'Fruit')
CROSS JOIN (SELECT COUNT(*) AS BOSS_COUNT
FROM BOSSES_UNREASONABLE_DEMANDS)) n
WHERE (BOSS_COUNT > 0 AND FRUIT_SOURCE = 'BOSS') OR
(BOSS_COUNT = 0 AND FRUIT_SOURCE = 'NORMAL')
Share and enjoy.
I think you can use nested tables. Assume you have a schema-level nested table type FRUIT_NAME_LIST (defined using CREATE TYPE).
SELECT fruit
BULK COLLECT INTO my_fruit_name_list
FROM bosses_unreasonable_demands
;
IF my_fruit_name_list.count = 0 THEN
SELECT name
BULK COLLECT INTO my_fruit_name_list
FROM produce_val
WHERE type='Fruit'
;
END IF;
SELECT item, price
FROM singapore_prices
WHERE item MEMBER OF my_fruit_name_list
;
(or, WHERE item IN (SELECT column_value FROM TABLE(CAST(my_fruit_name_list AS fruit_name_list)) if you like that better)

How can I limit the numbers of results being grouped in my Group By in Oracle?

I've got a table of a parameters, values, and times at which those values were recorded.
I've got a procedure which takes in a time, and needs to get the average result of each parameters value in the window of time that is -15/+5 seconds around that time frame. On top of that, I want to make sure that I take the no more than 15 records before the passed in time, and no more than 5 records after it.
For example, maybe I'm recording values of some parameters every second. If I passed in the time 21:30:30, I'd want to get the values between 21:30:15 and 21:30:35. But if I was recording every half second, I'd actually have more parameters that fit in that time frame than I want, and that's where my need to limit my results comes in.
I've read this question and this article which seem pretty related to what I'm trying to do, but unfortunately I'm dealing with Oracle and not MySQL, so I can't use "limit".
I've currently got something that looks like this:
std_values as
(
select
V.ParameterId,
V.NumericValue,
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime + 5/86400
)
select
ParameterId,
avg(NumericValue) as NumericValue
from
std_values
group by
ParameterId
pValueSource is just something that lets me filter down which value types I'm looking at, and pSummaryTime is the input time that I'm basing my time frame around. The goal here is to get the 15 records before pSummaryTime that falls within that window, and the 5 after that falls within that window, and use those for the average. Currently I'm not limiting the number of "before" and "after" results though, so I'm ending up with the average of everything that falls into that time window. And without something like "limit", I'm not sure how to do this in Oracle.
Sounds like you want a moving window aggregate function. This is part of the Analytical functions feature of Oracle.
It's not my strong suit, and since you didn't include sample tables/data to build a test case, I'll just point you to the Oracle documentation, here:
http://docs.oracle.com/cd/B14117_01/server.101/b10736/analysis.htm#i1006709
You probably want something like:
AVG(NumericValue) over (order by pSummaryTime RANGE BETWEEN 15 PRECEDING AND 5 FOLLOWING)
but, like I said, not my strong suit, and totally untested, but, I hope it gets the idea across.
Hope that helps.
Thanks to Mark Bobak's answer getting me on the right track, I ended up with this solution.
with
values_before as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime
),
values_after as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time <= pSummaryTime + 5/86400
and V.Time > pSummaryTime
),
values_all as
(
select * from values_before where RowNumber <= 15
union all
select * from values_after where RowNumber <= 5
)
select ParameterId, avg(NumericValue) from values_all group by ParameterId
No doubt there's a better way to do this, but it at least seems to be giving the correct result. The key was using an analytical function to set the row number and order for the 15 before and 5 after, and then filtering my results down to just those.

How to write the following pl/sql block without using Cursor?

I had written a cursor in a pl/sql block. This block taking lot of time if it has more records.
How to write this without a cursor or Is there any other alternative way that will reduce the time?
Is there any alternative query to perform insert into one table and delete from another table using a single query?
DECLARE
MDLCursor SYS_REFCURSOR;
BEGIN
open MDLCursor for
select dc.dest_id, dc.digits, dc.Effectivedate, dc.expirydate
from DialCodes dc
INNER JOIN MDL d
ON dc.Dest_ID = d.Dest_ID
AND d.PriceEntity = 1
join sysmdl_calltypes s
on s.call_type_id = v_CallType_ID
and s.dest_id = dc.Dest_ID
and s.call_type_id not in
(select calltype_id from ignore_calltype_for_routing)
order by length(dc.digits) desc, dc.digits desc;
loop
fetch MDLCursor
into v_mdldest_id, v_mdldigits, v_mdlEffectiveDate, v_mdlExpDate;
insert into tt_pendingcost_temp
(Dest_ID,
Digits,
CCASDigits,
Destination,
tariff_id,
NewCost,
Effectivedate,
ExpiryDate,
previous,
Currency)
select v_mdldest_id,
Digits,
v_mdldigits,
Destination,
tariff_id,
NewCost,
Effectivedate,
ExpiryDate,
previous,
Currency
FROM tt_PendingCost
where substr(Digits, 1, 2) = substr(v_MDLDigits, 1, 2)
and instr(Digits, v_MDLDigits) = 1
and v_mdlEffectiveDate <= effectivedate
and (v_mdlExpDate > effectivedate or v_mdlExpDate is null);
if SQL%ROWCOUNT > 0 then
delete FROM tt_PendingCost
where substr(Digits, 1, 2) = substr(v_MDLDigits, 1, 2)
and instr(Digits, v_MDLDigits) = 1
and v_mdlEffectiveDate <= effectivedate
and (v_mdlExpDate > effectivedate or v_mdlExpDate is null);
end if;
exit when MDLCursor%NOTFOUND;
end loop;
close MDLCursor;
END;
I don't have your tables and your data so I can only guess at a couple of things that would be slowing you down.
Firstly, the query used in your cursor has an ORDER BY clause in it. If this query returns a lot of rows, Oracle has to fetch them all and sort them all before it can return the first row. If this query typically returns a lot of results, and you don't particularly need it to return sorted results, you may find your PL/SQL block speeds up a bit if you drop the ORDER BY. That way, you can start getting results out of the cursor without needing to fetch all the results, store them somewhere and sort them first.
Secondly, the following is the WHERE clause used in your INSERT INTO ... SELECT ... and DELETE FROM ... statements:
where substr(Digits, 1, 2) = substr(v_MDLDigits, 1, 2)
and instr(Digits, v_MDLDigits) = 1
and v_mdlEffectiveDate <= effectivedate
and (v_mdlExpDate > effectivedate or v_mdlExpDate is null);
I don't see how Oracle can make effective use of indexes with any of these conditions. It would therefore have to do a full table scan each time.
The last two conditions seem reasonable and there doesn't seem a lot that can be done with them. I'd like to focus on the first two conditions as I think there's more scope for improvement with them.
The second of the four conditions is
instr(Digits, v_MDLDigits) = 1
This condition holds if and only if Digits starts with the contents of v_MDLDigits. A better way of writing this would be
Digits LIKE v_MDLDigits || '%'
The advantage of using LIKE in this situation instead of INSTR is that Oracle can make use of indexes when using LIKE. If you have an index on the Digits column, Oracle will be able to use it with this query. Oracle would then be able to focus in on those rows that start with the digits in v_MDLDigits instead of doing a full table scan.
The first of the four conditions is:
substr(Digits, 1, 2) = substr(v_MDLDigits, 1, 2)
If v_MDLDigits has length at least 2, and all entries in the Digits columns also have length at least 2, then this condition is redundant since it is implied by the previous one we looked at.
I'm not sure why you would have a condition like this. The only reason I can think why you might have this condition is if you have a functional index on substr(Digits, 1, 2). If not, I would be tempted to remove this substr condition altogether.
I don't think the cursor is what is making this procedure run slowly, and there's no single statement I know of that can insert into one table and delete from another. To make this procedure speed up I think you just need to tune the queries a bit.

Resources