Oracle create operator - oracle

I've recently run into a case where a fuzzy-match was useful when categorizing historical unstructured-string data. UTL_MATCH is great and has worked well when wrapping into a truthy fuzzy-match function. But I wanted to be joining ad-hoc, and took the route of building out a new function-based operator.
OPERATOR creation: (function-based, FUZZY_MATCH returns 0 for unmatched, 1 for match. A dummy version is included here)
CREATE OR REPLACE FUNCTION FUZZY_MATCH(
LEFT_ITEM VARCHAR2,
RIGHT_ITEM VARCHAR2 )
RETURN NUMBER AS BEGIN
RETURN 1;
END;
CREATE OR REPLACE OPERATOR RESEMBLES
BINDING (VARCHAR2, VARCHAR2)
RETURN NUMBER USING FUZZY_MATCH
;
Creating a new operator works fine, but I've been a little dissatisfied with the resultant sql syntax (example below).
CREATE TABLE LEFT_TABLE(ARBITRARY_DATA VARCHAR2(200) NOT NULL);
CREATE TABLE RIGHT_TABLE(ARBITRARY_DATA VARCHAR2(200) NOT NULL);
INSERT INTO LEFT_TABLE VALUES ('In a hole in the ground there lived a hobbit.');
INSERT INTO RIGHT_TABLE VALUES ('In the ground there lived a hobbit.');
SELECT
LEFT_TABLE.ARBITRARY_DATA LEFT_DATA,
RIGHT_TABLE.ARBITRARY_DATA RIGHT_DATA
FROM
LEFT_TABLE
INNER JOIN
RIGHT_TABLE
ON 1 = RESEMBLES ( LEFT_TABLE.ARBITRARY_DATA, RIGHT_TABLE.ARBITRARY_DATA )
;
Is there an alternative OPERATOR definition to make the truthiness of the operator implicit, allowing for more natural syntax like the following? I'm on 11gR2
Thanks
SELECT
LEFT_TABLE.ARBITRARY_DATA LEFT_DATA,
RIGHT_TABLE.ARBITRARY_DATA RIGHT_DATA
FROM
LEFT_TABLE
INNER JOIN
RIGHT_TABLE
ON LEFT_TABLE.ARBITRARY_DATA RESEMBLES RIGHT_TABLE.ARBITRARY_DATA;

Related

PL/SQL issue concerning Frequent Itemset

I'm trying to build a PL/SQL application to mine frequent item sets out of a set of given data and I've run into a bit of a snag. My PL/SQL skills aren't as good as I'd like them to be, so perhaps one of you can help me understand this a bit better.
So to begin, I'm using the Oracle data mining procedure: *DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL*
While reading the documentation, I came across the following example which I have manipulated to query over my data set:
CREATE OR REPLACE TYPE FI_VARCHAR_NT AS TABLE OF NUMBER;
/
CREATE TYPE fi_res AS OBJECT (
itemset FI_VARCHAR_NT,
support NUMBER,
length NUMBER,
total_tranx NUMBER
);
/
CREATE TYPE fi_coll AS TABLE OF fi_res;
/
create or replace
PROCEDURE freq_itemset_test is
cursor freqC is
SELECT itemset
FROM table(
CAST(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL(CURSOR(SELECT sale.customerid, sale.productid FROM Sale INNER JOIN Customer ON customer.customerid = sale.customerid WHERE customer.region = 'Canada' )
,0,2, 2, NULL, NULL) AS fi_coll));
coll_nt FI_VARCHAR_NT;
num_rows int;
num_itms int;
BEGIN
num_rows := 0;
num_itms := 0;
OPEN freqC;
LOOP
FETCH freqC INTO coll_nt;
EXIT WHEN freqC%NOTFOUND;
num_rows := num_rows + 1;
num_itms := num_itms + coll_nt.count;
END LOOP;
DBMS_OUTPUT.PUT_LINE('Rows: ' || num_rows || ' Columns: ' || num_itms);
CLOSE freqC;
END;
My reasoning for using the Oracle FI_TRANSACTIONAL over straight SQL is that I will need to repeat this analysis for multiple dynamic values of K, so why reinvent the wheel? Ultimately, my goal is to reference each individual item sets returned by the procedure and return the set with the highest support based on some query logic. I will be incorporating this block of PL/SQL into another that basically changes the literal in the query from 'Canada' to multiple other regions based on the content of the data.
My question is: How can I actually get a programmatic reference on the data returned by the cursor (freqC)? Obviously I do not need to count the rows and columns, but that was part of the example. I'd like to print out the item sets with DBMS print line after I've found the most occurring item set. When I view this in a debugger, I see that each fetch of the cursor actually returns an item set (in this case, k=2, so two items). But how do I actually touch them programmatically? I'd like to grab the sets themselves as well as fi_res.support.
As always, thanks to everyone for sharing their brilliance!
You are fetching your data into a nested table. So to see the data in there, you would need to loop over the nested table:
FOR i IN coll_nt.FIRST .. coll_nt.LAST
LOOP
dbms_output.put_line(i||': '||coll_nt(i));
END LOOP;
For much more information on nested tables and other types of collections, see the presentation at:
http://www.toadworld.com/platforms/oracle/w/wiki/8253.everything-you-need-to-know-about-collections-but-were-afraid-to-ask.aspx

Oracle and possible constant predicates in "WHERE" clause

I have a common problem with ORACLE in following example code:
create or replace procedure usp_test
(
p_customerId number,
p_eventTypeId number,
p_out OUT SYS_REFCURSOR
)
as
begin
open p_out for
select e.Id from eventstable e
where
(p_customerId is null or e.CustomerId = p_customerId)
and
(p_eventTypeId is null or e.EventTypeId = p_eventTypeId)
order by Id asc;
end usp_test;
The "OR" in "(p_customerId is null or e.CustomerId = p_customerId)" kills procedure performance, because optimizer will not use index (i hope for index seek) on "CustomerId" column optimally, resulting in scan instead of seek. Index on "CustomerId" has plenty of distinct values.
When working with MSSQL 2008 R2 (latest SP) or MSSQL 2012 i can hint the query with "option(recompile)" which will:
Recompile just this query
Resolve values for all variables (they are known after sproc is called)
Replace all resolved variables with constants and eliminate constant
predicate parts
For example: if i pass p_customerId = 1000, then "1000 is null" expression will always be false, so optimizer will ignore it.
This will add some CPU overhead, but it is used mostly for rarely called massive reports procedures, so no problems here.
Is there any way to do that in Oracle? Dynamic-SQL is not an option.
Adds
Same procedure just without "p_customerId is null" and "p_eventTypeId is null" runs for ~0.041 seconds, while the upper one runs for ~0.448 seconds (i have ~5.000.000 rows).
CREATE INDEX IX_YOURNAME1 ON eventstable (NVL(p_customerId, 'x'));
CREATE INDEX IX_YOURNAME2 ON eventstable (NVL(p_eventTypeId, 'x'));
create or replace procedure usp_test
(
p_customerId number,
p_eventTypeId number,
p_out OUT SYS_REFCURSOR
)
as
begin
open p_out for
select e.Id from eventstable e
where
(NVL(p_customerId, 'x') = e.CustomerId OR NVL(p_customerId, 'x') = 'x')
AND (NVL(p_eventTypeId, 'x') = e.EventTypeId OR NVL(p_eventTypeId, 'x') = 'x')
order by Id asc;
end usp_test;
One column index can't help as it's not stored in index definition.
Is creating index on (customer id, event id, id ) allowed? This way all needed columns are in index...

Oracle pipelined function cannot access remote table (ORA-12840) when used in a union

I have created a pipelined function which returns a table. I use this function like a dynamic view in another function, in a with clause, to mark certain records. I then use the results from this query in an aggregate query, based on various criteria. What I want to do is union all these aggregations together (as they all use the same source data, but show aggregations at different heirarchical levels).
When I produce the data for individual levels, it works fine. However, when I try to combine them, I get an ORA-12840 error: cannot access a remote table after parallel/insert direct load txn.
(I should note that my function and queries are looking at tables on a remote server, via a DB link).
Any ideas what's going on here?
Here's an idea of the code:
function getMatches(criteria in varchar2) return myTableType pipelined;
...where this function basically executes some dynamic SQL, which references remote tables, as a reference cursor and spits out the results.
Then the factored queries go something like:
with marked as (
select id from table(getMatches('OK'))
),
fullStats as (
select mainTable.id,
avg(nvl2(marked.id, 1, 0)) isMarked,
sum(mainTable.val) total
from mainTable
left join marked
on marked.id = mainTable.id
group by mainTable.id
)
The reason for the first factor is speed -- if I inline it, in the join, the query goes really slowly -- but either way, it doesn't alter the status of whatever's causing the exception.
Then, say for a complete overview, I would do:
select sum(total) grandTotal
from fullStats
...or for an overview by isMarked:
select sum(total) grandTotal
from fullStats
where isMarked = 1
These work fine individually (my pseudocode maybe wrong or overly simplistic, but you get the idea), but as soon as I union all them together, I get the ORA-12840 error :(
EDIT By request, here is an obfuscated version of my function:
function getMatches(
search in varchar2)
return idTable pipelined
as
idRegex varchar2(20) := '(05|10|20|32)\d{3}';
searchSQL varchar2(32767);
type rc is ref cursor;
cCluster rc;
rCluster idTrinity;
BAD_CLUSTER exception;
begin
if regexp_like(search, '^L\d{3}$') then
searchSQL := 'select distinct null id1, id2_link id2, id3_link id3 from anotherSchema.linkTable#my.remote.link where id2 = ''' || search || '''';
elsif regexp_like(search, '^' || idRegex || '(,' || idRegex || || ')*$') then
searchSQL := 'select distinct null id1, id2, id3 from anotherSchema.idTable#my.remote.link where id2 in (' || regexp_replace(search, '(\d{5})', '''\1''') || ')';
else
raise BAD_CLUSTER;
end if;
open cCluster for searchSQL;
loop
fetch cCluster into rCluster;
exit when cCluster%NOTFOUND;
pipe row(rCluster);
end loop;
close cCluster;
return;
exception
when BAD_CLUSTER then
raise_application_error(-20000, 'Invalid Cluster Search');
return;
when others then
raise_application_error(-20999, 'API' || sqlcode || chr(10) || sqlerrm);
return;
end getMatches;
It's very simple, designed for an API with limited access to the database, in terms of sophistication (hence passing a comma delimited string as a possible valid argument): If you supply a grouping code, it returns linked IDs (it's a composite, 3-field key); however, if you supply a custom list of codes, it just returns those instead.
I'm on Oracle 10gR2; not sure which version exactly, but I can look it up when I'm back in the office :P
To be honest no idea where the issue came from but the simplest way to solve it - create a temporary table and populate it by values from your pipelined function and use the table inside WITH clause. Surely the temp table should be created but I'm pretty sure you get serious performance shift because dynamic sampling isn't applied to pipelined functions without tricks.
p.s. the issue could be fixed by with marked as ( select /*+ INLINE / id from table(getMatches('OK'))) but surely it isn't the stuff you're looking for so my suggestion is confirmed WITH does something like 'insert /+ APPEND*/' inside it'.

Functional Where-In Clause - Oracle PL/SQL

I've got an association table that groups accounts together.
I'm trying to select a subset of table 'target'
p_group_id := 7;
select *
target t
where t.account_id in get_account_ids(p_group_id);
Is it possible to write a function that returns a list of account_ids (as some form of collection) that would facilitate the above code?
I've looked at pipelined functions, however I want to stay away from loops and cursors. Also, custom types / table also get into casting that I'd like to avoid.
For reference, here's some pseudocode for what the function 'get_account_ids' would do, hypothetically:
function get_account_ids(p_group_id)
insert into v_ret
select aa.account_id
from assoc_account aa
where aa.groupid = p_group_id;
return v_ret;
You simply need:
select *
from target t
where t.account_id in
( select aa.account_id
from assoc_account aa
where aa.groupid = 7;
)
The above will work, assuming that assoc_account.account_id is never NULL.

Oracle function to return a table included in a where clause

I would like to create a function which returns a table of results. Something like
select * from address where zipcode in (f_zips_in_radius(45.123,-93.123,50));
I have been working on that function but don't have anything working so will exclude my attempts so as not to muddle the question.
Assuming that it is possible, How would I implement it?
A combination of comments lead me to that answer. Thanks #a_horse_with_no_name and #Ray Toal
Here was my final solution
CREATE OR REPLACE PACKAGE pkg_distance AS
TYPE vcharset_t IS TABLE OF VARCHAR2(20);
FUNCTION zips_in_radius(i_lat number, i_lon number, i_radius NUMBER) RETURN vcharset_t PIPELINED;
END;
/
CREATE OR REPLACE PACKAGE BODY pkg_distance AS
FUNCTION zips_in_radius(i_lat number, i_lon number, i_radius NUMBER) RETURN vcharset_t PIPELINED IS
BEGIN
for r in (
select zipcode from zipdata z where f_distance(i_lat, i_lon, z.lat, z.lon) <= i_radius
)
loop
pipe row ( r.zipcode);
end loop;
return;
END;
END;
Then to use it I am using a query like:
--Boston City Centered on 42.360637,-71.0587120
select * from address a
where substr(a.zipcode,1,5) in
(select * from TABLE(pkg_distance.zips_in_radius(42.360637,-71.0587120,60)))
Which in my opinion still has an extra "select * from TABLE(" for my comfort, but still managable.
The problem is zipcode can be anything so if u r treating like a number then it becomes easier. You have to compute every possible permutation between x and y and then pass it back.
Since I donot know your app and your case, how many results can you expect ? Because that in clause can be very costly and you might have to do materialize the result set.. Or do some clever tricks....
Also I am guessing you have to cast this into a table...

Resources