Google like query for Oracle database - oracle

I want to do a lookup on a column in an Oracle table that contains a company name.
If I ask for "Jennifer's Dry Cleaners" I'd like to return not only that exact match but also "close matches" like (but not limited to):
Jennifer's Pipe Cleaners
Jessica's Dry Cleaners
Jennifer Pipe Cleanup
Pipe Cleaning by Jennifer
Is there a way to accomplish this?

You might be able to us the LIKE operator on each word and combine that with UTL_MATCH.EDIT_DISTANCE to order the results by closeness:
SELECT
company_name,
utl_match.edit_distance(lower(company_name), 'jennifers dry cleaners') edit_distance
FROM company_names
WHERE
lower(company_name) LIKE '%jennifer%' OR
lower(company_name) LIKE '%dry%' OR
lower(company_name) LIKE '%cleaner'
ORDER BY edit_distance asc;

Use Oracle Text indexes. You can either index a particular column of data, or use a stored procedure to associate a group of terms with a particular record.
Here's an example of indexing a particular column of a table. See documentation on CTX_DDL for how to index using a "Data Store", for indexing multiple columns, even from different sources.
9:28:34 AM > create table NOTE_TABLE (NOTE_ID NUMBER constraint
NOTE_TABLE_PK primary key, NOTE_TEXT VARCHAR2(4000));
Table created
Executed in 0.045 seconds
9:28:34 AM > insert into NOTE_TABLE values(1, 'This is the contents
of a note that has the word "glory"');
1 row inserted
Executed in 0.023 seconds
9:28:34 AM > insert into NOTE_TABLE values(2, 'Mine eyes have see the
Glory of the coming of the LORD');
1 row inserted
Executed in 0.01 seconds
9:51:15 AM> insert into NOTE_TABLE values(3, 'This is Jennifer''s Pipe cleaning');
1 row inserted
Executed in 0.008 seconds
9:28:34 AM > commit;
Commit complete
Executed in 0.014 seconds
9:28:34 AM > begin
2 ctx_ddl.create_preference('my_word_list', 'BASIC_WORDLIST');
3 ctx_ddl.set_attribute('my_word_list','FUZZY_MATCH','ENGLISH');
4 ctx_ddl.set_attribute('my_word_list','FUZZY_SCORE','65');
5 ctx_ddl.set_attribute('my_word_list','FUZZY_NUMRESULTS','5000');
6 ctx_ddl.set_attribute('my_word_list','SUBSTRING_INDEX','TRUE');
7 ctx_ddl.set_attribute('my_word_list','PREFIX_INDEX','YES');
8 ctx_ddl.set_attribute('my_word_list','PREFIX_MIN_LENGTH', 1);
9 ctx_ddl.set_attribute('my_word_list','PREFIX_MAX_LENGTH', 64);
10 ctx_ddl.set_attribute('my_word_list','STEMMER','ENGLISH');
11 end;
12 /
PL/SQL procedure successfully completed
Executed in 0.024 seconds
9:28:34 AM > CREATE INDEX NOTE_TABLE_TEXT_IX on NOTE_TABLE(NOTE_TEXT) indextype is
2 ctxsys.CONTEXT parameters ('wordlist my_word_list');
Index created
Executed in 0.321 seconds
9:28:35 AM > select NOTE_ID from NOTE_TABLE t where contains(NOTE_TEXT, '{glory}', 10) > 0;
NOTE_ID
----------
1
2
Executed in 0.101 seconds
9:28:35 AM > select NOTE_ID from NOTE_TABLE t where contains(NOTE_TEXT, '{glory} and {lord}', 10) > 0;
NOTE_ID
----------
2
Executed in 0.055 seconds
9:51:16 AM> select NOTE_ID from NOTE_TABLE t where contains(NOTE_TEXT, 'fuzzy(cleaners, 1, 3)', 10) > 0;
NOTE_ID
----------
3
Executed in 0.056 seconds
9:28:35 AM > drop table NOTE_TABLE;
Table dropped
Executed in 0.283 seconds
9:28:35 AM > begin
2 ctx_ddl.drop_preference('my_word_list');
3 end;
4 /
PL/SQL procedure successfully completed
Executed in 0.012 seconds
9:28:36 AM >

Related

simple random sampling while pulling data from warehouse(oracle engine) using proc sql in sas

I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I pull data using proc sql directly from warehouse by doing simple random sampling on the rows.
Below code wont work as ranuni is not supported by oracle
proc sql outobs =1000000;
select * from connection to oracle(
select * from tbl1 order by ranuni(12345);
quit;
How do you propose I do it
Use the DBMS_RANDOM Package to Sort Records and Then Use A Row Limiting Clause to Restrict to the Desired Sample Size
The dbms_random.value function obtains a random number between 0 and 1 for all rows in the table and we sort in ascending order of the random value.
Here is how to produce the sample set you identified:
SELECT
*
FROM
(
SELECT
*
FROM
tbl1
ORDER BY dbms_random.value
)
FETCH FIRST 1000000 ROWS ONLY;
To demonstrate with the sample schema table, emp, we sample 4 records:
SCOTT#DEV> SELECT
2 empno,
3 rnd_val
4 FROM
5 (
6 SELECT
7 empno,
8 dbms_random.value rnd_val
9 FROM
10 emp
11 ORDER BY rnd_val
12 )
13 FETCH FIRST 4 ROWS ONLY;
EMPNO RND_VAL
7698 0.06857749035643605682648168347885993709
7934 0.07529612360785920635181751566833986766
7902 0.13618520865865754766175030040204331697
7654 0.14056380246495282237607922497308953768
SCOTT#DEV> SELECT
2 empno,
3 rnd_val
4 FROM
5 (
6 SELECT
7 empno,
8 dbms_random.value rnd_val
9 FROM
10 emp
11 ORDER BY rnd_val
12 )
13 FETCH FIRST 4 ROWS ONLY;
EMPNO RND_VAL
7839 0.00430658806761508024693197916281775492
7499 0.02188116061148367312927392115186317884
7782 0.10606515700372416131060633064729870016
7788 0.27865276349549877512032787966777990909
With the example above, notice that the empno changes significantly during the execution of the SQL*Plus command.
The performance might be an issue with the row counts you are describing.
EDIT:
With table sizes in the order of 150 gigs - 79 MM, any sorting would be painful.
If the table had a surrogate key based on a sequence incremented by 1, we could take the approach of selecting every nth record based on the key.
e.g.
--scenario n = 3000
FROM
tbl1
WHERE
mod(table_id, 3000) = 0;
This approach would not use an index (unless a function based index is created), but at least we are not performing a sort on a data set of this size.
I performed an explain plan with a table that has close to 80 million records and it does perform a full table scan (the condition forces this without a function based index) but this looks tenable.
None of the answers posted or comments helped my cause, it could but we have 87 MM rows
Now I wanted the answer with the help of sas: here is what I did: and it works. Thanks all!
libname dwh path username pwd;
proc sql;
create table sample as
(select
<all the variables>, ranuni(any arbitrary seed)
from dwh.<all the tables>
<bunch of where conditions goes here>);
quit);

Nested PL/SQL cursors and For loop work OK but "Connecting to the database xxx"

Intro:
I'm a PL/SQL beginner of 11g express. Hence guys, please show mercy.. This procedure has two nested cursors with one for loop, hoping to duplicate order rows based on the number of its occurrence number for further processing.
e.g. If order number > 2, duplicate the whole row and divide the transaction amount by its number for transaction amount.
Raw data:
Table:D_MP
BARCODE Product Amount ID Num
76Q7Q7 Water 10.00 20160601 2
8JJ1NK Apple 5.50 20160601 1
8JJ1YK Orange 4.50 20160608 1
8JJ1CK Banana 4.00 20160608 2
Result:
Table:D_MP_1
BARCODE Product Amount ID Num
76Q7Q7 Water 5.00 20160601 1
76Q7Q7 Water 5.00 20160601 1
8JJ1NK Apple 5.50 20160601 1
8JJ1YK Orange 4.50 20160608 1
8JJ1CK Banana 2.00 20160608 1
8JJ1CK Banana 2.00 20160608 1
Problem:
The output of this procedure looks all right, but every time I run it, the log of SQL Developer keeps goes "Connecting to the database xxx", not showing disconnected to database xxx, and I have to stop the procedure myself. Since I just have 10+ row of raw data for testing purpose, I wonder what prevents it from disconnecting/stopping.
Here's the code
Create or replace PROCEDURE MULTIPLE_CURSORS_TEST is
pac_row D_MP%ROWTYPE;
v_barcode D_MP.BARCODE%TYPE;
v_Num D_MP.Num%TYPE;
v_Amount D_MP.Amount%TYPE;
CURSOR mul_pa_LOOP_CURSOR
IS
Select *
from D_MP;
CURSOR mul_pa_ID_CURSOR
IS
Select distinct ID
from D_MP;
BEGIN
OPEN mul_pa_loop_CURSOR;
LOOP
OPEN mul_pa_ID_CURSOR;
LOOP
FETCH mul_pa_loop_cursor INTO pac_row;
exit when mul_pa_loop_cursor%notfound;
exit when mul_pa_ID_cursor%notfound;
v_Num:= pac_row.Num;
v_Amount:=pac_row.Amount/pac_row.Num;
FOR i IN 1..v_Num
LOOP
INSERT INTO D_MP_1 ("BARCODE","Product", "Amount","ID","Num")
VALUES (pac_row."BARCODE",pac_row."Product",v_Amount,pac_row."ID",'1');
END LOOP;
COMMIT;
END LOOP;
CLOSE mul_pa_ID_CURSOR;
END LOOP;
CLOSE mul_pa_loop_CURSOR;
EXCEPTION
WHEN OTHERS
THEN
raise_application_error(-20001,'An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);
END MULTIPLE_CURSORS_TEST;
Actually one cursor is sufficient enough just for this duplicating function. But I wanted to do more on each transaction level for promotional activities based on this procedure, so I stick to these two cursors, mul_pa_LOOP_CURSOR for entire rows and the other mul_pa_ID_CURSOR for each different transaction ID.
Feel free to comment on this procedure.
Assumming that num is always integer, it can be done using single SQL query:
select barcode, product, amount/num as amount, 1 as num
from table1 t1
join (
select level l from dual
connect by level <= ( select max(num) from table1 )
) x
on x.l <= t1.num
order by 1,2
BARCOD PRODUC AMOUNT NUM
------ ------ ---------- ----------
76Q7Q7 Water 5 1
76Q7Q7 Water 5 1
8JJ1CK Banana 2 1
8JJ1CK Banana 2 1
8JJ1NK Apple 5,5 1
8JJ1YK Orange 4,5 1
On Oracle 12c it is even easier with a help of LATERAL JOIN:
select barcode, product, amount/num as amount, 1 as num
from table1 t1,
lateral (
SELECT null FROM dual
CONNECT BY LEVEL <= t1.num
) x
order by 1,2
;
BARCOD PRODUC AMOUNT NUM
------ ------ ---------- ----------
76Q7Q7 Water 5 1
76Q7Q7 Water 5 1
8JJ1CK Banana 2 1
8JJ1CK Banana 2 1
8JJ1NK Apple 5,5 1
8JJ1YK Orange 4,5 1
I don't understand why you have three loops.
First loop is for "mul_pa_loop_CURSOR"
Second loop is for "mul_pa_ID_CURSOR"
Third loop is a a FOR loop on numbers
You are fetching the record from only first loop. Atleast as per the code you have shared your second loop seems redundant. Try to either remove the second loop or move the FETCH and first EXIT statement in the first loop since currently they are in second loop.
OPEN mul_pa_loop_CURSOR;
LOOP
OPEN mul_pa_ID_CURSOR;
LOOP
FETCH mul_pa_loop_cursor INTO pac_row; -- move this in first loop
exit when mul_pa_loop_cursor%notfound; -- move this in first loop
exit when mul_pa_ID_cursor%notfound;
Like this
OPEN mul_pa_loop_CURSOR;
LOOP
FETCH mul_pa_loop_cursor INTO pac_row;
exit when mul_pa_loop_cursor%notfound;
OPEN mul_pa_ID_CURSOR;
LOOP
-- There should be a separate FETCH statement for this loop
exit when mul_pa_ID_cursor%notfound;
Hope this helps.

Can a sequence guarantee the given order?

Using this Oracle sequence definition:
CREATE SEQUENCE MY_SPECIAL_COUNTER
START WITH 100
INCREMENT BY -1
MAXVALUE 100
MINVALUE 0
NOCYCLE
NOCACHE
ORDER
;
Is it guaranteed, that this sequence will ALWAYS return
each number once from 100 to 0
in the given order from 100 to 0 and
no single number will be obmitted and
independent of multiple concurrent session in the very moment of the request
the correct number?
Yes, the sequence will return each number once, starting with 100, progressively down to 0.
Yes, it will return them in order from 100 down to 0.
Correct, it will not skip any numbers.
Multiple sessions? That depends.
Consider the following:
create table junk ( id number );
CREATE SEQUENCE MY_SPECIAL_COUNTER
START WITH 100
INCREMENT BY -1
MAXVALUE 100
MINVALUE 0
NOCYCLE
NOCACHE
ORDER
;
-- in session 1 do the following:
insert into junk
select my_special_counter.nextval from dual;
insert into junk
select my_special_counter.nextval from dual;
SQL> select * from junk;
ID
----------
100
99
SQL>
-- in session 2 do the following:
insert into junk
select my_special_counter.nextval from dual;
SQL> select * from junk;
ID
----------
98
SQL> commit;
-- in session 1 do the following:
rollback;
-- in session 3 do the following:
SQL> select * from junk;
ID
----------
98
SQL>
As you can see once you introduce multiple sessions, all kinds of fun things can happen. Numbers can be "lost" or "skipped" .. note that it is NOT the sequence doing it, but rather the session or whoever pulled the sequence value then subsequently drops it (ie job abends, and rollsback, logic error, etc.)
Also, if session 1 pulls sequence first, but session 2 commits first, others will "think" session 2 inserted a number "out of order" .. so to understand what you're after, we really need to understand your requirements in detail to advise further.
However, that should help you understand sequences properly. :)
Good luck!

PL/SQL select returns more than one line

I'm embarrassed to admit this is a totally noob question - but I take shelter in the fact that I come from a T-SQL world and this is a totally new territory for me
This is a simple table I have with 4 records only
ContractorID ProjectID Cost
1 100 1000
2 100 800
3 200 1005
4 300 2000
This is my PL SQL function which should take a contractor and a project id and return number of hours ( 10 in this case )
create or replace FUNCTION GetCost(contractor_ID IN NUMBER,
project_ID in NUMBER)
RETURN NUMBER
IS
ContractorCost NUMBER;
BEGIN
Select Cost INTO ContractorCost
from Contractor_Project_Table
where ContractorID= contractor_ID and ProjectID =project_ID ;
return ContractorCost;
END;
But then using
select GetCost(1,100) from Contractor_Project_Table;
This returns same row 4 times
1000
1000
1000
1000
What is wrong here? WHy is this returning 4 rows instead of 1
Thank you for
As #a_horse_with_no_name points out, the problem is that Contractor_Project_Table has (presumably) 4 rows so any SELECT against Contractor_Project_Table with no WHERE clause will always return 4 rows. Your function is getting called 4 times, one for each row in the table.
If you want to call the function with a single set of parameters and return a single row of data, you probably want to select from the dual table
SELECT GetCost( 1, 100 )
FROM dual
Because you have 4 rows in Contractor_Project_Table table. Use this query to get one record.
select GetCost(1,100) from dual;

getting last record of a cursor

If I have a cursor cur which contain these records
DEPTNO ENAME ORD CNT
10 KING 1 3
10 CLARK 2 3
10 MILLER 3 3
20 JONES 1 5
I get my cursor record like this :
FOR i IN cur LOOP
--Process
END LOOP;
Now I need to enhance my process and do a check, if the value of CNT column of the last record is equal to 5 I don't need to navigate into this cursor.
Is there a way to directly get the last record of the cursor to test CNT column without looping ?
No. A cursor is a pointer to a program that executes the query. You can only fetch from the cursor. Oracle itself has no idea what the last row the cursor will return until you attempt to fetch a row and find there are no more rows to return.
You could, of course, modify the query so that the CNT of the last row is returned in a separate column (assuming that you have some way to order the rows so that the "last row" is a meaningful concept).

Resources