utl_match comparing many records - oracle

I have 2 tables - one with 1 million records, and the other with 40000 records.
I need to compare for each record in a table if there's a similar string on the other table.
the thing is that this procedure is very slow
I need optimize this procedure
for tablea in
( select first_name||' '||last_name as fullname from employee )
loop
SELECT COUNT(*)
INTO num_coincidencias
FROM table b
WHERE utl_match.jaro_winkler_similarity(b.name ,tablea .fullname) > 98
dbms_output.put_line(num_coincidencias);
end loop;

You do realize you are doing 40 billion comparisons? This is going to take a long time no matter what method you use.
Turning this into a SQL statement will eliminate context switches, I don't know if your computer has the resources to do it all in a single SQL statement:
SELECT COUNT (*) c, a.first_name || ' ' || a.last_name full_name
FROM employee a CROSS JOIN tableb b
WHERE UTL_MATCH.jaro_winkler_similarity (b.first_name, a.first_name || a.last_name) > 98
GROUP BY a.first_name || ' ' || a.last_name

Related

Subquery as CASE WHEN condition

Below query, syntax error happens on AS PQ_COUNT
SELECT CASE WHEN
RESULTS LIKE '%PQ - Duplicate%' AND
(SELECT COUNT(*) FROM MY_TABLE WHERE ID = '998877'AND FINAL_RESULTS='FL_57') AS PQ_COUNT >= 1
THEN 'PQ count = '|| PQ_COUNT
ELSE RESULTS END AS RESULTS
If I moved AS PQ_COUNT inside select query,
(SELECT COUNT(*) AS PQ_COUNT FROM MY_TABLE WHERE ID = '998877'AND FINAL_RESULTS='FL_57') >= 1
the reference of PQ_COUNT in THEN block become invalid identifier (ORA-00904)
What might go wrong here when addressing subquery as CASE WHEN condition?
One option is to use a subquery (or a CTE, as in my example) to calculate number of rows that satisfy condition, and then - as it contains only one row - cross join it to my_table. Something like this:
SQL> WITH
2 my_table (id, final_results, results) AS
3 -- sample data
4 (SELECT '998877', 'FL_57', 'PQ - Duplicate' FROM DUAL),
5 cnt AS
6 -- calculate COUNT first ...
7 (SELECT COUNT (*) pq_count --> pq_count
8 FROM MY_TABLE
9 WHERE ID = '998877'
10 AND FINAL_RESULTS = 'FL_57')
11 -- ... then re-use it in "main" query
12 SELECT CASE
13 WHEN a.results LIKE '%PQ - Duplicate%'
14 AND b.pq_count >= 1 --> reused here
15 THEN
16 'PQ count = ' || b.PQ_COUNT --> and here
17 ELSE
18 a.results
19 END AS results
20 FROM my_table a CROSS JOIN cnt b;
RESULTS
---------------------------------------------------
PQ count = 1
SQL>
You cannot refer to an alias in the same sub-query where you create it; you need to nest sub-queries (or use a sub-query factoring clause; also called a CTE or WITH clause) and refer to it in the outer one:
SELECT CASE
WHEN results LIKE '%PQ - Duplicate%'
AND pq_count >= 1
THEN 'PQ count = '|| pq_count
ELSE results
END AS RESULTS
FROM (
SELECT results,
( SELECT COUNT(*)
FROM MY_TABLE
WHERE ID = '998877'
AND FINAL_RESULTS='FL_57'
) AS pq_count
FROM your_table
);

PL/SQL oracle procedure dose not returen any value

i Have this oracle procedure reading paramater with varchar value,and when i use this parameter value inside the procedure dose not work. Everything will be explained below
CREATE OR REPLACE procedure test_pro(read_batch in varchar2 )
as
v_read_batches varchar2(500);
begin
v_read_batches := '''' || replace(read_batch, ',', ''',''') || '''';
--v_read_batches VALUE IS '100','1000','11','9200'
SELECT CODE,BANK_NAME_ARABIC,BANK_CODE,to_number(BATCH_ID)BATCH_ID FROM (select 1 CODE,PB.BANK_NAME_ARABIC ,to_char(PB.BANK_CODE)BANK_CODE,
CASE PB.BANK_CODE
WHEN 1000
THEN 1000
WHEN 100
THEN 100
ELSE 9200
END batch_id
from BANKS PB
WHERE PB.BANK_CODE IN (1000,100,11200)
union
SELECT 2 CODE,'Other Banks' other_banks,listagg(PB.BANK_CODE , ', ')
within group(order by PB.BANK_CODE ) as BANK_CODE, 11 batch_id
FROM BANKS PB
WHERE PB.BANK_CODE NOT IN (1000,100,9200))
WHERE to_char(BATCH_ID) IN (v_read_batches)
end test_pro;
Problem is when i put v_read_batches inside the sql condition it did not returen any value, when i execute
the below sql alone with same value in v_read_batches variable it works and reture the values !!
SELECT CODE,BANK_NAME_ARABIC,BANK_CODE,to_number(BATCH_ID)BATCH_ID
FROM (select 1 CODE,PB.BANK_NAME_ARABIC
,to_char(PB.BANK_CODE)BANK_CODE, CASE PB.BANK_CODE
WHEN 1000
THEN 1000
WHEN 100
THEN 100
ELSE 9200 END batch_id from BANKS PB WHERE PB.BANK_CODE IN (1000,100,11200)
union SELECT 2 CODE,'Other Banks' other_banks,listagg(PB.BANK_CODE ,
', ') within group(order by PB.BANK_CODE ) as BANK_CODE, 11 batch_id
FROM BANKS PB WHERE PB.BANK_CODE NOT IN (1000,100,9200))
WHERE to_char(BATCH_ID) IN ('100','1000','11','9200')
You cannot build a string like this and hope to use it iin an IN statement. The elements in an IN clause are static, ie, if you code
col in ('123,456')
then we are looking for COL to match the string '123,456' not the elements 123 and 456.
You can convert your input string to rows via some SQL, eg
create table t as select '123,456,789' acct from dual
select distinct (instr(acct||',',',',1,level)) loc
from t
connect by level <= length(acct)- length(replace(acct,','))+1
Having done this, you could alter your procedure so that your
WHERE batch_id in (read_batch)
becomes
WHERE batch_id in (select distinct (instr(:batch||',',',',1,level)) loc
from t
connect by level <= length(:batch)- length(replace(:batch,','))+1
)
In the general sense, never let an input coming from the outside world be folded directly into a SQL statement. You create the risk of "SQL Injection" which is the most common way people get hacked.
Full video demo on the string-to-rows technique here:
https://youtu.be/cjvpXL3H64c?list=PLJMaoEWvHwFIUwMrF4HLnRksF0H8DHGtt

What is wrong with my SQL Query for the Cursor

Here is what I am supposed to do. Assume that the tables are created, and all the columns are correctly named
"Using stored procedures and cursors, display the location (including street, zip code, city and country) of the managers with job id of either IT_PROG or SA_MAN and with salary greater than 3000".
Here is the code I have written so far but the sql statement for the cursor doesn't seem to want to work. For the DEPARTMENTS Table the FK's are MANAGER_ID and LOCATION_ID, for the EMPLOYEES Table the FK is JOB_ID and the LOCATIONS table has no FK. All the primary keys are set
Here is the code:
create or replace procedure mgtLocation
is
cursor getLoc is
select LOCATIONS.STREET_ADDRESS, LOCATIONS.POSTAL_CODE, LOCATIONS.CITY,
LOCATIONS.COUNTRY, LOCATIONS.LOCATIONS_ID, LOCATIONS.LOCATIONS_ID
from LOCATIONS
inner join DEPARTMENTS on DEPARTMENTS.MANAGER_ID = EMPLOYEES.EMPLOYEE_ID
inner join LOCATIONS on LOCATIONS.LOCATION_ID = DEPARTMENTS.LOCATION_ID
where EMPLOYEES.Job_ID in (select Job_ID from EMPLOYEES where Job_ID = 'IT_PROG' or Job_ID = 'SA_MAN' and SALARY > 3000);
EmpLoc getLoc%rowtype;
begin
dbms_output.put_line('=================');
open getLoc;
loop
fetch getLoc into EmpLoc;
EXIT WHEN getLoc%NOTFOUND;
dbms_output.put_line('Street: ' || EmpLoc.STREET_ADDRESS ||
' Zip Code: ' || EmpLoc.POSTAL_CODE ||
' City: ' || EmpLoc.CITY ||
' Country: ' || EmpLoc.COUNTRY);
end loop;
dbms_output.put_line('=================');
close getLoc;
end;
/
execute mgtLocation;
I get an error for the inner joins and I cannot seem to figure out how to fix them in order for this to work.
You could try this:
cursor getLoc is
select LOCATIONS.STREET_ADDRESS,
LOCATIONS.POSTAL_CODE, LOCATIONS.CITY,
LOCATIONS.COUNTRY, LOCATIONS.LOCATIONS_ID,
LOCATIONS.LOCATIONS_ID
from LOCATIONS
inner join DEPARTMENTS on
DEPARTMENTS.LOCATION_ID = LOCATIONS.LOCATION_ID
inner join EMPLOYEES on
EMPLOYEES.EMPLOYEE_ID =
DEPARTMENTS.MANAGER_ID
where (EMPLOYEES.Job_ID = 'IT_PROG' or
EMPLOYEES.Job_ID = 'SA_MAN')
and SALARY > 3000;

Oracle Number type without precision - how do I know if it is a whole number or not

Our vendor's database has Number types for all numbers including whole numbers and decimal numbers. Literally, every numeric type column is created as NUMBER without precision and scale.
This is a big problem as we need to map these columns to proper data types on our target system, we are loading data from these tables into.
We need to know if a number is an integer or decimal.
Other than doing a random sampling/data profiling, is it possible to infer proper data types?
UPDATE:
I accepted the answer below and suggestion from #Bohemian. In addition to that, I will use SAMPLE clause that will do a random sampling of the table since my source tables are huge (many billions of rows).
SELECT
MAX(CASE WHEN col1 IS NOT NULL AND col1 <> round(col1, 0) then 1 else 0 end) as col1,
MAX(CASE WHEN col2 IS NOT NULL AND col2 <> round(col2, 0) then 1 else 0 end) as col2
FROM TABLE
SAMPLE(0.05)
If I want to sample only X rows, use formula below to SAMPLE(N):
Xrows*100/table_rows_total
You can try selecting each FIELD, and seeing if all values of FIELD are equal to ROUND(FIELD, 0). If they are, then that field should be integer. If not, decimal.
I have answered it in this other post and the query that you would use to find the maximum number of decimal places in all of the number columns is the same as that one.
To identify the columns with the their maximum decimal position digits, you can run the SQL below after substituting the MY_SCHEMA, MY_TABLE and the number 10 with say 25 to identify columns that have values over 25 decimal places. This SQL will generate a SQL that should be run to get your result.
SELECT 'SELECT ' || LISTAGG('MAX(LENGTH(TO_CHAR(ABS(' || column_name || ') - FLOOR(ABS(' || column_name || '))))) - 1 AS decimals_' || column_name || CHR(13)
, CHR(9)|| ', ') WITHIN GROUP (ORDER BY rn) ||
' FROM ' || owner || '.' || table_name || CHR(13) ||
' WHERE ' || CHR(13) ||
LISTAGG('(LENGTH(TO_CHAR(ABS(' || column_name || ') - FLOOR(ABS(' || column_name || ')))) - 1) > 10 ' || CHR(13)
, CHR(9)|| ' OR ')
WITHIN GROUP (ORDER BY rn) AS Nasty_Numbers_Finder_Query
FROM
(
SELECT owner, table_name, column_name,
row_number() OVER ( PARTITION BY table_name ORDER BY rownum) rn
FROM dba_tab_columns
WHERE
OWNER = 'MY_SCHEMA'
AND table_name = 'MY_TABLE'
AND (data_type LIKE '%FLOAT%'
OR data_type LIKE '%NUMERIC%')
) a
GROUP BY owner, table_name
For more information, I have blogged about it here.

pl/sql SQL Statement ignored and missing right parenthesis

this code has to sum salary of employees of department_id 100.so it gives this error "missing right parenthesis"
DECLARE
v_department_name VARCHAR(100);
v_department_manager VARCHAR(100);
v_totalsalary NUMBER(30);
BEGIN
SELECT departments.department_name, concat(employees.first_name,
employees.last_name), employees.salary INTO v_department_name,
v_department_manager, v_totalsalary
FROM employees JOIN departments ON employees.department_id =
departments.department_id
WHERE employees.salary = (SELECT departments.department_id,
sum(employees.salary)
FROM EMPLOYEES
where departments.department_id=100
GROUP BY DEPARTMENT_ID
ORDER BY DEPARTMENT_ID);
DBMS_OUTPUT.PUT_LINE ('Department Name is : ' || v_department_name || 'And
Department Manager is : ' || v_department_manager || 'Total Amount of Salary
is : ' || v_totalsalary );
END;
The "missing right parenthesis" error is clearly caused by the ORDER BY clause in the subquery (where it is not allowed).
Once you clear that error, you get the "too many values" error, because you are comparing a single variable (salary) to the output from a subquery that returns two values (department_id AND sum(salary)). Not sure why you thought you need to include the department_id in the SELECT clause of the subquery.
When you include error messages in your question, include the full text of the message (which shows the line number and position at which the error occurred - a crucial detail!)
Take it one small step at a time. Forget for the moment PL/SQL; are you able to write the correct query in SQL, which will return the department name, the manager's name and the sum of the salaries of all the employees in the department? If you can do that, then the PL/SQL around it is easy.
Here is one way to get all the values in one SQL statement:
select d.department_name,
m.first_name || ' ' || m.last_name as manager_name,
sum(e.salary) as sum_salary
from departments d
join
employees m on d.manager_id = m.employee_id
join
employees e on d.department_id = e.department_id
where d.department_id = 100
group by d.department_id, d.department_name, m.first_name, m.last_name
;
DEPARTMENT_NAME MANAGER_NAME SUM_SALARY
--------------- --------------- ----------
Finance Nancy Greenberg 51608
Perhaps 80% of writing good PL/SQL code is simply writing good, efficient SQL statements. If you have any difficulty with this query, you should probably spend the majority of your time writing SQL statements, for the next few days or weeks; return to PL/SQL when you feel this query (in my answer) is "simple", "easy", "standard" (which it is!)

Resources