I have a table which has SDO_Geometries and I query all the geometries to find their start and end point, then I insert these points to another table called ORAHAN. Now my main purpose is for each point in orahan I must find if it is intersects with another point in orahan when giving 2 cm buffer to points.
So I write some pl sql using Relate and Bufer functions but when I check some records in Map Info, I saw there is points within 1 cm area from itself but no record in intersections table called ORAHANCROSSES.
Am I use these functions wrongly or what?
Note: I am using Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production and
PL/SQL Release 11.2.0.1.0 - and SDO_PACKAGE
ORAHAN has approximately 400 thousands records.(points and other columns.)
declare
BEGIN
for curs in (select * from ORAHAN t) loop
for curs2 in (select *
from ORAHAN t2
where SDO_RELATE(t2.geoloc,SDO_GEOM.SDO_BUFFER(curs.geoloc,0.02,0.5) ,
'mask=ANYINTERACT') = 'TRUE'
and t2.mi_prinx <> curs.mi_prinx) loop
Insert INTO ORAHANCROSSES
values
(curs.Mip, curs.Startmi, curs2.Mip, curs2.Startmi);
commit;
end loop;
end loop;
END;
And this is MapInfo map image that shows 3 points which are close to each other aproximately 1 centimeter. But in the orahancrosses there is no record matching these 3.
Note: 0,00001000km equals 1cm
Orahan Metadata:
select * from user_sdo_geom_metadata where table_name = 'ORAHAN';
And diminfo:
What is the coordinate system of your data ? And, most important, what tolerance have you set in your metadata ?
Some other comments:
1) Don't use a relate with buffer approach. Just use a within-distance approach.
2) You don't need a PL/SQL loop for that sort of query just use a simple CTAS:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from orahan c1, orahan c2
where sdo_within_distance (c2.geoloc, c1.geoloc, 'distance=2 unit=cm') = 'TRUE'
and c2.mi_prinx <> c1.mi_prinx;
3) As written, couples of points A and B that are within 2 cm will be returned twice: once as (A,B) and once again as (B,A). To avoid that (and only return one of the cases), then write the query like this:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from orahan c1, orahan c2
where sdo_within_distance (c2.geoloc, c1.geoloc, 'distance=2 unit=cm') = 'TRUE'
and c1.rowid < c2.rowid;
3) Processing the number of points you mention (400000+) should run better using the SDO_JOIN technique, like this:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from table (
sdo_join (
'ORAHAN','GEOLOC',
'ORAHAN','GEOLOC',
'DISTANCE=2 UNIT=CM'
)
) j,
orahan c1,
orahan c2
where j.rowid1 < j.rowid2
and c1.rowid = j.rowid1
and c2.rowid = j.rowid2;
This will probably still take time to process - depending on the capacity of your database server. If you are licences for Oracle Enterprise Edition and your hardware has the proper capacity (# of cores) then parallelism can reduce the elapsed time.
4) You say you are using Oracle 11g. What exact version ? Version 11.2.0.4 is the terminal release for 11gR2. Anything older is no longer supported. By now you should really be on 12cR1 (12.1.0.2). The major benefit of 12.1.0.2 in your case s the Vector Performance Accelerator feature that speeds up a number of spatial functions and operators (only if you own the proper Oracle Spatial licenses - it is not available with the free Oracle Locator feature).
======================================
Using the two points in your example. Let's compute the distance:
select sdo_geom.sdo_distance(
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.07336716,null),null,null),
0.005
) distance
from dual;
DISTANCE
----------
.01000197
1 row selected.
Notice I don't specify any SRID. Assuming the coordinates are expressed in meters, the distance between them is indeed a little more than 1 cm.
======================================
The reason why your original syntax does not work is, as you noticed, because of the tolerance you specify for the SDO_BUFFER() call. You pass it as 0.5 (=50cm) to produce a buffer with a radius of 0.02 (2cm). The effect is that the buffer produced effectively dissolves into the point itself.
For example at tolerance 0.5:
select sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.5) from dual;
Produces:
SDO_GEOMETRY(2001, NULL, SDO_POINT_TYPE(521554.782, 4230983.08, NULL), NULL, NULL)
At tolerance 0.005:
select sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.005) from dual;
You get the proper buffer:
SDO_GEOMETRY(2003, NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 2), SDO_ORDINATE_ARRAY(521554.782, 4230983.06, 521554.802, 4230983.08, 521554.782, 4230983.1, 521554.762, 4230983.08, 521554.782, 4230983.06))
And the very close point now matches with that buffer:
select sdo_geom.relate(
sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.005),
'determine',
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.07336716,null),null,null),
0.005
) relation
from dual;
RELATION
-------------------------
CONTAINS
1 row selected.
======================================
Now the fact that your data does not have a proper explicit SRID means that the use of explicit units in measurements or distance-based searches will not work. Because the database does not know what coordinate system your data is in, it does not know how to determine that two points are less than a set number of cm or m apart. All you can do is assume the coordinates are in meters.
So in the examples I give above, replace 'DISTANCE=2 UNIT=CM' with 'DISTANCE=0.02'
Related
I need to calculate the stddev function in plsql, but when I compared the values with java program, it is different.
I need to get the stddev function for this set of values(100,104,105,103,110,115,130,95,91,105,106,101,65,91,95), when I used plsql the value is : 14.032 and using java it is 13.557. could you please help in getting the correct value using oracle plsql
Here is an explicit SQL query which takes your numbers and steps them through the Standard Deviation calculation.
with t23 as (
select column_value as val
from table (sys.odcinumberlist(100,104,105,103,110,115,130,95,91,105,106,101,65,91,95))
)
, mn as (
select avg(val) as avg_val
, count(*) as cnt
from t23
) , inp as (
select t23.val
, t23.val - mn.avg_val as diff
, power(t23.val - mn.avg_val , 2) diff_sq
, mn.cnt
from t23 cross join mn
)
select sum(diff_sq) as sum_diff_sq
, sqrt( sum(diff_sq) / (cnt-1) ) as sd
from inp
group by cnt
;
The value of SD is 14.0329544118054 which suggests the Oracle stddev() function is correct and whatever you're running in Java is incorrect.
Great code there #APC - I like the look through nature of the code :)
So I came across this same issue today and the answer is Oracle has two built-in functions for Standard Deviation according to the two distinct calculations:
a) stddev - for standard deviation sample; and
b) stddev_pop standard deviation population
select stddev(column_value), stddev_pop(column_value)
from table (sys.odcinumberlist(100,104,105,103,110,115,130,95,91,105,106,101,65,91,95))
Example
STDDEV(COLUMN_VALUE) STDDEV_POP(COLUMN_VALUE)
-------------------- ------------------------
14.0329544 13.5571219
So use the one that suits your needs...
Some info on standard deviations and which one to use here.
In oracle 11g, I want to execute a query like that :
In this case, I didn't allowed use Function or Procedure.
I tried to Google it, but i couldn't find a good solution. Almost show me the way to use Function or Store Procedure.
Table X with columns (A,B,C)
With a row in table X i want to select :
Count = B - A;
for(i=0;i<Count;i++)
{
C++;
D = C * A;
}
Expect result : table Y with columns (A,B,C,D)
You are thinking like a 3GL developer. Java (or whatever) only has arrays, so everything is an iteration. But SQL is a set-oriented language: we don't need loops to work on sets of data. Oracle SQL has built-in aggregation functions which allow us to compute values from sets of records.
For instance, this query calculates total remuneration (salary plus commission), number of employees and average salary:
select sum(sal + nvl(comm,0)) as total_renum
, count(*) as total_emps
, avg(sal) as average_salary
from emp
/
Oracle has a comprehensive range of such functions, some of them are really powerful. Find out more. Be sure to check out analytic functions too.
Hmmm, so you subsequently posted a cryptic snippet of code. It's still not clear exactly what you want, but this might produce the outcome for your tab;e Y:
select a
, b
, c
, 0 + ((c+level) * a) as d
from x
connect by level <= (b-a)
/
For each row in table X it will generate (b-a) rows, with a derived value of d. I have assumed a start of 0 for d.
Person table have 2 Million rows and 3 Million rows in Finance and on HR schema respectively. Now I'm going to update the Person status on Finance schema with the person_status on the HR schema. Query is running on M3000 Server with 32 GB RAM with Solaris 10 and Oracle 11.2.2. It took 117 hours and still runnning. How can I optimize this query. I have created index on person_no.
DECLARE
CURSOR c IS
SELECT p1.person_no AS "PersonNo1"
, p2.person_no AS "PersonNo2"
, p2.person_status AS "P2_PERSON_STATUS"
FROM person p1
, hr.person p2
WHERE (lower(p1.person_no) = lower(p2.person_no)
OR substr(p1.person_no, instr(p1.person_no, '-') + 1, length(p1.person_no))
= substr(p2.person_no, instr(p2.person_no, '-') + 1, length(p2.person_no)));
BEGIN
FOR i IN c LOOP
UPDATE person
SET person_status_id = decode(lower(i.P2_PERSON_STATUS), 'y', 1, 'n', 2)
WHERE (lower(person_no) = lower(i.PersonNo2)
OR substr(person_no, instr(person_no, '-') + 1, length(person_no))
= substr(i.PersonNo2, instr(i.PersonNo2, '-') + 1, length(i.PersonNo2)));
COMMIT;
END LOOP;
END;
/
There is several things you can do to optimize this. It would be helpful to see a query plan.
You can use a merge statement. Then you have only one statement and you can work optimizing this statement. Something like this
merge into person
using hr.person person_hr
on (person.person_no=person_hr.person_no)
when matched then
update set person_status_id=decode(lower(person_status),'y',1,'n',2);
You have to adjust the on part to match your where statement.
This might then be the source of best optimization. Maybe you have to create an index for lower and also for substr.
Something like
CREATE INDEX person_idx
ON person (lower(person_no))
Also of course for the substrings etc. Hope this helps.
In the code provided as question i can see there are two aspects.
1) Query Optimization
2) ROW BY ROW Update --> Never Recommended for such a huge records count..
Approach to Optmization.
1) Use Merge Statement as it will Bundle up the block into pure SQL thus will enhance your query.
2) Use Function based Index as i can see there are lot of functions like "LOWER" is used in the query "WHERE" conditions.
NOTE : [Over INDEXING may also cause deterioration in the performance}
3) Last but not the least if you have to go by Anonymous block only then avoid using ROW BY ROW Update. Try using Bulk collect Options as illustrated below.
Since i do not have workstation handy with me please bear with any Syntax Errors.
Let me know if this helps.
DECLARE
TYPE p1
IS
TABLE OF <table_name>.<COLUMN_NAME>%TYPE;
PersonNo1 p1;
TYPE p2
IS
TABLE OF <table_name>.<COLUMN_NAME>%TYPE;
PersonNo2 p2;
TYPE status
IS
TABLE OF <table_name>.<COLUMN_NAME>%TYPE;
P2_PERSON_STATUS status;
BEGIN
SELECT p1.person_no AS "PersonNo1" ,
p2.person_no AS "PersonNo2" ,
p2.person_status AS "P2_PERSON_STATUS" BULK COLLECT
INTO PersonNo1,
PersonNo2,
P2_PERSON_STATUS
FROM person p1 ,
hr.person p2
WHERE (lower(p1.person_no) = lower(p2.person_no)
OR SUBSTR(p1.person_no, instr(p1.person_no, '-') + 1, LENGTH(p1.person_no)) = SUBSTR(p2.person_no, instr(p2.person_no, '-') + 1, LENGTH(p2.person_no)));
FORALL I IN PersonNo1.FIRST..PersonNo1.LAST
UPDATE person
SET person_status_id = DECODE(lower(P2_PERSON_STATUS(I)), 'y', 1, 'n', 2)
WHERE (lower(person_no) = lower(PersonNo2(I))
OR SUBSTR(person_no, instr(person_no, '-') + 1, LENGTH(person_no)) = SUBSTR(PersonNo2(I), instr(PersonNo2(I), '-') + 1, LENGTH(PersonNo2(I))));
COMMIT;
END;
this is probably simple to those who know (I hope!)
I have an Oracle spatial database with a geometry column containing a list of node points, in northing/easting format (if it's relevent!)
I need to select those objects which fall within a given radius of a given point.
Northings and Eastings are 1 meter apart which makes it a bit easier.
Ideally this should include objects which cross the area even if their node points fall outside it.
Is this an easy-ish query? Maybe using SDO_WITHIN_DISTANCE?
The table looks like this:
MyTable
ID NUMBER
NAME VARCHAR2(20)
DESC VARCHAR2(50)
GEOM SDO_GEOMETRY
Thanks for any help!
You can do this one of two ways. First, as you mentioned, SDO_WITHIN_DISTANCE is a valid approach.
select
*
from center_point a
inner join target_points b
on a.id = 1
and sdo_within_distance( b.shape, a.shape, 'distance = 10' ) = 'TRUE'
;
In this case, the distance is in linear units defined by a's spatial reference. Oracle treats the coordinates as Cartesian so you will need to make sure you have a linear coordinate system before using this operator (as opposed to angular lat/lon units). Since you are working with northings/eastings, I think you'll be okay as long as the points you are comparing against are in the same spatial reference.
This approach uses an inner-loop to solve the query so not very efficient if you have a lot of points to compare against. Also, Oracle Spatial is VERY picky about the order of operands in the SDO functions so you might need to play around with parameter order to find the sweetspot. If your query runs for a long period, try switching the first and second parameter of your sdo operator. You can also play with the order of the 'from' and 'inner join' tables using the /*+ ORDERED */ hind after SELECT.
Another approach is to buffer the geometry and compare against the buffer.
select
*
from center_point a
inner join target_points b
on a.id = 1
and sdo_relate( b.shape, sdo_buffer(a.shape, 0.05 ), 'mask=anyinteract' ) = 'TRUE'
;
Keep in mind that whatever is in the second parameter of the SDO_RELATE (called the window) will not have a spatial index if you transform it like we are here with the buffer.
If you plan on doing this with several points, it is recommended to build a table where all of the source points are buffered. Then create a spatial index against the buffered areas and compare that to your target points.
For example:
create table point_bufs unrecoverable as
select sdo_buffer (a.shape, b.diminfo, 1.35)
from centerpoint a, user_sdo_geom_metadata b
where table_name='CENTERPOINT'
and column_name='SHAPE';
select
a.gif,
b.gid
from target_points a,
point_bufs b
where sdo_relate(a.shape, b.shape, 'mask=anyinteract querytype=join') = 'TRUE'
;
NOTE: When intersecting points with polygons, you always want to polygon to be in the window position of the sdo_relate (which is the second parameter). This will ensure your spatial index is used correctly.
The proper way is to use SDO_WITHIN_DISTANCE, and that is the case irrespective of the coordinate systems used, i.e. whether they are projected or geodetic:
select b.*
from my_table a, my_table b
where a.id = 1
and sdo_within_distance( b.shape, a.shape, 'distance=10 unit=meter' ) = 'TRUE';
The order of the arguments to the spatial predicates is important: the first one is the points you are searching, the second is the "query window", i.e. the point you are searching for. Notice that you should always specify the unit of your distance - here 10 meters. If you don't then it will default to the unit of the coordinate system of the table you search. For geodetic data, that will always be meters. For projected data it will be the unit of your coordinate system - generally meters too, but not always. Explicitly specifying a unit lifts all ambiguities.
You could also use the buffer approach, but that makes no difference here, and is actually slower. It does not matter that the second argument to a spatial predicate is indexed or not: that index is not used. Only the index on the first argument is required and used.
To perform the operation on a collection of geometries - i.e. for a set of points, find the points within a set distance of each of them, then consider using the SDO_JOIN() function instead, like this to find all couple of points that are within 10 meters of each other:
SELECT a.id, b.id
FROM my_table a,
my_table b,
TABLE(SDO_JOIN(
'MY_TABLE', 'SHAPE',
'MY_TABLE', 'SHAPE',
'DISTANCE=10 UNIT=METER')
) j
WHERE j.rowid1 = a.rowid
AND j.rowid2 = a.rowid
AND a.rowid < a.rowid;
We just converted our sql server stored procedures to oracle procedures. Sql Server SP's were highly dependent on session tables (INSERT INTO #table1...) these tables got converted as global temporary tables in oracle. We ended up with aroun 500 GTT's for our 400 SP's
Now we are finding out that working with GTT's in oracle is considered a last option because of performance and other issues.
what other alternatives are there? Collections? Cursors?
Our typical use of GTT's is like so:
Insert into GTT
INSERT INTO some_gtt_1
(column_a,
column_b,
column_c)
(SELECT someA,
someB,
someC
FROM TABLE_A
WHERE condition_1 = 'YN756'
AND type_cd = 'P'
AND TO_NUMBER(TO_CHAR(m_date, 'MM')) = '12'
AND (lname LIKE (v_LnameUpper || '%') OR
lname LIKE (v_searchLnameLower || '%'))
AND (e_flag = 'Y' OR
it_flag = 'Y' OR
fit_flag = 'Y'));
Update the GTT
UPDATE some_gtt_1 a
SET column_a = (SELECT b.data_a FROM some_table_b b
WHERE a.column_b = b.data_b AND a.column_c = 'C')
WHERE column_a IS NULL OR column_a = ' ';
and later on get the data out of the GTT. These are just sample queries, in actuality the queries are really complext with lot of joins and subqueries.
I have a three part question:
Can someone show how to transform
the above sample queries to
collections and/or cursors?
Since
with GTT's you can work natively
with SQL...why go away from the
GTTs? are they really that bad.
What should be the guidelines on
When to use and When to avoid GTT's
Let's answer the second question first:
"why go away from the GTTs? are they
really that bad."
A couple of days ago I was knocking up a proof of concept which loaded a largish XML file (~18MB) into an XMLType. Because I didn't want to store the XMLType permanently I tried loading it into a PL/SQL variable (session memory) and a temporary table. Loading it into a temporary table took five times as long as loading it into an XMLType variable (5 seconds compared to 1 second). The difference is because temporary tables are not memory structures: they are written to disk (specifically your nominated temporary tablespace).
If you want to cache a lot of data then storing it in memory will stress the PGA, which is not good if you have lots of sessions. So it's a trade-off between RAM and time.
To the first question:
"Can someone show how to transform the
above sample queries to collections
and/or cursors?"
The queries you post can be merged into a single statement:
SELECT case when a.column_a IS NULL OR a.column_a = ' '
then b.data_a
else column_a end AS someA,
a.someB,
a.someC
FROM TABLE_A a
left outer join TABLE_B b
on ( a.column_b = b.data_b AND a.column_c = 'C' )
WHERE condition_1 = 'YN756'
AND type_cd = 'P'
AND TO_NUMBER(TO_CHAR(m_date, 'MM')) = '12'
AND (lname LIKE (v_LnameUpper || '%') OR
lname LIKE (v_searchLnameLower || '%'))
AND (e_flag = 'Y' OR
it_flag = 'Y' OR
fit_flag = 'Y'));
(I have simply transposed your logic but that case() statement could be replaced with a neater nvl2(trim(a.column_a), a.column_a, b.data_a) ).
I know you say your queries are more complicated but your first port of call should be to consider rewriting them. I know how seductive it is to break a gnarly query into lots of baby SQLs stitched together with PL/SQL but pure SQL is way more efficient.
To use a collection it is best to define the types in SQL, because it gives us the flexibility to use them in SQL statements as well as PL/SQL.
create or replace type tab_a_row as object
(col_a number
, col_b varchar2(23)
, col_c date);
/
create or replace type tab_a_nt as table of tab_a_row;
/
Here's a sample function, which returns a result set:
create or replace function get_table_a
(p_arg in number)
return sys_refcursor
is
tab_a_recs tab_a_nt;
rv sys_refcursor;
begin
select tab_a_row(col_a, col_b, col_c)
bulk collect into tab_a_recs
from table_a
where col_a = p_arg;
for i in tab_a_recs.first()..tab_a_recs.last()
loop
if tab_a_recs(i).col_b is null
then
tab_a_recs(i).col_b := 'something';
end if;
end loop;
open rv for select * from table(tab_a_recs);
return rv;
end;
/
And here it is in action:
SQL> select * from table_a
2 /
COL_A COL_B COL_C
---------- ----------------------- ---------
1 whatever 13-JUN-10
1 12-JUN-10
SQL> var rc refcursor
SQL> exec :rc := get_table_a(1)
PL/SQL procedure successfully completed.
SQL> print rc
COL_A COL_B COL_C
---------- ----------------------- ---------
1 whatever 13-JUN-10
1 something 12-JUN-10
SQL>
In the function it is necessary to instantiate the type with the columns, in order to avoid the ORA-00947 exception. This is not necessary when populating a PL/SQL table type:
SQL> create or replace procedure pop_table_a
2 (p_arg in number)
3 is
4 type table_a_nt is table of table_a%rowtype;
5 tab_a_recs table_a_nt;
6 begin
7 select *
8 bulk collect into tab_a_recs
9 from table_a
10 where col_a = p_arg;
11 end;
12 /
Procedure created.
SQL>
Finally, guidelines
"What should be the guidelines on When
to use and When to avoid GTT's"
Global temp tables are very good when we need share cached data between different program units in the same session. For instance if we have a generic report structure generated by a single function feeding off a GTT which is populated by one of several procedures. (Although even that could also be implemented with dynamic ref cursors ...)
Global temporary tables are also good if we have a lot of intermediate processing which is just too complicated to be solved with a single SQL query. Especially if that processing must be applied to subsets of the retrieved rows.
But in general the presumption should be that we don't need to use a temporary table. So
Do it in SQL unless it is too hard it which case ...
... Do it in PL/SQL variables (usually collections) unless it takes too much memory it which case ...
... Do it with a Global Temporary Table
Generally I'd use a PL/SQL collection for storing small volumes of data (maybe a thousand rows). If the data volumes were much larger, I'd use a GTT so that they don't overload the process memory.
So I might select a few hundred rows from the database into a PL/SQL collection, then loop through them to do some calculation/delete a few or whatever, then insert that collection into another table.
If I was dealing with hundreds of thousands of rows, I would try to push as much of the 'heavy lifting' processing into large SQL statements. That may or may not require GTT.
You can use SQL level collection objects as something that translates quite easily between SQL and PL/SQL
create type typ_car is object (make varchar2(10), model varchar2(20), year number(4));
/
create type typ_coll_car is table of typ_car;
/
select * from table (typ_coll_car(typ_car('a','b',1999), typ_car('A','Z',2000)));
MAKE MODEL YEAR
---------- -------------------- ---------------
a b 1,999.00
A Z 2,000.00
declare
v_car1 typ_car := typ_car('a','b',1999);
v_car2 typ_car := typ_car('A','Z',2000);
t_car typ_coll_car := typ_coll_car();
begin
t_car := typ_coll_car(v_car1, v_car2);
FOR i in (SELECT * from table(t_car)) LOOP
dbms_output.put_line(i.year);
END LOOP;
end;
/