Nearest neighbor and distance between points and lines

Nearest neighbor and distance between points and lines - oracle

In oracle spatial I have two tables (AVALREGULACAO and ATROCOADUTOR) representing points and lines, respectively.
The structure of both tables is as follows:
AVALREGULACAO (295 point records)
IPID [number(10)]
GEOMETRY [MDSYS.SDO_GEOMETRY]
ATROCOADUTOR (12536 line records)
IPID [number(10)]
GEOMETRY [MDSYS.SDO_GEOMETRY]
I need to find the nearest ATROCOADUTOR neighbor from each AVALREGULACAO and calculate the distance between them
AVALREGULACAO_IPID | ATROCOADUTOR _IPID | DISTANCE
I’ve used 2 options
1
SELECT /*+ ORDERED */ A.IPID, B.IPID, MIN(SDO_GEOM.SDO_DISTANCE(sdo_cs.make_2d(A.GEOMETRY), sdo_cs.make_2d(B.GEOMETRY), 0.005)) as DISTANCE
FROM AVALREGULACAO A, ATROCOADUTOR B
GROUP BY c_b.IPID,c_d.IPID;
It takes quite a long time to compute - It generates a huge output of 295 x 12536 = 3 698 120 possible combinations (Cartesian product). Furthermore the csv file output cannot accommodate all this records (1 048 576 rows limit)
I only need 295 records corresponding to the 295 AVALREGULACAO.
2
I’ve also tried/adapted another query with the nearest neighbor (nn) operator
PROMPT IPID, nearest_IPID, distance
select /*+ ORDERED USE_NL(s,s2)*/
s.IPID,
s2.IPID as nearest_IPID,
TO_CHAR(REPLACE(mdsys.sdo_geom.sdo_distance(sdo_cs.make_2d(s.GEOMETRY),sdo_cs.make_2d(s2.GEOMETRY),0.05), ',','.')) as distance
from AVALREGULACAO s,
ATROCOADUTOR s2
where s2.IPID in (select IPID
from AVALREGULACAO s3
where sdo_nn(s3.GEOMETRY,s.GEOMETRY,'sdo_batch_size=10',1) = 'TRUE'
and s3.IPID <> s.IPID
and rownum < 2)
order by 1,2;
This query takes forever - I need to shut down the process before it ends.
I guess I'm missing the point on how to optimize/filter the desired results.
Any tips on how to efficiently solve this would be much appreciated.
Thanks in advance,
Pedro
PS:
#Boneist. Thanks a lot for the input.
Unfortunately I got an error after applying your query (still trying to work the semantics/syntax of new commands KEEP, dense_rank)
SELECT a.ipid a_ipid,
MIN(b.ipid) KEEP (dense_rank FIRST order by sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1)) b_ipid,
MIN(sdo_geom.sdo_distance(sdo_cs.make_2d(a.geometry), sdo_cs.make_2d(b.geometry), 0.005)) AS distance
FROM avalregulacao a
INNER JOIN atrocoadutor b ON sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1) = 'TRUE'
GROUP BY a.ipid;
Error
Error starting at line : 1 in command -
SELECT a.ipid a_ipid,
MIN(b.ipid) KEEP (dense_rank FIRST order by sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1)) b_ipid,
MIN(sdo_geom.sdo_distance(sdo_cs.make_2d(a.geometry), sdo_cs.make_2d(b.geometry), 0.005)) AS distance
FROM avalregulacao a
INNER JOIN atrocoadutor b ON sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1) = 'TRUE'
GROUP BY a.ipid
Error at Command Line : 2 Column : 45
Error report -
SQL Error: ORA-29907: foram encontradas etiquetas em duplicado em invocações primárias
29907. 00000 - "found duplicate labels in primary invocations"
*Cause: There are multiple primary invocations of operators with
the same number as the label.
*Action: Use distinct labels in primary invocations.

I think you're probably after something like:
SELECT a.ipid a_ipid,
MIN(b.ipid) KEEP (dense_rank FIRST order by sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1)) b_ipid,
MIN(sdo_geom.sdo_distance(sdo_cs.make_2d(a.geometry), sdo_cs.make_2d(b.geometry), 0.005)) AS distance
FROM avalregulacao a
INNER JOIN atrocoadutor b ON sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1) = 'TRUE'
GROUP BY a.ipid;
This joins both tables on the nearest neighbour function, which should reduce the number of rows being returned.
The MIN(b.ipid) KEEP (dense_rank first order by sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1)) simply returns the lowest b.ipid value for the lowest difference.
(I think this query will work as is, but I can't test it. You might have to do the join and have sdo_nn(a.GEOMETRY,b.GEOMETRY,'sdo_batch_size=10',1) as a column in a subquery and then do the group by in the outer query.)

Related

Substraction between two columns of different SELECTs

I have two differente SELECTs with different conditions but the same columns.
Imagin that I have this:
So what I want to do is to substract the amount of the first table to the amount of the second table (where both IDs match and the date differs by one month) to know the delta between them.
This is a very simple explanation, but the data model and the extraction is pretty difficult, so this is the only way I can afford this problem.

You can use the left join as follows:
select s1.id, s1.date d1, s2.date d2, s1.value - s2.value as delta
from subquery1 s1
left join subquery2 s2
on s1.id = s2.id
and add_months(s1.date,-1) = s2.date

Assuming date difference can be either ways
select t1.id,t2.id, T1.DT, t2.dt, t1.amt,t2.amt, t1.amt-t2.amt diff
from t1, t2
where t1.id=t2.id
and abs(MONTHS_BETWEEN(t1.dt, t2.dt)) = 1

Oracle Parameter Issue (Using OR Opeator)

If I execute both query individually it is not taking 4sec to get data, but when combine both I see query is deadslow. Any help much appreciated
Query1:
Select Med_Number,Med_Code,Member_Name,DOB FROM Med
WHERE Med.Med_Code=:Med_Code
Query2:
Select Red_Number,Red_Name,Red_Code FROM Red
WHERE Red.Red_Code =:Red_Code
Final One:Im passing one value at a time
Select Med_Number,Member_Name,Red_Number,Red_Name FROM Med M
LEFT JOIN Red R ON M.Med_Number=R.Red_Number
Where (Med.Med_Code=:Med_Code) OR (Red.Red_Code=:Red_Code)

If you look at the execution plan for all 3 statements, you'll figure it out. If you're not interested in figuring it out and you must execute only one query, then you can execute this, using src to determine which rows below to which row source assuming you have to know the difference and assuming the numbers and names are suitably equivalent data types:
Select 1 src, Med_Number,Member_Name,DOB
FROM Med
WHERE Med.Med_Number=:Med_Number
UNION ALL
Select 2 src, Red_Number,Red_Name, null
FROM Red
WHERE Red.Red_Number=:Red_Number
Of course, if the data types are equivalent and DOB is not allowed to be null, then this would suffice
Select Med_Number,Member_Name,DOB
FROM Med
WHERE Med.Med_Number=:Med_Number
UNION ALL
Select Red_Number,Red_Name, null
FROM Red
WHERE Red.Red_Number=:Red_Number

As you join your two tables on m.Med_number = R.Red_number, you don't need 2 parameters.
Select Med_Number,Member_Name,Red_Number,Red_Name FROM Med M
LEFT JOIN Red R ON M.Med_Number=R.Red_Number
Where M.Med_Number=:Number;

Add indicator to top and bottom 10%

I'm trying to capture the average of FIRST_CONTACT_CAL_DAYS but what I would like to do is create an indicator for the top and bottom 10% of values so I can exclude those (outliers) from my average calculation.
Not sure how to go about do this, any thoughts?
SELECT DISTINCT
TO_CHAR(A.FIRST_ASSGN_DT,'DAY') AS DAY_NUMBER,
A.FIRST_ASSGN_DT,
A.FIRST_CONTACT_DT,
TO_CHAR(A.FIRST_CONTACT_DT,'DAY') AS DAY_NUMBER2,
A.FIRST_CONTACT_DT AS FIRST_PHONE_CONTACT,
A.ID,
ABS(TO_DATE(A.FIRST_CONTACT_DT, 'DD/MM/YYYY') - TO_DATE(A.FIRST_ASSGN_DT, 'DD/MM/YYYY')) AS FIRST_CONTACT_CAL_DAYS,
FROM HIST A
LEFT JOIN CONTACTS D ON A.ID = D.ID
WHERE 1=1

You may be looking for something like this. Please adapt to your situation.
I assume you may have more than one "group" or "partition" and you need to compute the average for each group separately, after throwing out the outliers in each partition. (An alternative, which can be easily accommodated by adapting the query below, is to throw out the outliers at the global level, and only then to group and take the average for each group.)
If you don't have any groups, and everything is one big pile of data, it's even easier - you don't need GROUP BY and PARTITION BY.
Then: the function NTILE assigns a bucket number, in this example between 1 and 10, to each row, based on where they fall (first decile, i.e. first 10%, next decile, ... all the way to the last decile). I do this in a subquery. Then in the outer query just filter out the first and last bucket before you group by and you compute the average.
For testing purposes I create three groups with 10,000 random numbers each in a WITH clause - no need to spend any time on that portion of the code, since it is not part of the solution (the SQL code to solve your problem) - it's just a dirty trick to create test data on the fly.
with
inputs ( grp, val ) as (
select ceil(level/10000), dbms_random.value(0, 150)
from dual
connect by level <= 30000
)
select grp, avg(val) as avg_val
from (
select grp, val, ntile(10) over (partition by grp order by val) as bkt
from inputs
)
where bkt between 2 and 9
group by grp
;
GRP AVG_VAL
--- -----------------------
1 75.021614866547043734458
2 74.286117923344418598032
3 75.437412573353736953791

Trying to figure out top 5 land areas of the 50 states in the U.S

I have a table created. With one column named states and another column called land area. I am using oracle 11g. I have looked at various questions on here and cannot find a solution. Here is what I have tried so far:
SELECT LandAreas, State
FROM ( SELECT LandAreas, State, DENSE_RANK() OVER (ORDER BY State DESC) sal_dense_rank
FROM Map )
WHERE sal_dense_rank >= 5;
This does not provide the top 5 land areas as far as number wise.
I have also tried this one but no go either:
SELECT * FROM Map order by State desc)
where rownum < 5;
Anyone have any suggestions to get me on the right track??
Here is a samle of the table
states land areas
michagan 15000
florida 25000
tennessee 10000
alabama 80000
new york 150000
california 20000
oregon 5000
texas 6000
utah 3000
nebraska 1000
Desired output from query:
States land area
new york 150000
alabama 80000
florida 25000
california 20000

Try:
Select * from
(SELECT State, LandAreas FROM Map ORDER BY LandAreas DESC)
where rownum < 6
Link to Fiddle

Use a HAVING clause and count the number state states larger:
SELECT m.state, m.landArea
FROM Map m
LEFT JOIN Map m2 on m2.landArea > m.landArea
GROUP BY m.state, m.landArea
HAVING count(*) < 5
ORDER BY m.landArea DESC
See SQLFiddle
This joins each state to every state whose area is greater, then uses a HAVING clause to return only those states where the number of larger states was less than 5.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
The left join is needed for the case of the largest state, which has no other larger state to join to.
The ORDER BY is optional.

Try something like this
select m.states,m.landarea
from map m
where (select count(‘x’) from map m2 where m2.landarea > m.landarea)<=5
order by m.landarea

There are two bloomers in your posted code.
You need to use landarea in the DENSE_RANK() call. At the moment you're ordering the states in reverse alphabetical order.
Your filter in the outer query is the wrong way around: you're excluding the top four results.
Here is what you need ...
SELECT LandArea, State
FROM ( SELECT LandArea
, State
, DENSE_RANK() OVER (ORDER BY landarea DESC) as area_dr
FROM Maps )
WHERE area_dr <= 5
order by area_dr;
... and here is the SQL Fiddle to prove it. (I'm going with the statement in the question that you want the top 5 biggest states and ignoring the fact that your desired result set has only four rows. But adjust the outer filter as you will).
There are three different functions for deriving top-N result sets: DENSE_RANK, RANK and ROW_NUMBER.
Using ROW_NUMBER will always guarantee you 5 rows in the result set, but you may get the wrong result if there are several states with the same land area (unlikely in this case, but other data sets will produce such clashes). So: 1,2,3,4,5
The difference between RANK and DENSE_RANK is how they handle ties. DENSE_RANK always produces a series of consecutive numbers, regardless of how many rows there are in each rank. So: 1,2,2,3,3,3,4,5
RANK on the other hand will produce a sparse series if a given rank has more than one hit. So: 1,2,2,4,4,4.
Note that each of the example result sets has a different number of rows. Which one is correct? It depends on the precise question you want to ask.
Using a sorted sub-query with the ROWNUM pseudo-column will work like the ROW_NUMBER function, but I prefer using ROW_NUMBER because it is more powerful and more error-proof.

Oracle Spatial - select objects falling within area

this is probably simple to those who know (I hope!)
I have an Oracle spatial database with a geometry column containing a list of node points, in northing/easting format (if it's relevent!)
I need to select those objects which fall within a given radius of a given point.
Northings and Eastings are 1 meter apart which makes it a bit easier.
Ideally this should include objects which cross the area even if their node points fall outside it.
Is this an easy-ish query? Maybe using SDO_WITHIN_DISTANCE?
The table looks like this:
MyTable
ID NUMBER
NAME VARCHAR2(20)
DESC VARCHAR2(50)
GEOM SDO_GEOMETRY
Thanks for any help!

You can do this one of two ways. First, as you mentioned, SDO_WITHIN_DISTANCE is a valid approach.
select
*
from center_point a
inner join target_points b
on a.id = 1
and sdo_within_distance( b.shape, a.shape, 'distance = 10' ) = 'TRUE'
;
In this case, the distance is in linear units defined by a's spatial reference. Oracle treats the coordinates as Cartesian so you will need to make sure you have a linear coordinate system before using this operator (as opposed to angular lat/lon units). Since you are working with northings/eastings, I think you'll be okay as long as the points you are comparing against are in the same spatial reference.
This approach uses an inner-loop to solve the query so not very efficient if you have a lot of points to compare against. Also, Oracle Spatial is VERY picky about the order of operands in the SDO functions so you might need to play around with parameter order to find the sweetspot. If your query runs for a long period, try switching the first and second parameter of your sdo operator. You can also play with the order of the 'from' and 'inner join' tables using the /*+ ORDERED */ hind after SELECT.
Another approach is to buffer the geometry and compare against the buffer.
select
*
from center_point a
inner join target_points b
on a.id = 1
and sdo_relate( b.shape, sdo_buffer(a.shape, 0.05 ), 'mask=anyinteract' ) = 'TRUE'
;
Keep in mind that whatever is in the second parameter of the SDO_RELATE (called the window) will not have a spatial index if you transform it like we are here with the buffer.
If you plan on doing this with several points, it is recommended to build a table where all of the source points are buffered. Then create a spatial index against the buffered areas and compare that to your target points.
For example:
create table point_bufs unrecoverable as
select sdo_buffer (a.shape, b.diminfo, 1.35)
from centerpoint a, user_sdo_geom_metadata b
where table_name='CENTERPOINT'
and column_name='SHAPE';
select
a.gif,
b.gid
from target_points a,
point_bufs b
where sdo_relate(a.shape, b.shape, 'mask=anyinteract querytype=join') = 'TRUE'
;
NOTE: When intersecting points with polygons, you always want to polygon to be in the window position of the sdo_relate (which is the second parameter). This will ensure your spatial index is used correctly.

The proper way is to use SDO_WITHIN_DISTANCE, and that is the case irrespective of the coordinate systems used, i.e. whether they are projected or geodetic:
select b.*
from my_table a, my_table b
where a.id = 1
and sdo_within_distance( b.shape, a.shape, 'distance=10 unit=meter' ) = 'TRUE';
The order of the arguments to the spatial predicates is important: the first one is the points you are searching, the second is the "query window", i.e. the point you are searching for. Notice that you should always specify the unit of your distance - here 10 meters. If you don't then it will default to the unit of the coordinate system of the table you search. For geodetic data, that will always be meters. For projected data it will be the unit of your coordinate system - generally meters too, but not always. Explicitly specifying a unit lifts all ambiguities.
You could also use the buffer approach, but that makes no difference here, and is actually slower. It does not matter that the second argument to a spatial predicate is indexed or not: that index is not used. Only the index on the first argument is required and used.
To perform the operation on a collection of geometries - i.e. for a set of points, find the points within a set distance of each of them, then consider using the SDO_JOIN() function instead, like this to find all couple of points that are within 10 meters of each other:
SELECT a.id, b.id
FROM my_table a,
my_table b,
TABLE(SDO_JOIN(
'MY_TABLE', 'SHAPE',
'MY_TABLE', 'SHAPE',
'DISTANCE=10 UNIT=METER')
) j
WHERE j.rowid1 = a.rowid
AND j.rowid2 = a.rowid
AND a.rowid < a.rowid;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio