Cassandra range query on clustering key - time

Let's say I have this table: Table1(PartKey, Cluster_Time DESC) with following tuples:
(p1 , 6.0)
(p1 , 5.3)
(p1 , 4.1)
(p1 , 3.3)
(p1 , 2.3)
(p1 , 1.2)
(p1 , 0.1)
Now suppose I make this query:
SELECT *
FROM Table1
WHERE PartKey = "p1" AND Cluster_Time >= 2.0 AND Cluter_time <=4.0;
What I would like to understand is the following.
Does Cassandra:
1) start to scan partition p1 from the beginning and stops after reaching tuple (p1,2)
OR
2) it has a mechanism to start the scan directly at circa time 4.1?
If such a mechanism is not available, would an index be appropriate for this range query?
Thanks for any hint!

Assuming your key is PRIMARY KEY (PartKey, Cluster_Time). Providing your actual schema would help.
Partition p1 on disk will have all the rows for it in clustering order. The read will use the index component of the sstable which has a marker every 64kb (default) of clustering keys to get as close as it can to Cluster_Time = 2.0, skip rows until it reaches it then just continue reading and returning rows until Cluster_Time <= 4.0

Related

Oracle Spatial - SDO_BUFFER does not work?

I have a table which has SDO_Geometries and I query all the geometries to find their start and end point, then I insert these points to another table called ORAHAN. Now my main purpose is for each point in orahan I must find if it is intersects with another point in orahan when giving 2 cm buffer to points.
So I write some pl sql using Relate and Bufer functions but when I check some records in Map Info, I saw there is points within 1 cm area from itself but no record in intersections table called ORAHANCROSSES.
Am I use these functions wrongly or what?
Note: I am using Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production and
PL/SQL Release 11.2.0.1.0 - and SDO_PACKAGE
ORAHAN has approximately 400 thousands records.(points and other columns.)
declare
BEGIN
for curs in (select * from ORAHAN t) loop
for curs2 in (select *
from ORAHAN t2
where SDO_RELATE(t2.geoloc,SDO_GEOM.SDO_BUFFER(curs.geoloc,0.02,0.5) ,
'mask=ANYINTERACT') = 'TRUE'
and t2.mi_prinx <> curs.mi_prinx) loop
Insert INTO ORAHANCROSSES
values
(curs.Mip, curs.Startmi, curs2.Mip, curs2.Startmi);
commit;
end loop;
end loop;
END;
And this is MapInfo map image that shows 3 points which are close to each other aproximately 1 centimeter. But in the orahancrosses there is no record matching these 3.
Note: 0,00001000km equals 1cm
Orahan Metadata:
select * from user_sdo_geom_metadata where table_name = 'ORAHAN';
And diminfo:
What is the coordinate system of your data ? And, most important, what tolerance have you set in your metadata ?
Some other comments:
1) Don't use a relate with buffer approach. Just use a within-distance approach.
2) You don't need a PL/SQL loop for that sort of query just use a simple CTAS:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from orahan c1, orahan c2
where sdo_within_distance (c2.geoloc, c1.geoloc, 'distance=2 unit=cm') = 'TRUE'
and c2.mi_prinx <> c1.mi_prinx;
3) As written, couples of points A and B that are within 2 cm will be returned twice: once as (A,B) and once again as (B,A). To avoid that (and only return one of the cases), then write the query like this:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from orahan c1, orahan c2
where sdo_within_distance (c2.geoloc, c1.geoloc, 'distance=2 unit=cm') = 'TRUE'
and c1.rowid < c2.rowid;
3) Processing the number of points you mention (400000+) should run better using the SDO_JOIN technique, like this:
create table orahancrosses as
select c1.mip mip_1, c1.startmi startmi_1, c2.mip mip_2, c2.startmi startmi_2
from table (
sdo_join (
'ORAHAN','GEOLOC',
'ORAHAN','GEOLOC',
'DISTANCE=2 UNIT=CM'
)
) j,
orahan c1,
orahan c2
where j.rowid1 < j.rowid2
and c1.rowid = j.rowid1
and c2.rowid = j.rowid2;
This will probably still take time to process - depending on the capacity of your database server. If you are licences for Oracle Enterprise Edition and your hardware has the proper capacity (# of cores) then parallelism can reduce the elapsed time.
4) You say you are using Oracle 11g. What exact version ? Version 11.2.0.4 is the terminal release for 11gR2. Anything older is no longer supported. By now you should really be on 12cR1 (12.1.0.2). The major benefit of 12.1.0.2 in your case s the Vector Performance Accelerator feature that speeds up a number of spatial functions and operators (only if you own the proper Oracle Spatial licenses - it is not available with the free Oracle Locator feature).
======================================
Using the two points in your example. Let's compute the distance:
select sdo_geom.sdo_distance(
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.07336716,null),null,null),
0.005
) distance
from dual;
DISTANCE
----------
.01000197
1 row selected.
Notice I don't specify any SRID. Assuming the coordinates are expressed in meters, the distance between them is indeed a little more than 1 cm.
======================================
The reason why your original syntax does not work is, as you noticed, because of the tolerance you specify for the SDO_BUFFER() call. You pass it as 0.5 (=50cm) to produce a buffer with a radius of 0.02 (2cm). The effect is that the buffer produced effectively dissolves into the point itself.
For example at tolerance 0.5:
select sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.5) from dual;
Produces:
SDO_GEOMETRY(2001, NULL, SDO_POINT_TYPE(521554.782, 4230983.08, NULL), NULL, NULL)
At tolerance 0.005:
select sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.005) from dual;
You get the proper buffer:
SDO_GEOMETRY(2003, NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 2), SDO_ORDINATE_ARRAY(521554.782, 4230983.06, 521554.802, 4230983.08, 521554.782, 4230983.1, 521554.762, 4230983.08, 521554.782, 4230983.06))
And the very close point now matches with that buffer:
select sdo_geom.relate(
sdo_geom.sdo_buffer(sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.08336913,null),null,null),0.02,0.005),
'determine',
sdo_geometry (2001,null,sdo_point_type(521554.782174622,4230983.07336716,null),null,null),
0.005
) relation
from dual;
RELATION
-------------------------
CONTAINS
1 row selected.
======================================
Now the fact that your data does not have a proper explicit SRID means that the use of explicit units in measurements or distance-based searches will not work. Because the database does not know what coordinate system your data is in, it does not know how to determine that two points are less than a set number of cm or m apart. All you can do is assume the coordinates are in meters.
So in the examples I give above, replace 'DISTANCE=2 UNIT=CM' with 'DISTANCE=0.02'

Select from a loop in Oracle

In oracle 11g, I want to execute a query like that :
In this case, I didn't allowed use Function or Procedure.
I tried to Google it, but i couldn't find a good solution. Almost show me the way to use Function or Store Procedure.
Table X with columns (A,B,C)
With a row in table X i want to select :
Count = B - A;
for(i=0;i<Count;i++)
{
C++;
D = C * A;
}
Expect result : table Y with columns (A,B,C,D)
You are thinking like a 3GL developer. Java (or whatever) only has arrays, so everything is an iteration. But SQL is a set-oriented language: we don't need loops to work on sets of data. Oracle SQL has built-in aggregation functions which allow us to compute values from sets of records.
For instance, this query calculates total remuneration (salary plus commission), number of employees and average salary:
select sum(sal + nvl(comm,0)) as total_renum
, count(*) as total_emps
, avg(sal) as average_salary
from emp
/
Oracle has a comprehensive range of such functions, some of them are really powerful. Find out more. Be sure to check out analytic functions too.
Hmmm, so you subsequently posted a cryptic snippet of code. It's still not clear exactly what you want, but this might produce the outcome for your tab;e Y:
select a
, b
, c
, 0 + ((c+level) * a) as d
from x
connect by level <= (b-a)
/
For each row in table X it will generate (b-a) rows, with a derived value of d. I have assumed a start of 0 for d.

Find next id from varchar in Oracle

I have a row that is a varchar(50) that has a unique constraint and i would like to get the next unique number for an new insert but with a given prefix.
My rows could look like this:
ID (varchar)
00010001
00010002
00010003
00080001
So if I would like to get the next unqiue number from the prefix "0001" it would be "00010004" but if I would want it for the prefix "0008" it would be "00080002".
There will be more then 1 millon entries in this table. Is there a way with Oracle 11 to perform this kind of operation that is fairly fast?
I know that this setup is totaly insane but this is what I have to work with. I cant create any new tables etc.
You can search for the max value of the specified prefix and increment it:
SQL> WITH DATA AS (
2 SELECT '00010001' id FROM DUAL UNION ALL
3 SELECT '00010002' id FROM DUAL UNION ALL
4 SELECT '00010003' id FROM DUAL UNION ALL
5 SELECT '00080001' id FROM DUAL
6 )
7 SELECT :prefix || to_char(MAX(to_number(substr(id, 5)))+1, 'fm0000') nextval
8 FROM DATA
9 WHERE ID LIKE :prefix || '%';
NEXTVAL
---------
00010004
I'm sure you're aware that this is an inefficient method to generate a primary key. Furthermore it won't play nicely in a multi-user environment and thus won't scale. Concurrent inserts will wait then fail since there is a UNIQUE constraint on the column.
If the prefix is always the same length, you can reduce the workload somewhat: you could create a specialized index that would find the max value in a minimum number of steps:
CREATE INDEX ix_fetch_max ON your_table (substr(id, 1, 4),
substr(id, 5) DESC);
Then the following query could use the index and will stop at the first row retrieved:
SELECT id
FROM (SELECT substr(id, 1, 4) || substr(id, 5) id
FROM your_table
WHERE substr(id, 1, 4) = :prefix
ORDER BY substr(id, 5) DESC)
WHERE rownum = 1
If you need to do simultaneous inserts with the same prefix, I suggest you use DBMS_LOCK to request a lock on the specified newID. If the call fails because someone is already inserting this value, try with newID+1. Although this involves more work than traditional sequence, at least your inserts won't wait on each others (potentially leading to deadlocks).
This is a very unsatisfactory situation for you. As other posters have pointed out - if you don't use sequences then you will almost certainly have concurrency issues. I mentioned in a comment the possibility that you live with big gaps. This is the simplest solution but you will run out of numbers after 9999 inserts.
Perhaps an alternative would be to create a separate sequence for each prefix. This would only really be practical if the number of prefixes is fairly low but it could be done.
ps - your requirement that > 1000000 records should be possible may, in fact, mean you have no choice but to redesign the database.
SELECT to_char(to_number(max(id)) + 1, '00000000')
FROM mytable
WHERE id LIKE '0001%'
SQLFiddle demo here http://sqlfiddle.com/#!4/4f543/5/0

Oracle Spatial : Find the neighboring buildings

I'm creating an application which finds the neighboring buildings/ hydrants of a building on fire. I've created the tables:
CREATE TABLE building (
buildingno VARCHAR(40) PRIMARY KEY
, buildingname VARCHAR2(32),noofvertices INT
, shape MDSYS.SDO_GEOMETRY)
and
CREATE TABLE hydrant (hydrantno VARCHAR(40) PRIMARY KEY
, point MDSYS.SDO_GEOMETRY)
and
CREATE TABLE firebuilding(hydrantno VARCHAR(40) PRIMARY KEY)
I want to find the nearest neighbors of a particular building (both hydrants and buildings). Can I do this without creating a spatial index on the column name?
I am learning spatial querying and the dataset I'm working on is small (about 20 entries in each table and won't grow).
Do you have a good reason not to create a spatial index?
If you do, and if the number of shapes is small, you might get acceptable results and performance with a "brute force" approach that uses SDO_GEOM.SDO_DISTANCE to calculate the distance between the given point and each of the other points points and then picks the smallest distance. For example, if firebuilding identifies the given building, the following query identifies the closest building(s) using a tolerance of 1 metre (if the coordinates are geodetic) or 1 coordinate unit (if the coordinates are non-geodetic):
SELECT
B.*
FROM
(
SELECT
A.*,
DENSE_RANK () OVER (ORDER BY A.DISTANCE) AS RANKING
FROM
(
SELECT
OTHER_BUILDINGS.*,
SDO_GEOM.SDO_DISTANCE(BUILDING.SHAPE, OTHER_BUILDINGS.SHAPE, 1) DISTANCE
FROM
FIREBUILDING,
BUILDING,
BUILDING OTHER_BUILDINGS
WHERE
BUILDING.BUILDINGNO = FIREBUILDING.BUILDINGNO
AND
OTHER_BUILDINGS.BUILDINGNO <> BUILDING.BUILDINGNO
) A
) B
WHERE
B.RANKING = 1;

SQL/Oracle: when indexes on multiple columns can be used

If I create an index on columns (A, B, C), in that order, my understanding is that the database will be able to use it even if I search only on (A), or (A and B), or (A and B and C), but not if I search only on (B), or (C), or (B and C). Is this correct?
There are actually three index-based access methods that Oracle can use when a predicate is placed on a non-leading column of an index.
i) Index skip-scan: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#PFGRF10105
ii) Fast full index scan: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52044
iii) Index full scan: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i82107
I've most often seen the fast full index scan "in the wild", but all are possible.
That is not correct. Always best to come up with a test case that represents your data and see for yourself. If you want to really understand the Oracle SQL Optimizer google Jonathan Lewis, read his books, read his blog, check out his website, the guy is amazing, and he always generates test cases.
create table mytab nologging as (
select mod(rownum, 3) x, rownum y, mod(rownum, 3) z from all_objects, (select 'x' from user_tables where rownum < 4)
);
create index i on mytab (x, y, z);
exec dbms_stats.gather_table_stats(ownname=>'DBADMIN',tabname=>'MYTAB', cascade=>true);
set autot trace exp
select * from mytab where y=5000;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=10)
1 0 INDEX (SKIP SCAN) OF 'I' (INDEX) (Cost=1 Card=1 Bytes=10)
Up to version Oracle 8 an index will never be used unless the first column is included in the SQL.
In Oracle 9i the Skip Scan Index Access feature was introduced, which lets the Oracle CBO attempt to use indexes even when the prefix column is not available.
Good overview of how skip scan works here: http://www.quest-pipelines.com/newsletter-v5/1004_C.htm

Resources