Cassandra - filter rows based on a range - filter

Using cassandra and spark and datastax's spark-cassandra-connector.
In the spark-cassandra-connector it support gives such filter example:
sc.cassandraTable("test", "cars").select("id", "model").where("color = ?", "black").toArray.foreach(println)
Basically it filters the color column with black. However, can I filter the row based on a range? Like I want to filter the range column which is a long type and the range falls between 100000 and 200000 ? Does cql support such a range filter?

CQL supports range queries only on clustering columns. Range queries can be expressed as in SQL by using two bounding conditions on the same field, for instance, in spark-cassandra-connector you will write:
.where("my_long >= ? and my_long < ?", 1L, 100L)
This will work as long as the "my_long" column is the first clustering column. Clustering columns are the columns that follows the declaration of the partition columns in the primary key.
For instance, you can run range queries on my_long column if the primary key is declared as follows:
PRIMARY KEY (pk1, my_long)
PRIMARY KEY (pk1, my_long, pk3)
PRIMARY KEY ((pk1, pk2), my_long)
PRIMARY KEY ((pk1, pk2), my_long, pk4)
...
As you see, in all the preceding cases, my_long follows the declaration of partition key in the primary key.
If the column belongs to the clustering columns but it's not the first one, you have to provide an equality condition for all preceding columns.
For example:
PRIMARY KEY (pk1, pk2, my_long) --> .where("pk2=? and my_long>? and my_long
PRIMARY KEY (pk1, pk2, pk3, my_long) --> .where("pk2=? and pk3=? and my_long>? and my_long
PRIMARY KEY ((pk1, pk2), pk3, my_long) --> .where("pk3=? and my_long>? and my_long
PRIMARY KEY ((pk1, pk2), pk3, my_long, pk5) --> .where("pk3=? and my_long>? and my_long
Note: spark-cassandra-connector adds by default the clause "ALLOW FILTERING" in all the queries. If you try to run the examples above in cqlsh, you have to specify that clause manually.

Related

Do tables in Vertica has primary and secondary keys

Do projections in vertica have primary keys, secondary keys? How can I find out what is the key of a projection?
You had best go into Vertica's docu on projections:
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/CREATEPROJECTION.htm
Primary and foreign keys exist, as do unique constraints - as constraints; but these constraints are usually disabled - because they slow down a load / insert process considerably.
Even if you choose to not specify segmentation and ordering clause of a projection: each projection is either unsegmented or segmented by a value that depends from the contents of one or more non-nullable columns (usually a HASH() on one or more columns), and ORDERed by one or more columns. The ORDER BY clause in a projection definition constitutes the data access path used in that projection. It can be somehow compared to indexing in classical databases.
To find out what the access path of a projection is - the quickest way is to fire a SELECT EXPORT_OBJECTS('','<tablename>', FALSE) at it. In our previously used example, you see that it's ordered by all its four columns, and segmented by the HASH() of all its four columns, as we created the table with no primary or foreign key:
$ vsql -Atc "SELECT EXPORT_OBJECTS('','example',FALSE)"
CREATE TABLE dbadmin.example
(
fname varchar(4),
lname varchar(5),
hdate date,
salary numeric(7,2)
);
CREATE PROJECTION dbadmin.example_super /*+basename(example),createtype(L)*/
(
fname,
lname,
hdate,
salary
)
AS
SELECT example.fname,
example.lname,
example.hdate,
example.salary
FROM dbadmin.example
ORDER BY example.fname,
example.lname,
example.hdate,
example.salary
SEGMENTED BY hash(example.fname, example.lname, example.hdate, example.salary) ALL NODES OFFSET 0;

Why does SQLite not use an index for queries on my many-to-many relation table?

It's been a while since I've written code, and I never used SQLite before, but many-to-many relationships used to be so fundamental, there must be a way to make them fast...
This is a abstracted version of my database:
CREATE TABLE a (_id INTEGER PRIMARY KEY, a1 TEXT NOT NULL);
CREATE TABLE b (_id INTEGER PRIMARY KEY, fk INTEGER NOT NULL REFERENCES a(_id));
CREATE TABLE d (_id INTEGER PRIMARY KEY, d1 TEXT NOT NULL);
CREATE TABLE c (_id INTEGER PRIMARY KEY, fk INTEGER NOT NULL REFERENCES d(_id));
CREATE TABLE b2c (fk_b NOT NULL REFERENCES b(_id), fk_c NOT NULL REFERENCES c(_id), CONSTRAINT PK_b2c_desc PRIMARY KEY (fk_b, fk_c DESC), CONSTRAINT PK_b2c_asc UNIQUE (fk_b, fk_c ASC));
CREATE INDEX a_a1 on a(a1);
CREATE INDEX a_id_and_a1 on a(_id, a1);
CREATE INDEX b_fk on b(fk);
CREATE INDEX b_id_and_fk on b(_id, fk);
CREATE INDEX c_id_and_fk on c(_id, fk);
CREATE INDEX c_fk on c(fk);
CREATE INDEX d_id_and_d1 on d(_id, d1);
CREATE INDEX d_d1 on d(d1);
I have put in any index i could think of, just to make sure (and more than is reasonable, but not a problem, since the data is read only). And yet on this query
SELECT count(*)
FROM a, b, b2c, c, d
WHERE a.a1 = "A"
AND a._id = b.fk
AND b._id = b2c.fk_b
AND c._id = b2c.fk_c
AND d._id = c.fk
AND d.d1 ="D";
the relation table b2c does not use any indexes:
0|0|2|SCAN TABLE b2c
0|1|1|SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)
0|2|0|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)
0|3|3|SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?)
0|4|4|SEARCH TABLE d USING INTEGER PRIMARY KEY (rowid=?)
The query is about two orders of magnitude to slow to be usable. Is there any way to make SQLite use an index on b2c?
Thanks!
In a nested loop join, the outermost table does not use an index for the join (because the database just goes through all rows anyway).
To be able to use an index for a join, the index and the other column must have the same affinity, which usually means that both columns must have the same type.
Change the types of the b2c columns to INTEGER.
If the lookups on a1 or d1 are very selective, using a or d as the outermost table might make sense, and would then allow to use an index for the filter.
Try running ANALYZE.
If that does not help, you can force the join order with CROSS JOIN or INDEXED BY.

Secondary indexes on composite keys in cassandra

I have this table in cassandra
CREATE TABLE global_product_highlights (
deal_id text,
product_id text,
highlight_strength double,
category_id text,
creation_date timestamp,
rank int,
PRIMARY KEY (deal_id, product_id, highlight_strength)
)
When i fire below query in Golang
err = session.Query("select product_id from global_product_highlights where category_id=? order by highlight_strength DESC",default_category).Scan(&prodId_array)
I get ERROR : ORDER BY with 2ndary indexes is not supported.
I have an index on category_id.
I don't completely understand how is secondary index applied on composite keys in cassandra.
Appreciate if anyone would explain and rectify this one.
The ORDER BY clause in Cassandra only works on your first clustering column (2nd column in the primary key), which in this case is your product_id. This DataStax doc states that:
Querying compound primary keys and sorting results ORDER BY clauses
can select a single column only. That column has to be the second
column in a compound PRIMARY KEY.
So, if you want to have your table sorted by highlight_strength, then you'll need to make that field the first clustering column.

How to add composite primary keys?

I have a table with three columns, [Id,QTY,Date]. out of these three, two columns [id and date], should be set as primary keys, because I need to fetch the record one by one, from this table, into a reference.
the data to be inserted into this table is
101,10,NULL
101,20,201220
101,7,201440
102,5,null
102,8,201352
date is in yyyyww format
How do I define two columns as composite primary keys when they have null values, duplicates?
alter table abc add constraint pk primary key (ID, DATE);
if I try to alter the table the error appears
Error report:
SQL Error: ORA-01449: column contains NULL values; cannot alter to NOT NULL
01449. 00000 - "column contains NULL values; cannot alter to NOT NULL"
*Cause:
*Action:
Using table level constraint, you can use this query
alter table your_table add constraint pkc_Name primary key (column1, column2)
but first you need to declare the columns NOT NULL. All parts of a primary key need to be NOT NULL.
The column name of your table is ID and it is still null and non-unique, how is it possible. If it is primary key of other table try adding a surrogate key column for this table and make it primary key.
In case of composite primary key, it should have atleast one not null value(For each row) in the combination of columns. And the combination of column must be unique at all case.
For further details check, http://docs.oracle.com/cd/B10500_01/server.920/a96524/c22integ.htm
Correction - If composite primary key is made up of 3 columns, then no column (among 3) can hold NULL value. And the combination of those 3 columns must be unique.
E.g. (1,2,2)
(1,2,1)
(2,2,1)
(1,2,2) - not valid

Does making a primary key in multiple columns generate indexes for all of them?

If I set a primary key in multiple columns in Oracle, do I also need to create the indexes if I need them?
I believe that when you set a primary key on one column, you have it indexed by it; is it the same with multiple column PKs?
Thanks
No, indexes will not be created for the individual fields.
If you have a composit key FieldA, FieldB, FieldC and you
select * from MyTable where FieldA = :a
or
select * from MyTable where FieldA = :a and FieldB = :b
Then it will use this index (because it they are the first two fields in the key)
If you have
select * from MyTable where FieldB = :b and FieldC = :c
Where you are using parts of the index, but not the full index, the index will be used less efficiently through an index skip scan, full index scan, or fast full index scan.
(Thanks to David Aldridge for the correction)
If you create a primary key on columns (A, B, C) then Oracle will by default create a unique index on (A, B. C). You can tell Oracle to use a different (not necessarily unique) existing index like this:
alter table mytable add constraint mytable_pk
primary key (a, b, c)
using index mytable_index;
You will get one index across multiple columns, which is not the same as having an index on each column.
Primary key implies creating a composite unique index on primary key columns.
You can use a special access path called INDEX SKIP SCAN to use this index with predicates that do not include the first indexed column:
SQL> CREATE TABLE t_multiple (mul_first INTEGER NOT NULL, mul_second INTEGER NOT NULL, mul_data VARCHAR2(200))
2 /
Table created
SQL> ALTER TABLE t_multiple ADD CONSTRAINT pk_mul_first_second PRIMARY KEY (mul_first, mul_second)
2 /
Table altered
SELECT /*+ INDEX_SS (m pk_mul_first_second) */
*
FROM t_multiple m
WHERE mul_second = :test
SELECT STATEMENT, GOAL = ALL_ROWS
TABLE ACCESS BY INDEX ROWID SCOTT T_MULTIPLE
INDEX SKIP SCAN SCOTT PK_MUL_FIRST_SECOND
A primary key is only one (unique) index, possibly containing multiple columns
For B select index will be used if column a have low cardinality only (e.g. a have only 2 values).
In general you could have guessed this answer if you imagined that columns not indexed separately, but indexed concatenation of columns (it's not completely true, but it works for first approximation).
So it's not a, b index it's more like a||b index.
You may need to set individual indexes on the columns depending on your primary key structure.
Composite primary keys and indexes will create indexes in the following manner. Say i have columns A, B, C and i a create the primary key on (A, B, C). This will result in the indexes
(A, B, C)
(A, B)
(A)
Oracle actually creates an index on any of the left most column groupings. So... If you want an index on just the column B you will have to create one for it as well as the primary key.
P.S. I know MySQL exibits this left most behaviour and i think SQL Server is also left most
In Oracle, that's not an accurate statement. It creates only 1 index on (A,B,C). Does not create (A,B) and (A) indexes.

Resources