For sharding on tarantool I'am using rocks https://github.com/tarantool/shard and it works perfect for searching by primary index.
I have space events with primary key and secondary index.
box.schema.create_space('events')
box.space.events:create_index(
"id", {type = 'primary', parts = {1, 'unsigned'}}
)
box.space.events:create_index(
"secondary", {type = "tree", unique=true, parts = {2, 'str'}}
)
shard.events:insert{1, "pv.1", 3, 12345671, "uuid1"}
shard.events:insert{2, "pv.2", 3, 12345672, "uuid2"}
-- query by primary index works! and return tuple
shard.events:select{2}
-- query by secondary index NOT work!
shard.events.index.event_hash:select('pv.2', {iterator = box.index.EQ})
My question is:
What I have to use or what i have to do for query shards by secondary index?
Related
In PostgreSQL we can create a JSONB column that can be indexed and accessed something like this:
CREATE TABLE foo (
id BIGSERIAL PRIMARY KEY
-- createdAt, updatedAt, deletedAt, createdBy, updatedBy, restoredBy, deletedBy
data JSONB
);
CREATE INDEX ON foo((data->>'email'));
INSERT INTO foo(data) VALUES('{"name":"yay","email":"a#1.com"}');
SELECT data->>'name' FROM foo WHERE id = 1;
SELECT data->>'name' FROM foo WHERE data->>'email' = 'a#1.com';
Which is very beneficial in the prototyping phase (no need for migration at all or locking when adding column).
Can we do similar thing in Tarantool?
Sure, tarantool supports JSON path indices. The example:
-- Initialize / load the database.
tarantool> box.cfg{}
-- Create a space with two columns: id and obj.
-- The obj column supposed to contain dictionaries with nested data.
tarantool> box.schema.create_space('s',
> {format = {[1] = {'id', 'unsigned'}, [2] = {'obj', 'any'}}})
-- Create primary and secondary indices.
-- The secondary indices looks at the nested field obj.timestamp.
tarantool> box.space.s:create_index('pk',
> {parts = {[1] = {field = 1, type = 'unsigned'}}})
tarantool> box.space.s:create_index('sk',
> {parts = {[1] = {field = 2, path = 'timestamp', type = 'number'}}})
-- Insert three tuples: first, third and second.
tarantool> clock = require('clock')
tarantool> box.space.s:insert({1, {text = 'first', timestamp = clock.time()}})
tarantool> box.space.s:insert({3, {text = 'third', timestamp = clock.time()}})
tarantool> box.space.s:insert({2, {text = 'second', timestamp = clock.time()}})
-- Select tuples with timestamp of the last hour, 1000 at max.
-- Sort them by timestamp.
tarantool> box.space.s.index.sk:select(
> clock.time() - 3600, {iterator = box.index.GT, limit = 1000})
---
- - [1, {'timestamp': 1620820764.1213, 'text': 'first'}]
- [3, {'timestamp': 1620820780.4971, 'text': 'third'}]
- [2, {'timestamp': 1620820789.5737, 'text': 'second'}]
...
JSON path indices are available since tarantool 2.1.2.
I have field in tarantool space I no longer need.
local space = box.schema.space.create('my_space', {if_not_exists = true})
space:format({
{'field_1', 'unsigned'},
{'field_2', 'unsigned'},
{'field_3', 'string'},
})
How to remove field_2 if it's indexed and if it's not indexed?
There is no any convenient way to do it.
The first way, just declare this field as nullable and insert NULL value to this field. Yes, it will be stored physically but you could hide them from users.
It's simple and not expensive.
The second way, write in-place migration. It's not possible if you have indexed fields after field you want to drop (in your example it's field_3).
And it's dangerous if you have a huge amount of data in this space.
local space = box.schema.space.create('my_space', {if_not_exists = true})
space:create_index('id', {parts = {{field = 1, type = 'unsigned'}}})
space:format({
{'field_1', 'unsigned'},
{'field_2', 'unsigned'},
{'field_3', 'string'},
})
-- Create key_def instance to simplify primary key extraction
local key_def = require('key_def').new(space.index[0].parts)
-- drop previous format
space:format({})
-- Migrate your data
for _, tuple in space:pairs() do
space:depete(key_def:extract_key(tuple))
space:replace({tuple[1], tuple[3]})
end
-- Setup new format
space:format({
{'field_1', 'unsigned'},
{'field_3', 'string'},
})
The third way is to create new space, migrate data into it and drop previous.
Still it's quite dangerous.
local space = box.schema.space.create('new_my_space', {if_not_exists = true})
space:create_index('id', {parts = {{field = 1, type = 'unsigned'}}})
space:format({
{'field_1', 'unsigned'},
{'field_3', 'string'},
})
-- Migrate your data
for _, tuple in box.space['my_space']:pairs() do
space:replace({tuple[1], tuple[3]})
end
-- Drop the old space
box.space['my_space']:drop()
-- Rename new space
local space_id = box.space._space.index.name:get({'my_new_space'}).id
-- In newer version of Tarantool (2.6+) space.alter method available
-- But in older versions you could update name via system "_space" space
box.space._space:update({space_id}, {{'=', 'name', 'my_space'}})
I am using Document DB with partition key = "deviceId".
Is there any different between 2 code below:
var fo = new FeedOption{ PartitionKey= new PartitionKey("A1234") };
var partitionKeyInQuery= dbClient.CreateDocumentQuery(d => d.deviceId = "A1234" and d.type==1, fo);
var noPartitionKeyInQuery = dbClient.CreateDocumentQuery(d => d.type==1, fo);
When PartitionKey is applied in FeedOption, should I add "deviceId" in WHERE clause?
I believe there is no difference in performance. RequestCharge is the same and the where clause makes the query partition specific i.e eliminates cross partition query.
From the documentation:
Querying partitioned containers
When you query data in partitioned containers, Cosmos DB automatically routes the query to the partitions corresponding to the partition key values specified in the filter (if there are any). For example, this query is routed to just the partition containing the partition key "XMS-0001".
// Query using partition key
IQueryable<DeviceReading> query = client.CreateDocumentQuery<DeviceReading>(
UriFactory.CreateDocumentCollectionUri("db", "coll"))
.Where(m => m.MetricType == "Temperature" && m.DeviceId == "XMS-0001");
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-partition-data
I have persisted a sorted list of ids into the database. Now i want to get the records back based on the persisted id's.
However, the records are coming back in the order of the primary keys not the order of the int[] which I have persisted. Not quite sure how this can be acheived.
I currently have the following:
int[] ids = {8, 1, 5};
var items = from i in ContentPage.All()
where ids.Contains(i.ContentPageId)
select i;
Currently the records are coming out in the order of 1, 5, 8 where I actually want 8,1,5
Database does not obliged to return items in certain order. You can then process them this way:
int[] ids = {8, 1, 5};
var items = (from i in ContentPage.All()
where ids.Contains(i.ContentPageId)
select i).ToList();
var answer = (from id in ids
join item in items
on id equals item.ContentPageId
select item).ToList();
I have in db elements with following structure:
{
"id": 324214,
"modDate": "2014-10-01",
"otherInfo": {
..
..
}
}
Let's suppose that I have list of pairs [id, modDate]:
Map<String, String> idAndModDate
which contains f.e (324214, "2014-10-01"), (3254757, "2015-10-04")..
Now, I would like to use Java Api Elasticsearch QueryBuilder to build Query which in result give me list of all "ids" which are present in system but for who modDate is different as given.
Suppose that I have in database elements with following id/date pairs:
id, date
1, 2015-01-01
2, 2014-03-02
3, 2000-01-22
4, 2020-09-01
Now, I want to create query for
Map with following data:
Map<String, String> idDataPairs =[
(1, 2015-01-01)
(2, 2014-03-03)
(3, 2000-01-22)
(7, 2020-09-01)]
now I want create function like
List<String> ids = search(Map<String, String>) {
QueryBuilder.(sth).(sth) <--- thats what I'm asking about
}
which will return ids: 1, 3 because those ids exist in DB and dates from query are equal to dates in db respectively.
This is what you are looking for, more or less.
//build the test data in the map
Map<String, String> idDataPairs = new HashMap<String, String>();
idDataPairs.put("1", "2015-01-01");
idDataPairs.put("2", "2014-03-03");
idDataPairs.put("3", "2000-01-22");
idDataPairs.put("4", "2020-09-01");
//construct the query
BoolQueryBuilder should = QueryBuilders.boolQuery();
for(String id : idDataPairs.keySet()){
BoolQueryBuilder bool = QueryBuilders.boolQuery();
bool.must(QueryBuilders.termQuery("id", id));
bool.must(QueryBuilders.termQuery("modDate", idDataPairs.get(id)));
should.should(bool);
}
should.minimumNumberShouldMatch(1);
What i am doing is this:
For each of the Pairs, i am constructing a BoleanQuery called bool. This boolean query has two must conditions, that both the id and the date MUST match the document.
After constructing one bool Boolean Query, I add it to a parent BooleanQuery as well. This time, i say that the inner bool query should match, but its not required to. The final line says that at least one of these queries should match, if we want the document to match.
This structure is easier to understand, because must functions like AND and should functions like OR, but another way to do this is to use a TermsQuery, where we construct several TermsQuerys, and then add them to another parent BooleanQuery using should.
So, for the data
id, date
1, 2015-01-01
2, 2014-03-02
3, 2000-01-22
4, 2020-09-01
the above code will return the documents with ids 1,2,3