Cosmos DB Collection not using _id index when querying by _id? - performance

I have a CosmosDb - MongoDb collection that I'm using purely as a key/value store for arbitrary data where the _id is the key for my collection.
When I run the query below:
globaldb:PRIMARY> db.FieldData.find({_id : new BinData(3, "xIAPpVWVkEaspHxRbLjaRA==")}).explain(true)
I get this result:
{
"_t" : "ExplainResponse",
"ok" : 1,
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "data.FieldData",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 106,
"totalKeysExamined" : 0,
"totalDocsExamined" : 3571,
"executionStages" : {
},
"allPlansExecution" : [ ]
},
"serverInfo" : #REMOVED#
}
Notice that the totalKeysExamined is 0 and the totalDocsExamined is 3571 and the query took over 106ms. If i run without .explain() it does find the document.
I would have expected this query to be lightning quick given that the _id field is automatically indexed as a unique primary key on the collection. As this collection grows in size, I only expect this problem to get worse.
I'm definitely not understanding something about the index and how it works here. Any help would be most appreciated.
Thanks!

Related

Differentiating _delete_by_query tasks in a multi-tenant index

Scenario:
I have an index with a bunch of multi-tenant data in Elasticsearch 6.x. This data is frequently deleted (via _delete_by_query) and populated by the tenants.
When issuing a _delete_by_query request with wait_for_completion=false, supplying a query JSON to delete a tenants' data, I am able to see generic task information via the _tasks API. Problem is, with a large number of tenants, it is not actively clear who is deleting data at any given time.
My question is this:
Is there a way I can view the query for which the _delete_by_query task is operating on? Or can I attach an additional param to the URL that is cached in the task to differentiate them?
Side note: looking at the docs: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/tasks.html I see there is a description field in the _tasks API response that has the query as a String, however, I do not see that level of detail in my description field:
"description" : "delete-by-query [myindex]"
Thanks in advance
One way to identify queries is to add the X-Opaque-Id HTTP header to your queries:
For instance, when deleting all tenant data for (e.g.) User 3, you can issue the following command:
curl -XPOST -H 'X-Opaque-Id: 3' -H 'Content-type: application/json' http://localhost:9200/my-index/_delete_by_query?wait_for_completion=false -d '{"query":{"term":{"user": 3}}}'
You then get a task ID, and when checking the related task document, you'll be able to identify which task is/was deleting which tenant data thanks to the headers section which contains your HTTP header:
"_source" : {
"completed" : true,
"task" : {
"node" : "DB0GKYZrTt6wuo7d8B8p_w",
"id" : 20314843,
"type" : "transport",
"action" : "indices:data/write/delete/byquery",
"status" : {
"total" : 3,
"updated" : 0,
"created" : 0,
"deleted" : 3,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0
},
"description" : "delete-by-query [deletes]",
"start_time_in_millis" : 1570075424296,
"running_time_in_nanos" : 4020566,
"cancellable" : true,
"headers" : {
"X-Opaque-Id" : "3" <--- user 3
}
},

Firebase Realtime Database: Do I need extra container for my queries?

I am developing an app, where I retrieve data from a firebase realtime database.
One the one hand, I got my objects. There will be around 10000 entries when it is finished. A user can select for each property like "Blütenfarbe" (flower color) 1 (or more) characteristics, where he will then get the result plants, on which these constraints are true. Each property has 2-10 characteristics.
Is querying here powerful enough, to get fast results ? If not, my thought would be to also setup container for each characteristic and put every ID in that, when it is a characteristic of that plant.
This is my first project, so any tip for better structure is welcome. I don't want to create this database and realize afterwards, that it is not well enough structured.
Thanks for your help :)
{
"Pflanzen" : {
"Objekt" : {
"00001" : {
"Belaubung" : "Sommergrün",
"Blütenfarbe" : "Gelb",
"Blütezeit" : "Februar",
"Breite" : 20,
"Duftend" : "Ja",
"Frosthärte" : "Ja",
"Fruchtschmuck" : "Nein",
"Herbstfärbung" : "Gelb",
"Höhe" : 20,
"Pflanzengruppe" : "Laubgehölze",
"Standort" : "Sonnig",
"Umfang" : 10
},
"00002" : {
"Belaubung" : "Sommergrün",
"Blütenfarbe" : "Gelb",
"Blütezeit" : "März",
"Breite" : 25,
"Duftend" : "Nein",
"Frosthärte" : "Ja",
"Fruchtschmuck" : "Nein",
"Herbstfärbung" : "Ja",
"Höhe" : 10,
"Pflanzengruppe" : "Nadelgehölze",
"Standort" : "Schatten",
"Umfang" : 10
},
"Eigenschaften" : {
"Belaubung" : {
"Sommergrün" : [ "00001", "00002" ],
"Wintergrün" : ["..."]
},
"Blütenfarbe" : {
"Braun": ["00002"],
"Blau" : [ "00001" ]
},
}
}
}
}

How to store nested document as String in elastic search

Context:
1) We are building a CDC pipeline (using kafka & connect framework)
2) We are using debezium for capturing mysql Tx logs
3) We are using Elastic Search connector to add documents to ES index
Sample change event generated by Debezium:
{
"source" : {
"before" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 0
},
"after" : {
"Id" : 97,
"name" : "Northland",
"code" : "NTL",
"country_id" : 6,
"is_business_mapped" : 1
},
"source" : {
"version" : "0.7.5",
"name" : "__",
"server_id" : 252639387,
"ts_sec" : 1547805940,
"gtid" : null,
"file" : "mysql-bin-changelog.000570",
"pos" : 236,
"row" : 0,
"snapshot" : false,
"thread" : 614,
"db" : "bazaarify",
"table" : "state"
},
"op" : "u",
"ts_ms" : 1547805939683
}
What we want :
We want to visualize only 3 columns in kibana :
1) before - containing the nested JSON as string
2) after - containing the nested JSON as string
3) source - containing the nested JSON as string
I can think below possibilities here :
a) Either converting nested JSON as string
b) Combining column data in elastic search
I am a newbie to elastic search . Can someone please guide me how to do that.
I tried defining custom mapping as well but it is giving me exception.
You can always view your document as a Raw JSON in Kibana.
You don't need to manipulate it before indexing in elastic.
As this is related to visualization, handle this in Kibana only.
Check this link for a screenshot.
Refer this to add the columns which you want to see onto the results
I don't fully understand your use case, but if you would like to turn some json's to their representing strings, then you can use logstash for that, or even Elasticsearch ingest capabilities to convert an object (json) to a string.
From the link above, an example:
PUT _ingest/pipeline/my-pipeline-id { "description": "converts the
content of the id field to an integer", "processors" : [
{
"convert" : {
"field" : "source",
"type": "string"
}
} ] }

OrOperator in MongoTemplate not returning values?

I am working on Spring MVC with MongoDB.
I have added criteria for some fields with orOperator, Each Fields has value in database, but List returning is Empty.
criteriaQuery.addCriteria(Criteria.where("First_Name").is(docName.trim())
.orOperator(Criteria.where("Middle_Name").is(docName.trim())
.orOperator(Criteria.where("Last_Name").is(docName.trim()))));
Query printing in console is
Query: { "First_Name" : "ESTHER" , "$or" : [ { "Middle_Name" : "ESTHER" , "$or" : [ { "Last_Name" : "ESTHER"}]}]}, Fields: null, Sort: null
I tried the Like query, but that also results the same
criteriaQuery.addCriteria(Criteria.where("First_Name").regex(docName)
.orOperator(Criteria.where("Middle_Name").regex(docName))
.orOperator(Criteria.where("Last_Name").regex(docName)));
Mongo Query
db.doctor_details.find({ "$or" : {"First_Name" : { "$regex" : "ESTHER"} }, "$or" : [ { "Middle_Name" : { "$regex" : "ESTHER"} , "$or" : [ { "Last_Name" : { "$regex" : "ESTHER"}}]}]})
Help me to get the result, I dont know where my code goes wrong. Help is appreciated!
Thanks
As far as I remember in order to use orOperator you should do:
Query query = new Query(new Criteria().orOperator(criteriaV1,criteriaV2));
it is based on this answer

elastic search filter by documents count in nested document

I have this schema in elastic search.
79[
'ID' : '1233',
Geomtries:[{
'doc1' : 'F1',
'doc2' : 'F2'
},
(optional for some of the documents)
{
'doc2' : 'F1',
'doc3' : 'F2'
}]
]
the Geometries is a nested element.
I want to get all of the documents that have one object inside Geometries.
Tried so far :
"script" : {"script" : "if (Geomtries.size < 2) return true"}
But i get exceptions : no such property GEOMTRIES
If you have the field as type nested in the mapping, the typical doc[fieldkey].values.size() approached does not seem to work. I found the following script to work:
{
"from" : 0,
"size" : <SIZE>,
"query" : {
"filtered" : {
"filter" : {
"script" : {
"script" : "_source.containsKey('Geomtries') && _source['Geomtries'].size() == 1"
}
}
}
}
}
NB: You must use _source instead of doc.
The problem is in the way you access fields in your script, use:
doc['Geometry'].size()
or
_source.Geometry.size()
By the way for performance reasons, I would denormalize and add GeometryNumber field. You can use the transform mapping to compute size at index time.

Resources