Find all documents in a mongodb where the array fulfill a compound critiera - spring

I want to find all documents from a collection where the deliveryPrograms array fulfill the criteria e.g. contains (01A or 07A) and 02A ...
My datastructure is something like:
_id: (ObjectId...)
dealerId: "1"
name: "Test 1"
programs: Array
0: "01A"
1: "07A"
2: "02A"
3: "99"
4: "01B"
_id: (ObjectId...)
dealerId: "2"
name: "Test 2"
programs: Array
0: "07A"
1: "02A"
2: "03C"
3: "99"
4: "01B"
The criteria is flexible so there can be a filter only on a program 01A or a complex filter on (01A or 07A) and 02A etc.
Is it possible with org.springframework.data.mongodb.core.query.Criteria ?

Related

Adding fuzziness to an ElasticSearch prefix query

I have two documents:
{id: 1, name: "james"}
{id: 2, name: "james kennedy"}
I am using the match_bool_prefix API for autocomplete, and I would like to be able to match the document with id: 1 even if I incorrectly spell james.
Query: jamis
Desired output: finding document with id: 1.

elasticsearch_dsl response multiple bucket aggregations

found this thread on how to frame nested aggregations using elasticsearch_dsl Generate multiple buckets in aggregation
can someone show how to iterate through the response to get the second bucket results?
for i in s.aggregations.clients.buckets.num_servers.buckets:
does not work, how else to get to the content in num_servers or server_list?
You need two loops if you want to loop through an second level aggregation. Here is an example assuming 'label' and 'number' fields in your index:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, A
client = Elasticsearch()
# Build a two level aggregation
my_agg = A('terms', field='label')
my_agg.bucket('number', A('terms', field='number'))
# Build and submit the query
s = Search(using=client, index="stackoverflow")
s.aggs.bucket('label', my_agg)
response = s.execute()
# Loop through the first level of the aggregation
for label_bucket in response.aggregations.label.buckets:
print "Label: {}, {}".format(label_bucket.key, label_bucket.doc_count)
# Loop through the 2nd level of the aggregation
for number_bucket in label_bucket.number.buckets:
print " Number: {}, {}".format(number_bucket.key, number_bucket.doc_count)
Which would print something like this:
Label: A, 3
Number: 2, 2
Number: 1, 1
Label: B, 3
Number: 3, 2
Number: 1, 1

find docs by a id field that not exists given document's nested field that contains ids

i have a int id field and a nested int array ids field on my docs. i want to find docs based an id field that not exists in the given document's id list.
Example:
[{
my_id: 1,
other_ids: [2,3]
},{
my_id: 2,
other_ids: [1]
},{
my_id: 3,
other_ids: [2]
}]
ty.

Do you get the same performance using index prefixes?

Say I have a collection containing documents like the one below:
{
_id: ObjectId(),
myValue: 123,
otherValue: 456
}
I then create like below:
{myValue: 1, otherValue: 1}
If I execute the following query:
db.myCollection.find({myValue: 123})
will I get the same performance with my index as I would if I have an index on only the myValue field? Or is the performance degraded some how since it is using an index Prefix?
A "compound index" which is the correct term for your "link" does not create any performance problems on "read" ( since writing new entries is obviously more information ) than an index just on the single field used in the query. With one exception.
If you use a "multi-Key" index which means an "array" item as part of the index then you effectively create n more items in the index per key. As in:
{ "a": 1, "b": [ 1, 2, 3 ] }
An index on { "a": 1, "b": 1 } means this in basic terms:
{ "a": 1, "b": 1 },
{ "a": 1, "b": 2 },
{ "a": 1, "b": 3 }
So basically one index entry per array element to be scanned.
But otherwise the single element does not affect performance with the general exclusion of the "obvious" need to load a structure that contains more data than what you "need to use" into memory per element.
So if you don't need it then don't use it. And creating "two" indexes ( one for compound one for single field ) might save you memory, but it will "cost" you in write performance and storage space in general.

Aggregate Terms Usage Count

I'm trying to work out a way of finding the most popular terms and their usage in ElasticSearch. The Terms Aggregation is very close but returns the count of documents that the term appeared in, rather than how many times the term appeared.
For example, imagine an appropriate index has been created to index these example documents:
{ text: 'one two two' }
{ text: 'two three' }
Then executing the following search:
{
aggregations: {
popular_terms: {
terms: {
field: 'text'
}
}
}
}
Will return:
... {
buckets: [
{ key: 'two', value: 2 },
{ key: 'one', value: 1 },
{ key: 'three', value: 1 }
]
}
Is it possible to search with an aggregation counting instances of the terms in a similar way? So in this example returning 3 for the value 'two' as it appears twice in the first document?
Aggregation counts the number of documents based on a criteria (eg: terms ). So it won't return what you are expecting.
For your use case you can probably use the term vector

Resources