Elasticsearch query and aggregation on indices - elasticsearch

I have multiple elasticsearch indices and I would like to get the doc_count of how many documents matched the query for each index (even if no documents matched for an index). I tried this (using elasticsearch.js):
{
index: 'one,or,many,indices,here',
query: {
query_string: {
query: 'hello',
},
},
aggs: {
group_by_index: {
terms: {
field: '_index',
min_doc_count: 0,
},
},
},
from: 0,
size: 20,
};
This only works if I specify all indices on the index key. However, I don't always want to match documents across all indices in my search hits.
So I came up with:
{
index: 'all,indices,here',
query: {
bool: {
must: [
{
query_string: {
query: 'hello',
},
},
{
terms: {
_index: 'only,search,hits,indices,here',
},
},
],
},
},
aggs: {
group_by_index: {
terms: {
field: '_index',
min_doc_count: 0,
},
},
},
from: 0,
size: 20,
};
But then I get doc_count: 0 for indices where there are matches because the index is not in the bool query.
How can I get the doc_count for all indices but not get search hits from unwanted indices?

You need to move your index constraint to post_filter
{
index: 'all,indices,here',
query: {
bool: {
must: [
{
query_string: {
query: 'hello',
},
},
],
},
},
aggs: {
group_by_index: {
terms: {
field: '_index',
min_doc_count: 0,
},
},
},
post_filter: {
terms: {
_index: 'only,search,hits,indices,here',
},
},
from: 0,
size: 20,
};

Related

Distinct records with geo_distance sort on aggregation ES

I'm working on nearby API using elasticsearch.
I'm trying to run 4 actions in ES query
match condition (here running a script to get records within radius)
get distinct records based on company's Key (want to get one record from a company)
sort records based on geo_distance
add the field as Distance to get the distance between user and location
Here is my code:
const query = {
query: {
bool: {
must: [
customQuery,
{
term: {
"schedule.isShopOpen": true,
},
},
{
term: {
isBranchAvailable: true,
},
},
{
term: {
branchStatus: "active",
},
},
{
match:{
shopStatus: "active"
}
},
{
script: {
script: {
params: {
lat: parseFloat(req.lat),
lon: parseFloat(req.lon),
},
source:
"doc['location'].arcDistance(params.lat, params.lon) / 1000 <= doc['searchRadius'].value",
lang: "painless",
},
},
},
],
},
},
aggs: {
duplicateCount: {
terms: {
field: "companyKey",
size: 10000,
},
aggs: {
duplicateDocuments: {
top_hits: {
sort: [
{
_geo_distance: {
location: {
lat: parseFloat(req.lat),
lon: parseFloat(req.lon),
},
order: "asc",
unit: "km",
mode: "min",
distance_type: "arc",
ignore_unmapped: true,
},
},
],
script_fields: {
distance: {
script: {
params: {
lat: parseFloat(req.lat),
lon: parseFloat(req.lon),
},
inline: `doc['location'].arcDistance(params.lat, params.lon)/1000`,
},
},
},
stored_fields: ["_source"],
size: 1,
},
},
},
},
},
};
Here's the out put:
data: [
{
companyKey: "1234",
companyName: "Floward",
branchKey: "3425234",
branch: "Mursilat",
distance: 1.810064121687324,
},
{
companyKey: "0978",
companyName: "Dkhoon",
branchKey: "352345",
branch: "Wahah blue branch ",
distance: 0.08931851500047634,
},
{
companyKey: "567675",
companyName: "Abdulaziz test",
branchKey: "53425",
branch: "Jj",
distance: 0.011447273197846672,
},
{
companyKey: "56756",
companyName: "Mouj",
branchKey: "345345",
branch: "King fahad",
distance: 5.822936713752124,
},
];
I have two issues
How to sort records based on geo_distance
will query actions(match, script) apply to aggregation data...?
Can you please help me out to solve these issues
This would be more appropriate query for your use case
{
"query": {
"bool": {
"filter": [
{
"geo_distance": {
"distance": "200km",
"distance_type": "arc",
"location": {
"lat": 40,
"lon": -70
}
}
},
{
"match": {
"shopStatus": "active"
}
}
]
}
},
"collapse": {
"field": "companyKey"
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 40,
"lon": 71
},
"order": "asc",
"unit": "km",
"mode": "min",
"distance_type": "arc",
"ignore_unmapped": true
}
}
],
"_source": ["*"],
"script_fields": {
"distance_in_m": {
"script": "doc['location'].arcDistance(40, -70)" // convert to unit required
}
}
}
Filter instead of must - since you are just filtering documents, filter will be faster as it does not score documents unlike must
collapse
You can use the collapse parameter to collapse search results based on field values. The collapsing is done by selecting only the top sorted document per collapse key.
Geo distance instead of script -- to find documents with in distance
script field to get distance

Elasticsearch multi-match not returning all results when providing empty string

I have a total of 1783 records and I want ES to return all of them in case the multi_match query is not provided (searchObject.query = '')
I manage to do so if I pass an empty array to query.bool.should, so in theory I could update the ES object below based on the searchObject.query value but I'm not sure if that's a good idea.
{
_source: [
'id',
'event',
'description',
'element',
'date'
],
track_total_hits: true,
query: {
bool: {
should: [{
multi_match:{
query: searchObject.query
fields: ["element","description","nar.*","title","identifier"]
}
}],
filter: []
}
},
highlight: { fields: { '*': {} } },
sort: [],
from: 0,
size: 10
}
Any suggestions?
You can append a match_all to the should:
{
"_source": [
"id",
"event",
"description",
"element",
"date"
],
"track_total_hits": true,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "",
"fields": [
"line",
"element",
"description",
"nar.*",
"title",
"identifier"
]
}
},
{
"match_all": {}
}
],
"filter": []
}
},
"highlight": {
"fields": {
"*": {}
}
},
"sort": [],
"from": 0,
"size": 10
}
That's what it's usually for. IMHO the empty string should be checked before you perform the ES request. I'm assuming it's coming from an autocomplete or such.
This is regulated by Match query zero_terms_query property. Just add this property to your multi_match block: "zero_terms_query": "all".

ElasticSearch: Sorting a complex query

I have a complex query in Elasticsearch (below) and I need to sort by date_creation ascending, within the "Activité" (activity) bucket. The query works, but the basic sort I have for date_creation does not. I am looking for how I would sort the Activities by date_creation, in ascending order. I've seen some posts on nested queries here on stackoverflow, for example and here in the elastic search documentation but they don't seem to answer how to address the complexity of my query.
I am using ElasticSearch 2.3.5 with Lucene 5.5.0.
var searchQuery = {
index: "resultats_" + env,
body: {
size: 0,
sort: [{ date_creation: { order: "asc", mode: "min" } }],
query: {
filtered: {
query: {
match_all: {}
},
filter: {
query: {
bool: {
should: [{}],
must: [
{
term: {
player_id: {
value: params.player_id
}
}
},
{
term: {
classes: {
value: params.grade
}
}
}
],
must_not: [{}]
}
}
}
}
},
aggs: {
Matière: {
terms: {
field: "id_matiere",
size: 10
},
aggs: {
"Titre matière": {
top_hits: {
_source: {
include: ["titre_matiere"]
},
size: 1
}
},
PP: {
terms: {
field: "id_point_pedago",
size: 10
},
aggs: {
"Titre PP": {
top_hits: {
_source: {
include: ["titre_point_pedago"]
},
size: 1
}
},
Compétence: {
terms: {
field: "id_competence",
size: 10
},
aggs: {
"Titre compétence": {
top_hits: {
_source: {
include: ["titre_competence"]
},
size: 1
}
},
Activité: {
terms: {
field: "id_activite",
size: 10
},
aggs: {
"Titre activité": {
top_hits: {
_source: {
include: [
"titre_activite",
"nombre_perimetre_occurrence"
]
},
size: 1
}
},
Trimestres: {
filters: {
filters: {
T1: {
range: {
date_creation: {
gte: params.t1_start,
lte: params.t1_end
}
}
},
T2: {
range: {
date_creation: {
gte: params.t2_start,
lte: params.t2_end
}
}
},
T3: {
range: {
date_creation: {
gte: params.t3_start,
lte: params.t3_end
}
}
}
}
},
aggs: {
Moyenne: {
avg: {
field: "resultat"
}
},
Occurrences: {
cardinality: {
field: "id_occurrence",
precision_threshold: 1000
}
},
Résultat: {
terms: {
field: "resultat",
size: 10,
min_doc_count: 0
}
}
}
}
}
}
}
}
}
}
}
}
}
}
};
You can do it like this:
Activité: {
terms: {
field: "id_activite",
size: 10
},
aggs: {
"Titre activité": {
top_hits: {
_source: {
include: [
"titre_activite",
"nombre_perimetre_occurrence"
]
},
size: 1,
add this line -> sort: [{ date_creation: { order: "asc", mode: "min" } }],
}
},

Perform nested sort without inner_hits in ElasticSearch

I need some help on querying records from ELasticSearch (1.7.3). We will be getting a list of evaluations performed and display only the last evaluation done as shown below:
evaluation: [
{
id: 2,
breaches: null
},
{
id: 6,
breaches: null
},
{
id: 7,
breaches: null
},
{
id: 15,
breaches: null
},
{
id: 18,
breaches: [
"rule_one",
"rule_two",
"rule_three"
]
},
{
id: 19,
breaches: [
"rule_one",
"rule_two",
"rule_three"
]
}
]
Now we need to query records on the basis of latest evaluation performed, that is to query only on the last object of the evaluation array. We found out the there is a support of inner_hits to sort and limit the nested records. For that we wrote a query to sort on the basis of evaluation id in desc order and limit its size to 1 as shown below:
{
"query": {
"bool": {
"must": {
"nested": {
"path": " evaluation",
"query": {
"bool": {
"must": {
"term": {
" evaluation. breaches": "rule_one"
}
}
}
},
"inner_hits": {
"sort": {
" evaluation.id": {
"order": "desc"
}
},
"size": 1
}
}
}
}
}
}
Please find the mapping below:
evaluation: {
type: "nested",
properties: {
id: {
type: "long"
},
breaches: {
type: "string"
}
}
}
We tried sorting records but it did not worked, can you suggest some other ways to search on just the last object of nested records.
Thanks.

Inner hits on grandparents still not working

I have problems retrieving the inner_hits of my "grandparent" items.
Parents from a child query works fine, but cant get it to return also the ones one more level up.
Any ideas of this?
The known issue for this should be fixed by now (2.3) and the workaround are written according to nested objects, not parent/child hierarchy data, so cant get it to work for me.
Code in Sense-format:
POST /test/child/_search
{
"query": {
"has_parent": {
"type": "parent",
"query": {
"has_parent": {
"type": "grandparent",
"query": {
"match_all": {}
},
"inner_hits": {}
}
},
"inner_hits": {}
}
}
}
PUT /test/child/3?parent=2&routing=1
{
"id": 3,
"name": "child",
"parentid": 2
}
PUT /test/parent/2?parent=1&routing=1
{
"id": 2,
"name": "parent",
"parentid": 1
}
PUT /test/grandparent/1
{
"id": 1,
"name": "grandparent"
}
PUT /test
{
"mappings": {
"grandparent": {},
"parent": {
"_parent": {
"type": "grandparent"
}
},
"child": {
"_parent": {
"type": "parent"
}
}
}
}
this is sample code for finding grand parent
const filterPath = `hits.hits.inner_hits.activity.hits.hits.inner_hits.user.hits.hits._source*,
hits.hits.inner_hits.activity.hits.hits.inner_hits.user.hits.hits.inner_hits.fofo.hits.hits._source*`;
const source = ['id', 'name', 'thumbnail'];
const { body } = await elasticWrapper.client.search({
index: ElasticIndex.UserDataFactory,
filter_path: filterPath,
_source: source,
body: {
from,
size,
query: {
bool: {
must: [
{
match: {
relation_type: ElasticRelationType.Like,
},
},
{
has_parent: {
parent_type: ElasticRelationType.Post,
query: {
bool: {
must: [
{
term: {
id: {
value: req.params.id,
},
},
},
{
has_parent: {
parent_type: ElasticRelationType.User,
query: {
bool: {
must: [
{
exists: {
field: 'id',
},
},
],
should: [
{
has_child: {
type: ElasticRelationType.Follower,
query: {
bool: {
minimum_should_match: 1,
should: [
{
match: {
follower:
req.currentUser?.id,
},
},
{
match: {
following:
req.currentUser?.id,
},
},
],
},
},
inner_hits: {
_source: [
'follower',
'following',
'status',
],
},
},
},
],
},
},
inner_hits: {
_source: ['id', 'name', 'thumbnail'],
},
},
},
],
},
},
inner_hits: {},
},
},
],
},
},
sort: [
{
createdAt: {
order: 'desc',
},
},
],
},
});

Resources