Aggregate results from several indices in Elasticsearch - elasticsearch

I have an Elasticsearch query as shown below.
"query": {
"bool": {
"must": [
{
"match": {
"content": "Netherlands"
}
}
]
}
},
"sort": [
{
"file.created": {
"order": "asc"
}
}
]
}
When I query several indices and sort my results as shown below(in ascending or descending order), my results are in order but of each individual index. So I get trial2 results in order then trial3 results in order.
http://localhost:9200/trial2,trial3/_doc/_search?pretty
What I am looking for, since I am querying several indices and sorting by date, is to get the results of all the indices in ascending or descending order. If a document in trial3 is more recent then the one in trial2, it should appear higher regardless of the order of the indices in the query.
Kindly advice

If you are working with multiple indices that have an equal structure, it would make sense to create an alias that contains all these indices together. You can then run your queries against this virtual big index. Also the sorted results are then in a correct order, while the original index is still referenced in each result document.
POST /_aliases
{
"actions": [
{
"add": {
"index": "trial2",
"alias": "my-alias"
}
},
{
"add": {
"index": "trial3",
"alias": "my-alias"
}
}
]
}

Related

Elasticsearch collapse not working with search_after with single sort field and PIT

I have an Elastic query that initially returns results. When I attempt the query again using search_after for paging, I am getting the error: Cannot use [collapse] in conjunction with [search_after] unless the search is sorted on the same field. Multiple sort fields are not allowed. So far as I can tell, I am sorting and collapsing using just a single field per_id. Is my query structured incorrectly or is there something else I need to do to get this query to run?
GET /_search
{
"query": {
"bool": {
"must": [{
"term": {
"pform": "iphone"
}
}]
}
},
"collapse": {
"field": "per_id"
},
"pit": {
"id": "g-ABCDDEFG12345678ABCDDEFG12345678==",
"keep_alive": "5m"
},
"sort": [
{"per_id": "asc"}
],
"search_after" : [
"ABCDDEFG12345678",
123456
]
}
I needed to exclude the tie breaker in my search_after. It shouldn't cause duplicates because I am using a PIT and sorting on the collapse field, meaning duplicates shouldn't exist in the my result set.
"search_after" : [
"ABCDDEFG12345678"
]
So I needed to remove the tiebreaker returned from the previous result before passing it into the next one

Elastic search multi index query

I am building an app where I need to match users based on several parameters. I have two elastic search indexes, one with the user's likes and dislikes, one with some metadata about the user.
/user_profile/abc12345
{
"userId": "abc12345",
"likes": ["chocolate", "vanilla", "strawberry"]
}
/user_metadata/abc12345
{
"userId": "abc12345",
"seenBy": ["aaa123","bbb123", "ccc123"] // Potentially hundreds of thousands of userIds
}
I was advised to make these separate indexes and cross reference them, but how do I do that? For example I want to search for a user who likes chocolate and has NOT been seen by user abc123. How do I write this query?
If this is a frequent query in your use case, I would recommend merging the indices (always design your indices based on your queries).
Anyhow, a possible workaround for your current scenario is to exploit the fact that both indices store the user identifier in a field with the same name (userId). Then, you can (1) issue a boolean query over both indices, to match documents from one index based on the likes field, and documents from the other index based on the seenBy field, (2) use the terms bucket aggregation to get the list of unique userIds that satisfy your conditions.
For example
GET user_*/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"match": {
"likes": "chocolate"
}
},
{
"match": {
"seenBy": "abc123"
}
}
]
}
},
"aggs": {
"by_userId": {
"terms": {
"field": "userId.keyword",
"size": 100
}
}
}
}

Elasticsearch group sorted combined query

I am faced with the following issue. I have a few sorted queries against a specific group of records in my index, similar to the one below, where the term1 matching values vary per query, while for term2 they remain static for all queries.
{
"query": {
"bool": {
"must": {
"terms": {
"term1": [ "val1", "val2" ]
}
},
"must_not": {
"terms": {
"term2": [ "val3", "val4" ]
}
}
}
},
"sort": [
{ "sort_term": "desc" }
],
"from": 0,
"size": 10
}
Right now, I'm performing all these queries separately and then combining and shuffling their results in code, something that as you can probably tell is not ideal. I was wondering if there's a way to combine these queries in ElasticSearch, while maintaining the group-based sorting.
The reason I want to maintain the sorting order of each individual query is because the sorting values are not uniform and I don't want results from different groups to be buried down the result set.
The only solution I could think of would be to somehow re-process all records and compute a relative sorting value based on the sorting values of all the records in a given group, but these values change very regularly and the index has a lot of records, so that would probably be overkill.
Any ideas would be greatly appreciated!
You can use multiple Terms in sort array. If you combine the query I would first sort by _type, which prevents mixing up your search results. The sort field of your query should be like:
"sort": [
{ "_type": "desc" },
{ "sort_term_query1": "desc" },
{ "sort_term_query2": "desc" },
],

Can _score from different queries be compared?

In my application, I issue multiple queries, each of which to a different index. Then, I merge the results from these queries, and sort them using the _score attribute, in order to rank them according to their relavance. But I wonder if this makes sense at all, since the results came from different queries?
I guess my question is: can _scores from different queries be compared?
Instead of issuing multiple queries , it would be a good idea to club them together in a single query.
You can use index query to do index specefic operation.
So something like
{
"bool": {
"should": [
{
"indices": {
"indices": [
"index1"
],
"query": {
"term": {
"tag": "wow"
}
}
}
},
{
"indices": {
"indices": [
"index2"
],
"query": {
"term": {
"name": "laptop"
}
}
}
}
]
}
}
Once this is done , results would be sorted based on the _score.
Hope that helps.

Is there a way to have elasticsearch return a hit per generated bucket during an aggregation?

right now I have a query like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
}
},
{
"range": {
"date": {
"from": "now-12h",
"to": "now"
}
}
}
]
}
},
"aggs": {
"query": {
"terms": [
{
"field": "query",
"size": 3
}
]
}
}
}
The aggregation works perfectly well, but I can't seem to find a way to control the hit data that is returned, I can use the size parameter at the top of the dsl, but the hits that are returned are not returned in the same order as the bucket so the bucket results do not line up with the hit results. Is there any way to correct this or do I have to issue 2 separate queries?
To expand on Filipe's answer, it seems like the top_hits aggregation is what you are looking for, e.g.
{
"query": {
... snip ...
},
"aggs": {
"query": {
"terms": {
"field": "query",
"size": 3
},
"aggs": {
"top": {
"top_hits": {
"size": 42
}
}
}
}
}
}
Your query uses exact matches (match and range) and binary logic (must, bool) and thus should probably be converted to use filters instead:
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
}
},
{
"range": {
"date": {
"from": "now-12h",
"to": "now"
}
}
}
]
}
}
As for the aggregations,
The hits that are returned do not represent all the buckets that were returned. so if have buckets for terms 'a', 'b', and 'c' I want to have hits that represent those buckets as well
Perhaps you are looking to control the scope of the buckets? You can make an aggregation bucket global so that it will not be influenced by the query or filter.
Keep in mind that Elasticsearch will not "group" hits in any way -- it is always a flat list ordered according to score and additional sorting options.
Aggregations can be organized in a nested structure and return computed or extracted values, in a specific order. In the case of terms aggregation, it is in descending count (highest number of hits first). The hits section of the response is never influenced by your choice of aggregations. Similarly, you cannot find hits in the aggregation sections.
If your goal is to group documents by a certain field, yes, you will need to run multiple queries in the current Elasticsearch release.
I'm not 100% sure, but I think there's no way to do that in the current version of Elasticsearch (1.2.x). The good news is that there will be when version 1.3.x gets released:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

Resources