Terms aggregation across two fields in Elasticsearch - elasticsearch

I'm not sure what I want to do is possible. I have data that looks like this:
{
"Actor1Name": "PERSON",
"Actor2Name": "OTHERPERSON"
}
I use copy_to in order to populate a secondary field, ActorNames, with both values.
I am trying to build a typeahead capability where a user can start to type a name and it will populate with the top hits for that prefix. I want it to search across both actor fields. The only problem is when I search across ActorNames, I get both values even if only one matches. That means if I'm searching for prefix O that I will get both OTHERPERSON (desired) and PERSON (undesired) in my results based on the above document.
My current solution is to run 2 aggregations and combine them client side, but is it possible to do this purely in ES?
Current query:
{
"query": {
"prefix": {
"ActorNames": "O"
}
},
"aggs": {
"actor1": {
"filter": {
"prefix": {
"Actor1Name": "O"
}
},
"aggs": {
"actor1": {
"terms": {
"field": "Actor1Name",
}
}
}
},
"actor2": {
"filter": {
"prefix": {
"Actor2Name": "O"
}
},
"aggs": {
"actor2": {
"terms": {
"field": "Actor2Name",
}
}
}
}
}
}

If you want to check the prefix condition on both the fields, why not use ANDING of prefix on both fields? Like:
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"prefix": {
"Actor1Name": "O"
}
},
{
"prefix": {
"Actor2Name": "O"
}
}
]
}
}
}

Related

How to search for an array of terms, in elasticsearch?

Contextualizing: I have this query that I search for a term, in two fields, and the result should bring me items that resemble the one inserted in the wildcard. But eventually I'll get a list of search terms...
I use this query to search when I get only 1 string:
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"wildcard": {
"shortName": "BAN*"
}
},
{
"wildcard": {
"name": "BAN*"
}
}
]
}
},
{
"range": {
"dhCot": {
"gte": "2022-04-11T00:00:00.000Z",
"lt": "2022-04-12T00:00:00.000Z"
}
}
}
]
}
},
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "dtBuy",
"interval": "1H",
"format": "yyyy-MM-dd:HH:mm:ssZ"
},
"aggs": {
"documents": {
"top_hits": {
"size": 100
}
}
}
}
}
}
But in some moments, I will get an array of strings, like this ["BANANA","APPLE","ORANGE"]
So, how do I search for items that exactly match the items within the array? Is it possible?
The object inserted in elastic is this one:
{
"name": "BANANA",
"priceDay": 1,
"priceWeek": 3,
"variation": 2,
"dataBuy":"2022-04-11T11:01:00.585Z",
"shortName": "BAN"
}
If you want to search for items that exactly match the items within the array, you can use the terms query
{
"query": {
"terms": {
"name": ["BANANA","APPLE","ORANGE"]
}
}
}
You can include the terms query, in your existing query either in the should clause or must clause depending on your use case.

Elasticsearch return unique string from array field after a given filter

How would I get all values of all the ids with a given prefix from the elastic search records and make them unique.
Records
PUT items/1
{ "ids" : [ "apple_A", "orange_B" ] }
PUT items/2
{ "ids" : [ "apple_A", "apple_B" ] }
PUT items/3
{ "ids" : [ "apple_C", "banana_A" ] }
What I need is to find all the unique ids for a given prefix, for example if input is apple the output of ids should be ["apple_A", "apple_B", "apple_C"]
What I have tried so far is make use of the term aggregation, with the following query I was able to filter out the documents which have ids with given prefix but in the aggregation it will return all the ids part of the document.
{
"aggregations": {
"filterIds": {
"filter": {
"bool": {
"filter": [
{
"prefix": {
"ids.keyword": {
"value": "apple"
}
}
}
]
}
},
"aggregations": {
"uniqueIds": {
"terms": {
"field": "ids.keyword",
}
}
}
}
}
}
It's returning aggregation list as [ "appleA", "orange_B", "apple_B","apple_C", "banana_A"] if we give prefix input as apple. Basically returning all ids which have a matching filter.
Is there to get only the ids which match the prefix in array and not all the ids in the array of document ?
You can limit the returned values using the include parameter:
POST items/_search
{
"size": 0,
"aggregations": {
"filterIds": {
"filter": {
"bool": {
"filter": [
{
"prefix": {
"ids.keyword": {
"value": "apple"
}
}
}
]
}
},
"aggregations": {
"uniqueIds": {
"terms": {
"field": "ids.keyword",
"include": "apple.*" <--
}
}
}
}
}
}
Do check this other thread which deals with using regex within include -- it's very similar to your use case.

How do I refer to multiple nesting levels in an Elastic Search's Filter Aggregation?

Let's call my root level foo and my child level events. I want to aggregate on the events level but with a filter that EITHER the event has color "orange" OR the parent foo has customerId "35".
So, I want to have a filter aggregation that's inside a nested aggregation. In this filter's query clause, I have one child that refers to a field on foo and the other refers to a field on events. However, that first child has no way to actually reference the parent like that! I can't use a reverse_nested aggregation because I can't put one of those as a child of a compound query, and I can't filter before nesting because I'd lose the OR semantics that way. How do I reference the field on foo?
Concrete example if it helps. Mapping:
{
"foo": {
"properties": {
"customer_id": { "type": "long" },
"events": {
"type": "nested",
"properties": {
"color": { "type": "keyword" },
"coord_y": { "type": "double" }
}
}
}
}
}
(update for clarity: that's an index named foo with the root mapping named foo)
The query I want to be able to make:
{
"aggs": {
"OP0_nest": {
"nested": { "path": "events" },
"aggs": {
"OP0_custom_filter": {
"filter": {
"bool": {
"should": [
{ "term": { "events.color": "orange" } },
{ "term": { "customer_id": 35 } }
]
}
},
"aggs": {
"OP0_op": {
"avg": { "field": "events.coord_y" }
}
}
}
}
}
}
}
Of course, this does not work, because the child of the should clause containing customer_id does not work. That term query is always false because customer_id can't be accessed inside the nested aggregation.
Thanks in advance!
Since the fields you want to apply filter on are at different levels you need to make query for each level separately and place them in should clause of bool query which becomes the filter for our filter aggregation. In this aggregation we then add a nested aggregation to get the avg of coord_y.
The aggregation will be (UPDATED: since foo is index name removed foo from field names):
{
"aggs": {
"OP0_custom_filter": {
"filter": {
"bool": {
"should": [
{
"term": {
"customer_id": 35
}
},
{
"nested": {
"path": "events",
"query": {
"term": {
"events.color": "orange"
}
}
}
}
]
}
},
"aggs": {
"OP0_op": {
"nested": {
"path": "events"
},
"aggs": {
"OP0_op_avg": {
"avg": {
"field": "events.coord_y"
}
}
}
}
}
}
}
}

ElasticSearch multi_match if field exists apply filter otherwise dont worry about it?

So we got an elasticsearch instance, but a job is requiring a "combo search" (A single search field, with checkboxes for types across a specific index)
This is fine, I simply apply this kind of search to my index (for brevity: /posts):
{
"query": {
"multi_match": {
"query": querystring,
"type":"cross_fields",
"fields":["title","name"]
}
}
}
}
As you may guess from the need for the multi_match here, the schemas to each of these types differs in one way or another. And that's my challenge right now.
In one of the types, just one, there is a field that doesnt exist in the other types, it's called active and it's a basic boolean 0 or 1.
We want to index inactive items in the type for administration search purposes, but we don't want inactive items in this type to be exposed to the public when searching.
To my knowledge and understanding, I want to use a filter. But when I supply a filter asking for active to be 1, I only ever now get results from that type and nothing else. Because now it's explicitly looking for items with that field and equal to one.
How can I do a conditional "if field exists, make sure it equals 1, otherwise ignore this condition"? Can this even be achieved?
if field exists, make sure it equals 1, otherwise ignore this condition
I think it can be implemented like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"exists": {
"field": "active"
}
},
{
"term": {
"active": 1
}
}
]
}
},
{
"missing": {
"field": "active"
}
}
]
}
}
}
}
}
and the complete query:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "whatever",
"type": "cross_fields",
"fields": [
"title",
"name"
]
}
},
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"exists": {
"field": "active"
}
},
{
"term": {
"active": 1
}
}
]
}
},
{
"missing": {
"field": "active"
}
}
]
}
}
}
}
}

Elasticsearch: how to extract possible pattern over multiple events with same session id

Summary:
I use elasticsearch for my weblogs. I want to get an anwser to the question: how many clients requested page A and page B within one session?
Details:
My Elasticsearch node contains the events that are logged on my website. Each event contains amongst others timestamp, url, referrer and session id. At this moment I know how to find e.g. how many sessions requested url xyz. But I don't know how to find if there are cases that within a session both page A and page B are requested. And of course not that page A or B is part of the referrer.
Is this something that is somehow supported within elasticsearch?
The query should look something like this (assuming your url and session_id are not_analyzed):
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "[Page A URL]"
}
},
{
"term": {
"url": "[Page B URL]"
}
}
]
}
}
}
},
"aggs": {
"requested_both_pages": {
"terms": {
"field": "session_id"
}
}
}
}
The doc_count in the response will be the number you're looking for.
Keep in mind that if your url is analyzed and you need to do fuzzy matching then you'll have to use a match query instead of the terms filter. I generally wouldn't recommend an analyzed referrer. Instead I would break it down into its parts and create a nested url object with each string not_analyzed and then use a terms filter. You can do a wildcard query with not_analyzed fields still if you need some fuzziness.
I figured out a query that at least returns how many times url A and url B are requested per session. I was not aware that I could use this style of aggregation. Still not the perfect solution as it can return sessions where url A has counts and url B has no counts. So I will not mark the anwser as solved. Unless some expert can tell me that my request is just not possible at all.
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"Url": "[Page A URL]"
}
},
{
"term": {
"Url": "[Page B URL]"
}
}
]
}
}
}
},
"aggs": {
"sessions_all": {
"terms": {
"field": "session_id",
"size": 100
},
"aggs": {
"Page_A_URL": {
"filter": {
"term": {
"Url": "[Page A URL]"
}
}
},
"Page_B_URL": {
"filter": {
"term": {
"Url": "[Page A URL]"
}
}
}
}
}
}
}

Resources