Elasticsearch search for attributes matching query over multiple documents

Elasticsearch search for attributes matching query over multiple documents - elasticsearch

I have data modeled where multiple documents with different attributes are logically connected over a chainID because the documents are indexed with an undefined amount of time between them i.e. after they're executed in the backend. All documents are indexed on the same index. Example documents:
Doc 1:
{
"att1": "a",
"att2": "b",
"chainID": "123"
}
Doc 2:
{
"att3": "c",
"att4": "d",
"chainID": "123"
}
Doc 3:
{
"att1": "x",
"att2": "y",
"chainID": "678"
}
Doc 4:
{
"att3": "z",
"att4": "u",
"chainID": "678"
}
Mapping:
{
"properties": {
"att1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"att2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"att3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"att4": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"chainID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
I want to group the documents by chainID and search through the aggregated results so that a query with att1=a AND att3=c would have chainID=123 as a result.
I tried the following query which resulted in no matching documents
{
"query": {
"bool": {
"must": [
{
"term": {
"att1.keyword": "a"
}
},
{
"term": {
"att3.keyword": "c"
}
}
]
}
},
"aggs": {
"chainIDs": {
"terms": {
"field": "chainID.keyword"
},
"aggs": {
"docs": {
"top_hits": {
"_source": [
"chainID"
]
}
}
}
}
}
}
It seems like the aggregation happens after the query is processed. What I would like to do is aggregate the documents per their chainID and run the query against the aggregated documents. Is this possible with elasticsearch or do I need to adjust my mappings/data model?

Try replacing "must" with should (logic OR). "Must" requires the same document to have att1=1 and att3=c (logic AND).
{
"query": {
"bool": {
"should": [
{
"term": {
"att1.keyword": "a"
}
},
{
"term": {
"att3.keyword": "c"
}
}
]
}
},
"aggs": {
"chainIDs": {
"terms": {
"field": "chainID.keyword"
},
"aggs": {
"docs": {
"top_hits": {
"_source": [
"chainID"
]
}
}
}
}
}
}

Related

Aggregation based off of nested document field with filters on nested document and parent

I have the following mapping:
{
"accountId": {
"type": "long"
},
"storeProductId": {
"type": "long"
},
"storeSchemaId": {
"type": "long"
},
"yoyoValues": {
"type": "nested",
"properties": {
"yoyoNameId": {
"type": "long"
},
"dataType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
},
"languageId": {
"type": "long"
},
"value_Number": {
"type": "float"
},
"value_Raw": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
and I'm trying to get the max and min values for value_number for all nested documents with yoyoNameId of 3 that also has a parent document with an accountId of 1285 and storeSchemaId of 241.
Everytime I've tried, I've been unable to properly filter the nested documents so it ends up being the min and max values for all nested documents with the correct parent document values.
I've tried several different queries but my most recent one is as follows:
{
"size": 0,
"aggs": {
"filter-layer": {
"filters": {
"filters": [
{
"term": {
"accountId": 1285
}
},
{
"term": {
"yoyoSchemaId": 241
}
},
{
"nested": {
"path": "yoyoValues",
"query": {
"bool": {
"filter": [
{
"term": {
"yoyoValues.yoyoNameId": 3
}
}
]
}
}
}
}
]
},
"aggs": {
"yoyoValues": {
"nested": {
"path": "yoyoValues"
},
"inner": {
"filter": {
"term": {
"yoyoValues.yoyoNameId": 3
}
},
"aggs": {
"min_value": {
"min": {
"field": "yoyoValues.value_Number"
}
},
"max_value": {
"max": {
"field": "yoyoValues.value_Number"
}
}
}
}
}
}
}
}
}
Can someone please help me correct this query? I'm limited to elastic v7.13.

Elastic: Nested Query or Query of elastic Documents by document key whose values are arrays of key, value pair

I am trying to form a query in elastic, the documents are stored like this-
{
"data": [
{
"value": "Lorem ipsum, Lorem ipsum",
"source": [
"abc.com", "xyz.com"
]
},
{
"value": "Lorem ipsum Lorem ipsum",
"source": [
"wxy.com", "osa.com"
]
}
]
}
I want to filter the documents by a certain source.
I've got some idea from here but it's not working. It's returning all the records in all the queries for this field.
I've exhausted all permutations and combinations of queries in my knowledge so far. So, any idea how can I achieve this?
Mapping:-
{
"mappings": {
"_document_name_": {
"properties": {
"data": {
"properties": {
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Query:-
{
"query": {
"nested": {
"path": "address",
"query": {
"bool": {
"must_not": [
{ "match": {"data.source": "abc.com"}}
]
}
}
}
}
}

Boost score based on integer value - Elasticsearch

I'm not very experienced with ElasticSearch and would like to know how to boost a search based on a certain integer value.
This is an example of a document:
{
"_index": "links",
"_type": "db1",
"_id": "mV32vWcBZsblNn1WqTcN",
"_score": 8.115617,
"_source": {
"url": "example.com",
"title": "Example website",
"description": "This is an example website, used for various of examples around the world",
"likes": 9,
"popularity": 543,
"tags": [
{
"name": "example",
"votes": 5
},
{
"name": "test",
"votes": 2
},
{
"name": "testing",
"votes": 1
}
]
}
}
Now in this particular search, the focus is on the tags and I would like to know how to boost the _score and multiply it by the integer in the votes under tags.
If this is not possible (or very hard to achieve), I would simply like to know how to boost the _score by the votes (not under tags)
Example, add 0.1 to the _score for each integer in votes
This is the current search query I'm using (for searching tags only):
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool":{
"should":{
"match":{
"tags.name":"example,testing,something else"
}
}
}
}
}
}
}
I couldn't find much online, and hope someone can help me out.
How do I boost the _score with an integer value?
Update
For more info, here is the mapping:
{
"links": {
"mappings": {
"db1": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"likes": {
"type": "long"
},
"popularity": {
"type": "long"
},
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"votes": {
"type": "long"
}
}
}
}
}
}
}
}
Update 2
Changed the tags.likes/tags.dislikes to tags.votes, and added a nested property to the tags

This took a long time to figure out. I have learnt so much on my way there.
Here is the final result:
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"tags.name": "example"
}
},
{
"match": {
"tags.name": "testing"
}
},
{
"match": {
"tags.name": "test"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
}
}
}
The array in should has helped a lot, and was glad I could combine it with function_score

You are looking at function score query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
And field value factor https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor.
Snippet from documentation:
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "tags.dislikes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
Or with script score because your nested tags field (not sure if field value score works fine with nested structure).

Elasticsearch min price of a month

I would like to receive the lowest prices for the next and previous 15 days from my chosen date in my products index.
How can I get this prices in ES? What kind of query should I write?
My mapping:
{
"product-data": {
"mappings": {
"mine-apple": {
"properties": {
"date": {
"type": "date"
},
"productName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"productDescription": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"price": {
"type": "long"
},
"query": {
"properties": {
"match_all": {
"type": "object"
}
}
}
}
}
}
}
}
Thanks in advance.

The solution I found: I added date-histogram to my query.In this way, grouping my query with date-histogram. At the latest I get minimum prices with minimum aggregation.
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "2017-05-11",
"lte": "2017-05-14"
}
}
}
]
}
},
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"format": "YYYY-MM-dd",
"field": "date",
"interval": "day"
},
"aggs": {
"sales": {
"min": {
"field": "price"
}
}
}
}
}
}

Elastic search top_hits aggregation on nested

I have an index which contains CustomerProfile documents. Each of this document in the CustomerInsightTargets(with the properties Source,Value) property can be an array with x items. What I am trying to achieve is an autocomplete (of top 5) on CustomerInsightTargets.Value grouped by CustomerInisghtTarget.Source.
It will be helpful if anyone gives me hint about how to select only a subset of nested objects from each document and use that nested obj in aggregations.
{
"customerinsights": {
"aliases": {},
"mappings": {
"customerprofile": {
"properties": {
"CreatedById": {
"type": "long"
},
"CreatedDateTime": {
"type": "date"
},
"CustomerInsightTargets": {
"type": "nested",
"properties": {
"CustomerInsightSource": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"CustomerInsightValue": {
"type": "text",
"term_vector": "yes",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "ngram_tokenizer_analyzer"
},
"CustomerProfileId": {
"type": "long"
},
"Guid": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Id": {
"type": "long"
}
}
},
"DisplayName": {
"type": "text",
"term_vector": "yes",
"analyzer": "ngram_tokenizer_analyzer"
},
"Email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Id": {
"type": "long"
},
"ImageUrl": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "customerinsights",
"creation_date": "1484860145041",
"analysis": {
"analyzer": {
"ngram_tokenizer_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "10"
}
}
},
"number_of_replicas": "2",
"uuid": "nOyI0O2cTO2JOFvqIoE8JQ",
"version": {
"created": "5010199"
}
}
}
}
}
Having as example a document:
{
{
"Id": 9072856,
"CreatedDateTime": "2017-01-12T11:26:58.413Z",
"CreatedById": 9108469,
"DisplayName": "valentinos",
"Email": "valentinos#mail.com",
"CustomerInsightTargets": [
{
"Id": 160,
"CustomerProfileId": 9072856,
"CustomerInsightSource": "Tags",
"CustomerInsightValue": "Tag1",
"Guid": "00000000-0000-0000-0000-000000000000"
},
{
"Id": 160,
"CustomerProfileId": 9072856,
"CustomerInsightSource": "ProfileName",
"CustomerInsightValue": "valentinos",
"Guid": "00000000-0000-0000-0000-000000000000"
},
{
"Id": 160,
"CustomerProfileId": 9072856,
"CustomerInsightSource": "Playground",
"CustomerInsightValue": "Wiki",
"Guid": "00000000-0000-0000-0000-000000000000"
}
]
}
}
If i ran an aggregation on the top_hits the result will include all targets from a document -> if one of them match my search text.
Example
GET customerinsights/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "CustomerInsightTargets",
"query": {
"bool": {
"must": [
{
"match": {
"CustomerInsightTargets.CustomerInsightValue": {
"query": "2017",
"operator": "AND",
"fuzziness": 2
}
}
}
]
}
}
}
}
]
}
} ,
"aggs": {
"root": {
"nested": {
"path": "CustomerInsightTargets"
},
"aggs": {
"top_tags": {
"terms": {
"field": "CustomerInsightTargets.CustomerInsightSource.keyword"
},
"aggs": {
"top_tag_hits": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"size": 5,
"_source": "CustomerInsightTargets"
}
}
}
}
}
}
},
"size": 0,
"_source": "CustomerInsightTargets"
}
My question is how I should use the aggregation to get the "autocomplete" Values grouped by Source and order by the _score. I tried to use a significant_terms aggregation but doesn't work so well, also terms aggs doesn't sort by score (and by _count) and having fuzzy also adds complexity.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch search for attributes matching query over multiple documents - elasticsearch

Related

Aggregation based off of nested document field with filters on nested document and parent

Elastic: Nested Query or Query of elastic Documents by document key whose values are arrays of key, value pair

Boost score based on integer value - Elasticsearch

Elasticsearch min price of a month

Elastic search top_hits aggregation on nested

Categories

Resources