Elasticsearch nested type select query - elasticsearch

I have a field with mapping:
"my_field_name": {
"type": "nested",
"properties": {
"labels": {
"type": "text"
},
"source": {
"type": "text"
}
}
}
and when I put data, format is like:
"my_field_name": {
"url_of_an_image": [
"tag1",
"tag2",
"tag3",
"tag4",
"tag5"
],
"url_of_an_image2": [
"tag4",
"tag5",
"tag6",
"tag7",
"tag8"
]
}
I want to write an elasticsearch query to gather all the documents that contains given tag.
How can I search a tag in all documents and select the ones that contains given tag?
Thank you

(EDITED AFTER COMMENT)
As a first consideration, the mapping doesn't match the structure of your document, e.g., my_field_name has neither labels nor source fields. Which is the one that better represents your data?
As a second consideration, in your comment you said that
while querying I don't know "url_of_an_image" since it is different for each document
However, from your example, it looks like url_of_an_image is a property of the items of my_field_name rather than of the document. Therefore, I'd suggest doing something like:
"my_field_name": [
{
"image_url": "url_of_an_image",
"tags": [
"tag1",
"tag2",
"tag3",
"tag4",
"tag5"
]
},
{
"image_url": "url_of_an_image2",
"tags": [
"tag4",
"tag5",
"tag6",
"tag7",
"tag8"
]
}
]
And this would be the mapping:
"my_field_name": {
"type": "nested",
"properties": {
"image_url": {
"type": "text"
},
"tags": {
"type": "text"
}
}
}
This way, you can run queries like:
{
"query": {
"nested" : {
"path" : "my_field_name",
"query" : {
"bool" : {
"must" : [
{ "match" : {"my_field_name.image_url" : "url_of_an_image2"} },
{ "match" : {"my_field_name.tags" : "tag3"} }
]
}
}
}
}
}

You can achieve this by using the Nested Query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Related

ElasticSearch DSL Matching all elements of query in list of list of strings

I'm trying to query ElasticSearch to match every document that in a list of list contains all the values requested, but I can't seem to find the perfect query.
Mapping:
"id" : {
"type" : "keyword"
},
"mainlist" : {
"properties" : {
"format" : {
"type" : "keyword"
},
"tags" : {
"type" : "keyword"
}
}
},
...
Documents:
doc1 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1",
"tag2"
]
},
{
"type" : "small",
"tags" : [
"tag1"
]
}
]
},
doc2 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1"
]
},
{
"type" : "small",
"tags" : [
"tag2"
]
}
]
},
doc3 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1"
]
}
]
}
The query I've tried that got me closest to the result is:
GET /index/_doc/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"mainlist.tags": "tag1"
}
},
{
"term": {
"mainlist.tags": "tag2"
}
}
]
}
}
}
although I get as result doc1 and doc2, while I'd only want doc1 as contains tag1 and tag2 in a single list element and not spread across both sublists.
How would I be able to achieve that?
Thanks for any help.
As mentioned by #caster, you need to use the nested data type and query as in normal way Elasticsearch treats them as object and relation between the elements are lost, as explained in offical doc.
You need to change both mapping and query to achieve the desired output as shown below.
Index mapping
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"mainlist" :{
"type" : "nested"
}
}
}
}
Sample Index doc according to your example, no change there
Query
{
"query": {
"nested": {
"path": "mainlist",
"query": {
"bool": {
"must": [
{
"term": {
"mainlist.tags": "tag1"
}
},
{
"match": {
"mainlist.tags": "tag2"
}
}
]
}
}
}
}
}
And result
hits": [
{
"_index": "71519931_new",
"_id": "1",
"_score": 0.9139043,
"_source": {
"id": "abc",
"mainlist": [
{
"type": "big",
"tags": [
"tag1",
"tag2"
]
},
{
"type": "small",
"tags": [
"tag1"
]
}
]
}
}
]
use nested field type,this is work for it
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/nested.html

Search by slug in Elasticsearch

I have an index named homes. Here is the simplified mapping of it:
{
"template": "homes",
"index_patterns": "homes",
"settings": {
"index.refresh_interval": "60s"
},
"mappings": {
"properties": {
"status": {
"type": "keyword"
},
"address": {
"type": "keyword",
"fields": {
"suggest": {
"type": "search_as_you_type"
},
"search": {
"type": "text"
}
}
}
}
}
}
As you can see, there is an address field which I query this way:
{
"query": {
"bool": {
"filter": [
{
"term": {
"status": "sale"
}
},
{
"term": {
"address": "406 - 533 Richmond St W"
}
}
]
}
}
}
Now my problem is that I need to be able to query with slugyfied version of the address field as well. For example, I need to query like this:
{
"query": {
"bool": {
"filter": [
{
"term": {
"status": "sale"
}
},
{
"term": {
"address": "406-533-richmond-st-w"
}
}
]
}
}
}
So, instead of 406 - 533 Richmond St W I need to query 406-533-richmond-st-w. How can I do that? I was thinking of adding a new field address_slug which is the slugyfied version of address but I need it to be auto populated so I don't need to manually fill this field every time that I insert or update a document in the index.
If you create a custom analyzer with the token filters below and another field for search that uses the custom analyzer, you can achieve this. Here is an example analyze result and output:
GET {index}/_analyze
{
"tokenizer": "keyword",
"filter": [
{
"type": "lowercase"
},
{
"type": "pattern_replace",
"pattern": """[^A-Za-z0-9]+""",
"replacement": "-"
}
],
"text": "406 - 533 Richmond St W"
}
Output:
{
"tokens" : [
{
"token" : "406-533-richmond-st-w",
"start_offset" : 0,
"end_offset" : 23,
"type" : "word",
"position" : 0
}
]
}

Place an Analyzer on a a specific array item in a nested object

I have the following mapping
"mappings":{
"properties":{
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"customProps":{
"type" : "nested",
"properties": {
"key":{
"type": "keyword"
},
"value": {
"type" : "keyword"
}
}
}
}
}
example data
{
"name" : "person1",
"age" : 10,
"customProps":[
{"hairColor":"blue"},
{"height":"120"}
]
},
{
"name" : "person2",
"age" : 30,
"customProps":[
{"jobTitle" : "software engineer"},
{"salaryAccount" : "AvGhj90AAb"}
]
}
so i want to be able to search for document by salary account case insensitive, i am also searching using wild card
example query is
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "customProps",
"query": {
"bool": {
"must": [
{ "match": { "customProps.key": "salaryAccount" } },
{ "wildcard": { "customProps.value": "*AvG*"
}
}
]}}}}]}}}
i tried adding analyzer with PUT using the following syntax
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_case_insensitive" : {
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"people":{
"properties":{
"customProps":{
"properties":{
"value":{
"type": "keyword",
"analyzer": "analyzer_case_insensitive"
}
}
}
}
}
}
}
im getting the following error
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [people: {properties={customProps={properties={value={analyzer=analyzer_case_insensitive, type=keyword}}}}}]"
any idea how to do the analyzer for the salary account object in the array when it exists?
Your use case is quite clear, that you want to search on the value of salaryAccount only when this key exists in customProps array.
There are some issues with your mapping definition :
You cannot define a custom analyzer for keyword type field, instead you can use a normalizer
Based on the mapping definition you added at the beginning of the question, it seems that you are using elasticsearch version 7.x. But the second mapping definition that you provided, in that you have added mapping type also (i.e people), which is deprecated in 7.x
There is no need to add the key and value fields in the index mapping.
Adding a working example with index mapping, search query, and search result
Index Mapping:
PUT myidx
{
"mappings": {
"properties": {
"customProps": {
"type": "nested"
}
}
}
}
Search Query:
You need to use exists query, to check whether a field exists or not. And case_insensitive param in Wildcard query is available since elasticsearch version 7.10. If you are using a version below this, then you need to use a normalizer, to achieve case insensitive scenarios.
POST myidx/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "customProps",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "customProps.salaryAccount"
}
},
{
"wildcard": {
"customProps.salaryAccount.keyword": {
"value": "*aVg*",
"case_insensitive": true
}
}
}
]
}
}
}
}
]
}
}
}
Search Result:
"hits" : [
{
"_index" : "myidx",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "person2",
"age" : 30,
"customProps" : [
{
"jobTitle" : "software engineer"
},
{
"salaryAccount" : "AvGhj90AAb"
}
]
}
}
]

How to Query elasticsearch index with nested and non nested fields

I have an elastic search index with the following mapping:
PUT /student_detail
{
"mappings" : {
"properties" : {
"id" : { "type" : "long" },
"name" : { "type" : "text" },
"email" : { "type" : "text" },
"age" : { "type" : "text" },
"status" : { "type" : "text" },
"tests":{ "type" : "nested" }
}
}
}
Data stored is in form below:
{
"id": 123,
"name": "Schwarb",
"email": "abc#gmail.com",
"status": "current",
"age": 14,
"tests": [
{
"test_id": 587,
"test_score": 10
},
{
"test_id": 588,
"test_score": 6
}
]
}
I want to be able to query the students where name like '%warb%' AND email like '%gmail.com%' AND test with id 587 have score > 5 etc. The high level of what is needed can be put something like below, dont know what would be the actual query, apologize for this messy query below
GET developer_search/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "abc"
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": IN [587]
}
},
{
"term": {
"tests.test_score": >= some value
}
}
]
}
}
}
}
]
}
}
}
The query must be flexible so that we can enter dynamic test Ids and their respective score filters along with the fields out of nested fields like age, name, status
Something like that?
GET student_detail/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"name": {
"value": "*warb*"
}
}
},
{
"wildcard": {
"email": {
"value": "*gmail.com*"
}
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Inner hits is what you are looking for.
You must make use of Ngram Tokenizer as wildcard search must not be used for performance reasons and I wouldn't recommend using it.
Change your mapping to the below where you can create your own Analyzer which I've done in the below mapping.
How elasticsearch (albiet lucene) indexes a statement is, first it breaks the statement or paragraph into words or tokens, then indexes these words in the inverted index for that particular field. This process is called Analysis and that this would only be applicable on text datatype.
So now you only get the documents if these tokens are available in inverted index.
By default, standard analyzer would be applied. What I've done is I've created my own analyzer and used Ngram Tokenizer which would be creating many more tokens than just simply words.
Default Analyzer on Life is beautiful would be life, is, beautiful.
However using Ngrams, the tokens for Life would be lif, ife & life
Mapping:
PUT student_detail
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 4,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"email" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"age" : {
"type" : "text" <--- I am not sure why this is text. Change it to long or int. Would leave this to you
},
"status" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"tests":{
"type" : "nested"
}
}
}
}
Note that in the above mapping I've created a sibling field in the form of keyword for name, email and status as below:
"name":{
"type":"text",
"analyzer":"my_analyzer",
"fields":{
"keyword":{
"type":"keyword"
}
}
}
Now your query could be as simple as below.
Query:
POST student_detail/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "war" <---- Note this. This would even return documents having "Schwarb"
}
},
{
"match": {
"email": "gmail" <---- Note this
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
}
}
}
]
}
}
}
Note that for exact matches I would make use of Term Queries on keyword fields while for normal searches or LIKE in SQL I would make use of simple Match Queries on text Fields provided they make use of Ngram Tokenizer.
Also note that for >= and <= you would need to make use of Range Query.
Response:
{
"took" : 233,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.7260926,
"hits" : [
{
"_index" : "student_detail",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.7260926,
"_source" : {
"id" : 123,
"name" : "Schwarb",
"email" : "abc#gmail.com",
"status" : "current",
"age" : 14,
"tests" : [
{
"test_id" : 587,
"test_score" : 10
},
{
"test_id" : 588,
"test_score" : 6
}
]
}
}
]
}
}
Note that I observe the document you've mentioned in your question, in my response when I run the query.
Please do read the links I've shared. It is vital that you understand the concepts. Hope this helps!

Elasticsearch nested geo-shape query

Suppose I have the following mapping:
"mappings": {
"doc": {
"properties": {
"name": {
"type": "text"
},
"location": {
"type": "nested",
"properties": {
"point": {
"type": "geo_shape"
}
}
}
}
}
}
}
There is one document in the index:
POST /example/doc?refresh
{
"name": "Wind & Wetter, Berlin, Germany",
"location": {
"type": "point",
"coordinates": [13.400544, 52.530286]
}
}
How can I make a nested geo-shape query?
Example of usual geo-shape query from the documentation (the "bool" block can be skipped):
{
"query":{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates" : [[13.0, 53.0], [14.0, 52.0]]
},
"relation": "within"
}
}
}
}
}
}
Example of a nested query is:
{
"query": {
"nested" : {
"path" : "obj1",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{ "match" : {"obj1.name" : "blue"} },
{ "range" : {"obj1.count" : {"gt" : 5}} }
]
}
}
}
}
}
Now how to combine them? In the documentation it is mentioned that nested filter has been replaced by nested query. And that it behaves as a query in “query context” and as a filter in “filter context”.
If I try query for intersect with the point:
{
"query": {
"nested": {
"path": "location",
"query": {
"geo_shape": {
"location.point": {
"shape": {
"type": "point",
"coordinates": [
13.400544,
52.530286
]
},
"relation": "disjoint"
}
}
}
}
}
}
I still get back the document even if relation is "disjoint", so it's not correct. I tried different combinations, with "bool" and "filter", etc. but query is ignored, returning the whole index. Maybe it's impossible with this type of mapping?
Clearly I am missing something here. Can somebody help me out with that, please? Any help is greatly appreciated.

Resources