How to Query elasticsearch index with nested and non nested fields - elasticsearch

I have an elastic search index with the following mapping:
PUT /student_detail
{
"mappings" : {
"properties" : {
"id" : { "type" : "long" },
"name" : { "type" : "text" },
"email" : { "type" : "text" },
"age" : { "type" : "text" },
"status" : { "type" : "text" },
"tests":{ "type" : "nested" }
}
}
}
Data stored is in form below:
{
"id": 123,
"name": "Schwarb",
"email": "abc#gmail.com",
"status": "current",
"age": 14,
"tests": [
{
"test_id": 587,
"test_score": 10
},
{
"test_id": 588,
"test_score": 6
}
]
}
I want to be able to query the students where name like '%warb%' AND email like '%gmail.com%' AND test with id 587 have score > 5 etc. The high level of what is needed can be put something like below, dont know what would be the actual query, apologize for this messy query below
GET developer_search/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "abc"
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": IN [587]
}
},
{
"term": {
"tests.test_score": >= some value
}
}
]
}
}
}
}
]
}
}
}
The query must be flexible so that we can enter dynamic test Ids and their respective score filters along with the fields out of nested fields like age, name, status

Something like that?
GET student_detail/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"name": {
"value": "*warb*"
}
}
},
{
"wildcard": {
"email": {
"value": "*gmail.com*"
}
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Inner hits is what you are looking for.

You must make use of Ngram Tokenizer as wildcard search must not be used for performance reasons and I wouldn't recommend using it.
Change your mapping to the below where you can create your own Analyzer which I've done in the below mapping.
How elasticsearch (albiet lucene) indexes a statement is, first it breaks the statement or paragraph into words or tokens, then indexes these words in the inverted index for that particular field. This process is called Analysis and that this would only be applicable on text datatype.
So now you only get the documents if these tokens are available in inverted index.
By default, standard analyzer would be applied. What I've done is I've created my own analyzer and used Ngram Tokenizer which would be creating many more tokens than just simply words.
Default Analyzer on Life is beautiful would be life, is, beautiful.
However using Ngrams, the tokens for Life would be lif, ife & life
Mapping:
PUT student_detail
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 4,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"email" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"age" : {
"type" : "text" <--- I am not sure why this is text. Change it to long or int. Would leave this to you
},
"status" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"tests":{
"type" : "nested"
}
}
}
}
Note that in the above mapping I've created a sibling field in the form of keyword for name, email and status as below:
"name":{
"type":"text",
"analyzer":"my_analyzer",
"fields":{
"keyword":{
"type":"keyword"
}
}
}
Now your query could be as simple as below.
Query:
POST student_detail/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "war" <---- Note this. This would even return documents having "Schwarb"
}
},
{
"match": {
"email": "gmail" <---- Note this
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
}
}
}
]
}
}
}
Note that for exact matches I would make use of Term Queries on keyword fields while for normal searches or LIKE in SQL I would make use of simple Match Queries on text Fields provided they make use of Ngram Tokenizer.
Also note that for >= and <= you would need to make use of Range Query.
Response:
{
"took" : 233,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.7260926,
"hits" : [
{
"_index" : "student_detail",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.7260926,
"_source" : {
"id" : 123,
"name" : "Schwarb",
"email" : "abc#gmail.com",
"status" : "current",
"age" : 14,
"tests" : [
{
"test_id" : 587,
"test_score" : 10
},
{
"test_id" : 588,
"test_score" : 6
}
]
}
}
]
}
}
Note that I observe the document you've mentioned in your question, in my response when I run the query.
Please do read the links I've shared. It is vital that you understand the concepts. Hope this helps!

Related

Elasticsearch Multi-Term Auto Completion

I'm trying to implement the Multi-Term Auto Completion that's presented here.
Filtering down to the correct documents works, but when aggregating the completion_terms they are not filtered to those that match the current partial query, but instead include all completion_terms from any matched documents.
Here are the mappings:
{
"mappings": {
"dynamic" : "false",
"properties" : {
"completion_ngrams" : {
"type" : "text",
"analyzer" : "completion_ngram_analyzer",
"search_analyzer" : "completion_ngram_search_analyzer"
},
"completion_terms" : {
"type" : "keyword",
"normalizer" : "completion_normalizer"
}
}
}
}
Here are the settings:
{
"settings" : {
"index" : {
"analysis" : {
"filter" : {
"edge_ngram" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "10"
}
},
"normalizer" : {
"completion_normalizer" : {
"filter" : [
"lowercase",
"german_normalization"
],
"type" : "custom"
}
},
"analyzer" : {
"completion_ngram_search_analyzer" : {
"filter" : [
"lowercase"
],
"tokenizer" : "whitespace"
},
"completion_ngram_analyzer" : {
"filter" : [
"lowercase",
"edge_ngram"
],
"tokenizer" : "whitespace"
}
}
}
}
}
}
}
I'm then indexing data like this:
{
"completion_terms" : ["Hammer", "Fortis", "Tool", "2000"],
"completion_ngrams": "Hammer Fortis Tool 2000"
}
Finally, the autocomplete search looks like this:
{
"query": {
"bool": {
"must": [
{
"term": {
"completion_terms": "fortis"
}
},
{
"term": {
"completion_terms": "hammer"
}
},
{
"match": {
"completion_ngrams": "too"
}
}
]
}
},
"aggs": {
"autocomplete": {
"terms": {
"field": "completion_terms",
"size": 100
}
}
}
}
This correctly returns documents matching the search string "fortis hammer too", but the aggregations include ALL completion terms that are included in any of the matched documents, e.g. for the query above:
"buckets": [
{ "key": "fortis" },
{ "key": "hammer" },
{ "key": "tool" },
{ "key": "2000" },
]
Ideally, I'd expect
"buckets": [
{ "key": "tool" }
]
I could filter out the terms that are already covered by the search query ("fortis" and "hammer" in this case) in the app, but the "2000" doesn't make any sense from a user's perspective, because it doesn't partially match any of the provided search terms.
I understand why this is happening, but I can't think of a solution. Can anyone help?
try filters agg please
{
"query": {
"bool": {
"must": [
{
"term": {
"completion_terms": "fortis"
}
},
{
"term": {
"completion_terms": "hammer"
}
},
{
"match": {
"completion_ngrams": "too"
}
}
]
}
},
"aggs": {
"findOuthammerAndfortis": {
"filters": {
"filters": {
"fortis": {
"term": {
"completion_terms": "fortis"
}
},
"hammer": {
"term": {
"completion_terms": "hammer"
}
}
}
}
}
}
}

Place an Analyzer on a a specific array item in a nested object

I have the following mapping
"mappings":{
"properties":{
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"customProps":{
"type" : "nested",
"properties": {
"key":{
"type": "keyword"
},
"value": {
"type" : "keyword"
}
}
}
}
}
example data
{
"name" : "person1",
"age" : 10,
"customProps":[
{"hairColor":"blue"},
{"height":"120"}
]
},
{
"name" : "person2",
"age" : 30,
"customProps":[
{"jobTitle" : "software engineer"},
{"salaryAccount" : "AvGhj90AAb"}
]
}
so i want to be able to search for document by salary account case insensitive, i am also searching using wild card
example query is
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "customProps",
"query": {
"bool": {
"must": [
{ "match": { "customProps.key": "salaryAccount" } },
{ "wildcard": { "customProps.value": "*AvG*"
}
}
]}}}}]}}}
i tried adding analyzer with PUT using the following syntax
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_case_insensitive" : {
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"people":{
"properties":{
"customProps":{
"properties":{
"value":{
"type": "keyword",
"analyzer": "analyzer_case_insensitive"
}
}
}
}
}
}
}
im getting the following error
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [people: {properties={customProps={properties={value={analyzer=analyzer_case_insensitive, type=keyword}}}}}]"
any idea how to do the analyzer for the salary account object in the array when it exists?
Your use case is quite clear, that you want to search on the value of salaryAccount only when this key exists in customProps array.
There are some issues with your mapping definition :
You cannot define a custom analyzer for keyword type field, instead you can use a normalizer
Based on the mapping definition you added at the beginning of the question, it seems that you are using elasticsearch version 7.x. But the second mapping definition that you provided, in that you have added mapping type also (i.e people), which is deprecated in 7.x
There is no need to add the key and value fields in the index mapping.
Adding a working example with index mapping, search query, and search result
Index Mapping:
PUT myidx
{
"mappings": {
"properties": {
"customProps": {
"type": "nested"
}
}
}
}
Search Query:
You need to use exists query, to check whether a field exists or not. And case_insensitive param in Wildcard query is available since elasticsearch version 7.10. If you are using a version below this, then you need to use a normalizer, to achieve case insensitive scenarios.
POST myidx/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "customProps",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "customProps.salaryAccount"
}
},
{
"wildcard": {
"customProps.salaryAccount.keyword": {
"value": "*aVg*",
"case_insensitive": true
}
}
}
]
}
}
}
}
]
}
}
}
Search Result:
"hits" : [
{
"_index" : "myidx",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "person2",
"age" : 30,
"customProps" : [
{
"jobTitle" : "software engineer"
},
{
"salaryAccount" : "AvGhj90AAb"
}
]
}
}
]

Elasticsearch preform "OR" search on groups of filtering criteria

Overview: I have a situation where within a single index I want to preform a search to return results based on 2 different sets of criteria. Imagine a scenario where where I have an index with a data structure like what is outlined below
I want to preform some sort of query that looks at different "blocks" of criteria. Meaning I want to search by both of the following categories in a single query (if possible):
Category One:
Distance / location
Public = true
--OR--
Category Two:
Distance / location
Public = false
category = "specific category"
(although this is not my exact scenario it is an illustration of what I am facing):
{
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"completed" : {
"type" : "boolean"
},
"deleted" : {
"type" : "boolean"
},
"location" : {
"type" : "geo_point"
},
"public" : {
"type" : "boolean"
},
"uuid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
Please note: I am rather new to Elastic and would appreciate any help with this. I have attempted to search for this question but was not able to find what I was looking for. Please let me know if there is any missing information here I should include.
Sure, you can combine bool queries
POST test_user2737876/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{
"geo_distance": {
"distance": "200km",
"location": {
"lat": 41.12,
"lon": -71.34
}
}
},
{
"term": {
"public": false
}
}
]
}
},
{
"bool": {
"filter": [
{
"geo_distance": {
"distance": "200km",
"location": {
"lat": 41.12,
"lon": -71.34
}
}
},
{
"term": {
"category.keyword": "specific category"
}
},
{
"term": {
"public": false
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Range query on Nested Documents for Keyword Type

I have a document structure like this. For this below two documents, we have nested documents called interaction info. I just need to get only the documents that have title duration and their value is greater than 60
Here whats the catch the value field is keyword, not an integer. I know that only for the Integer Range query will get executed. Is there any possible way to find the documents that have a duration greater than 60 ( Painless Query or Script Query ). Like converting the value Field into Integer and then searching the document.
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "11"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "145"
}
]
},
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "120"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "60"
}
]
}
I have added script to get interactionInfo.value>"somevalue". Scripts are slow and it is better to resolve this at index time and use a range query.
Index:
{
"index15" : {
"mappings" : {
"properties" : {
"interactionInfo" : {
"type" : "nested",
"properties" : {
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"value" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"key" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
Query:
{
"query": {
"nested": {
"path": "interactionInfo",
"query": {
"bool": {
"must": [
{
"term": {
"interactionInfo.title.keyword": {
"value": "duration"
}
}
},
{
"script": {
"script": {
"source":"def val=Integer.parseInt(doc['interactionInfo.value.keyword'].value); if(val>params.value) return true; else return false;",
"params": {
"value":10
}
}
}
}
]
}
},
"inner_hits": {}
}
}
}

Elasticsearch - How to Generate Facets for Doubly Nested Objects

Using elasticsearch 7, I am trying to build facets for doubly nested objects.
So in the example below I would like to pull out the artist id codes from the artistMakerPerson field. I can pull out the association which is nested at a single depth but I can't get the syntax for the nested nested objects.
You could use the following code in Kibana to recreate an example.
My mapping looks like this:
PUT test_artist
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"object" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
},
"copy_to" : [
"global_search"
]
},
"uniqueID" : {
"type" : "keyword",
"copy_to" : [
"global_search"
]
},
"artistMakerPerson" : {
"type" : "nested",
"properties" : {
"association" : {
"type" : "keyword"
},
"name" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "keyword"
},
"text" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
},
"copy_to" : [
"gs_authority"
]
}
}
},
"note" : {
"type" : "text"
}
}
}
}
}
}
Index a document with:
PUT /test_artist/_doc/123
{
"object": "cup",
"uniquedID": "123",
"artistMakerPerson" : [
{
"name" : {
"text" : "Johann Kandler",
"id" : "A6734"
},
"association" : "modeller",
"note" : "probably"
},
{
"name" : {
"text" : "Peter Reinicke",
"id" : "A27702"
},
"association" : "designer",
"note" : "probably"
}
]
}
I am using this query to pull out facets or aggregations for artistMakerPerson.association
GET test_artist/_search
{
"size": 0,
"aggs": {
"artists": {
"nested": {
"path": "artistMakerPerson"
},
"aggs": {
"kinds": {
"terms": {
"field": "artistMakerPerson.association",
"size": 10
}
}
}
}
}
}
and I am rewarded with buckets for designer and modeller but I get nothing when I try to pull out the deeper artist id:
GET test_artist/_search
{
"size": 0,
"aggs": {
"artists": {
"nested": {
"path": "artistMakerPerson"
},
"aggs": {
"kinds": {
"terms": {
"field": "artistMakerPerson.name.id",
"size": 10
}
}
}
}
}
}
What am I doing wrong?
Change the path from artistMakerPerson to artistMakerPerson.name.
GET test_artist/_search
{
"size": 0,
"aggs": {
"artists": {
"nested": {
"path": "artistMakerPerson.name"
},
"aggs": {
"kinds": {
"terms": {
"field": "artistMakerPerson.name.id",
"size": 10
}
}
}
}
}
}

Resources