ElasticSearch: Multi_fields parameter not working - elasticsearch

I have a multi fields parameter "startTime", below is the mapping
"startTime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"raw" : {
"type" : "date",
"format" : "dd-MM-yyyy HH:mm:ss||dd-MM-yyyy||hour_minute_second"
}
}
}
i inserted few documents
{
"orgId" => "backendorg",
"startTime" => "01-01-1980 06:32:51"
}
{
"orgId" => "backendorg",
"startTime" => "01-01-1980 06:35:51"
}
{
"orgId" => "backendorg",
"startTime" => "01-01-1980 06:39:51"
}
when i am trying to filter startTime based on below query it is returning empty result
{
"query": {
"bool": {
"must": [
{
"term": {
"orgId": {
"value": "backendorg",
"boost": 1
}
}
},
{
"bool": {
"should": [
{
"range": {
"startTime": {
"from": "01-01-1980 06:32:51",
"to": "01-01-1980 06:39:51",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
]
}
}
]
}
}
}
could someone tell me what is wrong with my query or mapping.

Since your date field is a sub-field called startTime.raw you need to use it in your range query
{
"range": {
"startTime.raw": { <----- change this
"from": "01-01-1980 06:32:51",
"to": "01-01-1980 06:39:51",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
]

Related

Elasticsearch Multi-Term Auto Completion

I'm trying to implement the Multi-Term Auto Completion that's presented here.
Filtering down to the correct documents works, but when aggregating the completion_terms they are not filtered to those that match the current partial query, but instead include all completion_terms from any matched documents.
Here are the mappings:
{
"mappings": {
"dynamic" : "false",
"properties" : {
"completion_ngrams" : {
"type" : "text",
"analyzer" : "completion_ngram_analyzer",
"search_analyzer" : "completion_ngram_search_analyzer"
},
"completion_terms" : {
"type" : "keyword",
"normalizer" : "completion_normalizer"
}
}
}
}
Here are the settings:
{
"settings" : {
"index" : {
"analysis" : {
"filter" : {
"edge_ngram" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "10"
}
},
"normalizer" : {
"completion_normalizer" : {
"filter" : [
"lowercase",
"german_normalization"
],
"type" : "custom"
}
},
"analyzer" : {
"completion_ngram_search_analyzer" : {
"filter" : [
"lowercase"
],
"tokenizer" : "whitespace"
},
"completion_ngram_analyzer" : {
"filter" : [
"lowercase",
"edge_ngram"
],
"tokenizer" : "whitespace"
}
}
}
}
}
}
}
I'm then indexing data like this:
{
"completion_terms" : ["Hammer", "Fortis", "Tool", "2000"],
"completion_ngrams": "Hammer Fortis Tool 2000"
}
Finally, the autocomplete search looks like this:
{
"query": {
"bool": {
"must": [
{
"term": {
"completion_terms": "fortis"
}
},
{
"term": {
"completion_terms": "hammer"
}
},
{
"match": {
"completion_ngrams": "too"
}
}
]
}
},
"aggs": {
"autocomplete": {
"terms": {
"field": "completion_terms",
"size": 100
}
}
}
}
This correctly returns documents matching the search string "fortis hammer too", but the aggregations include ALL completion terms that are included in any of the matched documents, e.g. for the query above:
"buckets": [
{ "key": "fortis" },
{ "key": "hammer" },
{ "key": "tool" },
{ "key": "2000" },
]
Ideally, I'd expect
"buckets": [
{ "key": "tool" }
]
I could filter out the terms that are already covered by the search query ("fortis" and "hammer" in this case) in the app, but the "2000" doesn't make any sense from a user's perspective, because it doesn't partially match any of the provided search terms.
I understand why this is happening, but I can't think of a solution. Can anyone help?
try filters agg please
{
"query": {
"bool": {
"must": [
{
"term": {
"completion_terms": "fortis"
}
},
{
"term": {
"completion_terms": "hammer"
}
},
{
"match": {
"completion_ngrams": "too"
}
}
]
}
},
"aggs": {
"findOuthammerAndfortis": {
"filters": {
"filters": {
"fortis": {
"term": {
"completion_terms": "fortis"
}
},
"hammer": {
"term": {
"completion_terms": "hammer"
}
}
}
}
}
}
}

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

ElasticSearch Aggregation Filter (not nested) Array

I have mapping like that:
PUT myindex1/_mapping
{
"properties": {
"program":{
"properties":{
"rounds" : {
"properties" : {
"id" : {
"type" : "keyword"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
And my example docs:
POST myindex1/_doc
{
"program": {
"rounds":[
{"id":"00000000-0000-0000-0000-000000000000", "name":"Test1"},
{"id":"00000000-0000-0000-0000-000000000001", "name":"Fact2"}
]
}
}
POST myindex1/_doc
{
"program": {
"rounds":[
{"id":"00000000-0000-0000-0000-000000000002", "name":"Test3"},
{"id":"00000000-0000-0000-0000-000000000003", "name":"Fact4"}
]
}
}
POST myindex1/_doc
{
"program": {
"rounds":[
{"id":"00000000-0000-0000-0000-000000000004", "name":"Test5"},
{"id":"00000000-0000-0000-0000-000000000005", "name":"Fact6"}
]
}
}
Purpose: get only names of rounds that filtered as wildcard by user.
Aggregation query:
GET myindex1/_search
{
"aggs": {
"result": {
"aggs": {
"names": {
"terms": {
"field": "program.rounds.name.keyword",
"size": 10000,
"order": {
"_key": "asc"
}
}
}
},
"filter": {
"bool": {
"must":[
{
"wildcard": {
"program.rounds.name": "*test*"
}
}
]
}
}
}
},
"size": 0
}
This aggregation returns all 6 names, but I need only Test1,Test3,Test5. Also tried include": "/tes.*/i" regex pattern for terms, but ignore case does not work.
Note: I'm note sure abount nested type, because I don't interested in association between Id and Name (at least for now).
ElasticSearch version: 7.7.0
If you want to only aggregate specific rounds based on a condition on the name field, then you need to make rounds nested, otherwise all name values end up in the same field.
Your mapping needs to be changed to this:
PUT myindex1/
{
"mappings": {
"properties": {
"program": {
"properties": {
"rounds": {
"type": "nested", <--- add this
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
And then your query needs to change to this:
GET myindex1/_search
{
"size": 0,
"query": {
"nested": {
"path": "program.rounds",
"query": {
"bool": {
"must": [
{
"wildcard": {
"program.rounds.name": "*Test*"
}
}
]
}
}
}
},
"aggs": {
"rounds": {
"nested": {
"path": "program.rounds"
},
"aggs": {
"name_filter": {
"filter": {
"wildcard": {
"program.rounds.name": "*Test*"
}
},
"aggs": {
"names": {
"terms": {
"field": "program.rounds.name.keyword",
"size": 10000,
"order": {
"_key": "asc"
}
}
}
}
}
}
}
}
}
And the result will be:
"aggregations" : {
"rounds" : {
"doc_count" : 6,
"name_filter" : {
"doc_count" : 3,
"names" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Test1",
"doc_count" : 1
},
{
"key" : "Test3",
"doc_count" : 1
},
{
"key" : "Test5",
"doc_count" : 1
}
]
}
}
}
}
UPDATE:
Actually, you can achieve what you want without introducing nested types with the following query. You were close, but the include pattern was wrong
GET myindex1/_search
{
"aggs": {
"result": {
"aggs": {
"names": {
"terms": {
"field": "program.rounds.name.keyword",
"size": 10000,
"include": "[Tt]est.*",
"order": {
"_key": "asc"
}
}
}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"program.rounds.name": "*Test*"
}
}
]
}
}
}
},
"size": 0
}

How to use transform {"type": "flatten"}, and get data buckets in vega-lite

I am trying to create a visualize vega-lite/v2.json with transform ("type": "flatten").
but it's not working.
data
code:
{
$schema: https://vega.github.io/schema/vega-lite/v2.json
data: {
url: {
index: elk_map_statistic-*
body: {
size: 0,
"query": {
"bool": {
"must": [
{"range": {
"timestamp": {
"gte": "now-6m",
"lt": "now"
}
}
]
}
},
"aggs": {
"group_by_host": {
"terms": {
"field": "DestBuilding.keyword",
"size": 20000
},
"aggs" : {
"DestRoom" : {
"terms" : {
"field" : "DestRoom.keyword",
"size" : 500
}
},
"avg" : { "avg" : { "field" : "avg" } },
"types_count" : { "value_count" : { "field" : "count" } }
}
}
}
}
}
format: {property: "aggregations.group_by_host.buckets"}
},
"transform": [
{
"type": "flatten",
"fields": ["brand.buckets"],
"as": ["val"]
}
],
mark: rect
"encoding": {
"x": {"field": "val.key", "type": "ordinal"},
"y": {"field": "key", "type": "ordinal"}
},
}
How to use ("type": "flatten"), if I want to get data in buckets?
I found an example for vega:https://github.com/vega/vega/issues/1155

How to Query elasticsearch index with nested and non nested fields

I have an elastic search index with the following mapping:
PUT /student_detail
{
"mappings" : {
"properties" : {
"id" : { "type" : "long" },
"name" : { "type" : "text" },
"email" : { "type" : "text" },
"age" : { "type" : "text" },
"status" : { "type" : "text" },
"tests":{ "type" : "nested" }
}
}
}
Data stored is in form below:
{
"id": 123,
"name": "Schwarb",
"email": "abc#gmail.com",
"status": "current",
"age": 14,
"tests": [
{
"test_id": 587,
"test_score": 10
},
{
"test_id": 588,
"test_score": 6
}
]
}
I want to be able to query the students where name like '%warb%' AND email like '%gmail.com%' AND test with id 587 have score > 5 etc. The high level of what is needed can be put something like below, dont know what would be the actual query, apologize for this messy query below
GET developer_search/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "abc"
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": IN [587]
}
},
{
"term": {
"tests.test_score": >= some value
}
}
]
}
}
}
}
]
}
}
}
The query must be flexible so that we can enter dynamic test Ids and their respective score filters along with the fields out of nested fields like age, name, status
Something like that?
GET student_detail/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"name": {
"value": "*warb*"
}
}
},
{
"wildcard": {
"email": {
"value": "*gmail.com*"
}
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Inner hits is what you are looking for.
You must make use of Ngram Tokenizer as wildcard search must not be used for performance reasons and I wouldn't recommend using it.
Change your mapping to the below where you can create your own Analyzer which I've done in the below mapping.
How elasticsearch (albiet lucene) indexes a statement is, first it breaks the statement or paragraph into words or tokens, then indexes these words in the inverted index for that particular field. This process is called Analysis and that this would only be applicable on text datatype.
So now you only get the documents if these tokens are available in inverted index.
By default, standard analyzer would be applied. What I've done is I've created my own analyzer and used Ngram Tokenizer which would be creating many more tokens than just simply words.
Default Analyzer on Life is beautiful would be life, is, beautiful.
However using Ngrams, the tokens for Life would be lif, ife & life
Mapping:
PUT student_detail
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 4,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"email" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"age" : {
"type" : "text" <--- I am not sure why this is text. Change it to long or int. Would leave this to you
},
"status" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"tests":{
"type" : "nested"
}
}
}
}
Note that in the above mapping I've created a sibling field in the form of keyword for name, email and status as below:
"name":{
"type":"text",
"analyzer":"my_analyzer",
"fields":{
"keyword":{
"type":"keyword"
}
}
}
Now your query could be as simple as below.
Query:
POST student_detail/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "war" <---- Note this. This would even return documents having "Schwarb"
}
},
{
"match": {
"email": "gmail" <---- Note this
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
}
}
}
]
}
}
}
Note that for exact matches I would make use of Term Queries on keyword fields while for normal searches or LIKE in SQL I would make use of simple Match Queries on text Fields provided they make use of Ngram Tokenizer.
Also note that for >= and <= you would need to make use of Range Query.
Response:
{
"took" : 233,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.7260926,
"hits" : [
{
"_index" : "student_detail",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.7260926,
"_source" : {
"id" : 123,
"name" : "Schwarb",
"email" : "abc#gmail.com",
"status" : "current",
"age" : 14,
"tests" : [
{
"test_id" : 587,
"test_score" : 10
},
{
"test_id" : 588,
"test_score" : 6
}
]
}
}
]
}
}
Note that I observe the document you've mentioned in your question, in my response when I run the query.
Please do read the links I've shared. It is vital that you understand the concepts. Hope this helps!

Resources