Issues with null value in Elasticsearch - elasticsearch

Here's an example of my data :
{
"MOD_DATE_START": "2010-04-20T15:05:49Z",
"MOD_DATE_END": null,
"MOD_ID": "123456789",
}
I'm having some issues with my Elasticsearch query. I have a couple of date fields where I am doing a range based filtering to make sure that my date is in between the start and end dates.
My first query (which works well) is filtering on the :
curl -s -XPOST http://server:9200/myindex/mytype/_search?pretty=true -d '
{
"fields": ["MOD_ID", "MOD_DATE_START", "MOD_DATE_END"],
"query": {
"bool": {
"must": [
{"term": {"MOD_ID": "123456789"}},
{"range": {"MOD_DATE_START": {"lte": "2012-04-20T15:05:49Z"}}}
]
}
}
}
'
The MOD_DATE_START field always contains information, so the first query works well.
Since the second date field, MOD_DATE_END, is null in most cases I would like to modify my query too add the following test :
IF "MOD_DATE_END" NOT NULL then
{"range": {"MOD_DATE_END": {"gte": "2012-04-20T15:05:49Z"}}}
ELSE skip "MOD_DATE_END"
I am, however, not quite able to figure out how to modify my query to add the third condition to be able to perform the gte test successfully.
Thanks in advance for your help.

One way to achieve this is by using a missing filter in a filtered query.
Example below :
curl -s -XPOST http://server:9200/myindex/mytype/_search?pretty=true -d '
{
"fields": ["MOD_ID", "MOD_DATE_START", "MOD_DATE_END"],
"query": {
"filtered": {
"filter": {
"bool": {
"must": {
"range": {
"MOD_DATE_START": {
"lte": "2012-04-20T15:05:49Z"
}
}
},
"should": [
{
"missing": {
"field": "MOD_DATE_END",
"null_value": true,
"existence": true
}
},
{
"range": {
"MOD_DATE_START": {
"gte": "2012-04-20T15:05:49Z"
}
}
}
]
}
},
"query": {
"term": {
"MOD_ID": "123456789"
}
}
}
}
}
'

Related

elasticsearch must query combine OR?

I have been trying to use a must query with bool but I am failing to get the results.
In pseudo-SQL:
SELECT * FROM info WHERE (ulevel= '1.3.10' or ulevel= '1.3.6') AND (#timestamp between '2017-06-05T07:00:00.000Z' and '2017-06-05T07:00:00.000Z')
Here is what I have:
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "_all",
"query": "*"
},
"range": {
"#timestamp": {
"from": "2017-06-05T07:00:00.000Z",
"to": "2017-06-05T07:20:00.000Z"
}
},
"bool": {
"should": [
{"term": { "ulevel": "1.3.10"}},
{"term": { "ulevel": "1.3.6"}}
]
}
}
]
}
}
Does anyone have a solution?
Thank you so much.
You can use terms query for the first part and the range query for the second part
GET _search
{
"query": {
"bool": {
"must": [
{
"terms": {
"ulevel": [
"1.3.10",
"1.3.6"
]
}
},
{
"range": {
"#timestamp": {
"gte": "2017-06-05T07:00:00.000Z",
"lte": "2017-06-05T07:20:00.000Z"
}
}
}
]
}
},
"from": 0,
"size": 20
}
Some Notes :
Filters documents that have fields that match any of the provided terms (not analyzed)
Also you can use some date spesific formulation with rage filter. Please check the range query page https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#ranges-on-dates more information.
Update:
Added from and size for comment question.

Elasticsearch combining queries

I've looked at the other examples on StackOverflow and on the Elastic site, but I cannot get this combined Elasticsearch query to work.
Individually projectName and timestamp queries work, but not this combined query:
curl -XGET "http://localhost/jenkins/_search/exists" -d'{"query" : {"bool": {"must": [{"match": {"data.projectName": {"query": "QA_Deployment","type": "phrase"}}}]},{"range": {"#timestamp": {"gte": "now-30d","lte": "now"}}}}}'
I changed two things, there was a space missing between the -d', not sure if that is a problem though. The other thing is the second query. This should be within the bool>must part as well. This should work:
curl -XGET "http://localhost/jenkins/_search/exists" -d '
{
"query" : {
"bool": {
"must": [
{
"match": {
"data.projectName": {
"query": "QA_Deployment",
"type": "phrase"
}
}
},
{
"range": {
"#timestamp": {
"gte": "now-30d",
"lte": "now"
}
}
}
]
}
}
}'

How to dynamically assign value in elasticsearch query?

I have Parent-Child mapping. Can I dynamically assign value in elasticsearch's query? I want max_children and min_children to be evaluated per parent. I've tried something like this, but it is not working:
"min_children": {
"script": "doc['boxes_count'].value"
},
"max_children": {
"script": "doc['boxes_count'].value"
}
Current query looks like that, and is actually working
curl -XPOST 'localhost:9200/star_cars_development_car_washes/_search?pretty' -d '
{
"query": {
"has_child" : {
"type" : "reservation",
"min_children": 10,
"max_children": 10,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"date": {"lte": "2015-07-21T03:36:00.000+03:00"}
}
},
{
"range": {
"ends_at": {"gte": "2015-07-21T03:36:00.000+03:00"}
}
}
]
}
}
}
}
}
}
}'

How to include a combination of term and terms filters inside a single bool filter in elastic search?

I am using logstash to store logs in elasticsearch database. I want to get logs having a particular severitylabel and are between certain time stamps and matches to some specific message. The curl query I wrote is :
curl -XPOST 'localhost:9200/logstash-2015.06.19/_search/?pretty' -d '{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"#message": "session"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2015-06-19T10:11:44.000Z",
"lte": "2015-06-19T11:11:44.000Z"
}
}
},
{
"term": {
"#app": "sparta"
}
},
{
"terms": {
"#severityLabel": [
"INFO",
"WARN",
"ERROR",
"FATAL",
"OFF"
]
}
}
]
}
}
}
} } '
It always shows zero documents, matched. I am using term filter as a sibling of terms filter, is that a problem?

Find documents with empty string value on elasticsearch

I've been trying to filter with elasticsearch only those documents that contains an empty string in its body. So far I'm having no luck.
Before I go on, I should mention that I've already tried the many "solutions" spread around the Interwebz and StackOverflow.
So, below is the query that I'm trying to run, followed by its counterparts:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent"
}
}
]
}
}
}
}
}
I've also tried the following:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent",
"existence":true,
"null_value":true
}
}
]
}
}
}
}
}
And the following:
{
"query": {
"filtered":{
"filter": {
"missing": {"field": "_textContent"}
}
}
}
}
None of the above worked. I get an empty result set when I know for sure that there are records that contains an empty string field.
If anyone can provide me with any help at all, I'll be very grateful.
Thanks!
If you are using the default analyzer (standard) there is nothing for it to analyze if it is an empty string. So you need to index the field verbatim (not analyzed). Here is an example:
Add a mapping that will index the field untokenized, if you need a tokenized copy of the field indexed as well you can use a Multi Field type.
PUT http://localhost:9200/test/_mapping/demo
{
"demo": {
"properties": {
"_content": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Next, index a couple of documents.
/POST http://localhost:9200/test/demo/1/
{
"_content": ""
}
/POST http://localhost:9200/test/demo/2
{
"_content": "some content"
}
Execute a search:
POST http://localhost:9200/test/demo/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"_content": ""
}
}
}
}
}
Returns the document with the empty string.
{
took: 2,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 1,
max_score: 0.30685282,
hits: [
{
_index: test,
_type: demo,
_id: 1,
_score: 0.30685282,
_source: {
_content: ""
}
}
]
}
}
Found solution here https://github.com/elastic/elasticsearch/issues/7515
It works without reindex.
PUT t/t/1
{
"textContent": ""
}
PUT t/t/2
{
"textContent": "foo"
}
GET t/t/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "textContent"
}
}
],
"must_not": [
{
"wildcard": {
"textContent": "*"
}
}
]
}
}
}
Even with the default analyzer you can do this kind of search: use a script filter, which is slower but can handle the empty string:
curl -XPOST 'http://localhost:9200/test/demo/_search' -d '
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source._content.length() == 0"
}
}
}
}
}'
It will return the document with empty string as _content without a special mapping
As pointed by #js_gandalf, this is deprecated for ES>5.0. Instead you should use: query->bool->filter->script as in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
For those of you using elastic search 5.2 or above, and still stuck. Easiest way is to reindex your data correctly with the keyword type. Then all the searches for empty values worked. Like this:
"query": {
"term": {"MY_FIELD_TO_SEARCH": ""}
}
Actually, when I reindex my database and rerun the query. It worked =)
The problem was that my field was type: text and NOT a keyword. Changed the index to keyword and reindexed:
curl -X PUT https://username:password#host.io:9200/mycoolindex
curl -X PUT https://user:pass#host.io:9200/mycoolindex/_mapping/mycooltype -d '{
"properties": {
"MY_FIELD_TO_SEARCH": {
"type": "keyword"
},
}'
curl -X PUT https://username:password#host.io:9200/_reindex -d '{
"source": {
"index": "oldindex"
},
"dest": {
"index": "mycoolindex"
}
}'
I hope this helps someone who was as stuck as I was finding those empty values.
OR using lucene query string syntax
q=yourfield.keyword:""
See Elastic Search Reference https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-query-string-query.html#query-string-syntax
in order to find the empty string of one field in your document, it's highly relevant to the field's mapping, in other word, its index/analyzer setting .
If its index is not_analyzed, which means the token is just the empty string, you can just use term query to find it, as follows:
{"from": 0, "size": 100, "query":{"term": {"name":""}}}
Otherwise, if the index setting is analyzed and I believe most analyzer will treat empty string as null value So
you can use the filter to find the empty string.
{"filter": {"missing": {"existence": true, "field": "name", "null_value": true}}, "query": {"match_all": {}}}
here is the gist script you can reference: https://gist.github.com/hxuanji/35b982b86b3601cb5571
BTW, I check the commands you provided, it seems you DON'T want the empty string document.
And all my above command are just to find these, so just put it into must_not part of bool query would be fine.
My ES is 1.0.1.
For ES 1.3.0, currently the gist I provided cannot find the empty string. It seems it has been reported: https://github.com/elasticsearch/elasticsearch/issues/7348 . Let's wait and see how it go.
Anyway, it also provides another command to find
{ "query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"name": {
}
}
}
}
}
} } }
name is the field name to find the empty-string. I've tested it on ES 1.3.2.
I'm using Elasticsearch 5.3 and was having trouble with some of the above answers.
The following body worked for me.
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['city'].empty",
"lang": "painless"
}
}
}
}
}
}
Note: you might need to enable the fielddata for text fields, it is disabled by default. Although I would read this: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html before doing so.
To enable the fielddata for a field e.g. 'city' on index 'business' with type name 'record' you need:
PUT business/_mapping/record
{
"properties": {
"city": {
"type": "text",
"fielddata": true
}
}
}
If you don't want to or can't re-index there is another way. :-)
You can use the negation operator and a wildcard to match any non-blank string *
GET /my_index/_search?q=!(fieldToLookFor:*)
For nested fields use:
curl -XGET "http://localhost:9200/city/_search?pretty=true" -d '{
"query" : {
"nested" : {
"path" : "country",
"score_mode" : "avg",
"query" : {
"bool": {
"must_not": {
"exists": {
"field": "country.name"
}
}
}
}
}
}
}'
NOTE: path and field together constitute for search. Change as required for you to work.
For regular fields:
curl -XGET 'http://localhost:9200/city/_search?pretty=true' -d'{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "name"
}
}
}
}
}'
I didn't manage to search for empty strings in a text field. However it seems to work with a field of type keyword. So I suggest the following:
delete /test_idx
put test_idx
{
"mappings" : {
"testMapping": {
"properties" : {
"tag" : {"type":"text"},
"content" : {"type":"text",
"fields" : {
"x" : {"type" : "keyword"}
}
}
}
}
}
}
put /test_idx/testMapping/1
{
"tag": "null"
}
put /test_idx/testMapping/2
{
"tag": "empty",
"content": ""
}
GET /test_idx/testMapping/_search
{
"query" : {
"match" : {"content.x" : ""}}}
}
}
You need to trigger the keyword indexer by adding .content to your field name. Depending on how the original index was set up, the following "just works" for me using AWS ElasticSearch v6.x.
GET /my_idx/_search?q=my_field.content:""
I am trying to find the empty fields (in indexes with dynamic mapping) and set them to a default value and the below worked for me
Note this is in elastic 7.x
POST <index_name|pattern>/_update_by_query
{
"script": {
"lang": "painless",
"source": """
if (ctx._source.<field name>== "") {
ctx._source.<field_name>= "0";
} else {
ctx.op = "noop";
}
"""
}
}
I followed one of the responses from the thread and came up with below it will do the same
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "field_name"
}
}
],
"must_not": [
{
"wildcard": {
"field_name": "*"
}
}
]
}
}
}
I am also trying to find the documents in the index that dont have the field and add them with a value
one of the responses from this thread helped me to come up with below
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "field_name"
}
}
]
}
}
}
Thanks to every one who contributed to this thread I am able to solve my problem

Resources