Find documents with empty string value on elasticsearch - elasticsearch

I've been trying to filter with elasticsearch only those documents that contains an empty string in its body. So far I'm having no luck.
Before I go on, I should mention that I've already tried the many "solutions" spread around the Interwebz and StackOverflow.
So, below is the query that I'm trying to run, followed by its counterparts:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent"
}
}
]
}
}
}
}
}
I've also tried the following:
{
"query": {
"filtered":{
"filter": {
"bool": {
"must_not": [
{
"missing":{
"field":"_textContent",
"existence":true,
"null_value":true
}
}
]
}
}
}
}
}
And the following:
{
"query": {
"filtered":{
"filter": {
"missing": {"field": "_textContent"}
}
}
}
}
None of the above worked. I get an empty result set when I know for sure that there are records that contains an empty string field.
If anyone can provide me with any help at all, I'll be very grateful.
Thanks!

If you are using the default analyzer (standard) there is nothing for it to analyze if it is an empty string. So you need to index the field verbatim (not analyzed). Here is an example:
Add a mapping that will index the field untokenized, if you need a tokenized copy of the field indexed as well you can use a Multi Field type.
PUT http://localhost:9200/test/_mapping/demo
{
"demo": {
"properties": {
"_content": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Next, index a couple of documents.
/POST http://localhost:9200/test/demo/1/
{
"_content": ""
}
/POST http://localhost:9200/test/demo/2
{
"_content": "some content"
}
Execute a search:
POST http://localhost:9200/test/demo/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"_content": ""
}
}
}
}
}
Returns the document with the empty string.
{
took: 2,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 1,
max_score: 0.30685282,
hits: [
{
_index: test,
_type: demo,
_id: 1,
_score: 0.30685282,
_source: {
_content: ""
}
}
]
}
}

Found solution here https://github.com/elastic/elasticsearch/issues/7515
It works without reindex.
PUT t/t/1
{
"textContent": ""
}
PUT t/t/2
{
"textContent": "foo"
}
GET t/t/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "textContent"
}
}
],
"must_not": [
{
"wildcard": {
"textContent": "*"
}
}
]
}
}
}

Even with the default analyzer you can do this kind of search: use a script filter, which is slower but can handle the empty string:
curl -XPOST 'http://localhost:9200/test/demo/_search' -d '
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source._content.length() == 0"
}
}
}
}
}'
It will return the document with empty string as _content without a special mapping
As pointed by #js_gandalf, this is deprecated for ES>5.0. Instead you should use: query->bool->filter->script as in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

For those of you using elastic search 5.2 or above, and still stuck. Easiest way is to reindex your data correctly with the keyword type. Then all the searches for empty values worked. Like this:
"query": {
"term": {"MY_FIELD_TO_SEARCH": ""}
}
Actually, when I reindex my database and rerun the query. It worked =)
The problem was that my field was type: text and NOT a keyword. Changed the index to keyword and reindexed:
curl -X PUT https://username:password#host.io:9200/mycoolindex
curl -X PUT https://user:pass#host.io:9200/mycoolindex/_mapping/mycooltype -d '{
"properties": {
"MY_FIELD_TO_SEARCH": {
"type": "keyword"
},
}'
curl -X PUT https://username:password#host.io:9200/_reindex -d '{
"source": {
"index": "oldindex"
},
"dest": {
"index": "mycoolindex"
}
}'
I hope this helps someone who was as stuck as I was finding those empty values.

OR using lucene query string syntax
q=yourfield.keyword:""
See Elastic Search Reference https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-query-string-query.html#query-string-syntax

in order to find the empty string of one field in your document, it's highly relevant to the field's mapping, in other word, its index/analyzer setting .
If its index is not_analyzed, which means the token is just the empty string, you can just use term query to find it, as follows:
{"from": 0, "size": 100, "query":{"term": {"name":""}}}
Otherwise, if the index setting is analyzed and I believe most analyzer will treat empty string as null value So
you can use the filter to find the empty string.
{"filter": {"missing": {"existence": true, "field": "name", "null_value": true}}, "query": {"match_all": {}}}
here is the gist script you can reference: https://gist.github.com/hxuanji/35b982b86b3601cb5571
BTW, I check the commands you provided, it seems you DON'T want the empty string document.
And all my above command are just to find these, so just put it into must_not part of bool query would be fine.
My ES is 1.0.1.
For ES 1.3.0, currently the gist I provided cannot find the empty string. It seems it has been reported: https://github.com/elasticsearch/elasticsearch/issues/7348 . Let's wait and see how it go.
Anyway, it also provides another command to find
{ "query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"name": {
}
}
}
}
}
} } }
name is the field name to find the empty-string. I've tested it on ES 1.3.2.

I'm using Elasticsearch 5.3 and was having trouble with some of the above answers.
The following body worked for me.
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['city'].empty",
"lang": "painless"
}
}
}
}
}
}
Note: you might need to enable the fielddata for text fields, it is disabled by default. Although I would read this: https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html before doing so.
To enable the fielddata for a field e.g. 'city' on index 'business' with type name 'record' you need:
PUT business/_mapping/record
{
"properties": {
"city": {
"type": "text",
"fielddata": true
}
}
}

If you don't want to or can't re-index there is another way. :-)
You can use the negation operator and a wildcard to match any non-blank string *
GET /my_index/_search?q=!(fieldToLookFor:*)

For nested fields use:
curl -XGET "http://localhost:9200/city/_search?pretty=true" -d '{
"query" : {
"nested" : {
"path" : "country",
"score_mode" : "avg",
"query" : {
"bool": {
"must_not": {
"exists": {
"field": "country.name"
}
}
}
}
}
}
}'
NOTE: path and field together constitute for search. Change as required for you to work.
For regular fields:
curl -XGET 'http://localhost:9200/city/_search?pretty=true' -d'{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "name"
}
}
}
}
}'

I didn't manage to search for empty strings in a text field. However it seems to work with a field of type keyword. So I suggest the following:
delete /test_idx
put test_idx
{
"mappings" : {
"testMapping": {
"properties" : {
"tag" : {"type":"text"},
"content" : {"type":"text",
"fields" : {
"x" : {"type" : "keyword"}
}
}
}
}
}
}
put /test_idx/testMapping/1
{
"tag": "null"
}
put /test_idx/testMapping/2
{
"tag": "empty",
"content": ""
}
GET /test_idx/testMapping/_search
{
"query" : {
"match" : {"content.x" : ""}}}
}
}

You need to trigger the keyword indexer by adding .content to your field name. Depending on how the original index was set up, the following "just works" for me using AWS ElasticSearch v6.x.
GET /my_idx/_search?q=my_field.content:""

I am trying to find the empty fields (in indexes with dynamic mapping) and set them to a default value and the below worked for me
Note this is in elastic 7.x
POST <index_name|pattern>/_update_by_query
{
"script": {
"lang": "painless",
"source": """
if (ctx._source.<field name>== "") {
ctx._source.<field_name>= "0";
} else {
ctx.op = "noop";
}
"""
}
}
I followed one of the responses from the thread and came up with below it will do the same
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "field_name"
}
}
],
"must_not": [
{
"wildcard": {
"field_name": "*"
}
}
]
}
}
}
I am also trying to find the documents in the index that dont have the field and add them with a value
one of the responses from this thread helped me to come up with below
GET index_pattern*/_update_by_query
{
"script": {
"source": "ctx._source.field_name='0'",
"lang": "painless"
},
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "field_name"
}
}
]
}
}
}
Thanks to every one who contributed to this thread I am able to solve my problem

Related

Elasticsearch: Exists Query on Binary Field

I am looking for a way to look up the number of documents that has a certain binary field "not set" in a mapped index. However, standard "Exists" query does not seem to work. Example:
{
"some-index": {
"mappings": {
"some-type": {
"properties": {
"data": {
"type": "binary"
}
}
}
}
}
}
Query:
POST http://.../some-index/some-type/_search?size=1
{
"query":{
"exists":{
"field":"data"
}
}
}
The query above would return 0 result no matter what. My guess is this is because Elasticsearch does not store binary fields in source by default, and "Exists" query only looks up the source.
Is there an alternative to using Exists query, ideally without using extra boolean field in mapping?
Does the following do what you want... I'm creating a template with field1 set as binary type, then indexed a document with just field2 (which I didn't bother defining) then I'm searching for docs without field1. You can run these in the Dev Console in Kibana
PUT _template\binary
{
"template": "binary",
"mappings": {
"binary": {
"properties": {
"field1": {
"type": "binary"
}
}
}
}}
PUT /binary/type/1
{
"field2":"abc"
}
GET binary/_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "field1"
}
}
}
}
}
That should return the doc you just indexed... if you change it to the following, it shouldn't return anything because field2 is present!
GET binary/_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "field2"
}
}
}
}
}

query must match 2 fields exactly, don't analyze

I tried a few different ways of doing a simple get request, filtering on two different attributes, example:
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"email": "erik.landvall#example.com"
}
},
{
"term": {
"password": "bb3810356e9b60cf6..."
}
}
]
}
},
"query": {
"match_all": []
}
}
}
The problem is that I get nothing back in return. As I understand it, this is because ElasticSearch analyzes the email field, making the query fail. So if I however would use the term erik.landvall instead of the complete email address, it will match the document - which confirms that's what's going on.
I can define the attribute as type:string and index:not_analyzed when I create the index. But what if I wanna be able to search on the email attribute in a different context? So there should, to my mind, be a way to specify that I wanna filter on the actual value of the attribute in a query. I can however not find how such a query would look.
Is it possible to force Elasticsearch to use "not_analyze" when querying? If so, then how?
You can use scripting for this purpose. You would have to directly access the JSON you have stored with _source. Try following query
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"inline" : "_source.email==param1 && _source.password==param2",
"params" : {
"param1" : "erik.landvall#example.com",
"param2" : "bb3810356e9b60cf6"
}
}
}
}
}
}
}
You would need to enable dynamic scripting. Add script.inline: on to your yml file and restart the node.
If this kind of query is fairly regular then It would be much better to reindex the data as others have suggested in the comments.
Its not possible to turn on/off analyzed or not, the way to do it to "transform" your field to analysis you need by using fields.
curl -XPUT 'localhost:9200/my_index?pretty' -d'
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
curl -XPUT 'localhost:9200/my_index/my_type/1?pretty' -d'
{
"city": "New York"
}'
curl -XPUT 'localhost:9200/my_index/my_type/2?pretty' -d'
{
"city": "York"
}'
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}'

filter empty array fields in elasticsearch

My document structure is something like:
{
title: string,
description: string,
privacy_mode: string,
hidden: boolean,
added_by: string,
topics: array
}
I am trying to query elasticsearch. However I dont want any document with empty topics array field.
Below is a function which builds the query object:
function getQueryObject(data) {
var orList = [{ "term": {"privacy_mode": "public", "hidden": false} }]
if (data.user) {
orList.push({ "term": {"added_by": data.user} });
}
var queryObj = {
"fields": ["title", "topics", "added_by", "img_url", "url", "type"],
"query": {
"filtered" : {
"query" : {
"multi_match" : {
"query" : data.query + '*',
"fields" : ["title^4", "topics", "description^3", "tags^2", "body^2", "keywords",
"entities", "_id"]
}
},
"filter" : {
"or": orList
},
"filter" : {
"limit" : {"value" : 15}
},
"filter": {
"script": {
"script": "doc['topics'].values.length > 0"
}
}
}
}
}
return queryObj;
};
This still gives me elements with empty topics array. wondering whats wrong!
Thank for the help
You probably want the missing-filter. Your script approach will load all the values of topics into memory, which will be very wasteful if you are not also e.g. faceting on them.
Also, the structure of your filter is wrong. You cannot have repeated values for filter, but should wrap them with a bool-filter. (Here is why you usually want to use bool and not and|or|not: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
Lastly, you probably want to specify the size on the search object, instead of using the limit-filter.
I made a runnable example you can play with: https://www.found.no/play/gist/aa59b987269a24feb763
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"privacy_mode":"public","topics":["foo","bar"]}
{"index":{"_index":"play","_type":"type"}}
{"privacy_mode":"private","topics":[]}
'
# Do searches
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"privacy_mode": "public"
}
}
],
"must_not": [
{
"missing": {
"field": "topics"
}
}
]
}
}
}
}
}
'
The keyword missing is remove since ES5.0 and it suggests using exists(see here):
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type:
application/json' -d'
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "topics"
}
}
}
}
}'

Find empty strings in elasticsearch

I'm trying to _search documents that has some specific value in the field.
{
"query": {
"bool": {
"must": [
{"field": {"advs.status": "warn"}}
]
}
}
}
That works find. But when I'm trying to find documents that has empty string in that field, I get this error:
ParseException[Cannot parse '' ...
and then - long list of what was expected instead of empty string.
I try this query:
{
"query": {
"bool": {
"must": [
{"term": {"advs.status": ""}}
]
}
}
}
It doesn't fails but finds nothing. It works for non empty strings instead. How am I supposed to do this?
My mapping for this type looks exactly like this:
{
"reports": {
"dynamic": "false",
"_ttl": {
"enabled": true,
"default": 7776000000
},
"properties": {
"#fields": {
"dynamic": "true",
"properties": {
"upstream_status": {
"type": "string"
}
}
},
"advs": {
"properties": {
"status": {
"type": "string",
"store": "yes"
}
}
},
"advs.status": {
"type": "string",
"store": "yes"
}
}
}
}
Or another way to do the same thing more efficiently is to use the exists filter:
"exists" : {
"field" : "advs.status"
}
Both are valid, but this one is better :)
You can try this temporary solution which works but isn't optimal - https://github.com/elastic/elasticsearch/issues/7515
PUT t/t/1
{
"textContent": ""
}
PUT t/t/2
{
"textContent": "foo"
}
GET t/t/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "textContent"
}
}
],
"must_not": [
{
"wildcard": {
"textContent": "*"
}
}
]
}
}
}
Try using must_not with missing in your bool:
"must_not":{
"missing":{
"field":"advs.status",
"existence":true,
"null_value":true
}
}
If tou want to search for fields containing an empty string, either you change your mapping to set not_analyzed to this particular field or you can use a script filter:
"filter": {
"script": {
"script": "_source.advs.status.length() == 0"
}
}
I generally use a filter if the field is not analyzed. Here is snippet:
{
"filtered": {
"filter": {
"term": {
"field": ""
}
}
}
},
the "missing" does work only for null values or not being there at all. Matching empty string was already answered here: https://stackoverflow.com/a/25562877/155708

Elasticsearch: Facet script and facet_filter

Is it possible to use facet script and facet filter in elasticsearch like this?
{
"facets": {
"judges": {
"terms": {
"field": "judges.untouched",
"size": 10,
"all_terms": false,
"script": { "script": "...", "params": { }}
},
"global_facets": false,
"facet_filter": {
"and": [
{
"query": {
"query_string": {
"query": "..... ",
"fields": [
"judges.analyzed"
],
"default_operator": "and",
"analyze_wildcard": true
}
}
}
]
}
}
}
}
Because when i run this query, elasticsearch raises error: Parse Failure [No facet type found for [and]]]; }.
Thanks
EDIT Incorrect answer. I'm leaving it because of context.
To clarify: and is an appropriate filter and should be accepted by facet_filter. Not sure what's up.
Untested, but from the docs: (http://www.elasticsearch.org/guide/reference/api/search/facets/)
All facets can be configured with an additional filter (explained in the Query DSL section)
So you need to put an appropriate query in facet_filter. And is NOT an appropriate filter (the error you receive could be clearer)
e.g:
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
You'd probably want something like:
"facet_filter" : {
"query_string": {
"query": "..... ",
"fields": [
"judges.analyzed"
],
"default_operator": "and",
"analyze_wildcard": true
}
}
The syntax for an and filter is:
"facet_filter": {
"and": {
"filters": [
{
// filter definition
},
{
// another filter definition
}
]
}
}
But you're using only a single condition, so there's no need of an and filter.
You should just have:
"facet_filter": {
"query": {
"query_string": {
"query": "..."
}
}
}

Resources