Find all documents where field exists literally: either it has a value or it is null in Elasticsearch - elasticsearch

I know how to determine the documents in Elasticsearch 6.8 with the non empty field, e.g.:
GET grch38_test__wes__grch38__variants__20210222/_search
{
"query": {
"bool": {
"must": [{
"exists": {
"field": "hgmd_accession"
}
}]
}
}
}
But how to return existing (non null) together with empty values in one query? I need to find the documents where the value literally exists: either empty or set to null. There can be some documents in my index where the field is just not there at all and I need to _reindex the ones that just have the field present in any form.

I don't think null values can be searched because they are not indexed by elasticsearch.
If you can change your index mapping then you should look into the null_value property provided by elasticsearch.
Find it here: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/null-value.html

Related

Elasticsearch: Query to search if field not exists at all, should not match [ ] (empty array field)

I have some documents with field links : [] while other documents don't have the field links at all.
I want to get documents which don't have the field links at all.
I have tried the following query:
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "links"
}
}
}
}
}
But this query also returns the documents with links:[]
Your best bet is to modify mapping of field to consider null values , refer to this link ( documentation ) .
You could use a wildcard query * inside boolean to see if it got any terms - but thats a very inefficient / slow way to query and may not be practical depending on cardinality of that field.

Multi_match elasticsearch on all fields with boost to specific fields

I am using Elastic 6.1+
I have created an index and added some values to it, the index mapping is text and numbers.
I want to create a multi_match on all of the fields in the index, query a text or a number and get the results back.
Also i would like to define that the score of field1 on the index is boosted
For some reason once i add the fields array it only search on that fields (added it in order to be able to define which field i want to boost and how much) and if i add to the fields array the "*" as field it return an error.
GET MyIndex/_search
{
"query": {
"multi_match": {
"query": "test1",
"fields": [
"field1^3",
"*"
]
}
}
}
Thank you
Apparently adding
"lenient": true
to the query solved the problem

Find documents in Elasticsearch where `ignore_malformed` was triggered

Elasticsearch by default throws an exception if inserting data to a field which does not fit the existing type. For example, if a field has been created as number type, inserting a document with a string value for that field causes an error.
This behavior can be changed by enabling then ignore_malformed setting, which means such fields are silently ignored for indexing purposes, but retained in the _source document - meaning that the invalid values cannot be searched or aggregated, but are still included in the returned document.
This is preferable behavior in our use case, but we would wish to be able to locate such documents somehow so we can fix them in the future.
Is there any way to somehow flag documents for which some malformed fields were ignored? We control the document insertion process fully, so we can modify all insertion flags, or do a trial insert, or anything, to reach our goal.
You can use the exists query to find document where this field does not exist, see this example
PUT foo
{
"mappings": {
"bar": {
"properties": {
"baz": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
PUT foo/bar/1
{
"baz": "field"
}
GET foo/bar/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "baz"
}
}
]
}
}
}
}
}
There is no dedicated mechanism though, so this search finds also documents where the field is not set intentionally
You cannot, when you search on elasticsearch, you don't search on document source but on the inverted index, which contains the analyzed data.
ignore_malformed flag is saying "always store document, analyze if possible".
You can try, create a mal-formed document, and use _termvectors API to see how the document is analyzed and stored in the inverted index, in a case of a string field, you can see an "Array" is stored as an empty string etc.. but the field will exists.
So forget the inverted index, let's use the source!
Scroll all your data until you find the anomaly, I use a small python script that search scroll, unserialize and I test field type for every documents (very long) but I can have a list of wrong document IDs.
Use a script query can be very long and crash your cluster, use with caution, maybe as a post_filter:
Here I want to retrieve the document where country_name is not a string:
{
"_source": false,
"timeout" : "30s",
"query" : {
"query_string" : {
"query" : "locale:de_ch"
}
},
"post_filter": {
"script": {
"script": "!(_source.country_name instanceof String)"
}
}
}
"_source:false" => I want only document ID
"timeout" => prevent crash
As you notice, this is a missing feature, I know logstash will tag
document that fail, so elasticsearch could implement the same thing.

In Elasticsearch match query how to deal with slash

I have a match query searching for a type of doc:
{
"query": {
"bool": {
"should": {
"match": {
"ph1_enc": "EAAQnb1kMr/e2/ADqo"
}
}
}
}
}
"EAAQnb1kMr/e2/ADqo" is the string i'm trying to match, however in the search results I can see multiple records with substring "/e2/" are also returned.
Looks like "/e2/" is indexed separately, so that this could happen.I thought the match query is to do full-text match... Is it because I missed something when creating the template? Any idea?
Add-on instead of reindex, how to modify the query to match the exact value in the query?
Which analyzer do you set in the mapping to index your data?
If you are using the default one (standard analyzer), then according to the documentation, this uses the default tokenizer that seems to split also the text by slash ('/'). The documentation redirects here for more information about the tokenizer.
So, that will index the following words 'EAAQnb1kMr', 'e2', and 'ADqo'. Accordingly, your query value will also been analyzed the same way the field was indexed. That is why documents with 'e2' are also being returned.
If you don't need to tokenize the 'ph1_enc' field, you can just set its type in the mapping as 'keyword'.
"properties": {
"ph1_enc": {
"type": "keyword"
}
}
That will not analyze the field and it will match exactly while you query.
I hope that it helps.

Why I can retrieve records in Elastic search using bool query?

I've inserted a record in ElasticSearch an I can see that here:
But this query returns nothing:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"term": {
"name": "Ehsanl"
}
}
}
}
}
}
}
I post this query using post method to this user: http://127.0.0.1:9200/mydb/customers2/_search
What's wrong with that?
Try giving the name as "ehsanl". All in lower case.
What you see on your screenshot is the original document as you indexed it (_source field).
However, by default, string fields are analyzed (see this answer for more detail about analysis).
Using standard analyzer, your name value should have been lowercased to ehsanl and stored this way in the index : term queries search for the exact value Ehsanl in the index, which doesn't exist.
You can either :
use ehsanl value with term query
use Ehsanl value with a match query, which will apply the same analyzer before to search.

Resources