ElasticSearch get all fields even if their value is null - elasticsearch

I want to search ElasticSearch and retrieve specific fields from all records, no matter their value. But response contains for each record only the fields whose value is not null. Is there a way to force ElasticSearch to return the exact same number of fields for all records?
Example Request:
{
"fields" : ["Field1","Field2","Field3"],
"query" : {
"match_all" : {}
}
}
Example Response:
{
"hits": [
{
"fields": {
"Field1": [
"bla"
],
"Field2": [
"test"
]
}
},
{
"fields": {
"Field1": [
"bla"
],
"Field2": [
"test"
],
"Field3": [
"somevalue"
]
}
}
]
}
My goal is to get something for "Field3" in the first hit.

As per the guide given in the following link, It clearly says that any fields which has null,[] or "" are not stored or not indexed in the document. Its an inverted index concept and has to be handled in the program explicitly.
link - http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_dealing_with_null_values.html

Related

How to formulate an and condition for a nested object in Opensearch?

Opensearch ingests documents similar to this example (its just a minimal example):
PUT nested_test/_doc/4
{
"log": "This is a fourth log message",
"function": "4 test function",
"related_objects": [
{ "type": "user", "id": "10" },
{ "type": "offer", "id": "120" }
]
}
PUT nested_test/_doc/5
{
"log": "This is a fifth log message",
"function": "5 test function",
"related_objects": [
{ "type": "user", "id": "120" },
{ "type": "offer", "id": "90" }
]
}
With many of these documents, I'd like to filter those which have a specific related object (e.g. type=user and id=120). With the example data above, this should only return the document with id 5. Using simple filters (DQL syntax) as follows does not work:
related_objects.type:user and related_objects.id:120
As this would also match a document 5, as there is a related_object with type user and a related object with id 120, although its not the related user object with id 120, its the related offer.
If Array[object] is used, the field type is nested, The document reference
Elasticsearch query example:
{
"query" : {
"nested" : {
"path" : "related_objects",
"query" : {
"bool" : {
"must" : [
{
"term" : {"related_objects.type" : "MYTYPE"}
},
{
"term" : {"related_objects.id" : "MYID"}
}
]
}
}
}
}
}
Basically just go into a nested query and specify all your AND conditions as MUST clauses inside a bool query.
As soon as the field is declared as nested field, it is possible to run a simple DQL query to get the desired information:
related_objects:{type:"user" and id:120}
This requires that the field has been defined as nested before:
PUT my-index-000001
{
"mappings": {
"properties": {
"related_objects": {
"type": "nested"
}
}
}
}

How to get the best matching document in Elasticsearch?

I have an index where I store all the places used in my documents. I want to use this index to see if the user mentioned one of the places in the text query I receive.
Unfortunately, I have two documents whose name is similar enough to trick Elasticsearch scoring: Stockholm and Stockholm-Arlanda.
My test phrase is intyg stockholm and this is the query I use to get the best matching document.
{
"size": 1,
"query": {
"bool": {
"should": [
{
"match": {
"name": "intyig stockholm"
}
}
],
"must": [
{
"term": {
"type": {
"value": "4"
}
}
},
{
"terms": {
"name": [
"intyg",
"stockholm"
]
}
},
{
"exists": {
"field": "data.coordinates"
}
}
]
}
}
}
As you can see, I use a terms query to find the interesting documents and I use a match query in the should part of the root bool query to use scoring to get the document I want (Stockholm) on top.
This code worked locally (where I run ES in a container) but it broke when I started testing on a cluster hosted in AWS (where I have the exact same dataset). I found this explaining what happens and adding the search type argument actually fixes the issue.
Since the workaround is best not used on production, I'm looking for ways to have the expected result.
Here are the two documents:
// Stockholm
{
"type" : 4,
"name" : "Stockholm",
"id" : "42",
"searchableNames" : [
"Stockholm"
],
"uniqueId" : "Place:42",
"data" : {
"coordinates" : "59.32932349999999,18.0685808"
}
}
// Stockholm-Arlanda
{
"type" : 4,
"name" : "Stockholm-Arlanda",
"id" : "1832",
"searchableNames" : [
"Stockholm-Arlanda"
],
"uniqueId" : "Place:1832",
"data" : {
"coordinates" : "59.6497622,17.9237807"
}
}

Enriching the Data in Elastic Search

We will be ingesting data into an Index (Index1), however one of the fields in the document(field1) is an ENUM value, which needs to be converted into a value (string) using a lookup through a rest api call.
the rest api call gives a JSON in response like this which has string values for all the ENUMS.
{
values : {
"ENUMVALUE1" : "StringValue1",
"ENUMVALUE2" : "StringValue2"
}
}
I am thinking of making an index from this response document and use that for the lookup.
The incoming document has field1 as ENUMVALUE1 or ENUMVALUE2 (only one of them) and we want to eventually save StringValue1 or StringValue2 in the document under field1 and not ENUMVALUE1.
I went through the documentation of enrichment processor however I am not sure if that is the correct approach to handle this scenario.
While forming the match enrich policy I am not sure how match_field and enrich_fields should be configured.
Could you please advise if this can be done in Elastic and if yes what possible options do I have if the above one is not an optimal approach.
OK, 150-200 enums might not be enough to use an enrich index, but here is a potential solution.
You first need to build the source index containing all enum mappings, it would look like this:
POST enums/_doc/_bulk
{"index":{}}
{"enum_id": "ENUMVALUE1", "string_value": "StringValue1"}
{"index":{}}
{"enum_id": "ENUMVALUE2", "string_value": "StringValue2"}
Then you need to create an enrich policy out of this index:
PUT /_enrich/policy/enum-policy
{
"match": {
"indices": "enums",
"match_field": "enum_id",
"enrich_fields": [
"string_value"
]
}
}
POST /_enrich/policy/enum-policy/_execute
Once it's built (with 200 values it should take a few seconds), you can start building your ingest pipeline using an ingest processor:
PUT _ingest/pipeline/enum-pipeline
{
"description": "Enum enriching pipeline",
"processors": [
{
"enrich" : {
"policy_name": "enum-policy",
"field" : "field1",
"target_field": "tmp"
}
},
{
"set": {
"if": "ctx.tmp != null",
"field": "field1",
"value": "{{tmp.string_value}}"
}
},
{
"remove": {
"if": "ctx.tmp != null",
"field": "tmp"
}
}
]
}
Testing this pipeline, we get this:
POST _ingest/pipeline/enum-pipeline/_simulate
{
"docs": [
{
"_source": {
"field1": "ENUMVALUE1"
}
},
{
"_source": {
"field1": "ENUMVALUE4"
}
}
]
}
Results =>
{
"docs" : [
{
"doc" : {
"_source" : {
"field1" : "StringValue1" <--- value has been replaced
}
}
},
{
"doc" : {
"_source" : {
"field1" : "ENUMVALUE4" <--- value has NOT been replaced
}
}
}
]
}
For the sake of completeness, I'm sharing the other solution without enrich index, so you can test both and use whichever makes most sense for you.
In this second option, we're simply going to use an ingest pipeline with a script processor whose parameters contain a map of your enums. field1 will be replaced by whatever value is mapped to the enum value it contains, or will keep its value if there's no corresponding enum value.
PUT _ingest/pipeline/enum-pipeline
{
"description": "Enum enriching pipeline",
"processors": [
{
"script": {
"source": """
ctx.field1 = params.getOrDefault(ctx.field1, ctx.field1);
""",
"params": {
"ENUMVALUE1": "StringValue1",
"ENUMVALUE2": "StringValue2",
... // add all your enums here
}
}
}
]
}
Testing this pipeline, we get this
POST _ingest/pipeline/enum-pipeline/_simulate
{
"docs": [
{
"_source": {
"field1": "ENUMVALUE1"
}
},
{
"_source": {
"field1": "ENUMVALUE4"
}
}
]
}
Results =>
{
"docs" : [
{
"doc" : {
"_source" : {
"field1" : "StringValue1" <--- value has been replaced
}
}
},
{
"doc" : {
"_source" : {
"field1" : "ENUMVALUE4" <--- value has NOT been replaced
}
}
}
]
}
So both solutions would work for your case, you just need to pick up the one that is the best fit. Just know that in the first option, if your enums change, you'll need to rebuild your source index and enrich policy, while in the second case, you just need to modify the parameters map of your pipeline.

Elastic Search how to select columns and pass filters?

I can connect to elastic search http://1.2.3.4:8888/index1/ and query it with e.g. (PUT in Body of request):
{
"key_u" : "u",
"key_p" : "p",
"zip": [ "1234"]
}
Response is:
{
"network_level": {
"0": [
"12",
"23",
"45"
],
"6": [
"660771009"
]
},
"tin": {
"123": {
"name": "mike",
"latlon": [
""
]
},
"456": {
"name": "john",
"latlon": [
""
]
}
}
How do I select network_level that is 6 (tried adding it as additional line in a query string but does not change result - it still shows all of them)?
How do I select to show only specific fields in result?
Thanks.
You can use the match_all query to see all the documents against _search endpoint. Also for selecting network_level.6 in your match all query, you can use the below query.
{
"_source": "network_level.6",
"query": {
"match_all": {}
}
}
Let me know if you have any questions.

ElasticSearch filter by array item

I have the following record in ES:
"authInput" : {
"uID" : "foo",
"userName" : "asdfasdfasdfasdf",
"userType" : "External",
"clientType" : "Unknown",
"authType" : "Redemption_regular",
"uIDExtensionFields" :
[
{
"key" : "IsAccountCreation",
"value" : "true"
}
],
"externalReferences" : []
}
"uIDExtensionFields" is an array of key/value pairs. I want to query ES to find all records where:
"uIDExtensionFields.key" = "IsAccountCreation"
AND "uIDExtensionFields.value" = "true"
This is the filter that I think I should be using but it never returns any data.
GET devdev/authEvent/_search
{
"size": 10,
"filter": {
"and": {
"filters": [
{
"term": {
"authInput.uIDExtensionFields.key" : "IsAccountCreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
Any help you guys could give me would be much appreciated.
Cheers!
UPDATE: WITH THE HELP OF THE RESPONSES BELOW HERE IS HOW I SOLVED MY PROBLEM:
lowercase the value that I was searching for. (changed "IsAccoutCreation" to "isaccountcreation")
Updated the mapping so that "uIDExtensionFields" is a nested type
Updated my filter to the following:
_
GET devhilden/authEvent/_search
{
"size": 10,
"filter": {
"nested": {
"path": "authInput.uIDExtensionFields",
"query": {
"bool": {
"must": [
{
"term": {
"authInput.uIDExtensionFields.key": "isaccountcreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
}
}
There are a few things probably going wrong here.
First, as mconlin points out, you probably have a mapping with the standard analyzer for your key field. It'll lowercase the key. You probably want to specify "index": "not_analyzed" for the field.
Secondly, you'll have to use nested mappings for this document structure and specify the key and the value in a nested filter. That's because otherwise, you'll get a match for the following document:
"uIDExtensionFields" : [
{
"key" : "IsAccountCreation",
"value" : "false"
},
{
"key" : "SomeOtherField",
"value" : "true"
}
]
Thirdly, you'll want to be using the bool-filter's must and not and to ensure proper cachability.
Lastly, you'll want to put your filter in the filtered-query. The top-level filter is for when you want hits to be filtered, but facets/aggregations to not be. That's why it's renamed to post_filter in 1.0.
Here's a few resources you'll want to check out:
Troubleshooting Elasticsearch searches, for Beginners covers the first two issues.
Managing Relations in ElasticSearch covers nested docs (and parent/child)
all about elasticsearch filter bitsets covers and vs. bool.

Resources