ElasticSearch doesn't seem to support array lookups - elasticsearch

I currently have a fairly simple document stored in ElasticSearch that I generated with an integration test:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "unit-test_project600",
"_type" : "recordDefinition505",
"_id" : "400",
"_score" : 1.0, "_source" : {
"field900": "test string",
"field901": "500",
"field902": "2050-01-01T00:00:00",
"field903": [
"Open"
]
}
} ]
}
}
I would like to filter for specifically field903 and a value of "Open", so I perform the following query:
{
query: {
filtered: {
filter: {
term: {
field903: "Open",
}
}
}
}
}
This returns no results. However, I can use this with other fields and it will return the record:
{
query: {
filtered: {
filter: {
term: {
field901: "500",
}
}
}
}
}
It would appear that I'm unable to search in arrays with ElasticSearch. I have read a few instances of people with a similar problem, but none of them appear to have solved it. Surely this isn't a limitation of ElasticSearch?
I thought that it might be a mapping problem. Here's my mapping:
{
"unit-test_project600" : {
"recordDefinition505" : {
"properties" : {
"field900" : {
"type" : "string"
},
"field901" : {
"type" : "string"
},
"field902" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"field903" : {
"type" : "string"
}
}
}
}
}
However, the ElasticSearch docs indicate that there is no difference between a string or an array mapping, so I don't think I need to make any changes here.

Try searching for "open" rather than "Open." By default, Elasticsearch uses a standard analyzer when indexing fields. The standard analyzer uses a lowercase filter, as described in the example here. From my experience, Elasticsearch does search arrays.

Related

Elasticsearch: how to find a document by number in logs

I have an error in kibana
"The length [2658823] of field [message] in doc[235892]/index[mylog-2023.02.10] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them."
I know how to deal with it (change "index.highlight.max_analyzed_offset" for an index, or set the query parameter), but I want to find the document with long field and examine it.
If i try to find it by id, i get this:
q:
GET mylog-2023.02.10/_search
{
"query": {
"terms": {
"_id": [ "235892" ]
}
}
}
a:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
q:
GET mylog-2023.02.10/_doc/235892
a:
{ "_index" : "mylog-2023.02.10", "_type" : "_doc", "_id" :
"235892", "found" : false }
Maybe this number (doc[235892]) is not id? How can i find this document?
try use Query IDs:
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}

Enriching documents in ElasticSearch with only matching nested elements by ID

We're creating some packages, but that process is currently rather slow, because of the sheer amount of data being sent between microservices. Therefore, I have pruned the information being sent between those microservices and instead want to enrich the documents with the necessary information directly from within ElasticSearch. This gives documents of the following shape:
{
"_index" : "packages-2022.02.28",
"_type" : "_doc",
"_id" : "SG_DH-8019-ao-74783-20220315-12",
"_score" : 1.0,
"_source" : {
"id" : "SG_DH-8019-ao-74783-20220315-12",
"updatedOn" : "2022-02-28T14:45:57.7511562+01:00",
"code" : "SG",
"createdDate" : "2022-02-28T15:17:48.2571391+01:00",
"content" : {
"contentId" : "74783",
"units" : [
{
"id" : "HB_DBL.ST_RO_NFP",
"globalId" : "74783_HB_DBL.ST_RO_NFP",
"globalIntId" : -592692223,
"forPackaging" : false
},
{
"id" : "HB_DBL.ST_BB_NFP",
"globalId" : "74783_HB_DBL.ST_BB_NFP",
"globalIntId" : 446952442,
"forPackaging" : false
},
{
"id" : "HB_DBL.ST_AI_NFP",
"globalId" : "74783_HB_DBL.ST_AI_NFP",
"globalIntId" : -1174348304,
"forPackaging" : false
},
{
"id" : "HB_DBL.SU_RO_NFP",
"globalId" : "74783_HB_DBL.SU_RO_NFP",
"globalIntId" : -2111509049,
"forPackaging" : false
},
{
"id" : "HB_DBL.SU_BB_NFP",
"globalId" : "74783_HB_DBL.SU_BB_NFP",
"globalIntId" : 307969427,
"forPackaging" : false
},
{
"id" : "HB_DBL.SU_AI_NFP",
"globalId" : "74783_HB_DBL.SU_AI_NFP",
"globalIntId" : 1418623211,
"forPackaging" : false
},
{
"id" : "HB_DBL.PO-1_RO_NFP",
"globalId" : "74783_HB_DBL.PO-1_RO_NFP",
"globalIntId" : 1328251159,
"forPackaging" : false
},
{
"id" : "HB_DBL.PO-1_BB_NFP",
"globalId" : "74783_HB_DBL.PO-1_BB_NFP",
"globalIntId" : -1228155826,
"forPackaging" : false
},
{
"id" : "HB_DBL.PO-1_AI_NFP",
"globalId" : "74783_HB_DBL.PO-1_AI_NFP",
"globalIntId" : 749215308,
"forPackaging" : false
},
{
"id" : "HB_DBL.OF_RO_NFP",
"globalId" : "74783_HB_DBL.OF_RO_NFP",
"globalIntId" : 1981865239,
"forPackaging" : false
},
{
"id" : "HB_DBL.OF_BB_NFP",
"globalId" : "74783_HB_DBL.OF_BB_NFP",
"globalIntId" : 545563435,
"forPackaging" : false
},
{
"id" : "HB_DBL.OF_AI_NFP",
"globalId" : "74783_HB_DBL.OF_AI_NFP",
"globalIntId" : -481310774,
"forPackaging" : false
}
]
"duration" : {
"value" : 12,
"durationType" : "Day"
}
},
"generatedInfo" : {
"productGroupName" : null,
"subProductGroupName" : "Foo",
"version" : 0
}
}
}
]
with information from an enrich policy's index of the shape (when queried):
{
"_index" : ".enrich-package-enrich-1646044129711",
"_type" : "_doc",
"_id" : "zt_gP38BZeMUiw0-LxLa",
"_score" : 1.0,
"_source" : {
"contentId" : "365114",
"name" : "PackageName",
"board" : [
"B1",
"B2"
],
"units" : [
{
"price" : [
{
"margin" : 0,
"combination" : 10000,
"value" : 189030,
"currency" : "EUR"
}
],
"id" : "W2M_AX2_SC_NFP",
"globalId" : "365114_W2M_AX2_SC_NFP",
"globalIntId" : -988330164,
"name" : "UnitName",
"prop1": "Foo",
"prop2": "Bar"
}
]
}
}
]
I originally could get this working. However, when enriching, I only want to keep the units with the same global ID as those in the document to save. To this end, I have tried also enriching each unit with a simple Enrich processor and a ForEach processor referencing the enrich policy, matching on globalId and have even attempted matching on its hash code globalIntId (although in even in the latter case I would often get the error that it 'is not an integer', even though it clearly is one). This separate enrich-policy index has a shape similar to the following:
{
"_index" : ".enrich-package-unit-enrich-1646044158417",
"_type" : "_doc",
"_id" : "dN_gP38BZeMUiw0-t2Io",
"_score" : 1.0,
"_source" : {
"units" : [
{
"price" : [
{
"margin" : 0,
"combination" : 10000,
"value" : 189030,
"currency" : "EUR"
}
],
"globalId" : "365114_W2M_AX2_SC_NFP",
"globalIntId" : -988330164,
"name" : "UnitName",
"prop1": "Foo",
"prop2": "Bar",
"id" : "W2M_AX2_SC_NFP"
}
]
}
}
]
I have also tried to use Painless script, but so far my experience hasn't been exactly painless (pun intended). Every time I would try to access any data (I've tried various ways I encountered), I would get nothing but compilation errors. Also, given that I'm working on making this process faster, I'm a bit worried about performance here if I were to get it to work. I've read that Painless is fast, yet I've also heard it's actually fairly slow (I think compared to using processors, not necessarily other scripts).
Now, I'm at a loss about how to get this to work. I would prefer to do this without scripting if possible. However, if it is only possible using scripting, that's okay as long as the performance is acceptable. I'm using Elastic 7.12.
Update 1:
I'm creating the enrich policy from C# using Nest like so:
var enrichPolicyRequest = new PutEnrichPolicyRequest(enrichPolicyName)
{
Match = new MyPackageBedEnrichPolicy(index)
};
var putEnrichPolicyResponse = await elasticClient.Enrich.PutPolicyAsync(enrichPolicyRequest);
var executeEnrichPolicyResponse = await elasticClient.Enrich.ExecutePolicyAsync(enrichPolicyName);
...
public class MyPackageBedEnrichPolicy : IEnrichPolicy
{
public MyPackageBedEnrichPolicy(string index)
{
Indices = index;
MatchField = "contentId";
EnrichFields = new[] { "name", "board", "units" };
}
public Indices Indices { get; set; }
public Field MatchField { get; set; }
public Fields EnrichFields { get; set; }
public string Query { get; set; }
}
and the index for the units very similarly, but with
public class MyPackageUnitEnrichPolicy : IEnrichPolicy
{
public MyPackageUnitEnrichPolicy(string index)
{
Indices = index;
MatchField = "units.globalId";
EnrichFields = new[] { "units" };
}
...
For now, I have created the ingest processors in Kibana for easier prototyping, though I will have take care of that using Nest later as well. I have defined them basically as follows:
This is the definition of the ingest pipeline in JSON:
[
{
"enrich": {
"field": "content.contentId",
"policy_name": "enrichPolicyName",
"target_field": "enrichTest"
}
},
{
"foreach": {
"field": "content.units.globalId",
"processor": {
"enrich": {
"field": "content.units.globalId",
"policy_name": "unitEnrichPolicyName",
"target_field": "enrichTest.units",
"tag": "enrich-units-on-globalId-processor"
}
}
}
}
]

How to make Elastic Engine understand a field is not to be analyzed for an exact match?

The question is based on the previous post where the Exact Search did not work either based on Match or MatchPhrasePrefix.
Then I found a similar kind of post here where the search field is set to be not_analyzed in the mapping definition (by #Russ Cam).
But I am using
package id="Elasticsearch.Net" version="7.6.0" targetFramework="net461"
package id="NEST" version="7.6.0" targetFramework="net461"
and might be for that reason the solution did not work.
Because If I pass "SOME", it matches with "SOME" and "SOME OTHER LOAN" which should not be the case (in my earlier post for "product value").
How can I do the same using NEST 7.6.0?
Well I'm not aware of how your current mapping looks. Also I don't know about NEST as well but I will explain
How to make Elastic Engine understand a field is not to be analyzed for an exact match?
by an example using elastic dsl.
For exact match (case sensitive) all you need to do is to define the field type as keyword. For a field of type keyword the data is indexed as it is without applying any analyzer and hence it is perfect for exact matching.
PUT test
{
"mappings": {
"properties": {
"field1": {
"type": "keyword"
}
}
}
}
Now lets index some docs
POST test/_doc/1
{
"field1":"SOME"
}
POST test/_doc/2
{
"field1": "SOME OTHER LOAN"
}
For exact matching we can use term query. Lets search for "SOME" and we should get document 1.
GET test/_search
{
"query": {
"term": {
"field1": "SOME"
}
}
}
O/P that we get:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"field1" : "SOME"
}
}
]
}
}
So the crux is make the field type as keyword and use term query.

Do query results impact elasticsearch phrase suggestions?

I'd like to know whether Elasticsearch users query results to populate phrase suggestions for direct generator or not?
Or it simply picks tokens from given index?
My queries are based on some permission sets.
So for instance, that'd be my query:
{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"bool" : {
"must" : [{
"terms" : {
"Permissions" : ["permission1", "permission2", "permission3"
]
}
}
]
}
}
}
},
"suggest" : {
"DidYouMean" : {
"text" : "{{SearchPhrase}}",
"phrase" : {
"field" : "_all",
"analyzer" : "simple",
"size" : 1,
"real_word_error_likelihood" : 0.96,
"max_errors" : 5,
"gram_size" : 3,
"direct_generator" : [{
"field" : "_all",
"suggest_mode" : "popular",
"min_word_length" : 3
}
]
}
}
}
}
How would I ensure that direct generator creates suggestions and doesn't violate my permissions clause?
Is this even possible?
The term suggester and phrase suggester feeds on the tokens for generating suggest results. The query does not affect the suggest results. The suggester directly works on the reverse index and get the tokens from them. So its scope is global and never the query

Elasticsearch not storing field, what am I doing wrong?

I have something like the following template in my Elasticsearch. I just want certain part of the data returned, so I turn the source off, and explicitly stated store for the fields I want.
{
"template_1" : {
"order" : 20,
"template" : "test*",
"settings" : { },
"mappings" : {
"_default_" : {
"_source" : {
"enabled" : false
}
},
"type_1" : {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}
}
}
However, when I query the data, I don't get the fields back. The query works, however, if I enable the _source field. I am just starting with Elasticsearch, so I am not quite sure what I am doing wrong. Any help would be appreciated.
Field definitions should be wrapped in properties section of your mapping:
"type_1" : {
"properties": {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}

Resources