Elasticsearch query to remove an delete value from inconsistent array of comma-separated values - elasticsearch

Recently, I posted the following about adding a string to existing (inconsistent) arrays in documents: ElasticSearch query to populate or append a value to a field
The marked solution is working perfectly.
But now I need to understand how to delete one of the 5-character codes from the arrays. Assuming I now need to delete the code 'ABCDE' from the documents, while leaving the other codes in the array untouched, what would that query look like?

In below script I am looping through array and creating a list by removing the given value.
Please test before running on actual data.
{
"script": {
"source": "ctx._source.customCategories.removeAll(Collections.singleton(params.catg))",
"lang": "painless",
"params": {
"catg": "c"
}
}
}

Related

Terms query does not work on keyword field which contains an array of values

I am a beginner in Elasticsearch. I recently added a new field jc_job_meta_field which is of keyword type (see image 1 below as I output the mapping of all my fields) and my index is en-gb. I expect it to be an array to hold a bunch of values. And I now have a document with ["Virtual", "Hybrid"] in that field. I wanted to have the ability to search all entries with Virtual in the field jc_job_meta_field. But now when I do a term query search like this
{
"query": {
"terms": {
"jc_job_meta_field": ["Virtual"]
}
}
}
Nothing returned (see image 2 below). Shouldn't it at least return that exact document with [Virtual, Hybrid]? I checked a similar post here and it seems like I am doing exactly what's supposed to work. What went wrong here? Thanks in advance!
My Mapping and field values:
My query:

How to search for an exact value in a keyword array field in ElasticSearch?

I have a Keyword field that is an array and am trying to search for documents that contain one of the values in the array.
A document contains a field called allowed_groups:
"allow_groups" : [
"c4e3f246-0b1f-43cc-831e-37ca620bf083"
],
I have tried a match query
...
"must":[
{
"match": {
"allow_groups": {
"query": "c4e3f246-0b1f-43cc-831e-37ca620bf083"
}
}
},
...
This returns results but as soon as I change a single character (3 at the end of the value to a 4), the documents still return. I need to change the value to something much different for the documents to not return.
I have also tried a term query in its place but cant get documents to come back at all.
I also should mention that in the end I am trying to pass an array of values to be matched against the allow_groups keyword array and return a document when theres at least 1 single exact match. Im just testing with 1 value currently.

How to combine 2 fiels with different names but same value in Kibana

I set up a ELK environment with 2 different indices(index) streams. both streams have a field with the same value but the filed name is different.
is there a possibility to merge them or something like that so when i use the Kibana filter it shows me the value from both filed.
so i can set up visualizations, but when i filter on stream 1, the visualization of stream 2 is empty.
i also tried to name the indedx the same, but did not help.
Example:
index1 fieldname information.ID = 123
index2 fieldname ID = 123
i want to use the filter on both streams
You can create an alias which contains both indexes. Write a query against the alias using script fields.
In script fields you can define the new field and its source logic from underlying documents.
"script_fields": {
"my_script_field": {
"script": {
"lang": "painless",
"inline": "doc['some_field'].value + doc['another_field'].value"
}
In this way the resultset will have single field “my_script_field”

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?
I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values
OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]
The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

Select and Update all matching Documents

We are trying to do the following and any help would be appreciated.
Say you make a search and 100,000 documents match.
We would like to increment a counter in each document that matched. Then at the same time select the first page say the first 50.
Can this be done in one operation or may be a parallel scenario.
You could try a multi search query for such kind of maneuver:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html
Basically you add several queries in a single multi search which is parallelized on the ES side, and returns a list of responses per each query.
You Can use Update by query using NEST.Let me know if you still facing any issues.
You can use update by query to do this :
{
"script": {
// fieldName is field you want to increment in document
"source": "ctx._source.fieldName=params.val", // counter increment by when query match
"lang": "painless",
"params": {
"i": 0,
"val": i+1,
}
},
"query": {
// your match condition
}
}

Resources