ElasticSearch Set Processor - elasticsearch

I am trying to use the Elasticsearch Set Processor functionality to add a Queue wise Constant field to a given index which contains data from multiple Queues. The ElasticSearch documentation is really sparse in this respect.
I am trying to use the below code to create a Set Processor for Index default-*, but somehow it's not working
PUT /_ingest/pipeline/set_aht
{
"description": "sets queue wise AHT constants",
"processors": [
{
"set": {
"field": "queueAHTVal",
"value": "10",
"if": "queueName == 'A'"
}
}
]
}
Looking for some howto guidance from anyone who might have previously worked on Set Processor for ElasticSearch

I tried to work on a possible suggestion. If i understood your issue well, you want to add a new field based on a field value (queueName) when it equals to A?
If yes, I modified your pipeline and did a test locally.
Here is the updated pipeline code:
PUT _ingest/pipeline/set_aht
{
"processors": [
{
"set": {
"field": "queueAHTVal",
"value": "10",
"if": "ctx.queueName.equals('A')"
}
}
]
}
I used the _reindex API so as to ingest the data in another field.
POST _reindex
{
"source": {
"index": "espro"
},
"dest": {
"index": "espro-v2",
"pipeline": "set_aht"
}
}
The response is:
"hits" : [
{
"_index" : "espro-v2",
"_type" : "_doc",
"_id" : "7BErVHQB3IIDvL59miT1",
"_score" : 1.0,
"_source" : {
"queueName" : "A",
"queueAHTVal" : "10"
}
},
{
"_index" : "espro-v2",
"_type" : "_doc",
"_id" : "IBEsVHQB3IIDvL59iien",
"_score" : 1.0,
"_source" : {
"queueName" : "B"
}
}
Let me know if you need help or If I wrongly understood your issue, I will try to help thank you.

Related

Convert two repeated values in array into a string

I have some old documents where a field has an array of two vales repeated, something like this:
"task" : [
"first_task",
"first_task"
],
I'm trying to convert this array into a string because it's the same value. I've seen the following script: Convert array with 2 equal values to single value but in my case, this problem can't be fixed through logstash because it happens just with old documents stored.
I was thinking to do something like this:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"description": "Change task field from array to first element of this one",
"lang": "painless",
"source": """
if (ctx['task'][0] == ctx['task'][1]) {
ctx['task'] = ctx['task'][0];
}
"""
}
}
]
},
"docs": [
{
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"#timestamp" : "2022-05-03T07:33:44.652Z",
"task" : ["first_task", "first_task"]
}
}
]
}
The result document is the following:
{
"docs" : [
{
"doc" : {
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_source" : {
"#timestamp" : "2022-05-03T07:33:44.652Z",
"task" : "first_task"
},
"_ingest" : {
"timestamp" : "2022-05-11T09:08:48.150815183Z"
}
}
}
]
}
We can see the task field is reassigned and we have the first element of the array as a value.
Is there a way to manipulate actual data from Elasticsearch and convert all the documents with this characteristic using DSL queries?
Thanks.
You can achieve this with _update_by_query endpoint. Here is an example:
POST tasks/_update_by_query
{
"script": {
"source": """
if (ctx._source['task'][0] == ctx._source['task'][1]) {
ctx._source['task'] = ctx._source['task'][0];
}
""",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
You can remove the match_all query if you want to update all documents or you can filter documents by chaning the conditions in the query.
Keep in mind that running a script to update all documents in the index may cause some performance issues while the update process is running.

Is there a way to enable _source on existing data?

I create an Index without _source field (considerations of memory).
I want to enable this field on the existing data , there is a way to do that?
For example:
I will create dummy-index :
PUT /dummy-index?pretty
{
"mappings": {
"_doc": {
"_source": {
"enabled": false
}
}
}
}
and I will add the next document :
PUT /dummy-index/_doc/1?pretty
{
"name": "CoderIl"
}
I will get only the hit metadata when I search without the name field
{
"_index" : "dummy-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0
}
the question if I could change the _soruce to enable and when I search again I'll get the missing data (in this example "name" field) -
{
"_index" : "dummy-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0
"_source" : {
"name" :CoderIl"
}
}
As clarified in the chat, the issue is
_source field is disabled.
In search result he wants what was stored in the fields which is returned as part if _source if enabled like below
_source" : {
"name" :CoderIl"
}
Now in order to achieve it, store option must be enabled on the field, please note this can't be changed dynamically and you have to re-index data again with updated mapping.
Example
Index mapping
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"name": {
"type": "text"
},
"title" :{
"type" : "text",
"store" : true
}
}
}
}
Index sample docs
{
"name" : "coderIL"
}
{
"name" : "coderIL",
"title" : "seconds docs"
}
**Search doc with fields content using store fields
{
"stored_fields": [
"title"
],
"query": {
"match": {
"name": "coderIL"
}
}
}
And search result
"hits": [
{
"_index": "without_source",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156
},
{
"_index": "without_source",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"fields": {
"title": [
"seconds docs"
]
}
}
]
store option on field controls that, from the same official doc
By default, field values are indexed to make them searchable, but they
are not stored. This means that the field can be queried, but the
original field value cannot be retrieved.
Usually this doesn’t matter. The field value is already part of the
_source field, which is stored by default. If you only want to retrieve the value of a single field or of a few fields, instead of
the whole _source, then this can be achieved with source filtering.
As mentioned on the doc, by default its disabled, and if you want to save space, you can enable it on specific fields and need to re-index the data again
Edit: Index option controls(enabled by default) whether field is indexed or not(this is required for searching on the field) and store option controls whether it's stored or not, this is used if you want to get the non-analyzed value ie what you sent to ES in your index request, which based on field type goes through text analysis and part of index option, refer this SO question for more info.

Elasticsearch sort results from several indexes so that one index has priority

I have 6 websites, lets call them A, B, C, D, E & M. M is the master website because from it you can search the contents of others, this I've done easily by using putting all indexes separated by comma in the search query.
However I have a new requirement now, that from every website you can search all websites(easy to do, apply solution from M to all), BUT give priority to results from the current website.
So If I'm searching from C, first results should be from C and then from others based on score.
Now, how do I give results from one index priority over the rest?
A boosting query serves this purpose well:
Sample data
POST /_bulk
{"index":{"_index":"a"}}
{"message":"First website"}
{"index":{"_index":"b"}}
{"message":"Second website"}
{"index":{"_index":"c"}}
{"message":"Third website"}
{"index":{"_index":"d"}}
{"message":"Something irrelevant"}
Query
POST /a,b,c,d/_search
{
"query": {
"boosting": {
"positive": {
"match": {
"message": "website"
}
},
"negative": {
"terms": {
"_index": ["b", "c", "d"]
}
},
"negative_boost": 0.2
}
}
}
Response
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "a",
"_type" : "_doc",
"_id" : "sx-DkWsBHWmGEbsYwViS",
"_score" : 0.2876821,
"_source" : {
"message" : "First website"
}
},
{
"_index" : "b",
"_type" : "_doc",
"_id" : "tB-DkWsBHWmGEbsYwViS",
"_score" : 0.05753642,
"_source" : {
"message" : "Second website"
}
},
{
"_index" : "c",
"_type" : "_doc",
"_id" : "tR-DkWsBHWmGEbsYwViS",
"_score" : 0.05753642,
"_source" : {
"message" : "Third website"
}
}
]
}
}
Notes
The smaller you make the negative_boost, the more likely it is that results from the "active index" will win out over the other indices
If you set the negative_boost to 0, you will guarantee that the "active site" results sort first, but you will discard all scores for all the other sites, so the remaining sort will be arbitrary.
I reckon something like negative_boost: 0.1, which is an order-of-magnitude adjustment on relevance, should get you what you're looking for.

ElasticSearch Bulk with ingest plugin

I am using the Attachment Processor Attachment Processor in a Pipeline.
All work fine, but i wanted to do multiple post, then I tried to used bulk API.
Bulk work fine too, but I can't find how to send the url parameter "pipeline=attachment".
this put works :
POST testindex/type1/1?pipeline=attachment
{
"data": "Y291Y291",
"name" : "Marc",
"age" : 23
}
this bulk works :
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2" } }
{ "name" : "jean", "age" : 22 }
But how can I index Marc with his data field in bulk to be understood by the pipeline plugin?
thanks to Val comment, I did that and it work fine:
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2", "pipeline": "attachment"} } }
{"data": "Y291Y291", "name" : "jean", "age" : 22}

How do you bulk index documents into the default mapping of ElasticSearch?

The documentation for ElasticSearch 5.5 offers no examples of how to use the bulk operation to index documents into the default mapping of an index. It also gives no indication why this is not possible, unless I'm missing that somewhere else in the documentation.
The ES 5.5 documentation gives one explicit example of bulk indexing:
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
But it also says that
The endpoints are /_bulk, /{index}/_bulk, and {index}/{type}/_bulk.
When the index or the index/type are provided, they will be used by
default on bulk items that don’t provide them explicitly.
So, the middle endpoint is valid, and it implies to me that a) you have to explicitly provide a type in the metadata for each document indexed, or b) that you can index documents into the default mapping ("_default_").
But I can't get this to work.
I've tried the /myindex/bulk endpoint with no type specified in the metadata.
I've tried it with "_type": "_default_" specified.
I've tried /myindex/_default_/bulk.
This has nothing to do with the _default_ mapping. This is about falling back to the default type that you specify in the URL. You can do the following
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
However the following snippet is exactly the same
POST /test/type1/_bulk
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
And you can mix this
POST foo/bar/_bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
In this example, one document would be indexed into foo and one into test.
Hope this makes sense.

Resources