ElasticSearch run script on document insertion (Insert API) - elasticsearch

Is it possible to specify a script be executed when inserting a document into ElasticSearch using its Index API? This functionality exists when updating an existing document with new information using its Update API, by passing in a script attribute in the HTTP request body. I think it would be useful too in the Index API because perhaps there are some fields the user wants to be auto-calculated and populated during insertion, without having to send an additional Update request after the insertion to have the script be executed.

Elasticsearch 1.3
If you just need to search/filter on the fields that you'd like to add, the mapping transform capabilities that were added into 1.3.0 could possibly work for you:
The document can be transformed before it is indexed by registering a
script in the transform element of the mapping. The result of the
transform is indexed but the original source is stored in the _source
field.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-transform.html
You can also have the same transformation run when you get a document as well by adding the _source_transform url parameter to the request:
The get endpoint will retransform the source if the _source_transform
parameter is set.The transform is performed before any source
filtering but it is mostly designed to make it easy to see what was
passed to the index for debugging.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_get_transformed.html
However, I don't think the _search endpoint accepts the _source_transform url parameter so I don't think you can apply the transformation to search results. That would be a nice feature request.
Elasticsearch 1.4
Elasticsearch 1.4 added a couple features which makes all this much nicer. As you mentioned, the update API allows you to specify a script to be executed. The update API in 1.4 can also accept a default document to be used in the case of an upsert. From the 1.4 docs:
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.counter += count",
"params" : {
"count" : 4
},
"upsert" : {
"counter" : 1
}
}'
In the example above, if the document doesn't exist it uses the contents of the upsert key to initialize the document. So in the case above the counter key in the newly created document will have a value of 1.
Now, if we set scripted_upsert to true (scripted_upsert is another new option in 1.4), our script will run against the newly initialized document:
curl -XPOST 'localhost:9200/test/type1/2/_update' -d '{
"script": "ctx._source.counter += count",
"params": {
"count": 4
},
"upsert": {
"counter": 1
},
"scripted_upsert": true
}'
In this example, if the document didn't exist the counter key would have a value of 5.
Full documentation from Elasticsearch site.

Related

Updating document in Elastic App Search with script not working

I am fairly new to the elastic search, and I am using the Elastic App Search.
So I am trying to update data in elastic app-search through MongoDB Realm App which basically provide triggers on CRUD operations.
I am able to add documents or update existing fields.
But the problem is I am unable to add elements to the array field. I want to add or delete elements from array, after some research I found out that it can be done using some scripts:
"script": {
"source": "ctx._source.fieldToUpdate.add(elementToAdd);",
"lang": "painless"
}
But it's just not working. I am using REST APIs to add or update data in elastic app search. And I am using elastic cloud managed service.
UPDATE - 1
I was using ES App Search, and I created and named the engine as "articles", when I tried to run queries using kibana, I had to use some weird name ".ent-search-engine-documents-article".
So I tried using the same name in Elastic Search REST API
POST /.ent-search-engine-documents-article/_update/docid
And it worked perfectly fine, but I want to perform the same work using REST API of APP Search only.
To perform CRUD operations on your data stored through AppSearch, you should use the Documents API.
AppSearch does not handle nested objects and only provides 4 field types: text, number, date and geolocation. if you are posting objects, it will flatten and stringify them as you described in your comment.
It's also the case for arrays, so you can't just add elements to a field that holds an array as it's just a text field, you need to re-write the whole field (though it does detect them as arrays and handles each element separately if you use that field as a facet for instance).
as for how to patch with the AppSearch REST API, here's a small example inspired from the official documentation:
curl -X PATCH 'https://[instance id].ent-search.[region].[provider].cloud.es.io:443/api/as/v1/engines/articles/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx' \
-d '[
{ "id": "your_article_id",
"source": "your article source",
"lang": "painless" }
]'
There are also clients for several programming languages that you may find helpful or more intuitive to use.
The weird names you see for your engines, like ".ent-search-engine-documents-article" are the underlying indices on ElasticSearch, and you normally would not like to manipulate them directly.

Elasticsearch in-place-update like solr

In SOLR I can use In-Place-Update to update the value of any fields. Here the value of popularity field will be incremented every time by 20 and add it to the the current value of popularity without considering any other things.
{
"id":"mydoc",
"price":{"set":99},
"popularity":{"inc":20}
}
For Elasticsearch, I can also use the _update api using script to In-Place-update.
POST /website/blog/1/_update
{
"script" : "ctx._source.popularity+=20"
}
But my problem is I want to use the _bulk api using python to in-place-update multiple documents at one with some incremental values. Here I've seen the documentation on how to use the _bulk endpoint to set different values with update action payload. I'm just having some difficulties how can I make the same POST JSON datasets for _bulk with python elasticsearch client for the script update way.

Elastic Search query analyzing

I want to see all the tokens is generated from 'match' text
I am wondering to know is there any specific file or capability to show details of query executing in elastic search or another way to see what is generated as a sequence of tokens when I am using 'match level' queries?
I did not use the log file to see which tokens are generated in the time of 'match' operation. Instead, I used _analayze endpoint.
Better to say if you want to use analyser of the specific index (in the case of different indices that each of them using its own customised analyser) put the name of the index in the URL:
POST /index_name/_analyze
{
"text" : "1174HHA8285M360"
}
This will use the default analyser defined in that index. And if we have more than one analyser in one index we can specify it in the query just as follow:
POST /index_name/_analyze
{
"text" : "1174HHA8285M360",
"analyzer" : "analyser_name"
}

Logstash replace old index

I'm using logstash to create an elastic index. The steps are :
1. logstash start
2. datas are retrieve with a jdbc input plugin
3. datas are indexed with an elasticsearch output plugin (with a template includes an alias)
4. logstash stop
The time, I've got an index call myindex-1 which can be requested with the alias myindex.
The second time, I've got an index call myindex-2 which can be requested with the alias myindex. The first index is now deprecated and I need to delete it just before (or after the step 4).
Do you know how to do this ?
First things first, if you know the deprecated index name, then it's just a question of adding a step 5:
curl -XDELETE 'http://localhost:9200/myindex-1'
So you'd wrap your logstash run into a script with this additional step - as to my knowledge there is no option for logstash to delete an index, it's simply not its purpose.
But from the way you describe your situation, it seems you're trying to keep the data available during the new index creation, could you elaborate a bit on your use case?
Reason for the asking is that with the current procedure, you're likely to end up with duplicate data (old and new version) during the indexing period.
If there is indeed a need to refresh the data, and assuming that you have an id in the data retrieved from the DB,
you might consider another approach: configuring 2 elasticsearch outputs in your logstash,
first one with action set to "delete" targeting the old entry in previous index,
second being your standard create into new index.
Depending on the nature of your data, there might also be other possibilities.
Create and populate myindex-2, don't alias it yet
Simultaneously add alias to myindex-2 and remove it from myalias-1
REST request for step 2:
POST /_aliases
{
"actions" : [
{ "remove" : { "index" : "myindex-1", "alias" : "myindex" } },
{ "add" : { "index" : "myindex-2", "alias" : "myindex" } }
]
}
Documentation here

Elasticsearch find documents by another document

I want to search documents in elasticsearch which have exactly same fields as the given document of id docId. For e.g. user calls the api with a docId, I want to filter docs such that all the docs returned fulfills some parameters in docId.
For example can I query Elasticsearch like this:
POST similarTerms/_search
{
"fields": [
"_id", "title"
] ,
"filter": {
"query": {"match": {
"title": doc[docId].title
}}
},
"size": 30
}
I know I can fetch the document with docId and then I can prepare the above query, but can I avoid the network hop somehow as even milliseconds of time improvement is of great concern for my app.
Thanks
This is a text-book scenario for "more like this" api. Quote from the docs:
The more like this (mlt) API allows to get documents that are "like" a
specified document. Here is an example:
$ curl -XGET
'http://localhost:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1'
The API simply results in executing a search request with moreLikeThis
query (http parameters match the parameters to the more_like_this
query). This means that the body of the request can optionally include
all the request body options in the search API (aggs, from/to and so
on). Internally, the more like this API is equivalent to performing a
boolean query of more_like_this_field queries, with one query per
specified mlt_fields.
If you plan testing this (like I did) with one document only for test, make sure you also set min_term_freq=0 and min_doc_freq=0: GET /my_index/locations/1/_mlt?min_term_freq=0&min_doc_freq=0

Resources