Elasticsearch script_fields to update another field? - elasticsearch

Is there a way to use the output of an ElasticSearch script_fields to update another variable in the index?
I have an index in ElasticSearch 1.x which has timestamp enabled, but not stored. (See below for mapping)
This means that the timestamp can be accessed for searches, or using script_fields like -
GET twitter/_search
{
"script_fields": {
"script1": {
"script": "_fields['_timestamp']"
}
}
}
I need to extract this timestamp field, and store it in the index. It is easy enough to write a script to copy any other field e.g. (I am using the update API)
ctx._source.t1=ctx._source.message
But how can I use the value from the script_fields output to update another field in the index? I want the field 'tcopy' to get the value of the timestamp for each document.
Further, I tried to use java to get the values as below, but it returned null.
SearchResponse response = client.prepareSearch("twitter")
.setQuery(QueryBuilders.matchAllQuery())
.addScriptField("test", "doc['_timestamp'].value")
.execute().actionGet();
The mapping
{
"mappings": {
"tweet": {
"_timestamp": {
"enabled": true,
"doc_values" : true
},
"properties": {
"message": {
"type": "string"
},
"user": {
"type": "string"
},
"tcopy": {
"type": "long"
}
}
}
}
}

You need to do this in two runs:
Run the query and get a mapping id<->timestamp and
Then run a bulk update with the timestamp
So to extract the timestamp data from your twitter index you can for instance use elasticdump like this:
elasticdump \
--input=http://localhost:9200/twitter \
--output=$ \
--searchBody '{"script_fields": {"ts": {"script": "doc._timestamp.value"}}}' > twitter.json
This will produce a file called twitter.json having the following content:
{"_index":"twitter","_type":"tweet","_id":"1","_score":1,"fields":{"ts":[1496806671021]}}
{"_index":"twitter","_type":"tweet","_id":"2","_score":1,"fields":{"ts":[1496807154630]}}
{"_index":"twitter","_type":"tweet","_id":"3","_score":1,"fields":{"ts":[1496807161591]}}
You can then easily use that file to update your documents. First create a shell script named read.sh
#!/bin/sh
while read LINE; do
INDEX=$(echo "${LINE}" | jq '._index' | sed "s/\"//g");
TYPE=$(echo "${LINE}" | jq '._type' | sed "s/\"//g");
ID=$(echo "${LINE}" | jq '._id' | sed "s/\"//g");
TS=$(echo "${LINE}" | jq '.fields.ts[0]');
curl -XPOST "http://localhost:9200/$INDEX/$TYPE/$ID/_update" -d "{\"doc\":{\"tcopy\":"$TS"}}"
done
And finally you can run it like this:
./read.sh < twitter.json
After the script has finished running, your documents will have a tcopy field with the _timestamp value.

The _timestamp field can be accessed using java. Then, we can use the Update API to set the new field. The request would look like
SearchResponse response = client.prepareSearch("twitter2")
.setQuery(QueryBuilders.matchAllQuery())
.addScriptField("test", "doc['_timestamp'].value")
.execute().actionGet();
Then I can use UpdateRequestBuilder with a script that uses this value to update the index

Related

Get data from only the latest Elastic index in Grafana

I have a series of indexes in Elastic, myindex-YYYY.MM.DD. In a Grafana panel, I want to read data only from the latest such index each time. I have created a datasource [myindex-]YYYY.MM.DD with pattern Daily, but this reads from all indexes. I can't find out whether limiting to the latest index should be done in the data source or in the panel options.
An alternative could be to filter the documents so that I get only those whose #timestamp equals the max #timestamp, but I can't figure out this either. I can get the max #timestamp with this:
GET /myindex-*/_search
{
"size": 0,
"aggs": {
"max_timestamp": { "max": { "field": "#timestamp" } }
}
}
I’d need to save the result in a variable and use it in another query, but I can’t find a way to do this in Grafana.
My conclusion (from reading whatever I could find and from the absence of answers to this question) is that what I want is not possible to do directly. I ended up creating a myindex-latest alias to the latest of the myindex-YYYY.MM.DD series. I did this by running a script similar to the following (in my case it's being run by Logstash after creation of myindex-YYYY.MM.DD finishes):
#!/bin/bash
#
# This script creates elastic alias myindex-latest for the index
# myindex-YYYY.MM.DD, where YYYY.MM.DD is the current date.
curdate=`date +%Y.%m.%d`
read -r -d '' JSON <<EOF1
{
"actions": [
{
"remove": {
"index": "*",
"alias": "myindex-latest"
}
},
{
"add": {
"index": "myindex-$curdate",
"alias": "myindex-latest"
}
}
]
}
EOF1
curl -X POST \
-H "Content-Type: application/json" \
"http://es01:9200/_aliases" \
-d "$JSON"

Trino/presto with elastic : how to search nested objects?

I'm new to trino and I'm trying to use it to query nested objects in elastic search.
This is my mapping in elasticsearch:
{
"product_index": {
"mappings": {
"properties" :{
"id" : { "type" : "keyword"},
"name" { "type" : "keyword"},
"linked_products" :{
"type": "nested",
"properties" :{
"id" : { "type" : "keyword"}
}
}
}
}
}
}
I need to perform a query on the id field under linked_products .
what is the syntax in trino to perform a query on the id field?
Do I need to use special definitions on the target index mapping in elastic to map the nested section for trino?
=========================================================
Hi,
I will try to add some clarifications to my question.
We are trying to query the data according to the id field.
This is the query in Elastic:
get product_index/_search
{
"query": {
"nested" : {
"path" : "linked_products",
"query": {
"bool": {
"should" : [
{ "match" : {"linked_products.id" :123}}
]
}
}
}
}
}
We tried to query the id field in 2 ways:
Trino query -
select count(*)
from es_table aaa
where any_match(aaa.linked_products, x-> x.id=123)
When we try to query according to the id field the Pushdown to elastic doesn't happen and the connector retrieve all the documents to trino (this only happens with queries on nested documents).
send es-query from trino to elastic:
SELECT * FROM es.default."$query:"
It works but when we are trying to retrieve id's with many documents we got timeout from the elastic client.
I don't understand from the documentation if it is possible to perform scrolling when we are using es-query to avoid the timeout problem.
Trino maps nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.
Assume we push the following document to your index.
curl -X POST "localhost:9200/product_index/_doc?pretty"
-H 'Content-Type: application/json' -d'
{
"id": "1",
"name": "foo",
"linked_products": {
"id": "123"
}
}
'
The way you would read this out in Trino would just be to use the standard ROW syntax.
SELECT
id,
name,
linked_products.id
FROM elasticsearch.default.product_index;
Result:
|id |name|id |
|---|----|---|
|1 |foo |123|
This is fine and well, but judging from the fact that the name of your nested object is plural, I'll assume you want to store an array of objects like so.
curl -X POST "localhost:9200/product_index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"id": "2",
"name": "bar",
"linked_products": [
{
"id": "123"
},
{
"id": "456"
}
]
}
'
If you run the same query as above, with the second document inserted, you'll get the following error.
SQL Error [58]: Query failed (#20210604_202723_00009_nskc4): Expected object for field 'linked_products' of type ROW: [{id=123}, {id=456}] [ArrayList]
This is because, Trino has no way of knowing which fields are arrays from the default Elasticsearch mapping. So to enable querying over this array, you'll need to follow the instructions in the docs to explicitly identify that field as an Array type in Trino using the _meta field. Here is the command that would be used in this example to indetify linked_products as an ARRAY.
curl --request PUT \
--url localhost:9200/product_index/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"presto":{
"linked_products":{
"isArray":true
}
}
}
}'
Now, you will need to account in the SELECT statement that linked_products is an ARRAY of type ROW. Not all of the indexes will have values, so you should use the index safe element_at function to avoid errors.
SELECT
id,
name,
element_at(linked_products, 1).id AS id1,
element_at(linked_products, 2).id AS id2
FROM elasticsearch.default.product_index;
Result:
|id |name|id1|id2 |
|---|----|---|----|
|1 |foo |123|NULL|
|2 |bar |123|456 |
=========================================================
Update to answer #gil bob's updated question.
There is currently no support for pushdown aggregates in the Elasticsearch connector but this is getting added in PR 7131
You can set the elasticsearch.request-timeout properties in your elasticsearch.properties file to increase the request timeout as a workaround until the pushdown occurs. If it's taking Elasticsearch this long to return it, this will need to get set whether you run the aggregation in Trino or Elasticsearch.

Elasticsearch 5.4.0 - How to add new field to existing document

In Production, we already had 2000+ documents. we need to add new field into existing document. is it possible to add new field ? How can i add new field to exisitng field
You can use the update by query API in order to add a new field to all your existing documents:
POST your_index/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"inline": "ctx._source.new_field = 0",
"lang": "painless"
}
}
Note: if your new field is a string, change 0 to '' instead
We can also add the new field using curl and directly running the following command in the terminal.
curl -X PUT "localhost:9200/you_index/_mapping/defined_mapping" -H 'Content-Type: application/json' -d '{ "properties":{"field_name" : {"type" : type_of_data}} }'

ElasticSearch - Reindexing your data with zero downtime

https://www.elastic.co/blog/changing-mapping-with-zero-downtime/
I try to create a new index and reindexing my data with zero downtime with this guide.
Now I have an index called "photoshooter" and I follow the steps
1) Create new index "photoshooter_v1" with the new mapping... (Done)
2) Create alias...
curl -XPOST localhost:9200/_aliases -d '
{
"actions": [
{ "add": {
"alias": "photoshooter",
"index": "photoshooter_v1"
}}
]
}
and I get this error...
{
"error": "InvalidAliasNameException[[photoshooter_v1] Invalid alias name [photoshooter], an index exists with the same name as the alias]",
"status": 400
}
I think I lose something with the logic..
Lets say your current index is named as "photoshooter " if i am guessing it right ok.
Now Create a Alias for this index first - OK
{
"actions": [
{ "add": {
"alias": "photoshooter_docs",
"index": "photoshooter"
}}
]
}
test it - curl -XGET 'localhost:9200/photoshooter_docs/_search'
Note - now you will use 'photoshooter_docs' as index name to interact with your index which is actually 'photoshooter' Ok.
Now we create a new index with your new mapping let's say we name it 'photoshooter_v2' now copy your 'photoshooter' index data to new index(photoshooter_v2)
Once you have copied all your data now simply
Remove the alias from previous index to new index -
curl -XPOST localhost:9200/_aliases -d '
{
"actions": [
{ "remove": {
"alias": "photoshooter_docs",
"index": "photoshooter"
}},
{ "add": {
"alias": "photoshooter_docs",
"index": "photoshooter_v2"
}}
]
}
test it again -> curl -XGET 'localhost:9200/photoshooter_docs/_search'
Congrats you have changed your mapping without zero downtime .
And to copy data you can use tools like this
https://github.com/mallocator/Elasticsearch-Exporter
Note - this tools also copies the mapping from old index to new index which you might don't want to do. So that you have read in its documentation or edit it according to your use .
Thanks
Hope this helps
It's it very simple, you cannot create an alias with a name of an index that already exists.
You'll need to consider a new name for the new index, re-index the data in the new one and then remove the old one to be able to give it the same name.
If you want to do that on daily basis, you might consider adding per say the date to your index's name and switch upon it every day.

How to copy some ElasticSearch data to a new index

Let's say I have movie data in my ElasticSearch and I created them like this:
curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972
}'
And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.
Since ElasticSearch 2.3 you can now use the built in _reindex API
for example:
POST /_reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Or only a specific part by adding a filter/query
POST /_reindex
{
"source": {
"index": "twitter",
"query": {
"term": {
"user": "kimchy"
}
}
},
"dest": {
"index": "new_twitter"
}
}
Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.
The real world example I used :
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=mapping
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=data
Check out knapsack:
https://github.com/jprante/elasticsearch-knapsack
Once you have the plugin installed and working, you could export part of your index via query. For example:
curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
"match" : {
"myfield" : "myvalue"
}
},
"fields" : [ "_parent", "_source" ]
}'
This will create a tarball with only your query results, which you can then import into another index.
To reindex specific type from source index to destination index type syntax is
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
If the intent were to copy some portion of the data or the entire data to an index with the same settings/mappings as that of the original index one could use the clone api to achieve the same. Something like below:
POST /<index>/_clone/<target-index>
OR
PUT /<index>/_clone/<target-index>
However if the intent is to copy the data to a new index with the different settings/mappings than the original index one could use the reindex api to achieve the same. Something like below:
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
*Note: In case of reindex api the target index has to be created prior to actual api call.
For further reading on difference between clone and reindex refer What's the difference between cloning and reindexing an index in Elasticsearch?
You can do it easily with elasticsearch-dump (https://github.com/taskrabbit/elasticsearch-dump) in three steps. In the following example I copy the index "thor" to "thor2"
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data
Well the straightforward way to do this is to write code, with the API of your choice, querying for "year": 1972 and then indexing that data into a new index. You would use the Search api or the Scan and Scroll API to get all the documents and then either index them one by one or use the Bulk Api:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
Assuming you don't want to do this via code but are looking for a direct way of doing this, I suggest the Elasticsearch Snapshot and Restore. Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972.
Snapshot And Restore
The snapshot and restore module allows to create snapshots of
individual indices or an entire cluster into a remote repository. At
the time of the initial release only shared file system repository was
supported, but now a range of backends are available via officially
supported repository plugins.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either
be provided using a simple query string as a parameter, or using the
Query DSL defined within the request body.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Since v7.4 the _clone api was introduced and can easily satisfy your need: (read for the relevant prerequisites and monitoring involved)
POST /<index>/_clone/<target-index>
Or:
PUT /<index>/_clone/<target-index>
You can use elasticdump --searchBody:
# Copy documents from movies to 70sMovies (filtering using query)
elasticdump \
--input=http://localhost:9200/movies \
--output=http://localhost:9200/70sMovies \
--type=data \
--searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}" # <--- Your query here
more on elasticdump options here.

Resources