Elasticsearch Bulk Index - Update only if exists

Elasticsearch Bulk Index - Update only if exists - elasticsearch

I'm using Elasticsearch Bulk Index to update some stats of a documents, but it may happen the document I am trying to update does not exist - in this case I want it to do nothing.
I don't want it to create the document in this case.
I haven't found anything in the docs, or perhaps missed it.
My current actions (In this case it creates the document):
{
update: {
_index: "index1",
_type: "interaction",
_id: item.id
}
},
{
script: {
file: "update-stats",
lang: "groovy",
params: {
newCommentsCount: newRetweetCount,
}
},
upsert: normalizedItem
}
How do I update the document only if it exists, otherwise nothing?
Thank you

Dont use upsert and use a normal update.
Also if the document does not exist while updating , the update will fail.
There by it should work well for you.

Following worked for me with elasticsearch 7.15.2 (need to check lowest supported version for this, ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#update-api-example)
curl --location --request POST 'http://127.0.0.1:9200/exp/_update/8' \
--header 'Content-Type: application/json' \
--data-raw '
{
"scripted_upsert": true,
"script": {
"source": "if ( ctx.op == \"create\" ) {ctx.op=\"noop\"} else {ctx._source.name=\"updatedName\"} ",
"params": {
"count": 4
}
},
"upsert": {}
}
'
If ES is about to create a new record (ctx.op is "create" then we change the op to "noop" and nothing is done, otherwise we do the normal update through the script.

Related

Get data from only the latest Elastic index in Grafana

I have a series of indexes in Elastic, myindex-YYYY.MM.DD. In a Grafana panel, I want to read data only from the latest such index each time. I have created a datasource [myindex-]YYYY.MM.DD with pattern Daily, but this reads from all indexes. I can't find out whether limiting to the latest index should be done in the data source or in the panel options.
An alternative could be to filter the documents so that I get only those whose #timestamp equals the max #timestamp, but I can't figure out this either. I can get the max #timestamp with this:
GET /myindex-*/_search
{
"size": 0,
"aggs": {
"max_timestamp": { "max": { "field": "#timestamp" } }
}
}
I’d need to save the result in a variable and use it in another query, but I can’t find a way to do this in Grafana.

My conclusion (from reading whatever I could find and from the absence of answers to this question) is that what I want is not possible to do directly. I ended up creating a myindex-latest alias to the latest of the myindex-YYYY.MM.DD series. I did this by running a script similar to the following (in my case it's being run by Logstash after creation of myindex-YYYY.MM.DD finishes):
#!/bin/bash
#
# This script creates elastic alias myindex-latest for the index
# myindex-YYYY.MM.DD, where YYYY.MM.DD is the current date.
curdate=`date +%Y.%m.%d`
read -r -d '' JSON <<EOF1
{
"actions": [
{
"remove": {
"index": "*",
"alias": "myindex-latest"
}
},
{
"add": {
"index": "myindex-$curdate",
"alias": "myindex-latest"
}
}
]
}
EOF1
curl -X POST \
-H "Content-Type: application/json" \
"http://es01:9200/_aliases" \
-d "$JSON"

How to create a duplicate index in ElasticSearch from existing index?

I have an existing index with mappings and data in ElasticSearch which I need to duplicate for testing new development. Is there anyway to create a temporary/duplicate index from the already existing one?
Coming from an SQL background, I am looking at something equivalent to
SELECT *
INTO TestIndex
FROM OriginalIndex
WHERE 1 = 0
I have tried the Clone API but can't get it to work.
I'm trying to clone using:
POST /originalindex/_clone/testindex
{
}
But this results in the following exception:
{
"error": {
"root_cause": [
{
"type": "invalid_type_name_exception",
"reason": "Document mapping type name can't start with '_', found: [_clone]"
}
],
"type": "invalid_type_name_exception",
"reason": "Document mapping type name can't start with '_', found: [_clone]"
},
"status": 400
}
I know someone would guide me quickly. Thanks in advance all you wonderful folks.

First you have to set the source index to be read-only
PUT /originalindex/_settings
{
"settings": {
"index.blocks.write": true
}
}
Then you can clone
POST /originalindex/_clone/testindex
If you need to copy documents to a new index, you can use the reindex api
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type:
application/json' -d'
{
"source": {
"index": "someindex"
},
"dest": {
"index": "someindex_copy"
}
}
'
(See: https://wrossmann.medium.com/clone-an-elasticsearch-index-b3e9b295d3e9)

Shortly after posting the question, I figured out a way.
First, get the properties of original index:
GET originalindex
Copy the properties and put to a new index:
PUT /testindex
{
"aliases": {...from the above GET request},
"mappings": {...from the above GET request},
"settings": {...from the above GET request}
}
Now I have a new index for testing.

Elasticsearch 5.4.0 - How to add new field to existing document

In Production, we already had 2000+ documents. we need to add new field into existing document. is it possible to add new field ? How can i add new field to exisitng field

You can use the update by query API in order to add a new field to all your existing documents:
POST your_index/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"inline": "ctx._source.new_field = 0",
"lang": "painless"
}
}
Note: if your new field is a string, change 0 to '' instead

We can also add the new field using curl and directly running the following command in the terminal.
curl -X PUT "localhost:9200/you_index/_mapping/defined_mapping" -H 'Content-Type: application/json' -d '{ "properties":{"field_name" : {"type" : type_of_data}} }'

how insert data to Elasticsearch without id

I insert data to Elasticsearch with id 123
localhost:9200/index/type/123
but I do not know what will next id inserted
how insert data to Elasticsearch without id in localhost:9200/index/type?

The index operation can be executed without specifying the id. In such a case, an id will be generated automatically. In addition, the op_type will automatically be set to create. Here is an example (note the POST used instead of PUT):
$ curl -XPOST 'http://localhost:9200/twitter/tweet/' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'

In my case, using nodejs and the elasticsearch package I did it this way using the client:
client.index ()
var elasticsearch = require ('elasticsearch');
let client = new elasticsearch.Client ({
host: '127.0.0.1: 9200'
});
client.index ({
index: 'myindex'
type: 'mytype',
body: {
properti1: 'val 1',
properti2: ['y', 'z'],
properti3: true,
}
}, function (error, response) {
if (error) {
console.log("error: ", error);
} else {
console.log("response: ", response);
}
});
if an id is not specified, elasticsearch will generate one automatically

In my case, I was trying to add a document directly to an index, e.g. localhost:9200/messages, as opposed to localhost:9200/someIndex/messages.
I had to append /_doc to the URL for my POST to succeed: localhost:9200/messages/_doc. Otherwise, I was getting an HTTP 405:
{"error":"Incorrect HTTP method for uri [/messages] and method [POST], allowed: [GET, PUT, HEAD, DELETE]","status":405}
Here's my full cURL request:
$ curl -X POST "localhost:9200/messages/_doc" -H 'Content-Type:
application/json' -d'
{
"user": "Jimmy Doe",
"text": "Actually, my only brother!",
"timestamp": "something"
}
'
{"_index":"messages","_type":"_doc","_id":"AIRF8GYBjAnm5hquWm61","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":3}

You can use POST request to create a new document or data object without specifying id property in the path.
curl -XPOST 'http://localhost:9200/stackoverflow/question' -d '
{
title: "How to insert data to elasticsearch without id in the path?"
}

If our data doesn’t have a natural ID, we can let Elasticsearch autogenerate one for us. The structure of the request changes: instead of using the PUT verb ("store this document at this URL"), we use the POST verb ("store this document under this URL").
The URL now contains just the _index and the _type:
curl -X POST "localhost:9200/website/blog/" -H 'Content-Type: application/json' -d'
{
"title": "My second blog entry",
"text": "Still trying this out...",
"date": "2014/01/01"
}
'
The response is similar to what we saw before, except that the _id field has been generated for us:
{
"_index": "website",
"_type": "blog",
"_id": "AVFgSgVHUP18jI2wRx0w",
"_version": 1,
"created": true
}
Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID strings. These GUIDs are generated from a modified FlakeID scheme which allows multiple nodes to be generating unique IDs in parallel with essentially zero chance of collision.
https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html

It's possible to leave the ID field blank and elasticsearch will assign it one. For example a _bulk insert will look like
{"create":{"_index":"products","_type":"product"}}\n
{JSON document 1}\n
{"create":{"_index":"products","_type":"product"}}\n
{JSON document 2}\n
{"create":{"_index":"products","_type":"product"}}\n
{JSON document 3}\n
...and so on
The IDs will look something like 'AUvGyJMOOA8IPUB04vbF'

Delete all documents from index/type without deleting type

I know one can delete all documents from a certain type via deleteByQuery.
Example:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}'
But i have NO term and simply want to delete all documents from that type, no matter what term. What is best practice to achieve this? Empty term does not work.
Link to deleteByQuery

I believe if you combine the delete by query with a match all it should do what you are looking for, something like this (using your example):
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"match_all" : {}
}
}'
Or you could just delete the type:
curl -XDELETE http://localhost:9200/twitter/tweet
Note: XDELETE is deprecated for later versions of ElasticSearch

The Delete-By-Query plugin has been removed in favor of a new Delete By Query API implementation in core. Read here
curl -XPOST 'localhost:9200/twitter/tweet/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'

From ElasticSearch 5.x, delete_by_query API is there by default
POST: http://localhost:9200/index/type/_delete_by_query
{
"query": {
"match_all": {}
}
}

You can delete documents from type with following query:
POST /index/type/_delete_by_query
{
"query" : {
"match_all" : {}
}
}
I tested this query in Kibana and Elastic 5.5.2

Torsten Engelbrecht's comment in John Petrones answer expanded:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d
'{
"query":
{
"match_all": {}
}
}'
(I did not want to edit John's reply, since it got upvotes and is set as answer, and I might have introduced an error)

Starting from Elasticsearch 2.x delete is not anymore allowed, since documents remain in the index causing index corruption.

Since ElasticSearch 7.x, delete-by-query plugin was removed in favor of new Delete By Query API.
The curl option:
curl -X POST "localhost:9200/my-index/_delete_by_query" -H 'Content-Type: application/json' -d' { "query": { "match_all":{} } } '
Or in Kibana
POST /my-index/_delete_by_query
{
"query": {
"match_all":{}
}
}

The above answers no longer work with ES 6.2.2 because of Strict Content-Type Checking for Elasticsearch REST Requests. The curl command which I ended up using is this:
curl -H'Content-Type: application/json' -XPOST 'localhost:9200/yourindex/_doc/_delete_by_query?conflicts=proceed' -d' { "query": { "match_all": {} }}'

In Kibana Console:
POST calls-xin-test-2/_delete_by_query
{
"query": {
"match_all": {}
}
}

(Reputation not high enough to comment)
The second part of John Petrone's answer works - no query needed. It will delete the type and all documents contained in that type, but that can just be re-created whenever you index a new document to that type.
Just to clarify:
$ curl -XDELETE 'http://localhost:9200/twitter/tweet'
Note: this does delete the mapping! But as mentioned before, it can be easily re-mapped by creating a new document.

Note for ES2+
Starting with ES 1.5.3 the delete-by-query API is deprecated, and is completely removed since ES 2.0
Instead of the API, the Delete By Query is now a plugin.
In order to use the Delete By Query plugin you must install the plugin on all nodes of the cluster:
sudo bin/plugin install delete-by-query
All of the nodes must be restarted after the installation.
The usage of the plugin is the same as the old API. You don't need to change anything in your queries - this plugin will just make them work.
*For complete information regarding WHY the API was removed you can read more here.

You have these alternatives:
1) Delete a whole index:
curl -XDELETE 'http://localhost:9200/indexName'
example:
curl -XDELETE 'http://localhost:9200/mentorz'
For more details you can find here -https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
2) Delete by Query to those that match:
curl -XDELETE 'http://localhost:9200/mentorz/users/_query' -d
'{
"query":
{
"match_all": {}
}
}'
*Here mentorz is an index name and users is a type

I'm using elasticsearch 7.5 and when I use
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -d'
{
"query": {
"match_all": {}
}
}'
which will throw below error.
{
"error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
"status" : 406
}
I also need to add extra -H 'Content-Type: application/json' header in the request to make it works.
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
{
"took" : 465,
"timed_out" : false,
"total" : 2275,
"deleted" : 2275,
"batches" : 3,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}

Just to add couple cents to this.
The "delete_by_query" mentioned at the top is still available as a plugin in elasticsearch 2.x.
Although in the latest upcoming version 5.x it will be replaced by
"delete by query api"

Elasticsearch 2.3 the option
action.destructive_requires_name: true
in elasticsearch.yml do the trip
curl -XDELETE http://localhost:9200/twitter/tweet

For future readers:
in Elasticsearch 7.x there's effectively one type per index - types are hidden
you can delete by query, but if you want remove everything you'll be much better off removing and re-creating the index. That's because deletes are only soft deletes under the hood, until the trigger Lucene segment merges*, which can be expensive if the index is large. Meanwhile, removing an index is almost instant: remove some files on disk and a reference in the cluster state.
* The video/slides are about Solr, but things work exactly the same in Elasticsearch, this is Lucene-level functionality.

If you want to delete document according to a date.
You can use kibana console (v.6.1.2)
POST index_name/_delete_by_query
{
"query" : {
"range" : {
"sendDate" : {
"lte" : "2018-03-06"
}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch Bulk Index - Update only if exists - elasticsearch

Dont use upsert and use a normal update. Also if the document does not exist while updating , the update will fail. There by it should work well for you.

Related

Get data from only the latest Elastic index in Grafana

How to create a duplicate index in ElasticSearch from existing index?

Elasticsearch 5.4.0 - How to add new field to existing document

how insert data to Elasticsearch without id

Delete all documents from index/type without deleting type

Categories

Resources