Elasticsearch deleted document reappears using logstash

Elasticsearch deleted document reappears using logstash - elasticsearch

I am running ES on single node cluster for development.
I am deleting a document using delete api from kibana. It is deleted for a second and immediately reappears. Any help would be appreciated
Here is api command I use:
DELETE test/_doc/12345
{
"_index" : "test",
"_type" : "_doc",
"_id" : "12345",
"_version" : 231,
"result" : "deleted",
"_shards" : {
"total" : 3,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 899,
"_primary_term" : 1
}
GET test/_count
{
"count" : 3,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
Immediately deleted doc is re-indexed
GET test/_count
{
"count" : 4,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}

According to the documentation:
...If clean_run is set to true, this value will be ignored and
sql_last_value will be set to Jan 1, 1970
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_state
That may explain why all your data are added each 10 minutes. Remove the clean_run and test again or check if the _version filed is updated.

I found that it was an data issue. my logstash jdbc statement checks for modificationdate greater than sql_last_value. and scheduler is set to run every 10 seconds. The reappeared documents have modificationdate in the future, changing it to current date solved the problem

Related

How to find execution time of elasticsearch query

I am using elasticsearch to get the fast result from a rails app. I want to know how much time a particular query took to get executed? Is there any tool where I can find and compare execution time so that I can optimize query?

The 'took' attribute in the response object is the execution time in milliseconds. For example:
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

Detect changes during bulk indexing

We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?
I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.

You can use the noop detection when checking the result of your bulk queries.
When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)
# Say the document is indexed
PUT test/doc/1
{
"test": "123"
}
# Now you want to bulk update it
POST test/doc/_bulk
{"update":{"_id": "1"}}
{"doc":{"test":"123"}} <-- this will yield `result: noop`
{"update":{"_id": "1"}}
{"doc":{"test":"1234"}} <-- this will yield `result: updated`
{"update":{"_id": "2"}}
{"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`
Result:
{
"took" : 6,
"errors" : false,
"items" : [
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"result" : "noop", <-- see "noop"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"status" : 200
}
},
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 3,
"result" : "updated", <-- see "updated"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 200
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_version" : 1,
"result" : "created", <-- see "created"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
]
}
As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created

Elasticsearch _reindex API not copying documents

I'm trying to upgrade an old 1.5 elastic index to 6.0, according to docs (https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade.html)
I can create a new index in 6.0 and then use reindex from remote using reindex from remote (https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade-remote.html)
Both of these instances are running inside docker containers I just wanted to test this in local before actually doing it in production
I can see there are documents indexed in my old index.
curl -XGET 'http://localhost:9200/old_index/_search?pretty'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "old_index",
"_type" : "item",
"_id" : "92",
"_score" : 1.0,
"_source":{"user_id":3,"slug":"asdfaisjeilej","name":"lake.jpgasdad","item_type":"image","created_at":"2018-01-23T18:11:30Z","deleted_at":null,"content_length":1252171}
}]}
}
After creating a new index (new_index) in my elasticsearch 6.0 instance, with a slightly diff mapping (change string types to text), I then proceed to reindex from remote using the following command. (note than my other instance is running in port 9400)
curl -XPOST 'localhost:9400/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
"source": {
"remote": {
"host": "http://localhost:9200"
},
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
I get the following response
{
"took" : 136,
"timed_out" : false,
"total" : 0,
"updated" : 0,
"created" : 0,
"deleted" : 0,
"batches" : 0,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
So basically, documents from old_index are not being copied to new_index, and I have no idea why this is happening. Is there a step I'm missing, I'm following elasticsearch docs exactly as they read apparently.

As I mentioned, I also had the same issue while migrating from Elasticsearch-2 to Elasticsearch-6 after I tested the remote-reindexing in staging environment without dockers.
My workaround was to create an instance of the old version (not on docker), load it from backup and reindex from it to elasticsearch 6 instance that not running on docker.
If you still want to run elasticsearch 6 on docker you can always mount the data to your container.
Hope you find it helpful.

Elasticsearch CouchDB River no hit

I have a problem with CouchDB and Elasticsearch. i use Docker to realise it. i have a working couchdb container on the default port. Now i use this container:
registry.hub.docker.com/u/jeko/elasticsearch-river-couchdb/
And i insert a new couchdb connection with this:
curl -X PUT '127.0.0.1:9200/_river/testdb/_meta' -d ' { "type" : "couchdb", "couchdb" : { "host" : "couchdb", "port" : 5984, "db" : "articles", "filter" : null }, "index" : { "index" : "articles", "type" : "articles", "bulk_size" : "100", "bulk_timeout" : "10ms" } }'
to have a working elasticsearch with the couchdb river. Now i checked with curl host/articles/articles/_search?pretty=true the documents. The Hits are empty.
{
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
i turned the debugger on and checked the logging file. The output is this: http://pastebin.com/ETkNmJzT
The only conspicuous thing i found is this line: [2015-02-20 14:04:24,554][DEBUG][plugins ] [Arc] [/elasticsearch/plugins/river-couchdb/_site] directory does not exist.
But i doesn't understand why it doesn't work. i can curl the IP

Elastic Search Index Status

I am trying to setup a scripted reindex operation as suggested in: http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
To go with the suggestion of creating a new index, aliasing then deleting the old index I would need to have a way to tell when the indexing operation on the new index was complete. Ideally via the REST interface.
It has 80 million rows to index and can take a few hours.
I can't find anything helpful in the docs..

You can try with _stats : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-stats.html
Eg :
{
"_shards" : {
"total" : 10,
"successful" : 5,
"failed" : 0
},
"_all" : {
"primaries" : {
"docs" : {
"count" : 0,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 575,
"throttle_time_in_millis" : 0
},
"indexing" : {
"index_total" : 0,
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
I think, you can compare _all.total.docs.count and _all.total.indexing.index_current

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch deleted document reappears using logstash - elasticsearch

I found that it was an data issue. my logstash jdbc statement checks for modificationdate greater than sql_last_value. and scheduler is set to run every 10 seconds. The reappeared documents have modificationdate in the future, changing it to current date solved the problem

Related

How to find execution time of elasticsearch query

Detect changes during bulk indexing

Elasticsearch _reindex API not copying documents

Elasticsearch CouchDB River no hit

Elastic Search Index Status

Categories

Resources