Elastic update a field with json data - elasticsearch

POST cars/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"inline": "ctx._source.addresses = [{country:'Country', countryCode : 'cr'}]",
"lang": "painless"
}
}
The script run successfully, no error raised, the output is bellow, but nothing gets updated.
{
"took" : 18092,
"timed_out" : false,
"total" : 400000,
"updated" : 400000,
"deleted" : 0,
"batches" : 400,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
Thanks

Your script needs to look like this instead:
"inline": "ctx._source.addresses = [['country':'Country', 'countryCode' : 'cr']]",
Note that the Painless doesn't handle JSON directly, you need to go through Arrays and Maps instead. As a proof, running your query above, I get the following error:
"script" : "ctx._source.addresses = [{country:'Country', countryCode : 'cr'}]",
"lang" : "painless",
"position" : {
"offset" : 25,
"start" : 0,
"end" : 50
},
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "invalid sequence of tokens near ['{'].",
"caused_by" : {
"type" : "no_viable_alt_exception",
"reason" : null
}
}

Related

ElasticSearch, simple two fields comparison with painless

I'm trying to run a query such as SELECT * FROM indexPeople WHERE info.Age > info.AgeExpectancy
Note the two fields are NOT nested, they are just json object
POST /indexPeople/_search
{
"from" : 0,
"size" : 200,
"query" : {
"bool" : {
"filter" : [
{
"bool" : {
"must" : [
{
"script" : {
"script" : {
"source" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless"
},
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"info"
],
"excludes" : [ ]
}
}
However this query fails as
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.get(ScriptDocValues.java:121)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.getValue(ScriptDocValues.java:115)",
"doc['info.Age'].value > doc['info.AgeExpectancy'].value",
" ^---- HERE"
],
"script" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless",
"position" : {
"offset" : 22,
"start" : 0,
"end" : 70
}
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "indexPeople",
"node" : "c_Dv3IrlQmyvIVpLoR9qVA",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.get(ScriptDocValues.java:121)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.getValue(ScriptDocValues.java:115)",
"doc['info.Age'].value > doc['info.AgeExpectancy'].value",
" ^---- HERE"
],
"script" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless",
"position" : {
"offset" : 22,
"start" : 0,
"end" : 70
},
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
}
}
]
},
"status" : 400
}
Is there a way to achieve this?
What is the best way to debug it? I wanted to print the objects or look at the logs (which aren't there), but I couldn't find a way to do neither.
The mapping is:
{
"mappings": {
"_doc": {
"properties": {
"info": {
"properties": {
"Age": {
"type": "long"
},
"AgeExpectancy": {
"type": "long"
}
}
}
}
}
}
}
perhaps you already solved the issue. The reason why the query failed is clear:
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
Basically there is one or more document that do not have one of the queried fields. So you can achieve the result you need by using an if to check if the fields do indeed exists. If they do not exist, you can simply return false as follows:
{
"script": """
if (doc['info.Age'].size() > 0 && doc['info.AgeExpectancy'].size() > 0) {
return doc['info.Age'].value > doc['info.AgeExpectancy'].value
}
return false;
}
"""
I tested it with an Elasticsearch 7.10.2 and it works.
What is the best way to debug it
That is a though question, perhaps someone has a better answer for it. I try to list some options. Obviously, debugging requires to read carefully the error messages.
PAINLESS LAB
If you have a pretty recent version of Kibana, you can try to use the painless lab to simulate your documents and get the errors quicker and in a more focused environment.
KIBANA Scripted Field
You can try to create a bolean scripted field in the index pattern named condition. Before clicking create remember to click "preview result":
MINIMAL EXAMPLE Create a minimal example to reduce the complexity.
For this answer I used a sample index with four documents with all possible cases.
No info: { "message": "ok"}
Info.Age but not AgeExpectancy: {"message":"ok","info":{"Age":14}}
Info.AgeExpectancy but not Age: {"message":"ok","info":{"AgeExpectancy":12}}
Info.Age and AgeExpectancy: {"message":"ok","info":{"Age":14, "AgeExpectancy": 12}}

Number of records processed in logstash

We're using logstash to sync Elastic search and we've around 3 million documents. It takes 3 to 4 hours to sync. Currently all we get is, it is started and stopped. Is there any way to see how many records processed in logstash ?
If you're using Logstash 5 and higher, the Logstash Monitoring API can help you. You can see and monitor what's happening inside Logstash as it processes events. If you hit the Pipeline stats API you'll get the total number of processed events per stage and plugin (input/filter/output):
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
You'll get this type of response in which you can clearly see at any time how many events have been processed:
{
"pipelines" : {
"test" : {
"events" : {
"duration_in_millis" : 365495,
"in" : 216485,
"filtered" : 216485,
"out" : 216485,
"queue_push_duration_in_millis" : 342466
},
"plugins" : {
"inputs" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
"events" : {
"out" : 216485,
"queue_push_duration_in_millis" : 342466
},
"name" : "beats"
} ],
"filters" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
"events" : {
"duration_in_millis" : 55969,
"in" : 216485,
"out" : 216485
},
"failures" : 216485,
"patterns_per_field" : {
"message" : 1
},
"name" : "grok"
}, {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
"events" : {
"duration_in_millis" : 3326,
"in" : 216485,
"out" : 216485
},
"name" : "geoip"
} ],
"outputs" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
"events" : {
"duration_in_millis" : 278557,
"in" : 216485,
"out" : 216485
},
"name" : "elasticsearch"
} ]
},
"reloads" : {
"last_error" : null,
"successes" : 0,
"last_success_timestamp" : null,
"last_failure_timestamp" : null,
"failures" : 0
},
"queue" : {
"type" : "memory"
}
}
}

Elasticsearch get snapshot size

I'm looking for a way to get the storage size of an specific Elasticsearch snapshot? The snapshots are located on a shared filesystem.
It seems there is no API for this?
In order to get the size or status of the elasticsearch snapshot, run snapshot status API snapshot status API
curl -X GET "localhost:9200/_snapshot/my_repository/my_snapshot/_status?pretty"
Note: Mention appropriate values in the above curl.
Sample Output:
"snapshots" : [
{
"snapshot" : "index-01",
"repository" : "my_repository",
"uuid" : "OKHNDHSKENGHLEWNALWEERTJNS",
"state" : "SUCCESS",
"include_global_state" : true,
"shards_stats" : {
"initializing" : 0,
"started" : 0,
"finalizing" : 0,
"done" : 2,
"failed" : 0,
"total" : 2
},
"stats" : {
"incremental" : {
"file_count" : 149,
"size_in_bytes" : 8229187919
},
"total" : {
"file_count" : 463,
"size_in_bytes" : 169401330819
},
"start_time_in_millis" : 1631622333285,
"time_in_millis" : 208851,
"number_of_files" : 149,
"processed_files" : 149,
"total_size_in_bytes" : 8229187919,
"processed_size_in_bytes" : 8229187919
},
"indices" : {
"graylog_130" : {
"shards_stats" : {
"initializing" : 0,
"started" : 0,
"finalizing" : 0,
"done" : 2,
"failed" : 0,
"total" : 2
},
"stats" : {
"incremental" : {
"file_count" : 149,
"size_in_bytes" : 8229187919
},
"total" : {
"file_count" : 463,
"size_in_bytes" : 169401330819
},
"start_time_in_millis" : 1631622333285,
"time_in_millis" : 208851,
"number_of_files" : 149,
"processed_files" : 149,
"total_size_in_bytes" : 8229187919,
"processed_size_in_bytes" : 8229187919
},
"shards" : {
"0" : {
"stage" : "DONE",
"stats" : {
"incremental" : {
"file_count" : 97,
"size_in_bytes" : 1807163337
},
"total" : {
"file_count" : 271,
"size_in_bytes" : 84885391182
},
"start_time_in_millis" : 1631622334048,
"time_in_millis" : 49607,
"number_of_files" : 97,
"processed_files" : 97,
"total_size_in_bytes" : 1807163337,
"processed_size_in_bytes" : 1807163337
}
},
"1" : {
"stage" : "DONE",
"stats" : {
"incremental" : {
"file_count" : 52,
"size_in_bytes" : 6422024582
},
"total" : {
"file_count" : 192,
"size_in_bytes" : 84515939637
},
"start_time_in_millis" : 1631622333285,
"time_in_millis" : 208851,
"number_of_files" : 52,
"processed_files" : 52,
"total_size_in_bytes" : 6422024582,
"processed_size_in_bytes" : 6422024582
}
}
}
}
In the above output, look for
"total" : {
"file_count" : 463,
"size_in_bytes" : 169401330819
}
Now convert size_in_bytes to GB, you will get the exact size of the snapshot in GB's Convert bytes to GB
You could get storage used by index using _cat API ( primary store size). First snapshot should be around index size.
For Incremental snapshots, it depends . This is because snapshots are taken in a segment level ( index-.. ) so it may be much smaller depending your indexing. Merges could cause new segments to form etc..
https://www.elastic.co/blog/found-elasticsearch-snapshot-and-restore Gives a nice overview
I need an exact solution of the used size on the storage.
Now I use the following approach: separate directories on index/snapshot level and so I can get the used storage size on system level (du command) for a specific index or snapshot.

elasticsearch doesn't update documents

I'm facing up with a trouble related with document updatings.
I'm able to index(create) documents and they are correctly added on index.
Nevertheless, when I'm trying to update one of them, the operation is not made, the document is not updated.
When I first time add the document it's like:
{
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
This is the code I'm using in order to build UpdateRequest:
this.elasticsearchResources.getElasticsearchClient()
.prepareUpdate()
.setIndex(this.user.getMe().getUser())
.setType(type)
.setId(id.toString())
.setDoc(source)
.setUpsert(source)
.setDetectNoop(true);
I've also been able to debug which's the content of this request begore sending it to elasticsearch. The document is:
{
"user":"user4",
"timestamp":"2016-12-16T15:00:22.645Z",
"startTimestamp":"2016-12-16T15:00:22.645Z",
"dueTimestamp":null,
"closingTimestamp":null,
"matter":"F1",
"comment":null,
"status":0,
"backlogStatus":20,
"metainfos":{
},
"resources":[
],
"notes":null
}
As you can see the only difference is metainfos is empty when I try to update the document.
After having performed this update request the document is not updated. I mean the content of metainfos keeps as before:
#curl -XGET 'http://localhost:9200/user4/fuas/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "living_v1",
"_type" : "fuas",
"_id" : "327c9435-c394-11e6-aa90-02420a011808",
"_score" : 1.0,
"_routing" : "user4",
"_source" : {
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
>>>>>>>> "ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
} ]
}
}
I don't quite figure out what's wrong. Any ideas?
ElasticSearch will not update an empty object. You can try with:
null "metainfos":null
or
"metainfos":"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa":[]
to clean the field.

Update an existing collection mongodb

I have a collection
{
"_id" : 100000001,
"horses" : []
"race" : {
"date" : ISODate("2014-06-05T00:00:00.000Z"),
"time" : ISODate("2014-06-05T02:40:00.000Z"),
"type" : "Flat",
"name" : "Hindwoods Maiden Stakes (Div I)",
"run_befor" : 11,
"finish" : null,
"channel" : "ATR",
},
"track" : {
"fences" : 0,
"omitted" : 0,
"hdles" : 0,
"name" : "Lingfield",
"country" : "GB",
"type" : "",
"going" : "Good"
}
}
I'm trying to update it
#result value
{
"race":{
"run_after":"10",
"finish":{
"time":152.34,
"slow":1,
"fast":0,
"gap":5.34
}
},
"track":{
"name":"Lingfield",
"country":"GB",
"type":"",
"going":"Good",
"fences":0,
"omitted":0,
"hdles":0
}
}
Card.where(_id:100000001).update(#result)
When I update the collection of all data is deleted and inserted new
If do set() same
How do to upgrade the existing collection records are updated and not existing added?

Resources