elasticsearch doesn't update documents - elasticsearch

I'm facing up with a trouble related with document updatings.
I'm able to index(create) documents and they are correctly added on index.
Nevertheless, when I'm trying to update one of them, the operation is not made, the document is not updated.
When I first time add the document it's like:
{
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
This is the code I'm using in order to build UpdateRequest:
this.elasticsearchResources.getElasticsearchClient()
.prepareUpdate()
.setIndex(this.user.getMe().getUser())
.setType(type)
.setId(id.toString())
.setDoc(source)
.setUpsert(source)
.setDetectNoop(true);
I've also been able to debug which's the content of this request begore sending it to elasticsearch. The document is:
{
"user":"user4",
"timestamp":"2016-12-16T15:00:22.645Z",
"startTimestamp":"2016-12-16T15:00:22.645Z",
"dueTimestamp":null,
"closingTimestamp":null,
"matter":"F1",
"comment":null,
"status":0,
"backlogStatus":20,
"metainfos":{
},
"resources":[
],
"notes":null
}
As you can see the only difference is metainfos is empty when I try to update the document.
After having performed this update request the document is not updated. I mean the content of metainfos keeps as before:
#curl -XGET 'http://localhost:9200/user4/fuas/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "living_v1",
"_type" : "fuas",
"_id" : "327c9435-c394-11e6-aa90-02420a011808",
"_score" : 1.0,
"_routing" : "user4",
"_source" : {
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
>>>>>>>> "ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
} ]
}
}
I don't quite figure out what's wrong. Any ideas?

ElasticSearch will not update an empty object. You can try with:
null "metainfos":null
or
"metainfos":"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa":[]
to clean the field.

Related

Elasticsearch: how to find a document by number in logs

I have an error in kibana
"The length [2658823] of field [message] in doc[235892]/index[mylog-2023.02.10] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them."
I know how to deal with it (change "index.highlight.max_analyzed_offset" for an index, or set the query parameter), but I want to find the document with long field and examine it.
If i try to find it by id, i get this:
q:
GET mylog-2023.02.10/_search
{
"query": {
"terms": {
"_id": [ "235892" ]
}
}
}
a:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
q:
GET mylog-2023.02.10/_doc/235892
a:
{ "_index" : "mylog-2023.02.10", "_type" : "_doc", "_id" :
"235892", "found" : false }
Maybe this number (doc[235892]) is not id? How can i find this document?
try use Query IDs:
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}

Problem re-adding the same fields to start with a lowercase letter when updating

I am encountering a problem while updating. The problem I'm having is, for example, I want to update the Title field, but I see that a new field has been created for the Title field in the document. (A title field that starts with a lowercase letter). I'm doing the update process with NEST, can anyone share an idea with me? Thank you in advance for your help.
The state of the document before the update:
{
"_index" : "my_test_index",
"_type" : "_doc",
"_id" : "uPggFnoBChFNLIc8qdjW",
"_score" : 31.908756,
"_source" : {
"RelatedPassiveCompanyId" : "0d075c1681106286cfe9f31999f8247c",
"CreateTime" : "2021-06-16T21:41:17.2697847+03:00",
"Title" : "FE NEW CENTURY INDUSTRY(SINGAPORE)PTE LTD",
"IsBannedFromOpenCorpCompanies" : false,
"CreatedBy" : 1,
"IsActivated" : false,
"IsCancelled" : false,
"IsMembershipTypeBought" : false
}
}
The state of the document after the update:
{
"_index" : "my_test_index",
"_type" : "_doc",
"_id" : "uPggFnoBChFNLIc8qdjW",
"_score" : 26.380388,
"_source" : {
"RelatedPassiveCompanyId" : "0d075c1681106286cfe9f31999f8247c",
"CreateTime" : "2021-06-16T21:41:17.2697847+03:00",
"Title" : "FE NEW CENTURY INDUSTRY(SINGAPORE)PTE LTD",
"IsBannedFromOpenCorpCompanies" : false,
"CreatedBy" : 1,
"IsActivated" : false,
"IsCancelled" : false,
"IsMembershipTypeBought" : false,
"isBannedFromOpenCorpCompanies" : false,
"contactInformations" : {
"contactPerson" : { },
"phones" : [ ]
},
"isCancelled" : false,
"dnbInformation" : {
"processId" : "fba921ee-493d-4f12-aa0a-0a432b9e8b3a",
"requestLogs" : [
{
"requestTime" : "2021-11-23T10:03:09.8302661+03:00",
"message" : "Company not found on Dnb",
"resultType" : 2
}
]
},
"createTime" : "2021-06-16T21:41:17.2697847+03:00",
"createdBy" : 1,
"isMembershipTypeBought" : false,
"isActivated" : false,
"title" : "FE NEW CENTURY INDUSTRY(SINGAPORE)PTE LTD",
"relatedPassiveCompanyId" : "0d075c1681106286cfe9f31999f8247c"
}
},
my update function:
public bool UpdateDocuments(IHit<MyESModel> documentHitItem)
{
var response = elasticClient.Update<MyESModel, object>(DocumentPath<MyESModel>
.Id(documentHitItem.Id), u => u
.Index("my_test_index")
.Doc(documentHitItem.Source)
.DocAsUpsert(true)
.RetryOnConflict(8)
);
return response.IsValid;
}
I would be very happy if anyone has any idea what the problem could be.

Detect changes during bulk indexing

We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?
I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.
You can use the noop detection when checking the result of your bulk queries.
When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)
# Say the document is indexed
PUT test/doc/1
{
"test": "123"
}
# Now you want to bulk update it
POST test/doc/_bulk
{"update":{"_id": "1"}}
{"doc":{"test":"123"}} <-- this will yield `result: noop`
{"update":{"_id": "1"}}
{"doc":{"test":"1234"}} <-- this will yield `result: updated`
{"update":{"_id": "2"}}
{"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`
Result:
{
"took" : 6,
"errors" : false,
"items" : [
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"result" : "noop", <-- see "noop"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"status" : 200
}
},
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 3,
"result" : "updated", <-- see "updated"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 200
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_version" : 1,
"result" : "created", <-- see "created"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
]
}
As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created

ElasticSearch: Check if a field is informed

I'm using ElasticSearch as primary backend infrastructure.
Currently, I need to know whether a field is informed, so, I need to know #documents with a field 'x' > 0.
Imagine a collection with these two documents:
[ {
"_index" : "living_v1",
"_type" : "fuas",
"_id" : "58fb4509-9452-11e6-a361-02420a016207",
"_score" : 1.0,
"_routing" : "living_team",
"_source" : {
"user" : "living_team",
"timestamp" : "2016-10-17T10:29:27.037Z",
"startTimestamp" : "2016-10-17T10:29:27.037Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "FUA1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
"cabeaacaceaacadeaacaeeaacafeaa" : [ "s11" ],
"cdbccaeacdbccaeacdbccaeacdbccaeacdbccaea_ldate" : [ "2016-10-19T07:08:23.130Z" ]
},
"resources" : [ ],
"notes" : null
}
}, {
"_index" : "living_v1",
"_type" : "fuas",
"_id" : "2298eab3-9a8a-11e6-8f4a-02420a010a07",
"_score" : 1.0,
"_routing" : "living_team",
"_source" : {
"user" : "living_team",
"timestamp" : "2016-10-25T09:53:23.078Z",
"startTimestamp" : "2016-10-25T09:53:23.078Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "FUA2",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
"aecfacebfaaecfcebfaaecfcebfaaecfcebfaaecfcebfa" : [ "s22" ]
},
"resources" : [ ],
"notes" : null
}
} ]
I'd like to know how many documents have a field status informed. For example, if I want to know if a field named exfield is informed, the response will be NO, nevertheless, if I want to know if a field named matter is informed the response will be YES.
Does exist anyway to do that?

Elasticsearch CouchDB River no hit

I have a problem with CouchDB and Elasticsearch. i use Docker to realise it. i have a working couchdb container on the default port. Now i use this container:
registry.hub.docker.com/u/jeko/elasticsearch-river-couchdb/
And i insert a new couchdb connection with this:
curl -X PUT '127.0.0.1:9200/_river/testdb/_meta' -d ' { "type" : "couchdb", "couchdb" : { "host" : "couchdb", "port" : 5984, "db" : "articles", "filter" : null }, "index" : { "index" : "articles", "type" : "articles", "bulk_size" : "100", "bulk_timeout" : "10ms" } }'
to have a working elasticsearch with the couchdb river. Now i checked with curl host/articles/articles/_search?pretty=true the documents. The Hits are empty.
{
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
i turned the debugger on and checked the logging file. The output is this: http://pastebin.com/ETkNmJzT
The only conspicuous thing i found is this line: [2015-02-20 14:04:24,554][DEBUG][plugins ] [Arc] [/elasticsearch/plugins/river-couchdb/_site] directory does not exist.
But i doesn't understand why it doesn't work. i can curl the IP

Resources