Elasticsearch: how to find a document by number in logs - elasticsearch

I have an error in kibana
"The length [2658823] of field [message] in doc[235892]/index[mylog-2023.02.10] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them."
I know how to deal with it (change "index.highlight.max_analyzed_offset" for an index, or set the query parameter), but I want to find the document with long field and examine it.
If i try to find it by id, i get this:
q:
GET mylog-2023.02.10/_search
{
"query": {
"terms": {
"_id": [ "235892" ]
}
}
}
a:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
q:
GET mylog-2023.02.10/_doc/235892
a:
{ "_index" : "mylog-2023.02.10", "_type" : "_doc", "_id" :
"235892", "found" : false }
Maybe this number (doc[235892]) is not id? How can i find this document?

try use Query IDs:
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}

Related

Elasic Existed Query Not working About Empty String

According to Official 7.x document Link
While a field is deemed non-existent if the JSON value is null or [],
these values will indicate the field does exist:
Empty strings, such as "" or "-" Arrays containing null and another
value, such as [null, "foo"] A custom null-value, defined in field
mapping
However, My es not consider "" as not existed.
Here is my Data:
"_source" : {
"chat_msg" : {
"action" : "send",
"from" : "t",
"msgid" : "6505946507184390735_161_external",
"msgtime" : 1623396135015,
"msgtype" : "text",
"roomid" : "",
Now, When I do Query As :
GET enterprise_chat_data/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "chat_msg.roomid"
}
}
]
}
}
}
Result:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
It Hit Nothing. Am I Wrong About Something?
I think you have misunderstood the documentation. In the documentation, it is written that if a field have value as empty strings, such as "" or "-", then that field will be considered to be existing.
Due to this when you are querying for must_not exists query for "chat_msg.roomid" field, you are getting empty results, as in the data you have indexed the value of "chat_msg.roomid" field as ""
Update 1:
You can use term query to search for documents having field value of chat_msg.roomid as ""
{
"query": {
"term": {
"chat_msg.roomid.keyword": ""
}
}
}

Elasticsearch max of field combined with a unique field

I have an index with two fields:
name: uuid
version: long
I now only want to count the documents (on a very large index [1 million+ entries]) where the version of the name is the highest. For e.g. a query on an index with the following documents:
{name="a", version=1}
{name="a", version=2}
{name="a", version=3}
{name="b", version=1}
... would return:
count=2
Is this somehow possible? I can not find a solution for this particular problem.
You are effectively describing a count of distinct names, which you can do with a cardinality aggregation.
Request:
GET test1/_search
{
"aggs" : {
"distinct_count" : {
"cardinality" : {
"field" : "name.keyword"
}
}
},
"size": 0
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"distinct_count" : {
"value" : 2
}
}
}

Detect changes during bulk indexing

We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?
I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.
You can use the noop detection when checking the result of your bulk queries.
When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)
# Say the document is indexed
PUT test/doc/1
{
"test": "123"
}
# Now you want to bulk update it
POST test/doc/_bulk
{"update":{"_id": "1"}}
{"doc":{"test":"123"}} <-- this will yield `result: noop`
{"update":{"_id": "1"}}
{"doc":{"test":"1234"}} <-- this will yield `result: updated`
{"update":{"_id": "2"}}
{"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`
Result:
{
"took" : 6,
"errors" : false,
"items" : [
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"result" : "noop", <-- see "noop"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"status" : 200
}
},
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 3,
"result" : "updated", <-- see "updated"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 200
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_version" : 1,
"result" : "created", <-- see "created"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
]
}
As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created

elasticsearch doesn't update documents

I'm facing up with a trouble related with document updatings.
I'm able to index(create) documents and they are correctly added on index.
Nevertheless, when I'm trying to update one of them, the operation is not made, the document is not updated.
When I first time add the document it's like:
{
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
This is the code I'm using in order to build UpdateRequest:
this.elasticsearchResources.getElasticsearchClient()
.prepareUpdate()
.setIndex(this.user.getMe().getUser())
.setType(type)
.setId(id.toString())
.setDoc(source)
.setUpsert(source)
.setDetectNoop(true);
I've also been able to debug which's the content of this request begore sending it to elasticsearch. The document is:
{
"user":"user4",
"timestamp":"2016-12-16T15:00:22.645Z",
"startTimestamp":"2016-12-16T15:00:22.645Z",
"dueTimestamp":null,
"closingTimestamp":null,
"matter":"F1",
"comment":null,
"status":0,
"backlogStatus":20,
"metainfos":{
},
"resources":[
],
"notes":null
}
As you can see the only difference is metainfos is empty when I try to update the document.
After having performed this update request the document is not updated. I mean the content of metainfos keeps as before:
#curl -XGET 'http://localhost:9200/user4/fuas/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "living_v1",
"_type" : "fuas",
"_id" : "327c9435-c394-11e6-aa90-02420a011808",
"_score" : 1.0,
"_routing" : "user4",
"_source" : {
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
>>>>>>>> "ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
} ]
}
}
I don't quite figure out what's wrong. Any ideas?
ElasticSearch will not update an empty object. You can try with:
null "metainfos":null
or
"metainfos":"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa":[]
to clean the field.

Count the number of duplicates in elasticsearch

I have an application inserting a numbered sequence of logs into elasticsearch.
Under certain conditions, after stopping my application, I find that in elasticsearch there are more logs than I have actually generated.
This simple aggregation helped me find out that a few duplicates are present:
curl /logstash-*/_search?pretty -d '{
size: 0,
aggs: {
msgnum_terms: {
terms: {
field: "msgnum.raw",
min_doc_count: 2,
size: 0
}
}
}
}'
msgnum is the field containing the numeric sequence. Normally it should be unique and the resulting doc_counts never exceed 1. Instead I get something like:
{
"took" : 33,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 100683,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"msgnum_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "4097",
"doc_count" : 2
}, {
"key" : "4099",
"doc_count" : 2
...
...
...
}, {
"key" : "5704",
"doc_count" : 2
} ]
}
}
}
How can I count the exact number of duplicates in order to make sure that they are the only cause of mismatch between number of generated log lines and number of hits in elasticsearch?

Resources