ElasticSearch errors in deleting records by query - elasticsearch

I am trying to delete large-number of documents in ES via delete_by_query.
But I am seeing the following errors.
Query
POST indexName/typeName/_delete_by_query
{
"size": 100000,
"query": {
"bool": {
"must": [
{
"range": {
"CREATED_TIME": {
"gte": 0,
"lte": 1507316563000
}
}
}
]
}
}
}
Result
{
"took": 50489,
"timed_out": false,
"total": 100000,
"deleted": 0,
"batches": 1,
"version_conflicts": 1000,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": [
{
"index": "indexName",
"type": "typeName",
"id": "HVBLdzwnImXdVbq",
"cause": {
"type": "version_conflict_engine_exception",
"reason": "[typeName][HVBLdzwnImXdVbq]: version conflict, current version [2] is different than the one provided [1]",
"index_uuid": "YPJcVQZqQKqnuhbC9R7qHA",
"shard": "1",
"index": "indexName"
},
"status": 409
},....

Please read this article.
You have two ways of handling this issue, by set the url to ignore version conflicts or set the query to ignore version conflicts:
If you’d like to count version conflicts rather than cause them to abort then set conflicts=proceed on the url or "conflicts": "proceed" in the request body.

Related

Lucene vs Elasticsearch query syntax

I can see that Elasticsearch support both Lucene syntax and it's own query language.
You can use both and get same kinds of results.
Example (might be done differently maybe but to show what I mean):
Both of these queries produce the same result but use Lucene or Elastic query syntax.
GET /index/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "field101:Denmark"
}
}
]
}
}
}
GET /index/_search
{
"query": {
"match": {
"field101": {
"query": "Denmark"
}
}
}
}
I was wondering are there any kind of implications when choosing one approach over the other (like performance or some kinds of optimizations)? Or is Elastic query syntax just translated to Lucene query somewhere since Elastic runs Lucene as its underlying search engine ?
I was wondering are there any kind of implications when choosing one approach over the other (like performance or some kinds of optimizations)?
Elasticsearch DSL will convert into Lucene query under the hood, you can set "profile":true in the query to see how that works and exactly how much time it takes to convert.
I would say there are no important performance implications and you should always use the DSL, because in many cases Elasticsearch will do optimizations for you. Also, query_string will expect well written Lucene queries, and you can have syntax errors (try doing "Denmark AND" as query_string.
Or is Elastic query syntax just translated to Lucene query somewhere since Elastic runs Lucene as its underlying search engine ?
Yes. You can try it yourself:
GET test_lucene/_search
{
"profile": true,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "field101:Denmark"
}
}
]
}
}
}
will produce:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"profile": {
"shards": [
{
"id": "[KGaFbXIKTVOjPDR0GrI4Dw][test_lucene][0]",
"searches": [
{
"query": [
{
"type": "TermQuery",
"description": "field101:denmark",
"time_in_nanos": 3143,
"breakdown": {
"set_min_competitive_score_count": 0,
"match_count": 0,
"shallow_advance_count": 0,
"set_min_competitive_score": 0,
"next_doc": 0,
"match": 0,
"next_doc_count": 0,
"score_count": 0,
"compute_max_score_count": 0,
"compute_max_score": 0,
"advance": 0,
"advance_count": 0,
"score": 0,
"build_scorer_count": 0,
"create_weight": 3143,
"shallow_advance": 0,
"create_weight_count": 1,
"build_scorer": 0
}
}
],
"rewrite_time": 2531,
"collector": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": 1115
}
]
}
],
"aggregations": []
}
]
}
}
And
GET /test_lucene/_search
{
"profile": true,
"query": {
"match": {
"field101": {
"query": "Denmark"
}
}
}
}
Will produce the same
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"profile": {
"shards": [
{
"id": "[KGaFbXIKTVOjPDR0GrI4Dw][test_lucene][0]",
"searches": [
{
"query": [
{
"type": "TermQuery",
"description": "field101:denmark",
"time_in_nanos": 3775,
"breakdown": {
"set_min_competitive_score_count": 0,
"match_count": 0,
"shallow_advance_count": 0,
"set_min_competitive_score": 0,
"next_doc": 0,
"match": 0,
"next_doc_count": 0,
"score_count": 0,
"compute_max_score_count": 0,
"compute_max_score": 0,
"advance": 0,
"advance_count": 0,
"score": 0,
"build_scorer_count": 0,
"create_weight": 3775,
"shallow_advance": 0,
"create_weight_count": 1,
"build_scorer": 0
}
}
],
"rewrite_time": 3483,
"collector": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": 1780
}
]
}
],
"aggregations": []
}
]
}
}
As you see, times are in nanoseconds, not even miliseconds, that says conversion is fast.
You can read more about here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

Can't delete item in Elasticsearch with _delete_by_query

I would like to delete some items in Elasticsearch database according simple condition. I try to do it via Postman app. So I have a POST request to this url localhost:9200/newlocalsearch/_delete_by_query with this json query:
{
"query": {
"bool": {
"must_not": [
{"exists": {"field": "ico"}}
]
}
}
}
But as I send request to database it returns this error response:
{
"took": 51,
"timed_out": false,
"total": 1,
"deleted": 0,
"batches": 1,
"version_conflicts": 1,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": [
{
"index": "newlocalsearch",
"type": "doc",
"id": "0",
"cause": {
"type": "version_conflict_engine_exception",
"reason": "[doc][0]: version conflict, current version [-1] is different than the one provided [1]",
"index_uuid": "jZbdUfqwSAqtFELXB2Z2AQ",
"shard": "0",
"index": "newlocalsearch"
},
"status": 409
}
]
}
I dont understand what happens. Is there anybody out there :) who knows what it means? Thanks a lot.
It could be you need to refresh your index first:
Send a POST request to localhost:9200/newlocalsearch/_refresh

index_out_of_bounds_exception error after upgrade from Elasticsearch 5.6.x to 6.4.0

I recently upgraded my Elasticsearch version from 5.6.x to 6.4.0 and since then, my query returns the following error:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 3,
"skipped": 0,
"failed": 2,
"failures": [
{
"shard": 2,
"index": "my_index",
"node": "node_name",
"reason": {
"type": "index_out_of_bounds_exception",
"reason": null
}
}
]
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"filtered-brands": {
"meta": {},
"doc_count": 0,
"attributes": {
"doc_count": 0,
"filtered-ids": {
"doc_count": 0,
"ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
}
}
}
Here is the complete Stacktrace:
play.api.Application$$anon$1: Execution exception[[SearchPhaseExecutionException: all shards failed]]
at play.api.Application$class.handleError(Application.scala:296)
at play.api.DefaultApplication.handleError(Application.scala:402)
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320)
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320)
at scala.Option.map(Option.scala:145)
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrElse(PlayDefaultUpstreamHandler.scala:320)
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3.applyOrElse(PlayDefaultUpstreamHandler.scala:316)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:344)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:343)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at play.api.libs.iteratee.Execution$trampoline$.execute(Execution.scala:46)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:293)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:133)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:254)
at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:101)
at org.elasticsearch.action.search.InitialSearchPhase.access$100(InitialSearchPhase.java:48)
at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$onFailure$1(InitialSearchPhase.java:222)
at org.elasticsearch.action.search.InitialSearchPhase.maybeFork(InitialSearchPhase.java:176)
at org.elasticsearch.action.search.InitialSearchPhase.access$000(InitialSearchPhase.java:48)
at org.elasticsearch.action.search.InitialSearchPhase$2.onFailure(InitialSearchPhase.java:222)
at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73)
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:51)
at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:526)
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1068)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1165)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1149)
at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:66)
at org.elasticsearch.action.search.SearchTransportService$6$1.onFailure(SearchTransportService.java:384)
at org.elasticsearch.search.SearchService$2.onFailure(SearchService.java:341)
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:335)
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:844)
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: : null
at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:657)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:131)
... 26 common frames omitted
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: index_out_of_bounds_exception: null
at java.nio.Buffer.checkIndex(Buffer.java:669)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:265)
at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:118)
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:385)
at org.apache.lucene.codecs.lucene70.Lucene70NormsProducer$7.longValue(Lucene70NormsProducer.java:263)
at org.apache.lucene.search.similarities.BM25Similarity$BM25DocScorer.score(BM25Similarity.java:257)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:65)
at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:39)
at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:180)
at org.apache.lucene.search.FilterScorer.score(FilterScorer.java:59)
at org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$FunctionFactorScorer.score(FunctionScoreQuery.java:370)
at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:59)
at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:49)
at org.elasticsearch.script.SearchScript.getScore(SearchScript.java:110)
at org.elasticsearch.painless.ScriptImpl.run(ScriptImpl.java:105)
at org.elasticsearch.search.aggregations.support.values.ScriptDoubleValues.advanceExact(ScriptDoubleValues.java:47)
at org.elasticsearch.search.aggregations.metrics.avg.AvgAggregator$1.collect(AvgAggregator.java:83)
at org.elasticsearch.search.aggregations.LeafBucketCollector$2.collect(LeafBucketCollector.java:67)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:84)
at org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator$1.collect(LongTermsAggregator.java:91)
at org.elasticsearch.search.aggregations.AggregatorFactory$MultiBucketAggregatorWrapper$1.collect(AggregatorFactory.java:140)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:84)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:76)
at org.elasticsearch.search.aggregations.bucket.filter.FilterAggregator$1.collect(FilterAggregator.java:66)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:84)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:76)
at org.elasticsearch.search.aggregations.bucket.filter.FilterAggregator$1.collect(FilterAggregator.java:66)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:84)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:76)
at org.elasticsearch.search.aggregations.bucket.nested.NestedAggregator$BufferingNestedLeafBucketCollector.processBufferedChildBuckets(NestedAggregator.java:183)
at org.elasticsearch.search.aggregations.bucket.nested.NestedAggregator.processBufferedDocs(NestedAggregator.java:121)
at org.elasticsearch.search.aggregations.bucket.nested.NestedAggregator.preGetSubLeafCollectors(NestedAggregator.java:111)
at org.elasticsearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:165)
at org.elasticsearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:166)
at org.elasticsearch.search.aggregations.BucketCollector$2.getLeafCollector(BucketCollector.java:75)
at org.elasticsearch.search.aggregations.BucketCollector$2.getLeafCollector(BucketCollector.java:69)
at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121)
at org.apache.lucene.search.FilterCollector.getLeafCollector(FilterCollector.java:40)
at org.elasticsearch.search.query.CancellableCollector.getLeafCollector(CancellableCollector.java:51)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:653)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:463)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:266)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:107)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:324)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:357)
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
... 9 common frames omitted
After some investigations, I noticed that the problem comes from one of my aggregation which is the following (if I remove it, it works):
"aggregations": {
"filtered-brands": {
"filter": {
"bool": {
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"attributes": {
"nested": {
"path": "attributes"
},
"aggregations": {
"filtered-ids": {
"filter": {
"term": {
"attributes.id": {
"value": "brand",
"boost": 1
}
}
},
"aggregations": {
"ids": {
"terms": {
"field": "attributes.id",
"size": 100,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"types": {
"terms": {
"field": "attributes.type",
"size": 100,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"avg_score": {
"avg": {
"script": {
"source": "_score",
"lang": "painless"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
If I remove the "avg_score" block, the query works fine. However, if I modify the aggregation into a much simpler one including the "avg_score" block, it works fine too. Thus I don't really know if it is the root of the problem.
Does anyone has experienced the same issue after upgrading to ES 6.x? If yes, any clues why this is happening?

Filter aggregated data in Elasticsearch

For the last two days my team deals with solving an issue of querying the data from Elasticsearch DB (ES). Our purpose is to get aggregated data by a field from ES with two values accumulated.
If I would translate it to SQL query we need something like that:
SELECT MAX(FIELD1) AS F1, MAX(FIELD2) AS F2 FROM ES GROUP BY FIELD3 HAVING F1 = ‘SOME_TEXT’
Please put attention that F1 is a text field.
The only solution that we found as of now is:
{
"size": 0 ,
"aggs": {
"flowId": {
"terms": {
"field": "flowId.keyword"
},
"aggs" :{
"scenario" : { "terms" : { "field" : "scnName.keyword" } },
"max_time" : { "max" : { "field" : "inFlowTimeNsec" } },
"sales_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalSales": "scenario"
},
"script": "params.totalSales != null && params.totalSales == 'Test' "
}
}
}
}
}
}
The issue that we encountered is:
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "aggregation_execution_exception",
"reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.bucket.terms.StringTerms"
}
},
"status": 503
}
As far as I understand that issue was already raised: https://github.com/elastic/elasticsearch/issues/23874
The output of the above query without bucket_selector part looks as following:
{
"took": 52,
"timed_out": false,
"_shards": {
"total": 480,
"successful": 480,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 15657901,
"max_score": 0,
"hits": []
},
"aggregations": {
"flowId": {
"doc_count_error_upper_bound": 4104,
"sum_other_doc_count": 9829317,
"buckets": [
{
"key": "0_66718_31120bfd_39ae_4258_81e8_08abd89a81bf",
"doc_count": 107816,
"scenario": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "GetPop",
"doc_count": 12
}
]
},
"max_time": {
"value": 121244876800
}
},
{
"key": "0_67116_31120bfd_39ae_4258_81e8_08abd89a81bf",
"doc_count": 107752,
"scenario": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "GetPop",
"doc_count": 12
}
]
},
"max_time": {
"value": 120955101184
}
},
…
}
The question is there any other way to achieve what we need? I mean we need filter the result of the aggregated data...
Thank you a lot,
EG

Storing JSON array string elasticsearch Bug

I am observing some strange behavior coming out of Elasticsearch 5.2 and it's impossible to debug-- as there are no errors thrown nor am I able to find similar issues/documentation online.
I'm storing a JSON array as a "string" in elasticsearch (using python's json.dumps()) -- long story short, I have to do it this way. However, when I do a DSL query, only the JSON arrays (stored as a singular string) containing 1 object are shown. If more than 1, then it just returns an empty bucket 0 objects. I'm storing them in a field called "metadata".
I'm very confused why only a subset of the data is shown, and other data (with more than 1 object in json array) is ignored. The data is encoded as string. I know for a fact the data stored in index. I can see it in kibana "discovery" -- as I can see large JSON strings with multiple objects.
Example 1 (JSON String w/ 1 object):
[{"score": 0.8829717636108398, "height": 0.875460147857666, "width":
0.3455989360809326, "y": 0.08105117082595825, "x": 0.5616265535354614, "note": "box1"}]
Example 2:
[{"score": 0.8829717636108398, "height": 0.875460147857666, "width":
0.3455989360809326, "y": 0.08105117082595825, "x": 0.5616265535354614, "note": "box1"}, {"score": 0.6821991136108398, "height":
0.875460147857666, "width": 0.3455989360809326, "y": 0.08105117082595825, "x": 0.5616265535354614, "note": "box2"}]
Here is my query:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
{
"range": {
"created_at": {
"gte": 1508012482796,
"lte": 1508014282797,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"5": {
"terms": {
"field": "metadata.keyword",
"size": 31,
"order": {
"_count": "desc"
}
}
}
}
}
This query only returns strings with 1 object. See below:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4214,
"max_score": 0,
"hits": []
},
"aggregations": {
"5": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 35,
"buckets": [
{
"key": "[]",
"doc_count": 102
},
{
"key": "{}",
"doc_count": 8
},
{
"key": "[{\"score\": 0.9015679955482483, \"height\": 0.8632315695285797, \"width\": 0.343660831451416, \"y\": 0.08102986216545105, \"x\": 0.5559845566749573, \"note\": \"box11\"}]",
"doc_count": 6
},
{
"key": "[{\"score\": 0.6365205645561218, \"height\": 0.9410756528377533, \"width\": 0.97696852684021, \"y\": 0.04701271653175354, \"x\": 0.013666868209838867, \"note\": \"box17\"}]",
"doc_count": 4
},
...
}
As observed, only data with JSON strings with 1 objects (i.e. [{..}]) are returned/visible. It's completely ignoring the strings with multiple objects (i.e. [{...},{...}]).
More Clarifications:
It's using the default mappings
I am able to get the JSON string(regardless of the number of objects)
when queried by document id, or using "match" by exact field values)
If you're using the default mapping, this is most probably because your keyword mapping has an ignore_above: 256 settings and looks like this:
{
"mappings": {
"my_type": {
"properties": {
"metadata": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
You can increase that limit in order to index your JSON strings longer than 256 characters.

Resources