elasticsearch - only return specific fields without _source? - elasticsearch

I've found some answer like
Make elasticsearch only return certain fields?
But they all need _source field.
In my system, disk and network are both scarce resources.
I can't store _source field and I don't need _index, _score field.
ElasticSearch Version: 5.5
Index Mapping just likes
{
"index_2020-04-08": {
"mappings": {
"type1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"rank_score": {
"type": "float"
},
"first_id": {
"type": "keyword"
},
"second_id": {
"type": "keyword"
}
}
}
}
}
}
My query:
GET index_2020-04-08/type1/_search
{
"query": {
"bool": {
"filter": {
"term": {
"first_id": "hello"
}
}
}
},
"size": 1000,
"sort": [
{
"rank_score": {
"order": "desc"
}
}
]
}
The search results I got :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_1",
"_score": null,
"sort": [
0.06621722
]
},
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_2",
"_score": null,
"sort": [
0.07864579
]
}
]
}
}
The results I want:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_id": "id_1"
},
{
"_id": "id_2"
}
]
}
}
Can I implement it?

To return specific fields in the document, you must do one of the two:
Include the _source field in your documents, which is enabled by default.
Store specific fields with the stored fields feature which must be enabled manually
Because you want pretty much the document Ids and some metadata, you can use the filter_path feature.
Here's an example that's close to what you want (just change the field list):
$ curl -X GET "localhost:9200/metricbeat-7.6.1-2020.04.02-000002/_search?filter_path=took,timed_out,_shards,hits.total,hits.max_score,hits.hits._id&pretty"
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_id" : "8SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-iEGSHEBzNscjCyQ18cg"
}
]
}
}

Just to clarify based on the SO question you linked -- you're not storing the _source, you're requesting it from ES. It's usually used to limit what you want to have retrieved, i.e.
...
"_source": ["only", "fields", "I", "need"]
...
_score, _index etc are meta fields that are going to be retrieved no matter what. You can "hack" it a bit by seeting the size to 0 and aggregating, i.e.
{
"size": 0,
"aggs": {
"by_ids": {
"terms": {
"field": "_id"
}
}
}
}
which will save you a few bytes
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Ac76WXEBnteqn982smh_",
"doc_count" : 1
},
{
"key" : "As77WXEBnteqn982EGgq",
"doc_count" : 1
}
]
}
}
}
but performing aggregations has a cost of its own.

Related

Get elasticsearch to ignore diacritics and accents in search hit

I want to search data on elasticsearch with different languages, and expect the data will be retrieved no matter if there is a diacritics or accent.
``
for example I have this data:
``
POST ابجد/_doc/31
{
"name":"def",
"city":"Tulkarem"
}
``
POST ابجٌد/_doc/31 { "name":"def", "city":"Tulkarem" }
PUT /abce
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "standard",
"filter" : ["my_ascii_folding"]
}
},
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : true
}
}
}
}
}
The difference between the two indexes is the diacritics.
Trying to get data:
GET ابجد/_search
I need it to retrieve both index, currently it is revering this:
`{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "ابجد",
"_id": "31",
"_score": 1,
"_source": {
"name": "def",
"city": "Tulkarem"
}
}
]
}
}

ElasticSearch high level client search failed occasional

when I using ElasticSearch High Level Client by submit asyncSearch,I got wrong response occasional. That shard total > 0 but successful and failed is 0, and I can't find any log about this search. for example, searchBuilder log as follow:
{
"size": 0,
"query": {...},
"aggregations": {
"term0": {
"filter": {
"match_all": {
"boost": 1
}
},
"aggregations": {
"countCOUNT_DISTINCTdid": {
"cardinality": {
"field": "did",
"precision_threshold": 40000
}
}
}
}
}
}
Then get wrong response content:
{
"took": 1002,
"timed_out": false,
"terminated_early": false,
"num_reduce_phases": 0,
"_shards": {
"total": 20,
"successful": 0,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "gte"
},
"max_score": null,
"hits": []
}
}
But when query on Kibana as above, the correct result is:
{
"took" : 231,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"term0" : {
"doc_count" : 8526098,
"countCOUNT_DISTINCTdid" : {
"value" : 3929368
}
}
}
}
by the way, other search request at the same time using the same client is ok.
Why does this happen and how to avoid it? Thanks a lot for any hints

Elasticsearch bool must query

I am trying to write a Elasticsearch bool query. I am having an issue querying an field (DATE) using bool must query.
Elastic search data look like so:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 11.519888,
"hits": [
{
"_index": "test-2019.06.27",
"_type": "test",
"_id": "pa6gmGsByDlvLvAyiRF-",
"_score": 11.519888,
"_source": {
"DATE": "01/06/19"
}
}
]
}
}
Elasticsearch query like that:
{
"query":
{
"bool" : {
"must" : [
{
"match" : {
"DATE" : {
"query" : "01/06/19",
"operator" : "AND",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
The query is not working.
Any idea please?
For date-typed queries, I used to write Range query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
Try the below code,
{
"query": {
"range" : {
"DATE" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
}

search data in elastic search based on some fields

I am new to EL and want to search on this data based on "type:": "load".
Please help
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1109,
"max_score": 1,
"hits": [
{"_index": "4",
"_type": "aa",
"_id": "xx",
"_score": 1,
"_source": {
"useRange": false,
"Blueprint": 4,
"standardDeviation": 0,
"occurrences": 0,
"type:": "load",
}...
{
}
Elasticsearch Documentation will help you:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
EDIT
Query is curl -XGET 'localhost:9200/sample/_search?q=type:load&pretty'
and Output will be
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "sample",
"_type" : "data",
"_id" : "1",
"_score" : 0.30685282,
"_source" : {
"useRange" : false,
"Blueprint" : 4,
"standardDeviation" : 0,
"occurrences" : 0,
"type" : "load"
}
} ]
}
}
Issue was with field name 'type' we change the name to typemetrics and below query is working
i thing type might be acting as keyword
" GET /4/_search
{
"query": {
"term" : { "typemetrics" : "load"}
}
} "

Elasticsearch Prefix query not working on nested documents

I'm using a prefix query for an elasticsearch query. It works fine when using it on top-level data, but once applied to nested data there are no results returned. The data I try to query looks as follows:
Here the prefix query works fine:
Query:
{ "query": { "prefix" : { "duration": "7"} } }
Result:
{
"took": 25, ... },
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "itemresults",
"_type": "itemresult",
"_id": "ITEM_RESULT_7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90_8bce0a3f-f951-4a01-94b5-b55dea1a2752_7c965241-ad0a-4a83-a400-0be84daab0a9_61",
"_score": 1,
"_source": {
"score": 1,
"studentId": "61",
"timestamp": 1377399320017,
"groupIdentifiers": {},
"assessmentItemId": "7c965241-ad0a-4a83-a400-0be84daab0a9",
"answered": true,
"duration": "7.078",
"metadata": {
"Korrektur": "a",
"Matrize12_13": "MA.1.B.1.d.1",
"Kompetenz": "ZuV",
"Zyklus": "Z2",
"Schwierigkeit": "H",
"Handlungsaspekt": "AuE",
"Fach": "MA",
"Aufgabentyp": "L"
},
"assessmentSessionId": "7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90",
"assessmentId": "8bce0a3f-f951-4a01-94b5-b55dea1a2752"
}
},
Now trying to use the prefix query to apply on the nested structure 'metadata' doesn't return any result:
{ "query": { "prefix" : { "metadata.Fach": "M"} } }
Result:
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
What am I doing wrong? Is it at all possible to apply prefix on nested data?
It does not depends whether is nested or not. It depends on your mapping, if you are analyzing the string at index time or not.
I'm going to put an example:
I've created and index with the following mapping:
curl -XPUT 'http://localhost:9200/test/' -d '
{
"mappings": {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"index" : "analyzed"
},
"text_2" : {
"index": "not_analyzed",
"type" : "string"
}
}
}
}
}'
Basically 2 text fields, one analyzed and the other not_analyzed. Now I index the following document:
curl -XPUT 'http://localhost:9200/test/test/1' -d '
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}'
text_1 query
As text_1 is analyzed one of the things that elasticsearch does is to convert the field into lower case. So if I make the following query it doesn't find any document:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "H"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
But if I do the trick and use lower case for making the query:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "h"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}
text_2 query
As text_2 is not analyzed, when I make the original query it matches:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_2": "H"} } }
'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}

Resources