Elasticsearch Prefix query not working on nested documents - elasticsearch

I'm using a prefix query for an elasticsearch query. It works fine when using it on top-level data, but once applied to nested data there are no results returned. The data I try to query looks as follows:
Here the prefix query works fine:
Query:
{ "query": { "prefix" : { "duration": "7"} } }
Result:
{
"took": 25, ... },
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "itemresults",
"_type": "itemresult",
"_id": "ITEM_RESULT_7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90_8bce0a3f-f951-4a01-94b5-b55dea1a2752_7c965241-ad0a-4a83-a400-0be84daab0a9_61",
"_score": 1,
"_source": {
"score": 1,
"studentId": "61",
"timestamp": 1377399320017,
"groupIdentifiers": {},
"assessmentItemId": "7c965241-ad0a-4a83-a400-0be84daab0a9",
"answered": true,
"duration": "7.078",
"metadata": {
"Korrektur": "a",
"Matrize12_13": "MA.1.B.1.d.1",
"Kompetenz": "ZuV",
"Zyklus": "Z2",
"Schwierigkeit": "H",
"Handlungsaspekt": "AuE",
"Fach": "MA",
"Aufgabentyp": "L"
},
"assessmentSessionId": "7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90",
"assessmentId": "8bce0a3f-f951-4a01-94b5-b55dea1a2752"
}
},
Now trying to use the prefix query to apply on the nested structure 'metadata' doesn't return any result:
{ "query": { "prefix" : { "metadata.Fach": "M"} } }
Result:
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
What am I doing wrong? Is it at all possible to apply prefix on nested data?

It does not depends whether is nested or not. It depends on your mapping, if you are analyzing the string at index time or not.
I'm going to put an example:
I've created and index with the following mapping:
curl -XPUT 'http://localhost:9200/test/' -d '
{
"mappings": {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"index" : "analyzed"
},
"text_2" : {
"index": "not_analyzed",
"type" : "string"
}
}
}
}
}'
Basically 2 text fields, one analyzed and the other not_analyzed. Now I index the following document:
curl -XPUT 'http://localhost:9200/test/test/1' -d '
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}'
text_1 query
As text_1 is analyzed one of the things that elasticsearch does is to convert the field into lower case. So if I make the following query it doesn't find any document:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "H"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
But if I do the trick and use lower case for making the query:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "h"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}
text_2 query
As text_2 is not analyzed, when I make the original query it matches:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_2": "H"} } }
'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}

Related

why elasticsearch can not search a document contains one word?

I am using default settings for one index, follow DSL is how to create the documents and searching.
### create index
PUT /mk_test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"_doc": {
"properties": {
"nickName": {
"type": "text"
}
}
}
}
}
### get index
GET /mk_test/_mapping
### create document
POST /mk_test/_doc
{
"nickName": "C.BP"
}
### create document
POST /mk_test/_doc
{
"nickName": "BP"
}
### create document
POST /mk_test/_doc
{
"nickName": "C.B"
}
### create document
POST /mk_test/_doc
{
"nickName": "你好,中国"
}
now I have 4 document in mk_test index,
and I have 2 search query, give me different answers.
I want to query docs contains "中国"
GET /mk_test/_search
{
"query": {
"bool": {
"must": [
{"match_phrase": {"nickName": "中国"}}
]
}
}
}
server responses:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5779729,
"hits" : [
{
"_index" : "mk_test",
"_type" : "_doc",
"_id" : "c2gwwX0BTkUG9klh1b8k",
"_score" : 1.5779729,
"_source" : {
"nickName" : "你好,中国"
}
}
]
}
}
I want to query docs contains "BP", I can't get "C.BP",
GET /mk_test/_search
{
"query": {
"bool": {
"must": [
{"match_phrase": {"nickName": "BP"}}
]
}
}
}
server give me only "BP", but "C.BP" not found
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.4599355,
"hits" : [
{
"_index" : "mk_test",
"_type" : "_doc",
"_id" : "TmguwX0BTkUG9klhAJ_S",
"_score" : 1.4599355,
"_source" : {
"nickName" : "BP"
}
}
]
}
}
How can I find both "BP" and "C.BP" ?

Query consecutive words using match_phrase elasticsearch works unexpected

I have the parameter name as a text:
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
}
}
}
Because of the nature of text type in ElasticSearch, matchs every word on the phrase. That's why in some cases I get the next results:
POST /example-tags/_search
{
"query": {
"match": {
"name": "Jordan Rudess was born in 1956"
}
}
}
// Results
{
"took" : 28,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.1596613,
"hits" : [
{
"_index" : "example-tags",
"_type" : "_doc",
"_id" : "6101e538bc8ec610aff699e4",
"_score" : 4.1596613,
"_source" : {
"name" : "Jordan Rudess"
}
},
{
"_index" : "example-tags",
"_type" : "_doc",
"_id" : "610123538bc8ec61034ff699e4",
"_score" : 4.1796613,
"_source" : {
"name" : "Alice in Chains"
}
},
]
}
}
As you can see, in the text Jordan Rudess was born in 1956 I get the result Alice in Chains just for the word in. I want to avoid this behaviour.
If I try:
POST /example-tags/_search
{
"query": {
"match_phrase": {
"name": "Dream Theater keyboardist's Jordan Rudess was born in 1956"
}
}
}
// Results
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
So, in the past example I was expecting to get the Jordan Rudess tag name but I get empty results.
I need to get the maximum ocurrences in tag.name of consecutive words in a phrase. How can I achieve that?

delete all documents where id start with a number Elasticsearch

What is the fastest way to get all _ids ?
I need a query to delete all documents where _id start with a number in elasticsearch.
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "_2432475",
"_score" : 1.0,
"_source" : {
"name" : "999",
"file" : null,
"age" : null,
}
},
Your best bet is to first copy the internal _id into a doc-level field (let's call it internal_id:
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.internal_id = ctx._id",
"lang": "painless"
}
}
and then use a match_phrase_prefix query like so:
GET myindex/_search
{
"query": {
"match_phrase_prefix": {
"internal_id": "_24"
}
}
}
POST /myindex/_delete_by_query' \
-H 'Content-Type: application/json' \
-d '{
"query": {
"terms": {
"_id": [ "1", "2" ]
}
}
}'
wild card on _id is not supported in elasticsearch, either you have to index similar key explictly into the doc or
you can update doc using _update_by_query and add _id key into it

elasticsearch - only return specific fields without _source?

I've found some answer like
Make elasticsearch only return certain fields?
But they all need _source field.
In my system, disk and network are both scarce resources.
I can't store _source field and I don't need _index, _score field.
ElasticSearch Version: 5.5
Index Mapping just likes
{
"index_2020-04-08": {
"mappings": {
"type1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"rank_score": {
"type": "float"
},
"first_id": {
"type": "keyword"
},
"second_id": {
"type": "keyword"
}
}
}
}
}
}
My query:
GET index_2020-04-08/type1/_search
{
"query": {
"bool": {
"filter": {
"term": {
"first_id": "hello"
}
}
}
},
"size": 1000,
"sort": [
{
"rank_score": {
"order": "desc"
}
}
]
}
The search results I got :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_1",
"_score": null,
"sort": [
0.06621722
]
},
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_2",
"_score": null,
"sort": [
0.07864579
]
}
]
}
}
The results I want:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_id": "id_1"
},
{
"_id": "id_2"
}
]
}
}
Can I implement it?
To return specific fields in the document, you must do one of the two:
Include the _source field in your documents, which is enabled by default.
Store specific fields with the stored fields feature which must be enabled manually
Because you want pretty much the document Ids and some metadata, you can use the filter_path feature.
Here's an example that's close to what you want (just change the field list):
$ curl -X GET "localhost:9200/metricbeat-7.6.1-2020.04.02-000002/_search?filter_path=took,timed_out,_shards,hits.total,hits.max_score,hits.hits._id&pretty"
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_id" : "8SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-iEGSHEBzNscjCyQ18cg"
}
]
}
}
Just to clarify based on the SO question you linked -- you're not storing the _source, you're requesting it from ES. It's usually used to limit what you want to have retrieved, i.e.
...
"_source": ["only", "fields", "I", "need"]
...
_score, _index etc are meta fields that are going to be retrieved no matter what. You can "hack" it a bit by seeting the size to 0 and aggregating, i.e.
{
"size": 0,
"aggs": {
"by_ids": {
"terms": {
"field": "_id"
}
}
}
}
which will save you a few bytes
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Ac76WXEBnteqn982smh_",
"doc_count" : 1
},
{
"key" : "As77WXEBnteqn982EGgq",
"doc_count" : 1
}
]
}
}
}
but performing aggregations has a cost of its own.

search data in elastic search based on some fields

I am new to EL and want to search on this data based on "type:": "load".
Please help
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1109,
"max_score": 1,
"hits": [
{"_index": "4",
"_type": "aa",
"_id": "xx",
"_score": 1,
"_source": {
"useRange": false,
"Blueprint": 4,
"standardDeviation": 0,
"occurrences": 0,
"type:": "load",
}...
{
}
Elasticsearch Documentation will help you:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
EDIT
Query is curl -XGET 'localhost:9200/sample/_search?q=type:load&pretty'
and Output will be
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "sample",
"_type" : "data",
"_id" : "1",
"_score" : 0.30685282,
"_source" : {
"useRange" : false,
"Blueprint" : 4,
"standardDeviation" : 0,
"occurrences" : 0,
"type" : "load"
}
} ]
}
}
Issue was with field name 'type' we change the name to typemetrics and below query is working
i thing type might be acting as keyword
" GET /4/_search
{
"query": {
"term" : { "typemetrics" : "load"}
}
} "

Resources