How to index percolator queries containing filters on inner objects? - elasticsearch

Using Elasticsearch 2.1.1
I have documents with inner objects:
{
"level1": {
"level2": 42
}
}
I want to register percolator queries applying filters on the inner property:
$ curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
It fails because I don't have a mapping:
{
"error" : {
"root_cause" : [ {
"type" : "query_parsing_exception",
"reason" : "Strict field resolution and no field mapping can be found for the field with name [level1.level2]",
"index" : "myindex",
"line" : 1,
"col" : 58
} ],
"type" : "percolator_exception",
"reason" : "failed to parse query [myquery]",
"index" : "myindex",
"caused_by" : {
"type" : "query_parsing_exception",
"reason" : "Strict field resolution and no field mapping can be found for the field with name [level1.level2]",
"index" : "myindex",
"line" : 1,
"col" : 58
}
},
"status" : 500
}
So I start again, but this time I add a mapping template before:
curl -XDELETE http://localhost:9200/_template/myindex
curl -XDELETE http://localhost:9200/myindex
curl -XPUT http://localhost:9200/_template/myindex?pretty -d 'x
{
"template": "myindex",
"mappings" : {
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
'
I try to register my percolator query again:
curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
And now it succeeds:
{
"_index" : "myindex",
"_type" : ".percolator",
"_id" : "myquery",
"_version" : 1,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"created" : true
}
And I can see the mapping that has been created:
curl http://localhost:9200/myindex/_mapping?pretty
{
"myindex" : {
"mappings" : {
".percolator" : {
"properties" : {
"query" : {
"type" : "object",
"enabled" : false
}
}
},
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
}
Now my problem is that I also need to perform searches on my percolator queries and the default percolate mapping doesn’t index the query field.
So I start again, this time specifying in my mapping template that I want percolator queries to be indexed (note "enabled": true):
curl -XPUT http://localhost:9200/_template/myindex?pretty -d '
{
"template": "myindex",
"mappings" : {
".percolator" : {
"properties" : {
"query" : {
"type" : "object",
"enabled" : true
}
}
},
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
'
I try to register my percolator query again:
curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
But now I get an error:
{
"error" : {
"root_cause" : [ {
"type" : "mapper_parsing_exception",
"reason" : "Field name [level1.level2] cannot contain '.'"
} ],
"type" : "mapper_parsing_exception",
"reason" : "Field name [level1.level2] cannot contain '.'"
},
"status" : 400
}
How can I create and index a percolator query matching an inner property?

Related

ELK bool query with match and prefix

I'm new in ELK. I have a problem with the followed search query:
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/commsrch/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "franc"}},
{"prefix" : {"srt" : "99889300200"}}
]
}
}
}
'
I need to find all documents that satisfies the condition: OR field "cn" contains "franc" OR field "srt" starts with "99889300200".
Index mapping:
{
"commsrch" : {
"mappings" : {
"properties" : {
"addr" : {
"type" : "text",
"index" : false
},
"cn" : {
"type" : "text",
"analyzer" : "compname"
},
"srn" : {
"type" : "text",
"analyzer" : "srnsrt"
},
"srt" : {
"type" : "text",
"analyzer" : "srnsrt"
}
}
}
}
}
Index settings:
{
"commsrch" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "commsrch",
"creation_date" : "1675079141160",
"analysis" : {
"filter" : {
"ngram_filter" : {
"type" : "ngram",
"min_gram" : "3",
"max_gram" : "4"
}
},
"analyzer" : {
"compname" : {
"filter" : [
"lowercase",
"stop",
"ngram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"srnsrt" : {
"type" : "custom",
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "1",
"uuid" : "C15EXHnaTIq88JSYNt7GvA",
"version" : {
"created" : "8060099"
}
}
}
}
}
Query works properly with just only one condition. If query has only "match" condition, results has properly documents count. If query has only "prefix" condition, results has properly documents count.
In case of two conditions "match" and "prefix", i see in result documents that corresponds only "prefix" condition.
In ELK docs can't find any limitation about mixing "prefix" and "match", but as i see some problem exists. Please help to find where is the problem.
In continue of experince I have one more problem.
Example:
Source data:
1st document cn field: "put stone is done"
2nd document cn field:: "job one or two"
Mapping and index settings the same as described in my first post
Request:
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "one"}},
{"prefix" : {"cn" : "one"}}
]
}
}
}
'
As I understand, the high scores got first document, because it has more repeats of "one". But I need high scores for documents, that has at least one word in field "cn" started from string "one". I have experiments with query:
{
"query": {
"bool": {
"should": [
{"match": {"cn": "one"}},
{
"constant_score": {
"filter": {
"prefix": {
"cn": "one"
}
},
"boost": 100
}
}
]
}
}
}
But it doesn't work properly. What's wrong with my query?

match_only_text fields do not support sorting and aggregations elasticsearch

I would like to count and sort the number of occurred message on a field of type match_only_text. Using a DSL query the output needed to have to look like this:
{" Text message 1":615
" Text message 2":568
....}
So i tried this on kibana:
GET my_index_name/_search?size=0
{
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message"
}
}
}
}
However i get this error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "match_only_text fields do not support sorting and aggregations"
}
I am interested in the field "message" this is its mapping:
"message" : {
"type" : "match_only_text"
}
This is a part of the index mapping:
"mappings" : {
"_meta" : {
"package" : {
"name" : "system"
},
"managed_by" : "ingest-manager",
"managed" : true
},
"_data_stream_timestamp" : {
"enabled" : true
},
"dynamic_templates" : [
{
"strings_as_keyword" : {
"match_mapping_type" : "string",
"mapping" : {
"ignore_above" : 1024,
"type" : "keyword"
}
}
}
],
"date_detection" : false,
"properties" : {
"#timestamp" : {
"type" : "date"
}
.
.
.
"message" : {
"type" : "match_only_text"
},
"process" : {
"properties" : {
"name" : {
"type" : "keyword",
"ignore_above" : 1024
},
"pid" : {
"type" : "long"
}
}
},
"system" : {
"properties" : {
"syslog" : {
"type" : "object"
}
}
}
}
}
}
}
Please Help
Yes, by design, match_only_text is of the text field type family, hence you cannot aggregate on it.
You need to:
A. create a message.keyword sub-field in your mapping of type keyword:
PUT my_index_name/_mapping
{
"properties": {
"message" : {
"type" : "match_only_text",
"fields": {
"keyword": {
"type" : "keyword"
}
}
}
}
}
B. update the whole index (using _update_by_query) so the sub-field gets populated and
POST my_index_name/_update_by_query?wait_for_completion=false
Then, depending on the size of your index, call GET _tasks?actions=*byquery&detailed regularly to check the progress of the task.
C. run the aggregation on that sub-field.
POST my_index_name/_search
{
"size": 0,
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message.keyword"
}
}
}
}

Not able to get any results on using bucket aggregations

I have some PR data in my ES. This is how the documents are modelled
{
"Author" : "dheerajrav",
"Date" : "2012-10-05T10:16:49Z",
"Number" : 2554441,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "dheerajrav",
"Date" : "2012-10-05T09:11:35Z",
"Number" : 2553883,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "crodjer",
"Date" : "2012-10-04T15:40:22Z",
"Number" : 2544540,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "crodjer",
"Date" : "2012-10-04T07:52:20Z",
"Number" : 2539410,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
}
.
.
.
]
}
I am trying the following terms agg on my index but I get no results
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Author",
"size" : 100
}
}
}
}
'
The desired result would have been separate buckets for each PR author. This is the response
"aggregations" : {
"contributors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
Am I modeling my data wrong?
This is the mapping for my index
{
"newidx" : {
"mappings" : {
"properties" : {
"Stats" : {
"properties" : {
"Author" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Body" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Date" : {
"type" : "date"
},
"IsMerged" : {
"type" : "boolean"
},
"MergedBy" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Number" : {
"type" : "long"
}
}
}
}
}
}
}
I generate a json file in my code and index it to elasticsearch using elasticsearch_loader, here is the command
elasticsearch_loader --es-host 'localhost' --index org-skills --type incident json --lines processed.json
Based on your mapping:
Author field is declared as text (used for full-text search) and keyword (used for matching whole values).
Read difference between textv/skeyword.
The parent mapping name is Stats.
You should therefore use Stats.Author.keyword in your aggregation query i.e:
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Stats.Author.keyword",
"size" : 100
}
}
}
}
'
It needs to be
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Stats.Author.keyword",
"size" : 100
}
}
}
}
'
Your field Stats.Author is of type text. For the use of aggregations, text-based fields have also to be keyword-fields. Therefore you need to use the field Stats.Author.keyword

Elasticsearch - Conditional nested fetching

I have index mapping:
{
"dev.directory.3" : {
"mappings" : {
"profile" : {
"properties" : {
"email" : {
"type" : "string",
"index" : "not_analyzed"
},
"events" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "integer"
},
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
}
}
}
}
}
}
}
with data:
"hits" : [ {
"_index" : "dev.directory.3",
"_type" : "profile",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"email" : "test#dummy.com",
"events" : [
{
"id" : 111,
"name" : "ABC",
},
{
"id" : 222,
"name" : "DEF",
}
],
}
}]
I'd like to filter only matched nested elements instead of returning all events array - is this possible in ES?
Example query:
{
"nested" : {
"path" : "events",
"query" : {
"bool" : {
"filter" : [
{ "match" : { "events.id" : 222 } },
]
}
}
}
}
Eg. If I query for events.id=222 there should be only single element on the result list returned.
What strategy for would be the best to achieve this kind of requirement?
You can use inner_hits to only get the nested records which matched the query.
{
"query": {
"nested": {
"path": "events",
"query": {
"bool": {
"filter": [
{
"match": {
"events.id": 222
}
}
]
}
},
"inner_hits": {}
}
},
"_source": false
}
I am also excluding the source to get only nested hits

Boolean query does not return expected data in Elasticsearch

I have the following document in Elasticsearch as reported by Kibana:
{"deviceId":"C1976429369BFE063ED8B3409DB7C7E7D87196D9","appId":"DisneyDigitalBooks.PlanesAdventureAlbum","ostype":"iOS"}
Why the following query does not return success?
[root#myvm elasticsearch-1.0.0]# curl -XGET 'http://localhost:9200/unique_app_install/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"must" : [ {
"term" : {
"deviceId" : "C1976429369BFE063ED8B3409DB7C7E7D87196D9"
}
}, {
"term" : {
"appId" : "DisneyDigitalBooks.PlanesAdventureAlbum"
}
}, {
"term" : {
"ostype" : "iOS"
}
} ]
}
}
}'
Here is the response from Elasticsearch:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
As a side question, is this the fastest way to query the data in my case?
Thx in advance.
UPDATE:
Could it be related to the fact that I used the following mapping for this index?
curl -XPOST localhost:9200/unique_app_install -d '{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"sdk_sync" : {
"properties" : {
"deviceId" : { "type" : "string" , "index": "not_analyzed"},
"appId" : { "type" : "string" , "index": "not_analyzed"},
"ostype" : { "type" : "string" , "index": "not_analyzed"}
}
}
}
}'
Check if the type of your document was right while inserting: sdk_sync.
I have used your items and for me it works. Using the following curl request give the right response for me:
curl -XPOST localhost:9200/unique_app_install/sdk_sync/1 -d '{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"sdk_sync" : {
"properties" : {
"deviceId" : { "type" : "string" , "index": "not_analyzed"},
"appId" : { "type" : "string" , "index": "not_analyzed"},
"ostype" : { "type" : "string" , "index": "not_analyzed"}
}
}
}
}'
curl -XPOST localhost:9200/unique_app_install/sdk_sync/1 -d '{
"deviceId":"C1976429369BFE063ED8B3409DB7C7E7D87196D9",
"appId":"DisneyDigitalBooks.PlanesAdventureAlbum",
"ostype":"iOS"
}'
curl -XGET 'http://localhost:9200/unique_app_install/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"must" : [ {
"term" : {
"deviceId" : "C1976429369BFE063ED8B3409DB7C7E7D87196D9"
}
}, {
"term" : {
"appId" : "DisneyDigitalBooks.PlanesAdventureAlbum"
}
}, {
"term" : {
"ostype" : "iOS"
}
} ]
}
}
}'
Unless you specify the field NOT to be analyzed, every fields are analyzed by default.
It means that deviceId "C1976429369BFE063ED8B3409DB7C7E7D87196D9" will be indexed as "c1976429369bfe063ed8b3409db7c7e7d87196d9" (lower case).
You have to use term query or term filter with string in LOWER CASE.
That is the reason why you should specify {"index": "not_analyzed"}
for the mapping.

Resources