Boolean query does not return expected data in Elasticsearch - elasticsearch

I have the following document in Elasticsearch as reported by Kibana:
{"deviceId":"C1976429369BFE063ED8B3409DB7C7E7D87196D9","appId":"DisneyDigitalBooks.PlanesAdventureAlbum","ostype":"iOS"}
Why the following query does not return success?
[root#myvm elasticsearch-1.0.0]# curl -XGET 'http://localhost:9200/unique_app_install/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"must" : [ {
"term" : {
"deviceId" : "C1976429369BFE063ED8B3409DB7C7E7D87196D9"
}
}, {
"term" : {
"appId" : "DisneyDigitalBooks.PlanesAdventureAlbum"
}
}, {
"term" : {
"ostype" : "iOS"
}
} ]
}
}
}'
Here is the response from Elasticsearch:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
As a side question, is this the fastest way to query the data in my case?
Thx in advance.
UPDATE:
Could it be related to the fact that I used the following mapping for this index?
curl -XPOST localhost:9200/unique_app_install -d '{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"sdk_sync" : {
"properties" : {
"deviceId" : { "type" : "string" , "index": "not_analyzed"},
"appId" : { "type" : "string" , "index": "not_analyzed"},
"ostype" : { "type" : "string" , "index": "not_analyzed"}
}
}
}
}'

Check if the type of your document was right while inserting: sdk_sync.
I have used your items and for me it works. Using the following curl request give the right response for me:
curl -XPOST localhost:9200/unique_app_install/sdk_sync/1 -d '{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"sdk_sync" : {
"properties" : {
"deviceId" : { "type" : "string" , "index": "not_analyzed"},
"appId" : { "type" : "string" , "index": "not_analyzed"},
"ostype" : { "type" : "string" , "index": "not_analyzed"}
}
}
}
}'
curl -XPOST localhost:9200/unique_app_install/sdk_sync/1 -d '{
"deviceId":"C1976429369BFE063ED8B3409DB7C7E7D87196D9",
"appId":"DisneyDigitalBooks.PlanesAdventureAlbum",
"ostype":"iOS"
}'
curl -XGET 'http://localhost:9200/unique_app_install/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"must" : [ {
"term" : {
"deviceId" : "C1976429369BFE063ED8B3409DB7C7E7D87196D9"
}
}, {
"term" : {
"appId" : "DisneyDigitalBooks.PlanesAdventureAlbum"
}
}, {
"term" : {
"ostype" : "iOS"
}
} ]
}
}
}'

Unless you specify the field NOT to be analyzed, every fields are analyzed by default.
It means that deviceId "C1976429369BFE063ED8B3409DB7C7E7D87196D9" will be indexed as "c1976429369bfe063ed8b3409db7c7e7d87196d9" (lower case).
You have to use term query or term filter with string in LOWER CASE.
That is the reason why you should specify {"index": "not_analyzed"}
for the mapping.

Related

Not able to get any results on using bucket aggregations

I have some PR data in my ES. This is how the documents are modelled
{
"Author" : "dheerajrav",
"Date" : "2012-10-05T10:16:49Z",
"Number" : 2554441,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "dheerajrav",
"Date" : "2012-10-05T09:11:35Z",
"Number" : 2553883,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "crodjer",
"Date" : "2012-10-04T15:40:22Z",
"Number" : 2544540,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "crodjer",
"Date" : "2012-10-04T07:52:20Z",
"Number" : 2539410,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
}
.
.
.
]
}
I am trying the following terms agg on my index but I get no results
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Author",
"size" : 100
}
}
}
}
'
The desired result would have been separate buckets for each PR author. This is the response
"aggregations" : {
"contributors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
Am I modeling my data wrong?
This is the mapping for my index
{
"newidx" : {
"mappings" : {
"properties" : {
"Stats" : {
"properties" : {
"Author" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Body" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Date" : {
"type" : "date"
},
"IsMerged" : {
"type" : "boolean"
},
"MergedBy" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Number" : {
"type" : "long"
}
}
}
}
}
}
}
I generate a json file in my code and index it to elasticsearch using elasticsearch_loader, here is the command
elasticsearch_loader --es-host 'localhost' --index org-skills --type incident json --lines processed.json
Based on your mapping:
Author field is declared as text (used for full-text search) and keyword (used for matching whole values).
Read difference between textv/skeyword.
The parent mapping name is Stats.
You should therefore use Stats.Author.keyword in your aggregation query i.e:
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Stats.Author.keyword",
"size" : 100
}
}
}
}
'
It needs to be
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Stats.Author.keyword",
"size" : 100
}
}
}
}
'
Your field Stats.Author is of type text. For the use of aggregations, text-based fields have also to be keyword-fields. Therefore you need to use the field Stats.Author.keyword

Elastichsearch range query does not work with icu_collation for Turkish words

I've document which has Turkish words like "şa, za, sb, şc, sd, şe" etc. as customer_address property.
I've indexed my documents as documented below because I want to order documents according to the customer_address field. Sorting is working well.
Sorting and Collations
Now I'm trying to apply range query over "customer_address" field. When I sent the query below, I've got an empty result. (expected result: sb, sd, şa, şd)
curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"query":{"bool":{"filter":[{"range":{"customer_address.sort":{"from":"plaj","to":"şcam","include_lower":true,"include_upper":true,"boost":1.0}}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}}}'
When I've queried I saw that my fields are encrypted as specified in the document.
curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"aggs":{"myaggregation":{"terms":{"field":"customer_address.sort","size":10000}}},"size":0}'
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.0,
"hits" : [ ]
}
"aggregations" : {
"a" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "⚕䁁䀠怀\u0001",
"doc_count" : 1
},
{
"key" : "⚗䁁䀠怀\u0001",
"doc_count" : 1
},
{
"key" : "✁ੀ⃀ၠ\u0000\u0000",
"doc_count" : 1
},
{
"key" : "✁ୀ⃀ၠ\u0000\u0000",
"doc_count" : 1
},
{
"key" : "✁ీ⃀ၠ\u0000\u0000",
"doc_count" : 1
},
{
"key" : "ⶔ䁁䀠怀\u0001",
"doc_count" : 1
}
]
}
}
}
So, How should I send my parameters in the range query to be able to get the successful result?
Thanks in advance.
My Mapping:
curl -XGET http://localhost:9200/sampleindex?pretty
{
"sampleindex" : {
"aliases" : { },
"mappings" : {
"invoice" : {
"properties" : {
"customer_address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
},
"sort" : {
"type" : "text",
"analyzer" : "turkish",
"fielddata" : true
}
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "sampleindex",
"max_result_window" : "2147483647",
"creation_date" : "1521732167023",
"analysis" : {
"filter" : {
"turkish_phonebook" : {
"variant" : "#collation=phonebook",
"country" : "TR",
"language" : "tr",
"type" : "icu_collation"
},
"turkish_lowercase" : {
"type" : "lowercase",
"language" : "turkish"
}
},
"analyzer" : {
"turkish" : {
"filter" : [
"turkish_lowercase",
"turkish_phonebook"
],
"tokenizer" : "keyword"
}
}
},
"number_of_replicas" : "1",
"uuid" : "ChNGX459TUi8VnBLTMn-Ng",
"version" : {
"created" : "5020099"
}
}
}
}
}
I've solved my problem by defining an analyzer with char filter during index creation. I don't know whether it is a good solution or not, but I've could not solve by "turkish_phonebook" of ICU, so the solution seems working for now.
Firstly, I created an index with "turkish_collation_analyzer". And then for my properties which needs this, I created a field "property.tr" to use this defined analyzer. And for last, during range queries, I converted my values as expected by this field.
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "sampleindex",
"max_result_window": "2147483647",
"creation_date": "1522050241730",
"analysis": {
"analyzer": {
"turkish_collation_analyzer": {
"char_filter": [
"turkish_char_filter"
],
"tokenizer": "keyword"
}
},
"char_filter": {
"turkish_char_filter": {
"type": "mapping",
"mappings": [
"a => x01",
"b => x02",
.,
.,
.,
]
}
}
},
"number_of_replicas": "1",
"uuid": "hiEqIpjYTLePjF142B8WWQ",
"version": {
"created": "5020099"
}
}
}

How to index percolator queries containing filters on inner objects?

Using Elasticsearch 2.1.1
I have documents with inner objects:
{
"level1": {
"level2": 42
}
}
I want to register percolator queries applying filters on the inner property:
$ curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
It fails because I don't have a mapping:
{
"error" : {
"root_cause" : [ {
"type" : "query_parsing_exception",
"reason" : "Strict field resolution and no field mapping can be found for the field with name [level1.level2]",
"index" : "myindex",
"line" : 1,
"col" : 58
} ],
"type" : "percolator_exception",
"reason" : "failed to parse query [myquery]",
"index" : "myindex",
"caused_by" : {
"type" : "query_parsing_exception",
"reason" : "Strict field resolution and no field mapping can be found for the field with name [level1.level2]",
"index" : "myindex",
"line" : 1,
"col" : 58
}
},
"status" : 500
}
So I start again, but this time I add a mapping template before:
curl -XDELETE http://localhost:9200/_template/myindex
curl -XDELETE http://localhost:9200/myindex
curl -XPUT http://localhost:9200/_template/myindex?pretty -d 'x
{
"template": "myindex",
"mappings" : {
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
'
I try to register my percolator query again:
curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
And now it succeeds:
{
"_index" : "myindex",
"_type" : ".percolator",
"_id" : "myquery",
"_version" : 1,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"created" : true
}
And I can see the mapping that has been created:
curl http://localhost:9200/myindex/_mapping?pretty
{
"myindex" : {
"mappings" : {
".percolator" : {
"properties" : {
"query" : {
"type" : "object",
"enabled" : false
}
}
},
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
}
Now my problem is that I also need to perform searches on my percolator queries and the default percolate mapping doesn’t index the query field.
So I start again, this time specifying in my mapping template that I want percolator queries to be indexed (note "enabled": true):
curl -XPUT http://localhost:9200/_template/myindex?pretty -d '
{
"template": "myindex",
"mappings" : {
".percolator" : {
"properties" : {
"query" : {
"type" : "object",
"enabled" : true
}
}
},
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
'
I try to register my percolator query again:
curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
But now I get an error:
{
"error" : {
"root_cause" : [ {
"type" : "mapper_parsing_exception",
"reason" : "Field name [level1.level2] cannot contain '.'"
} ],
"type" : "mapper_parsing_exception",
"reason" : "Field name [level1.level2] cannot contain '.'"
},
"status" : 400
}
How can I create and index a percolator query matching an inner property?

How to get Elasticsearch boolean match working for multiple fields

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.
This is my query:
curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "#timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '
Here is the start of the output from the search:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local#logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","#version":"1","#timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...
The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local#logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","#version":"1","#timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...
I checked the mappings for these fields to check that they are not analyzed :
curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'
I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.
....
"message" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
....
Where am I going wrong, or what am I misunderstanding here?
As mentioned in the OP you would need to use the "not_analyzed" view of the fields but as per the OP mapping the non-analyzed version of the field is message.raw, path.raw
Example:
{
"filter" : { "range" : { "#timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message.raw" : "Failed password for" } },
{ "match_phrase" : { "path.raw" : "/var/log/secure" } }
]
}
}
}
.The link alongside gives more insight to multi-fields
.To expand further
The mapping in the OP for path is as follows:
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
}
This specifies that the path field uses the default analyzer and field.raw is not analyzed.
If you want to set the path field to be not analyzed instead of raw it would be something on these lines:
"path" : {
"type" : "string",
"index" : "not_analyzed",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : <whatever analyzer you want>,
"ignore_above" : 256
}
}
}

elasticsearch - complex querying

I am using the jdbc river and I can create the following index:
curl -XPUT 'localhost:9201/_river/email/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy":"simple",
"poll":"10",
"driver" : "org.postgresql.Driver",
"url" : "jdbc:postgresql://localhost:5432/api_development",
"username" : "paulcowan",
"password" : "",
"sql" : "SELECT id, subject, body, personal, sent_at, read_by, account_id, sender_user_id, sender_contact_id, html, folder, draft FROM emails"
},
"index" : {
"index" : "email",
"type" : "jdbc"
},
"mappings" : {
"email" : {
"properties" : {
"account_id" : { "type" : "integer" },
"subject" : { "type" : "string" },
"body" : { "type" : "string" },
"html" : { "type" : "string" },
"folder" : { "type" : "string", "index" : "not_analyzed" },
"id" : { "type" : "integer" }
}
}
}
}'
I can run basic queries using curl like this:
curl -XGET 'http://localhost:9201/email/jdbc/_search?pretty&q=fullcontact'
I get back results
But what I want to do is restrict the results to a particular email account_id and a particular email, I run the following query:
curl -XGET 'http://localhost:9201/email/jdbc/_search' -d '{
"query": {
"filtered": {
"filter": {
"and": [
{
"term": {
"folder": "INBOX"
}
},
{
"term": {
"account_id": 1
}
}
]
},
"query": {
"query_string": {
"query": "fullcontact*"
}
}
}
}
}'
I get the following results:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Can anyone tell me what is wrong with my query?
It turns out that you need to use the type_mapping section to specify a field is not_analyzed in the jdbc river the normal mappings node is ignored.
Below is how it turned out:
curl -XPUT 'localhost:9200/_river/your_index/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy":"simple",
"poll":"10",
"driver" : "org.postgresql.Driver",
"url" : "jdbc:postgresql://localhost:5432/api_development",
"username" : "user",
"password" : "your_password",
"sql" : "SELECT field_one, field_two, field_three, the_rest FROM blah"
},
"index" : {
"index" : "your_index",
"type" : "jdbc",
"type_mapping": "{\"your_index\" : {\"properties\" : {\"field_two\":{\"type\":\"string\",\"index\":\"not_analyzed\"}}}}"
}
}'
Strangely or annoyingly, the type_mapping section, takes a json encoded string and not a normal json node:
I can check the mappings by running:
# check mappings
curl -XGET 'http://localhost:9200/your_index/jdbc/_mapping?pretty=true'
Which should give something like:
{
"jdbc" : {
"properties" : {
"field_one" : {
"type" : "long"
},
"field_two" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs"
},
"field_three" : {
"type" : "string"
}
}
}
}

Resources