Not able to get any results on using bucket aggregations - elasticsearch

I have some PR data in my ES. This is how the documents are modelled
{
"Author" : "dheerajrav",
"Date" : "2012-10-05T10:16:49Z",
"Number" : 2554441,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "dheerajrav",
"Date" : "2012-10-05T09:11:35Z",
"Number" : 2553883,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "crodjer",
"Date" : "2012-10-04T15:40:22Z",
"Number" : 2544540,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
},
{
"Author" : "crodjer",
"Date" : "2012-10-04T07:52:20Z",
"Number" : 2539410,
"IsMerged" : false,
"MergedBy" : "",
"Body" : ""
}
.
.
.
]
}
I am trying the following terms agg on my index but I get no results
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Author",
"size" : 100
}
}
}
}
'
The desired result would have been separate buckets for each PR author. This is the response
"aggregations" : {
"contributors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
Am I modeling my data wrong?
This is the mapping for my index
{
"newidx" : {
"mappings" : {
"properties" : {
"Stats" : {
"properties" : {
"Author" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Body" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Date" : {
"type" : "date"
},
"IsMerged" : {
"type" : "boolean"
},
"MergedBy" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Number" : {
"type" : "long"
}
}
}
}
}
}
}
I generate a json file in my code and index it to elasticsearch using elasticsearch_loader, here is the command
elasticsearch_loader --es-host 'localhost' --index org-skills --type incident json --lines processed.json

Based on your mapping:
Author field is declared as text (used for full-text search) and keyword (used for matching whole values).
Read difference between textv/skeyword.
The parent mapping name is Stats.
You should therefore use Stats.Author.keyword in your aggregation query i.e:
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Stats.Author.keyword",
"size" : 100
}
}
}
}
'

It needs to be
curl -X GET "localhost:9200/newidx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"contributors" : {
"terms" : {
"field" : "Stats.Author.keyword",
"size" : 100
}
}
}
}
'
Your field Stats.Author is of type text. For the use of aggregations, text-based fields have also to be keyword-fields. Therefore you need to use the field Stats.Author.keyword

Related

Elasticsearch Suggestions Multi Index and Multi Fields

I have different indexes that contain different fields. And I try to figure out how to get suggests from all indexes and all fields. I know that with GET /_all/_search I can search for results through all indexes. But how can I get all suggestions from all indexes and all fields? Because I want to have a feature like Google "Did you mean: suggests"
So, I tried this out:
GET /_all/_search
{
"query" : {
"multi_match" : {
"query" : "berlin"
}
},
"suggest" : {
"text" : "berlin",
"my-suggest-1" : {
"term" : {
"field" : "street"
}
},
"my-suggest-2" : {
"term" : {
"field" : "city"
}
},
"my-suggest-3" : {
"term" : {
"field" : "description"
}
}
}
}
"my-suggest-1" and "-2" belongs to Index address (see below) and "my-suggest-3" belongs to Index product. I get the following error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [street]"
},
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [city]"
},
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [description]"
}
]
}
But if I use only the fields of 1 index I get suggestions, see:
GET /_all/_search
{
"query" : {
"multi_match" : {
"query" : "berlin"
}
},
"suggest" : {
"text" : "berlin",
"my-suggest-1" : {
"term" : {
"field" : "street"
}
},
"my-suggest-2" : {
"term" : {
"field" : "city"
}
}
}
}
Response
...
"failures" : {
...
},
"hits" : {
...
}
"suggest" : {
"my-suggest-1" : [
{
"text" : "berlin",
"offset" : 0,
"length" : 10,
"options" : [
{
"text" : "berliner",
"score" : 0.9,
"freq" : 12
},
{
"text" : "berlinger",
"score" : 0.9,
"freq" : 1
}
]
}
],
"my-suggest-2" : [
{
"text" : "berlin",
"offset" : 0,
"length" : 10,
"options" : []
}
]
...
I don't know how I can get suggests from index address and product? I would be happy if someone can help me.
Index 1 - Address:
"address" : {
"aliases" : {
....
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"street" : {
"type" : "text"
},
"city" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}
Index 2 - Product:
"product" : {
"aliases" : {
...
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"description" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}
You can add multiple indices to your search. In this case, you need to search over the fields that exist on all indices. So In your case, you need to define all three fields in both of the indices. The fields "street" and "city" are filed in the first index and the field "description" is filled only in the second index. This will be your mapping for the "Address" index. In this index, the "description" field exists but has no data. In the second index, "street" and "city" exist but have no data.
"address" : {
"aliases" : {
....
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"street" : {
"type" : "text"
},
"city" : {
"type" : "text"
},
"description" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}

How to sort an Elasticsearch query result by a determined field in DESC?

Let's say I have the following query:
curl -XGET 'localhost:9200/library/document/_search?pretty=true'
That returns me the following example results:
{
"took" : 108,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "library",
"_type" : "document",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"page content" : [
"Page 0:",
"Page 1: something"
],
"publish date" : "2015-12-05",
"keywords" : "sample, example, article, alzheimer",
"author" : "Author name",
"language" : "",
"title" : "Sample article",
"number of pages" : 2
}
},
{
"_index" : "library",
"_type" : "document",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"page content" : [
"Page 1: eBay",
"Page 2: Paypal",
"Page 3: Google"
],
"publish date" : "2017-08-03",
"keywords" : "something, another, thing",
"author" : "Alex",
"language" : "english",
"title" : "Microsoft Word - TL0032.doc",
"number of pages" : 21
}
},
...
I want to order by publish date and by id (different querys) so that the most recent one shows first in the list. Is it possible to do? I know I have to use the sort function of Elasticsearch together with the DESC parameter. But somehow it is not working for me.
EDIT: Mapping of the fields
curl -XGET 'localhost:9200/library/_mapping/document?pretty'
{
"library" : {
"mappings" : {
"document" : {
"properties" : {
"author" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"keywords" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"number of pages" : {
"type" : "long"
},
"page content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"publish date" : {
"type" : "date"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
First you need good mapping like this:
PUT my_index
{
"mappings": {
"documents": {
"properties": {
"post_date" : {
"type": "date"
, "format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
And then the search:
GET my_index/_search
{
"sort": [
{
"post_date": {
"order": "desc"
}
}
]
}
Thank you everyone. Managed to get it working with this query:
curl -XGET 'localhost:9200/library/document/_search?pretty=true' -d '{"query": {"match_all": {}},"sort": [{"publish date": {"order": "desc"}}]}'
Didn't need aditional mapping.

How to index percolator queries containing filters on inner objects?

Using Elasticsearch 2.1.1
I have documents with inner objects:
{
"level1": {
"level2": 42
}
}
I want to register percolator queries applying filters on the inner property:
$ curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
It fails because I don't have a mapping:
{
"error" : {
"root_cause" : [ {
"type" : "query_parsing_exception",
"reason" : "Strict field resolution and no field mapping can be found for the field with name [level1.level2]",
"index" : "myindex",
"line" : 1,
"col" : 58
} ],
"type" : "percolator_exception",
"reason" : "failed to parse query [myquery]",
"index" : "myindex",
"caused_by" : {
"type" : "query_parsing_exception",
"reason" : "Strict field resolution and no field mapping can be found for the field with name [level1.level2]",
"index" : "myindex",
"line" : 1,
"col" : 58
}
},
"status" : 500
}
So I start again, but this time I add a mapping template before:
curl -XDELETE http://localhost:9200/_template/myindex
curl -XDELETE http://localhost:9200/myindex
curl -XPUT http://localhost:9200/_template/myindex?pretty -d 'x
{
"template": "myindex",
"mappings" : {
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
'
I try to register my percolator query again:
curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
And now it succeeds:
{
"_index" : "myindex",
"_type" : ".percolator",
"_id" : "myquery",
"_version" : 1,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"created" : true
}
And I can see the mapping that has been created:
curl http://localhost:9200/myindex/_mapping?pretty
{
"myindex" : {
"mappings" : {
".percolator" : {
"properties" : {
"query" : {
"type" : "object",
"enabled" : false
}
}
},
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
}
Now my problem is that I also need to perform searches on my percolator queries and the default percolate mapping doesn’t index the query field.
So I start again, this time specifying in my mapping template that I want percolator queries to be indexed (note "enabled": true):
curl -XPUT http://localhost:9200/_template/myindex?pretty -d '
{
"template": "myindex",
"mappings" : {
".percolator" : {
"properties" : {
"query" : {
"type" : "object",
"enabled" : true
}
}
},
"mytype" : {
"properties" : {
"level1" : {
"properties" : {
"level2" : {
"type" : "long"
}
}
}
}
}
}
}
'
I try to register my percolator query again:
curl -XPUT http://localhost:9200/myindex/.percolator/myquery?pretty -d '{
"query": {
"filtered": {
"filter": {
"range": {
"level1.level2": {
"gt": 10
}
}
}
}
}
}'
But now I get an error:
{
"error" : {
"root_cause" : [ {
"type" : "mapper_parsing_exception",
"reason" : "Field name [level1.level2] cannot contain '.'"
} ],
"type" : "mapper_parsing_exception",
"reason" : "Field name [level1.level2] cannot contain '.'"
},
"status" : 400
}
How can I create and index a percolator query matching an inner property?

How to get Elasticsearch boolean match working for multiple fields

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.
This is my query:
curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "#timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '
Here is the start of the output from the search:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local#logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","#version":"1","#timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...
The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local#logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","#version":"1","#timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...
I checked the mappings for these fields to check that they are not analyzed :
curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'
I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.
....
"message" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
....
Where am I going wrong, or what am I misunderstanding here?
As mentioned in the OP you would need to use the "not_analyzed" view of the fields but as per the OP mapping the non-analyzed version of the field is message.raw, path.raw
Example:
{
"filter" : { "range" : { "#timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message.raw" : "Failed password for" } },
{ "match_phrase" : { "path.raw" : "/var/log/secure" } }
]
}
}
}
.The link alongside gives more insight to multi-fields
.To expand further
The mapping in the OP for path is as follows:
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
}
This specifies that the path field uses the default analyzer and field.raw is not analyzed.
If you want to set the path field to be not analyzed instead of raw it would be something on these lines:
"path" : {
"type" : "string",
"index" : "not_analyzed",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : <whatever analyzer you want>,
"ignore_above" : 256
}
}
}

Boolean query does not return expected data in Elasticsearch

I have the following document in Elasticsearch as reported by Kibana:
{"deviceId":"C1976429369BFE063ED8B3409DB7C7E7D87196D9","appId":"DisneyDigitalBooks.PlanesAdventureAlbum","ostype":"iOS"}
Why the following query does not return success?
[root#myvm elasticsearch-1.0.0]# curl -XGET 'http://localhost:9200/unique_app_install/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"must" : [ {
"term" : {
"deviceId" : "C1976429369BFE063ED8B3409DB7C7E7D87196D9"
}
}, {
"term" : {
"appId" : "DisneyDigitalBooks.PlanesAdventureAlbum"
}
}, {
"term" : {
"ostype" : "iOS"
}
} ]
}
}
}'
Here is the response from Elasticsearch:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
As a side question, is this the fastest way to query the data in my case?
Thx in advance.
UPDATE:
Could it be related to the fact that I used the following mapping for this index?
curl -XPOST localhost:9200/unique_app_install -d '{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"sdk_sync" : {
"properties" : {
"deviceId" : { "type" : "string" , "index": "not_analyzed"},
"appId" : { "type" : "string" , "index": "not_analyzed"},
"ostype" : { "type" : "string" , "index": "not_analyzed"}
}
}
}
}'
Check if the type of your document was right while inserting: sdk_sync.
I have used your items and for me it works. Using the following curl request give the right response for me:
curl -XPOST localhost:9200/unique_app_install/sdk_sync/1 -d '{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"sdk_sync" : {
"properties" : {
"deviceId" : { "type" : "string" , "index": "not_analyzed"},
"appId" : { "type" : "string" , "index": "not_analyzed"},
"ostype" : { "type" : "string" , "index": "not_analyzed"}
}
}
}
}'
curl -XPOST localhost:9200/unique_app_install/sdk_sync/1 -d '{
"deviceId":"C1976429369BFE063ED8B3409DB7C7E7D87196D9",
"appId":"DisneyDigitalBooks.PlanesAdventureAlbum",
"ostype":"iOS"
}'
curl -XGET 'http://localhost:9200/unique_app_install/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"must" : [ {
"term" : {
"deviceId" : "C1976429369BFE063ED8B3409DB7C7E7D87196D9"
}
}, {
"term" : {
"appId" : "DisneyDigitalBooks.PlanesAdventureAlbum"
}
}, {
"term" : {
"ostype" : "iOS"
}
} ]
}
}
}'
Unless you specify the field NOT to be analyzed, every fields are analyzed by default.
It means that deviceId "C1976429369BFE063ED8B3409DB7C7E7D87196D9" will be indexed as "c1976429369bfe063ed8b3409db7c7e7d87196d9" (lower case).
You have to use term query or term filter with string in LOWER CASE.
That is the reason why you should specify {"index": "not_analyzed"}
for the mapping.

Resources