Elasticsearch term facet not showing negative terms - elasticsearch

I am using elastic search term facets, My field contains some negative values but the facet is ignoring the negative sign
following is the facet query
http://myserver.com:9200/index/type/_search
Get/Post body
{
"facets" : {
"school.id" : {
"terms" : {
"field" : "school.id",
"size" : 10
}
}
}
}
Response
{
"took": 281,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"facets": {
"school.id": {
"_type": "terms",
"missing": 302,
"total": 4390,
"other": 0,
"terms": [
{
"term": "1113515007867355135",
"count": 4390
}
]
}
}
}
The actual value of id is -1113515007867355135, am I doing something wrong or do I need to pass anything to include negative sign (stemming issue)?

The negative sign is a special character in Lucene (and ElasticSearch).
While indexing and searching you need to escape it.
Try adding a \ before the - character in your index, that should bring it up in the facet as well.

Got the answer from Elasticsearch Google Group. Need to update the mapping of the field
Possible Solution:
Update the mapping and use
"index":"analyzed","analyzer" : "keyword"
or
"index": "not_analyzed"

Related

Issue with date Histrogram aggrgation with filter

I am trying to get date histrogram for a timestamp field for a specific period. I am using the following query,
{
"aggs" : {
"dataRange" : {
"filter": {"range" : { "#timestamp" :{ "gte":"2020-02-28T17:20:10Z","lte":"2020-03-01T18:00:00Z" } } },
"aggs" : {
"severity_over_time" :{
"date_histogram" : { "field" : "#timestamp", "interval" : "28m" }
}}}
},"size" :0
}
The following result I got,
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 32,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"dataRange": {
"doc_count": 20,
"severity_over_time": {
"buckets": [
{
"key_as_string": "2020-02-28T17:04:00.000Z",
"key": 1582909440000,
"doc_count": 20
}
]
}
}
}
}
The the start of the histogram range ("key_as_string" ) goes outside of my filter criteria! My input filter is from "2020-02-28T17:20:10Z" but the key_as_string in the result is "2020-02-28T17:04:00.000Z" which is outside the range filter!
I tried looking at the docs but no avail. Am I missing something here?
I guess that has to do with the way a Range or a bucket is calculated. My understanding is that 28m of range would have to be maintained throughout i.e. the bucket size must be consistent.
Notice that 28m of range difference is maintained perfectly and in a way first and the last bucket seem to be stretched just to accommodate this 28m range.
Notice that logically, your result documents are all in the right buckets and that documents which are outside the filter range would not be in the aggregation query irrespective of the key_as_string appears within their limits.
Basically ES doesn't guarantee that the range values i.e. key_as_string or start and end values of buckets created may fall accurately within the scope of the filter you've provided but it does guarantee that only the documents filtered as per that range filtered query would be considered for evaluation.
You can say that bucket values are nearest possible values or approximations.
If you want to be sure of the filtered documents, just remove the filter from aggregation and use that in the query as below and remove size: 0
Notice I've made use of offset which would change the start value of the specified bucket. Perhaps that is something you are looking for.
Also one more thing, I've made use of min_doc_count just so you can filter out empty buckets.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2020-02-28T17:20:10Z",
"lte": "2020-03-01T18:00:01Z"
}
}
}
]
}
},
"aggs": {
"severity_over_time": {
"date_histogram": {
"field": "#timestamp",
"interval": "28m",
"offset": "+11h",
"min_doc_count": 1
}
}
}
}

elasticsearch, multi_match and filter by date

It seems I followed every similar answer I found, but I just cant figure out what is wrong...
This is a "match all" query:
{
"query": {
"match_all": {}
}
}
..and the results:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "unittest_index_unittestdocument",
"_type": "unittestdocument",
"_id": "a.b",
"_score": 1,
"_source": {
"id": "a.b",
"docdate": "2018-01-24T09:45:44.4168345+02:00",
"primarykeywords": [
"keyword"
],
"primarytitles": [
"the title of a document"
]
}
}
]
}
}
but when I try to filter that with a date like this:
{
"query":{
"bool":{
"must":{
"multi_match":{
"type":"most_fields",
"query":"document",
"fields":[ "primarytitles","primarykeywords" ]
}
},
"filter": [
{"range":{ "docdate": { "gte":"1900-01-23T15:17:12.7313261+02:00" } } }
]
}
}
}
I have zero hits...
I tried to follow this https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html and this filtering by date in elasticsearch with no success at all..
Is there any difference that I cannot see????
Please note that when I remove the date filter and I add a term filter on "primarykeywords" i get the results i want. The only problem is the range filter
Apparently there was no error with my query, the problem was that the docdate field wasn't index... :/
Don't know why I initially skipped indexing that field (my mistake), but I do believe elastic should warn me that I am trying to query something that has "index: false"
This thing that elastic just doesn't return results without informing what is going on is, in my opinion, a major issue. I lost one day reading everything I could find on the web, just because I didn't had a proper feedback from the engine.
Fail safe died for this reason...

Elastic search - More Like this query returning empty result

I'm using Elastic search to create some sort of tag engine. I'm inserting a document and I can't retrieve it. My steps to reproduce the issue:
1) Create index:
PUT index
{
"mappings": {
"taggeable" : {
"_all" : {"enabled" : false},
"properties" : {
"id" : {
"type" : "string",
"index" : "no"
},
"tags" : {
"type" : "text"
}
}
}
}
}
2) Insert document:
POST index/taggeable
{
"id" : "1",
"tags" : "tag1 tag2"
}
3) Query using More like This:
GET index/_search
{
"query": {
"more_like_this" : {
"fields" : ["tags"],
"like" : ["tag1"],
"min_term_freq" : 1
}
}
}
But I'm receiving:
{
"_shards": {
"failed": 0,
"skipped": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [],
"max_score": null,
"total": 0
},
"timed_out": false,
"took": 1
}
Anyone knows what I'm doing wrong? I should retrieve the document I inserted.
You set up the parameter
min_term_freq
The minimum term frequency below which the terms will be ignored from
the input document. Defaults to 2.
which is good, since otherwise it will be defaulted to 2. There is also a parameter
min_doc_freq
The minimum document frequency below which the terms will be ignored
from the input document. Defaults to 5.
In your case, if you have just 1 document, this will be ignored, so you either need to add more docs, or specify parameter min_doc_freq to 1

Using Kibana4 Tile Map with Geo-Points

I'm using Logstash to insert a location attribute into my logs that are going into ElasticSearch:
"location" : {
"lat":31.07,
"lon":-82.09
}
I'm then setting up a mapping to tell ElasticSearch it's a Geo-Point. I'm not exactly sure how this call should look. This is what I've been using so far:
PUT logstash-*/_mapping/latlon
{
"latlon" : {
"properties" : {
"location" : {
"type" : "geo_point",
"lat_lon" : true,
"geohash" : true
}
}
}
}
When I query for matching records in Kibana 4, the location field is annotated with a small globe. So far, so good.
When I move to the Tile Map visualization, bring up matching records, bucket by Geo Coordinates, select Geohash from the 'Aggregation' drop down, and then select location from the Field drop down, and press 'Apply', no results are returned.
The aggreations part of the request looks like this:
"aggs": {
"2": {
"geohash_grid": {
"field": "location",
"precision": 3
}
}
}
And the response:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 689,
"max_score": 0,
"hits": []
},
"aggregations": {
"2": {
"buckets": []
}
}
}
For some reason, ElasticSearch isn't returning results, even though it seems like the Geo-Point mapping is recognized. Any tips for how to troubleshoot from here?

Elasticsearch highlighting - not working

I have the following problem:
I'm doing some tests with facetings
My script is as follows:
https://gist.github.com/nayelisantacruz/6610862
the result I get is as follows:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": []
},
"facets": {
"title": {
"_type": "terms",
"missing": 0,
"total": 2,
"other": 0,
"terms": [
{
"term": "JavaScript",
"count": 1
},
{
"term": "Java Platform, Standard Edition",
"count": 1
}
]
}
}
}
which is fine, but the problem is that I can not display the "highlighting"
I was expecting a result like the following:
..........
..........
..........
"facets": {
"title": {
"_type": "terms",
"missing": 0,
"total": 2,
"other": 0,
"terms": [
{
"term": "<b>Java</b>Script",
"count": 1
},
{
"term": "<b>Java</b> Platform, Standard Edition",
"count": 1
}
]
}
}
..........
..........
..........
Anyone can help me and tell me what I'm doing wrong or what I'm missing, please
Thank you very much for your attention
Faceting and highlighting are two completely different things. Highlighting works together with search, in order to return highlighted snippets for each of the search results.
Faceting is a completely different story, as a facet effectively looks at all the terms that have been indexed for a specific field, throughout all the documents that match the main query. In that respect, the query only controls the documents that are going to be taken into account to perform faceting. Only the top terms (by default with higher count) are going to be returned. Those terms are not only related to the search results (by default 10) but to all the documents that match the query.
That said, the terms returned with the facets are never highlighted.
If you use highlighting you should see in your response, as mentioned in the reference, a new section that contains the highlighted snippets for each of your search results. The reason why you don't see it is that you are querying the title.autocomplete field, but you make highlighting on the title field with require_field_match enabled. You either have to set require_field_match to true or highlight the same field that you are querying on. But again this is not related to faceting whatsoever.
Note the use of * instead of _all. This works like a charm at all level of nesting:
POST 123821/Encounters/_search
{
"query": {
"query_string": {
"query": "Aller*"
}
},
"highlight": {
"fields": {
"*": {}
}
}
}

Resources