Elasticsearch geospatial queries returning no hits - elasticsearch

I'm using Kibana to look at a geospatial dataset in Elasticsearch for a feature currently under development. There is a index of positions which contains field "loc.coordinates", which is a geo_point, and has as data as such:
loc.coordinates 25.906958000000003, 51.776407000000006
However when I run the following query I get no results:
Query
GET /positions/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "2000km",
"loc.coordinates" : {
"lat" : 25,
"lon" : 51
}
}
}
}
}
}
Response
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I'm trying to understand why this is, as there are over 250,000 datapoints in the index, and I'm getting no hits regardless of how big the search area is. When I look in the position index mapping I see the following:
"loc": {
"type": "nested",
"properties": {
"coordinates": {
"type": "geo_point"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
I'm new to Elasticsearch and have been making my way through the documentation, but so far I don't see why my geo queries aren't working as expected. What am I doing wrong?

Your loc field is of type nested, so you need to query that field accordingly with a nested query:
GET /positions/_search
{
"query": {
"bool" : {
"filter" : {
"nested": {
"path": "loc",
"query": {
"geo_distance" : {
"distance" : "2000km",
"loc.coordinates" : {
"lat" : 25,
"lon" : 51
}
}
}
}
}
}
}
}

Related

How to use value of nested documents in script scoring

Schema looks like this:
"mappings": {
"_doc": {
"_all": {
"enabled": false
},
"properties": {
"category_boost": {
"type": "nested",
"properties" : {
"category": {
"type": "text",
"index": false
},
"boost": {
"type": "integer",
"index": false
}
}
}
}
}
}
The document in elastic does have data:
"category_boost": [
{
"category": "A",
"boost": 98
},
{
"category": "B",
"boost": 96
},
{
"category": "C",
"boost": 94
},
],
Inside scoring function:
for (int i=0; i<doc['"'category_boost.boost'"'].size(); ++i) {
if (doc['"'category_boost.category'"'][i].value.equals(params.category)) {
boost = doc['"'category_boost.boost'"'][i].value;
}
}
Also tried length to get size of the array, but did help. Since it does not affect results, I tried to divide by size() and it throws division by zero error, so I conclude the size is 0.
Overall problem: have a map of category->boost which is dynamic and I cannot hardcode into schema. I tried type object with json object, but it turned out you cannot access those objects in scoring functions, therefore I went with arrays with defined types.
nested datatype create sub-documents for representing the items of your collections. So access their doc values in a script is possible but you need to be inside a nested query.
Here is one way of doing it, I hope it fulfills your requirements. This example only returns the document with a score depending on the chosen category.
NB : I used elasticsearch 7 in my local, so your will have to modify the mapping to add your "_doc" entry etc....
Here is the modified mapping, I removed the index: false in nested properties since we now use them in queries
PUT test-score_nested
{
"mappings": {
"properties": {
"category_boost": {
"type": "nested",
"properties": {
"category": {
"type": "keyword"
},
"boost": {
"type": "integer"
}
}
}
}
}
}
Then I add your sample data :
POST test-score_nested/_doc
{
"category_boost": [
{
"category": "A",
"boost": 98
},
{
"category": "B",
"boost": 96
},
{
"category": "C",
"boost": 94
}
]
}
And then the query.
We go one level deep in the nested collection
Inside the collection we use a function score query with the replace mode
Inside the function score, we use a filter query to "select" the good category and use its boost for the scoring
POST test-score_nested/_search
{
"query": {
"nested": {
"path": "category_boost",
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"term": {
"category_boost.category": {
"value": "A"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "category_boost.boost"
}
}
]
}
}
}
}
}
returns
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 98.0,
"hits" : [
{
"_index" : "test-score_nested",
"_type" : "_doc",
"_id" : "v3Smqm0BZ7nyeX7PPevA",
"_score" : 98.0,
"_source" : {
"category_boost" : [
{
"category" : "A",
"boost" : 98
},
{
"category" : "B",
"boost" : 96
},
{
"category" : "C",
"boost" : 94
}
]
}
}
]
}
}
I hope it will help you!

ElasticSearch search for part of url

I'm working with ElasticSearch 5 and can't find a solution for the following:
I want to search for a string with slashes (part of a url) in a document. But it won't return matching documents.
I've read something that strings with slashes are splitted by ES and that's not what I want for this field. I've tried to set "not_analyzed" on the field with a mapping, but I can't seem to get it to work somehow.
"Create index":
Put http://localhost:9200/test
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"field1" : { "type" : "text","index": "not_analyzed" }
}
}
}
}
"Add document":POST http://localhost:9200/test/type1/
{
"field1" : "this/is/a/url/test"
}
"Search document" POST http://localhost:9200/test/type1/_search
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [{
"term" : {
"field1" : {
"value" : "this/is/a/url/test"
}
}
}
]
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
"The mapping response": GET http://localhost:9200/test/_mapping?pretty
{
"test": {
"mappings": {
"type1": {
"properties": {
"field1": {
"type": "text"
}
}
}
}
}
}
Using a term query for getting an exact match is correct. However, your initial mapping is wrong.
"type" : "text", "index": "not_analyzed"
should be this instead
"type": "keyword"
(Note: The keyword type in ES5 is equivalent to a not_analyzed string in ES 2.x)
You need to delete your index and re-create it with the corrected mapping. Then your term query will work.
I suspect what you need is a Match query, not a Terms query. Terms is looking for a single "term"/word and is not breaking down your request with an analyzer.
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [{
"match" : {
"field1" : "this/is/a/url/test"
}
}
]
}
}
}

How to round up double to 2 decimal point - elasticsearch

I have documents in elastic search that looks like :
{
"entityFk": 0,
"entityCode": "ADM",
"entityObj": {
"id": 0,
"code": "ADM",
"description": "ADM - FIRSTCOM"
},
"practiceFk": 54745,
"practiceObj": {
"id": 54745,
"code": "33.04.01.32",
"description": "Artrotomia ou artroscopia com tratamento de lesões articulares circunscritas ",
"practiceValue": 23.5
}
}
}
I want to sum all "practiceValue" (not null) that has entityCode.description equals to "FIRST", so I made this query :
{
"size" : 0,
"query" : {
"bool" : {
"must_not" : [
{
"missing" : { "field" : "practiceObj.practiceValue" }
}
],
"must" : [
{
"match" : { "entityObj.description" : "FIRST" }
}
]
}
},
"aggs" : {
"total" : {
"sum" : { "field" : "practiceObj.practiceValue"}
}
}
}
Here is the result I obtained :
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11477,
"max_score": 0,
"hits": []
},
"aggregations": {
"total": {
"value": 1593598.7499999984
}
}
}
The deal is: how can i round up the value to 2 decimal point.
Can someone help?
Thanks.
EDIT:
here´s my mapping :
{
"index_practice_entities": {
"mappings": {
"practice_entities_search": {
"properties": {
"entityCode": {
"type": "string"
},
"entityFk": {
"type": "long"
},
"entityObj": {
"properties": {
"code": {
"type": "string"
},
"description": {
"type": "string"
},
"id": {
"type": "long"
}
}
},
"practiceFk": {
"type": "long"
},
"practiceObj": {
"properties": {
"code": {
"type": "string"
},
"description": {
"type": "string"
},
"id": {
"type": "long"
},
"practiceValue": {
"type": "double"
}
}
}
}
}
}
}
}
Please try the below mentioned script, It will round the aggregated value to 2 decimal places.
"aggs" : {
"total" : {
"sum" : {
"script" : "Math.round(doc['practiceObj.practiceValue'].value*100)/100.0"
}
}
}
I wrote an aggs for managing more precisely the sum of floating point in elastic:
{
"query":{
/* your's filters */
},
"aggs":{
"price":{
"sum":{
"field":"price",
"script":"BigDecimal.valueOf(_value).setScale(4, RoundingMode.HALF_UP)",
"missing":0
}
}
}
}
In this example you can manage the scale changing .setScale to 2.
You can try script to achieve this:
"aggs" : {
"total" : {
"sum" : { "script" : "(doc['practiceObj.practiceValue'].value).round(2)" // practiceValue should be double or float
}
}
}
Make sure to to enable scripting

Elasticsearch Filtering Parents by Filtered Child Document Count

I'm attempting to do some elasticsearch query fu on a set of data I have.
I have a user document that is the parent to many child page view documents. I'm looking to return all users that have viewed a specific page an arbitrary amount of times (defined by user input box). So far, I've got a has_child query that will return me all the users that have a page view with certain ids. However, this will return those parents with all their children. Next, I've tried to write an aggregation on those query results, that will essentially do the same has_child query in aggregation form. Now, I have the right document count for my filtered child documents. I need to use this document count to go back and filter the parents. To explain the query in words, "return to me all the users that have viewed a specific page more than 4 times". It's possible that I may need to restructure my data. Any thoughts?
Here is my query thus far:
curl -XGET 'http://localhost:9200/development_users/_search?pretty=true' -d '
{
"query" : {
"has_child" : {
"type" : "page_view",
"query" : {
"terms" : {
"viewed_id" : [175,180]
}
}
}
},
"aggs" : {
"to_page_view": {
"children": {
"type" : "page_view"
},
"aggs" : {
"page_views_that_match" : {
"filter" : { "terms": { "viewed_id" : [175,180] } }
}
}
}
}
}'
This returns me a response like:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "development_users",
"_type" : "user",
"_id" : "22548",
"_score" : 1.0,
"_source":{"id":22548,"account_id":1009}
} ]
},
"aggregations" : {
"to_page_view" : {
"doc_count" : 53,
"page_views_that_match" : {
"doc_count" : 2
}
}
}
}
Associated Mappings:
{
"development_users" : {
"mappings" : {
"page_view" : {
"dynamic" : "false",
"_parent" : {
"type" : "user"
},
"_routing" : {
"required" : true
},
"properties" : {
"created_at" : {
"type" : "date",
"format" : "date_time"
},
"id" : {
"type" : "integer"
},
"viewed_id" : {
"type" : "integer"
},
"time_on_page" : {
"type" : "integer"
},
"title" : {
"type" : "string"
},
"type" : {
"type" : "string"
},
"updated_at" : {
"type" : "date",
"format" : "date_time"
},
"url" : {
"type" : "string"
}
}
},
"user" : {
"dynamic" : "false",
"properties" : {
"account_id" : {
"type" : "integer"
},
"id" : {
"type" : "integer"
}
}
}
}
}
}
Okay, so this is kind of involved. I made a few simplifications to keep it straight in my head. First, I used this mapping:
PUT /test_index
{
"mappings": {
"page_view": {
"_parent": {
"type": "development_user"
},
"properties": {
"viewed_id": {
"type": "string"
}
}
},
"development_user": {
"properties": {
"id": {
"type": "string"
}
}
}
}
}
Then I added some data. In this little universe, I have three users and two pages. I want to find users who have viewed "page_a" at least twice, so if I construct the correct query only user 3 will be returned.
POST /test_index/development_user/_bulk
{"index":{"_type":"development_user","_id":1}}
{"id":"user_1"}
{"index":{"_type":"page_view","_parent":1}}
{"viewed_id":"page_a"}
{"index":{"_type":"development_user","_id":2}}
{"id":"user_2"}
{"index":{"_type":"page_view","_parent":2}}
{"viewed_id":"page_b"}
{"index":{"_type":"development_user","_id":3}}
{"id":"user_3"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_b"}
To get that answer we'll use aggregations. Notice that I don't want documents returned (the normal way), but I do want to filter down the documents we analyze, because it will make things more efficient. So I use the same basic filter you had before.
So the aggregation tree starts with terms_parent_id which will just separate parent documents. Inside that I have children_page_view which filters the child documents down to the ones I want ("page_a"), and next to it in the hierarchy is bucket_selector_page_id_term_count which uses a bucket selector (you'll need ES 2.x) to filter the parent documents by those meeting the criterium, and then finally a top hits aggregation which shows us the documents that match the requirements.
POST /test_index/development_user/_search
{
"size": 0,
"query": {
"has_child": {
"type": "page_view",
"query": {
"terms": {
"viewed_id": [
"page_a"
]
}
}
}
},
"aggs": {
"terms_parent_id": {
"terms": {
"field": "id"
},
"aggs": {
"children_page_view": {
"children": {
"type": "page_view"
},
"aggs": {
"filter_page_ids": {
"filter": {
"terms": {
"viewed_id": [
"page_a"
]
}
}
}
}
},
"bucket_selector_page_id_term_count": {
"bucket_selector": {
"buckets_path": {
"children_count": "children_page_view>filter_page_ids._count"
},
"script": "children_count >= 2"
}
},
"top_hits_users": {
"top_hits": {
"_source": {
"include": [
"id"
]
}
}
}
}
}
}
}
which returns:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"terms_parent_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "user_3",
"doc_count": 1,
"children_page_view": {
"doc_count": 3,
"filter_page_ids": {
"doc_count": 2
}
},
"top_hits_users": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "development_user",
"_id": "3",
"_score": 1,
"_source": {
"id": "user_3"
}
}
]
}
}
}
]
}
}
}
Here's all the code I used:
http://sense.qbox.io/gist/43f24461448519dc884039db40ebd8e2f5b7304f

ElasticSearch 'range' query returns inappropriate results

Lets take this query:
{
"timeout": 10000,
"from": 0,
"size": 21,
"sort": [
{
"view_avg": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"price": {
"from": 10,
"to": 20
}
}
},
{
"terms": {
"category_ids": [
16405
]
}
}
]
}
}
}
This query on data set that I am running on, should return no results (as all prices are in 100s-1000s range). However, this query returns results, matching prices as:
"price": "1399.00"
"price": "1299.00"
"price": "1089.00"
And so on, and so forth.. Any ideas how I could modify the query, so it returns the correct results?
I'm 99% sure your mapping is wrong and price is declared as string. Elasticsearch is using different Lucene range queries based on the field type as you can see in their documentation. The TermRangeQuery for string type acts like your output, it uses lexicographical ordering (ie. 1100 is between 10 and 20).
To test it you can try the following mapping/search:
PUT tests/
PUT tests/test/_mapping
{
"test": {
"_source" : {"enabled" : false},
"_all" : {"enabled" : false},
"properties" : {
"num" : {
"type" : "float", // <-- HERE IT'S A FLOAT
"store" : "no",
"index" : "not_analyzed"
}
}
}
}
PUT tests/test/1
{
"test" : {
"num" : 100
}
}
POST tests/test/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"num": {
"from": 10,
"to": 20
}
}
}
]
}
}
}
Result:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
If you delete the index and try to recreate it changing the num type to a string:
PUT tests/test/_mapping
{
"test": {
"_source" : {"enabled" : false},
"_all" : {"enabled" : false},
"properties" : {
"num" : {
"type" : "string", // <-- HERE IT'S A STRING
"store" : "no",
"index" : "not_analyzed"
}
}
}
}
You'll see a different result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "tests",
"_type": "test",
"_id": "1",
"_score": 1
}
]
}
}
price needs to be a numeric field for that must to work. If it's string it will return. Make sure the mapping is correct, if it would have been float it would have worked.
You can check the mapping of the index with GET /index_name/_mapping.
If you would have had the following (and the price is string):
"range": {
"price": {
"from": 30,
"to": 40
}
}
that shouldn't return the docs because 1 (string) is before 3 or 4 (strings), even if numerically speaking 30 is smaller than 1399.

Resources