Having issues searching polygons that intersect with other polygons with elasticsearch - elasticsearch

In our app, ES holds objects with areas field, where areas field in a type of MultiPyligon. (basically, it's an array of polygons).
Now, we need to search for all the objects in which one of their polygons in at least partially falls within a given polygon (in our case it is the current viewport of the map).
The current query that we are experimenting with is the following:
$params = [
'index' => self::CrimeIndex,
'body' => [
'size' => 10000,
'query' => [
'bool' => [
'filter' => [
'geo_bounding_box' => [
'areas' => [
"top_left" => [
"lat" => $neLat,
"lon" => $neLng
],
"bottom_right" => [
"lat" => $swLat,
"lon" => $swLng
]
],
]
]
]
]
],
];
The problem is that this query gets all the polygons that touch the edges of the bounding box. (see picture).
How can we get all the polygons that are at least partially within the bounding box?

Related

Failing to search polygons that intersect with other polygons with elasticsearch

In our app, ES holds objects with areas field, where areas field in a type of MultiPyligon. (basically, it's an array of polygons).
Now, we need to search for all the objects in which one of their polygons in at least partially falls within a given polygon (in our case it is the current viewport of the map).
The current query that we are experimenting with is the following:
$params = [
'index' => self::CrimeIndex,
'body' => [
'size' => 10000,
'query' => [
'bool' => [
'filter' => [
'geo_bounding_box' => [
'areas' => [
"top_left" => [
"lat" => $neLat,
"lon" => $neLng
],
"bottom_right" => [
"lat" => $swLat,
"lon" => $swLng
]
],
]
]
]
]
],
];
The problem is that this query gets all the polygons that touch the edges of the bounding box. (see picture). How can we get all the polygons that are at least partially within the bounding box?
Mappings are done as follows:
$params = [
'index' => CrimeService::CrimeIndex,
'body' => [
"mappings" => [
'properties' => [
'areas' => [
'type' => 'geo_shape'
]
],
],
],
];
$client->indices()->create($params);
Based on the docs, geo_shape can be MultiPolygon.
https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-shape.html
And here is the example of how it looks like populated:
GET crimes/_mapping/field/areas provides the following:
UPDATE - More Detailed Steps to reproduce
The dump of the collection/index is attached: https://www.dropbox.com/s/8inavsvcrnuozw1/dump-2021-12-29t21_54_04.639z.json.zip?dl=0
The query that is executed with elasticsearch-php is:
$params = [
'index' => 'crime',
'body' => [
'size' => 10000,
'query' => [
'bool' => [
'filter' => [
'geo_bounding_box' => [
'areas' => [
"top_left" => [
"lat" => $neLat,
"lon" => $neLng
],
"bottom_right" => [
"lat" => $swLat,
"lon" => $swLng
]
],
]
],
]
]
],
];
If we execute it with the parameters:
49.29366604017385,-123.00491857934166,49.19709977562233,-123.26617317321401
We get the following:
In case that the viewport is changed a bit, so the polygons touch the borders of viewport: 49.28031011582358,-122.92300503734472,49.18371770837152,-123.18425963121705,
we get the rest of the polygons:
Your query coordinates are wrong, instead of top_left + bottom_right, you have bottom_left + top_right (see image below)
I think that pretty much explains why you're seeing what you're seeing.

Elasticsearch Collapsing Based on Text Similarity

I'm working with Elasticsearch and trying to come up with a way to filter a text field based on a phrase. I have basic searching working, but I also want to collapse "similar" results rather than duplicating them.
For example, given 5 objects with text content as
Buy 1 car get one car free until March
Buy 1 car get one car free until April
50% off your car insurance when you buy through us
Get 50% off your oven
If searching for car then I'd be looking for 2 results:
50% off your car insurance [...]
EITHER of the 1st or 2nd one (with both showing in inner_hits)
I've tried to do this using collapse on the content field but that will only collapse on exact matches.
'query' => [
'match' => [
'content' => 'car',
],
],
'collapse' => [
'field' => 'content',
'inner_hits' => [
'name' => 'recently_seen_on',
'size' => 3,
'sort' => [['seen_on' => 'desc']],
],
],
I've also tried creating adding a similarity property to the content field but I couldn't figure out if it's possible to collapse using that.
I also come across this https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-significanttext-aggregation.html but when I tried something similar I got 0 result. I set the content type to keywords in the mappings:
[
'content' => ['type' => 'keyword'],
]
And then using:
'query' => [
'match' => [
'content' => 'car',
],
],
'aggs' => [
'keywords' => [
'significant_text' => [
'field' => 'content',
'filter_duplicate_text' => true,
],
],
],
Is achieving something like this possible without coming adding a field that groups fields based on content manually?

Elastic Search - sort on should and must

I have this problem, i want to sort on 2 match, and my code look like this right now (its in php array)
$data = [
'query' => [
'bool' => [
'must' => [
'multi_match' => [
'query' => $q,
'fields' => ['Fulltext'],
'type' => 'cross_fields',
'operator' => 'and'
]
],
'should' => [
'match' => [
'Title' => $q,
]
]
]
],
'aggregations' => [
'categorys' => [
'terms' => [
'field' => 'Categorys.value',
'size' => $category_size
]
]
],
'size' => $product_size,
'sort' => [
'Popular' => [
'order' => 'desc'
]
]
];
The problem in this sort i want to sort it by "Popular" but, its not first sort on match in "should" and then after in "must" so my question is.
How can i start sorting in the first match points, and then sort in the secound match points based on Popular field?
The problem is i want to search in prouct where the query match in Title is higher importen then in Fulltext field.
so if i get 10 resualt based on Title and the score are highter then the next 10 resualt based on Fulltext, but we have 3 reusalt from Title and 2 from Fulltext where Probular is gretnder then 0 then this match shut be view'en and sorted by (_score, Popular) if Popular is greather then zero else sort just based on (_score)
Can eny body help me with this question?

Elastic Search - Order / Scoring for document of the same user

I have a question about I can accomplish something.
I have my search algorithm for user documents ready.
I get the list of documents, but I don't wanna have the list to have grouped parts of documents of the same user.
Eg:
doc1: user-1
doc2: user-2
doc3: user-2
doc4: user-3
doc5: user-4
Change to:
doc1: user-1
doc2: user-2
doc4: user-3
doc5: user-4
doc3: user-2
Kind of sorting/randomising...
Any tips, ideas for what I can search?
Or much better, some examples.
I'm quite new to elastic search. The documentation about custom-scoring or ordering is great but not giving me the right answer.
Thanks a million
Stefan
Update 18.08.:
As wished, here also my current query.
'query' => [
'filtered' => [
'query' => [
'bool' => [
'must' => [
'multi_match' => [
'query' => $q,
'fields' => [ 'title^6', 'description^1', 'tags^3']
]
],
'should' => [
[
'match' => [
'isTopDocument' => [
'query' => 'true',
'boost' => 2,
]
]
],[
'range' => [
'online_start' => [
'boost' => 1.8,
'gte' => 'now-7d/d'
]
]
],[
'range' => [
'online_start' => [
'boost' => 1.4,
'gte' => 'now-14d/d'
]
]
],[ // This is to include all available jobs, at least one should must be true if a must is set
// https://www.elastic.co/guide/en/elasticsearch/guide/current/bool-query.html#_controlling_precision
'range' => [
'online_start' => [
'gte' => 'now-61d/d'
]
]
]
]
]
],
'filter' => [
'bool' => [
// Some term filters
'should' => $filter_should,
'must' => $filter_must,
]
]
]
],
'size' => $perPage,
'from' => $from
Even if you find a search trick to score this particular use-case, you probably want to consider just post-processing the search results to get what you need.
Just loop through the list, keeping a reference to the previous user seen, and if you see the same user in the next result, just remove it from the results, and append it to the end of the list.
Generally speaking you'll get your "shuffled" users as desired, with an occasional pileup of your most prolific user at the very end of the list.

Elasticsearch aggregations multifaceted navigation, excluding facets by group

I'm pretty new to ES and have been trying to implement faceted navigation in the same way most of the large e-commerce stores do.
Part of my products mapping looks like this:
'name' => [
'type' => 'string',
'analyzer' => 'english_analyzer',
],
'range' => [
'type' => 'string',
'analyzer' => 'english_analyzer',
],
'filters' => [
'type' => 'nested',
'properties' => [
'name' => [
'type' => 'string',
'index' => 'not_analyzed',
],
'values' => [
'type' => 'object',
'properties' => [
'name' => [
'type' => 'string',
'index' => 'not_analyzed',
]
]
]
]
],
As you can see I'm storing the facets in "filters" as a nested object.
In my search query I can then add this aggregation to my query:
'aggs' => [
'facets' => [
'nested' => ['path' => 'filters'],
'aggs' => [
'filters' => [
'terms' => [
'field' => 'filters.name', //facet group name
'size' => 0
],
'aggs' => [
'id' => [
'terms' => [
'field' => 'filters.id' //facet group ID
]
],
'values' => [
'terms' => [
'field' => 'filters.values.name', //facet name
'size' => 0
],
'aggs' => [
'id' => [
'terms' => [
'field' => 'filters.values.id' //facet id
]
]
]
]
]
]
]
]
]
This gives me a nice list of facets to stick in my navigation. I can apply selected facets to my document results in 2 ways: Using the "post filter" or the "filtered query"
The post filter applies the aggregations after the query and so gives me document counts regardless of what facets have been selected by the user. In contrast the filtered query calculates facet counts based on selected facets, however it hides facets with no matching documents.
What I need to do - and what most of the big e-commerce stores do - is a hybrid of the 2. I want the facet counts to be based on what is selected but ignore any facets within the same group.
If I have these facets:
Colours:
Red (1)
Blue (2)
Green (3)
Brands:
Audi (1)
Ford (2)
BMW (3)
If somebody selects Blue, the counts should remain the same for Red and Green but would effect the counts for brand.
I have found a similar question on Stack Overflow:
ElasticSearch aggregation: exclude one filter per aggregation
From what I can gather I need to provide a pre defined list of facets (from my relational DB) and add them to my aggregations. So I have a manual list of my facet groups, then I add a filter bucket (https://www.elastic.co/guide/en/elasticsearch/guide/current/_filter_bucket.html) to each of these. Within the filter, I need to add a bool query which contains all the user selected facets which I include for each facet group, leaving out any facets which belong to that group.
So now I have a huge list of aggregations, grouped/bucketed by their facet group, each of these has a filter containing a bool query which may have a dozen selected fields. The query now is so large it probably would not fit on one page if I posted it here!
This to me just seems like a crazy addition to my query considering what I had to do before was almost what I needed. Is this the ONLY way I can achieve this?
Any help is greatly appreciated, I hope my question is clear enough.

Resources