Why is the max_score higher than the _score-sorted first hit's _score in Elasticsearch? - elasticsearch

I have an Elasticsearch (8.1) index on which I run a simple match or multi_match query (I tried both, both show the same behavior, even the simplest ones).
In the result it is always the case that max_score is higher than the first hit's _score.
If I add a terms aggregation (on a keyword field) with a top_hits sub-aggregation (with sorting on _score) then the first hit from the first bucket actually has _score == max_score (but it is obviously also a different hit compared to the "main" hits). So, the top_hits aggregation actually does what I want ("fetch all matching documents and sort by _score"). The "main" hits seem to miss some results, however.
How can I make sure that the "main" hits do not "drop" documents? What is the internal mechanics behind this?
I added my PHP array that gets JSON encoded and produces the Elasticsearch query:
[
'size' => 10,
'query' => [
// the result of this does not have all documents
// that appear in the aggregation
// and the highest ranked doc has lower score than max_score
'bool' => [
'must' => [
[
'match' => [
'my_text_field' => [
'query' => 'searchword'
]
]
],
['term' => ['my_other_field' => ['value' => 3]]],
// plus some more other simple term conditions
// on other simple integral fields, but no scripts ore similar
// simple "WHERE a = 5" conditions
]
]
],
// this aggregation has other/more hits than the directly retrieved docs, matching the max_score
// If I remove the aggregation nothing changes for the actual result
'aggs' => [
'my_agg' => [
'terms' => ['field' => 'my_agg_field', 'order' => ['score' => 'desc']],
'aggs' => [
'score' => ['max' => ['script' => '_score']],
'filteredHits' => [
'top_hits' => [
'size' => 10
]
]
]
]
]
]

Related

Query one field with multiple values in elasticsearch nest

I have a combination of two queries with Elasticsearch and nest, the first one is a full-text search for a specific term and the second one is to filter or query another field which is file-path but it should be for many files paths and the path could be part or full path, I can query one file-path but I couldn't manage to do it for many file paths, any suggestion?
Search<SearchResults>(s => s
.Query(q => q
.Match(m => m.Field(f => f.Description).Query("Search_term"))
&& q
.Prefix(t => t.Field(f => f.FilePath).Value("file_Path"))
)
);
For searching for more than one path you can use bool Query in elasticsearch and then use Should Occur to search like logical OR, so you code should look like this:
Search<SearchResults>(s => s
.Query(q => q.
Bool(b => b
.Should(
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
....
))
&& q.Match(m => m.Field(f => f.description).Query("Search_term")
)));
Also you should use WildCard Query to get result for paths that could be part or full path. For more information check ES offical documentation about WildQuery and Bool Query below:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/bool-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

How to force Elastic to keep more decimals from a float

I have some coordinates that I pass to Elasticsearch from Logstash, but Elastic keeps only 3 decimals, so coordinate wise, I completely lose the location.
When I send the data from Logstash, I can see it got the right value:
{
"nasistencias" => 1,
"tiempo_demora" => "15",
"path" => "/home/elk/data/visits.csv",
"menor" => "2",
"message" => "5,15,Parets del Vallès,76,0,8150,41.565505,2.234999575,LARINGITIS AGUDA,11/3/17 4:20,1,38,1,2,POINT(2.2349995750000695 41.565505000000044)",
"id_poblacion" => 76,
"#timestamp" => 2017-03-11T04:20:00.000Z,
"poblacion" => "Parets del Vallès",
"edad_valor" => 0,
"patologia" => "LARINGITIS AGUDA",
"host" => "elk",
"#version" => "1",
"Geopoint_corregido" => "POINT(2.2349995750000695 41.565505000000044)",
"id_tipo" => 1,
"estado" => "5",
"cp" => 8150,
"location" => {
"lon" => 2.234999575, <- HERE
"lat" => 41.565505 <- AND HERE
},
"id_personal" => 38,
"Fecha" => "11/3/17 4:20"
}
But then, I get it on Kibana as follows:
I do the conversion as follows:
mutate {
convert => { "longitud_corregida" => "float" }
convert => { "latitude_corregida" => "float" }
}
mutate {
rename => {
"longitud_corregida" => "[location][lon]"
"latitude_corregida" => "[location][lat]"
}
}
How can I keep all the decimals? With geolocation, one decimal can return the wrong city.
Another question (related)
I add the data to the csv document as follows:
# echo "5,15,Parets del Vallès,76,0,8150,"41.565505","2.234999575",LARINGITIS AGUDA,11/3/17 4:20,1,38,1,2,POINT(2.2349995750000695 41.565505000000044)" >> data/visits.csv
But in the original document, instead of dots there are comas for the coordinates. like this:
# echo "5,15,Parets del Vallès,76,0,8150,"41,565505","2,234999575",LARINGITIS AGUDA,11/3/17 4:20,1,38,1,2,POINT(2.2349995750000695 41.565505000000044)" >> data/visits.csv
But the problem was that it was getting the coma as field separator, and all the data was being sent to Elasticsearch wrong. Like here:
Here, the latitude was 41,565505, but that coma made it understand 41 as latitude, and 565505 as longitude. I changed the coma by dot, and am not sure if float understands comas and dots, or just comas. My question is, did I do wrong changing the coma by dot? Is there a better way to correct this?
Create a GEO-Point mapping for the lat/lon fields. This will lead to a more precise and internally optimized storage in ES and allow you more sophisticated GEO-Queries.
Please keep in mind, that you'll need to reindex the data as mapping changes are not possible afterwards (if there are already docs present having the fields to change)
Zero downtime approach:
Create a new index with a optimized mapping (derive it from the current, and make your changes manually)
Reindex the data (at least some docs for verification)
Empty the new index again
Change the logstash destination to the new index (consider using aliases)
Reindex the old data into the new index

ElasticSearch _count and _search apply same query

i'm trying get count of total results for paginator, but the problem is that _filter and _count have different arguments, for example:
_search:
'from' => $offset,
'size' => $limit,
'query' => [
'match_phrase' => $q,
]
_count:
'index' => $index,
'type' => $type,
'q' => $q,
I need apply match_phrase also inside _count, because when will keep there just q then count/number will don't be correct...but does _count accept match_phrase?
It's _count right way or must use different way? I was searching long hours, but found just this Elastic Search _search vs. _count syntax but it's saying nothing for me..

elasticsearch regroup aggs by prefix

Hi i make a search facets with elasticsearch so i use aggs to get this facets but i would like to regroup all term start with text until the first '/'
Example some value are indexing like 'levelone/leveltwo'
but i would like to regroup all same levelone value
i try that but it does not work
'aggs' = array( 'tags => array ('terms' => array (
'field' => $filter,
'size' => 0,
'include' => ".*/.*",
)
)
)
Is it possible ?

ElasticSearch in e-commerce: multiple categories for a product

How do you index products like that in ElasticSearch? We've separated all documents based on the attributes (colour, brand, size, whatever users input), but all of them belong to a set of categories. May be one, may be 15.
[0] => Array
(
[product_id] => 123456
[product_name] => Shirt 1
[filter_name] => Colour
[filter_value] => Blue
[product_parent_id] => 111111
[product_has_discount] => 0
[product_price] => 19.99
[product_stock] => 1
)
[1] => Array
(
[product_id] => 123457
[product_name] => Shirt 1
[filter_name] => Colour
[filter_value] => Red
[product_parent_id] => 111111
[product_has_discount] => 0
[product_price] => 19.99
[product_stock] => 1
)
How would we tag categories into this? Would it be so simple as saying
[product_categories] => ;4750;4834;4835;4836;
And then querying ElasticSearch with match against category with the value ;4836;? Is that possible? Recommended?
You can define product_categories as integer in your mapping and pass in the category values as array like
[product_categories] => array(4750,4834,4835,4836)
EDIT: You read more about mapping here. A more specific to mapping an array type.
Once your data is indexed like that you can query, filter, aggregate on the field product_categories easily in all combinations.
for example to match products in category 4750 or 4750:
{
"filter": {
"terms": {
"product_categories": [
4750,
4750
]
}
}
}

Resources