Elasticsearch filter logic - elasticsearch

I can't find results when filtering by category. Removing the category filter works.
After much experimentation, this is my query:
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "*",
"zero_terms_query": "all",
"operator": "and",
"fields": [
"individual_name.name^1.3",
"organisation_name.name^1.8",
"profile",
"accreditations"
]
}
},
"filter": {
"bool": {
"must": [{
"term": { "categories" : "9" }
]}
}
}
}
}
This is some sample data:
{
_index: providers
_type: provider
_id: 3
_version: 1
_score: 1
_source: {
locations:
id: 3
profile: <p>Dr Murray is a (blah blah)</p>
cost_id: 3
ages: null
nationwide: no
accreditations: null
service_types: null
individual_name: Dr Linley Murray
organisation_name: Crawford Medical Centre
languages: {"26":26}
regions: {"1":"Auckland"}
districts: {"8":"Manukau City"}
towns: {"2":"Howick"}
categories: {"10":10}
sub_categories: {"47":47}
funding_regions: {"7":7}
}
}
These are my indexing settings:
$index_settings = array(
'number_of_shards' => 5,
'number_of_replicas' => 1,
'analysis' => array(
'char_filter' => array(
'wise_mapping' => array(
'type' => 'mapping',
'mappings' => array('\'=>', '.=>', ',=>')
)
),
'filter' => array(
'wise_ngram' => array(
'type' => 'edgeNGram',
'min_gram' => 5,
'max_gram' => 10
)
),
'analyzer' => array(
'index_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'char_filter' => array('html_strip', 'wise_mapping'),
'filter' => array('standard', 'wise_ngram')
),
'search_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'char_filter' => array('html_strip', 'wise_mapping'),
'filter' => array('standard', 'wise_ngram')
),
)
)
);
Is there a better way to filter/search this? The filter worked when I used snowball instead of nGram. Why is this?

You are querying the category field looking for term 9, but the category field is actually an object:
{ "category": { "10": 10 }}
So your filter should look like this instead:
{ "term": { "category.9": 9 }}
Why are you specifying the category in this way? You'll end up with a new field for every category, which you don't want.
There's another problem with the query part. You are querying multiple fields with multi_match and setting operator to and. A query for "brown fox":
{ "multi_match": {
"query": "brown fox",
"fields": [ "foo", "bar"]
}}
would be rewritten as:
{ "dis_max": {
"queries": [
{ "match": { "foo": { "query": "brown fox", "operator": "and" }}},
{ "match": { "bar": { "query": "brown fox", "operator": "and" }}}
]
}}
In other words: all terms must be present in the same field, not in any of the listed fields! This is clearly not what you are after.
This is quite a hard problem to solve. In fact, in v1.1.0 we will be adding new functionality to the multi_match query which will greatly help in this situation.
You can read about the new functionality on this page.

Related

How can I apply filters in elasticsearch NEST functionscore function

To control the score of documents based on a field value, I am using filter with function_score in my DSL query and this gets the results ordered as I expect.
However on implementing this in NEST, the results are different; the score is not applied to the filter value. From further investigation I find that some versions of C# NEST Filter is not supported with ScoreFunctionsDescriptor. Is this still the case?
Can you please assist with a working option to implement this with NEST? (I am new to Elastic Search and C# so please excuse if its a noob question).
I am currently using Elasticsearch v7.6, and NEST v7.5.1.
Thanks!
DSL query
GET /help/_search
{
"query":{
"function_score": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "AGC",
"fields": [
"title^2",
"description^1"
],
"type":"most_fields",
"fuzziness": "AUTO:4,8",
"prefix_length": 2,
"boost": 5
}
}
]
}
},
** "functions": [
{
"filter": {
"term":{
"product":"A"
}
},
"weight": 45
},
{
"filter": {
"term":{
"product":"B"
}
},
"weight": 20
},
{
"filter": {
"term":{
"product":"C"
}
},
"weight": 10
}
],**
"score_mode": "max",
"boost_mode": "multiply"
}
}
}```
From further investigation I find that some versions of C# NEST Filter is not supported with ScoreFunctionsDescriptor. Is this still the case?
The function you're looking for is a weight function with a filter applied. This has been supported for a long time in the client.
The client equivalent would be something like
var response = client.Search<object>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Query(fsq => fsq
.Bool(b => b
.Must(mu => mu
.MultiMatch(mm => mm
.Query("AGC")
.Fields(new[] { "title^2", "description^1" })
.Type(TextQueryType.MostFields)
.Fuzziness(Fuzziness.AutoLength(4,8))
.PrefixLength(2)
.Boost(5)
)
)
)
)
.Functions(fu => fu
.Weight(w => w
.Filter(f => f
.Term("product", "A")
)
.Weight(45)
)
.Weight(w => w
.Filter(f => f
.Term("product", "B")
)
.Weight(20)
)
.Weight(w => w
.Filter(f => f
.Term("product", "C")
)
.Weight(10)
)
)
.ScoreMode(FunctionScoreMode.Max)
.BoostMode(FunctionBoostMode.Multiply)
)
)
);
There's a couple of things that you may want to look at:
description^1 can be replaced with simply description as the default boost is 1
This may be better expressed as a bool query with should clauses for each of the queries in each weight function, with a boost applied similar to how weight is being used. Something like
var response = client.Search<object>(s => s
.Query(q => q
.Bool(b => b
.Must(mu => mu
.MultiMatch(mm => mm
.Query("AGC")
.Fields(new[] { "title^2", "description^1" })
.Type(TextQueryType.MostFields)
.Fuzziness(Fuzziness.AutoLength(4, 8))
.PrefixLength(2)
.Boost(5)
)
)
.Should(
sh => sh.Term(t => t
.Field("product")
.Value("A")
.Boost(45)
),
sh => sh.Term(t => t
.Field("product")
.Value("B")
.Boost(20)
),
sh => sh.Term(t => t
.Field("product")
.Value("C")
.Boost(10)
)
)
)
)
);

how to implement Elastic Search in Laravel for auto complete

I followed this wonderful tutorial Made with love ES laravel tutorial to implement ES in my Laravel Ecommerce app i am building. I got it to work but i wanted to tweak it a little because as of now, the results only work when i my query term matches exactly an entire word i have in one of my products.
i have a health nut bar called bear naked almond bar, but when i type "bea" it doesn't match anything, but when i type "bear" then it works.
This is my search query as of now.
$model = new ProductsListing;
$items = $this->elasticsearch->search([
'index' => $model->getSearchIndex(),
'type' => $model->getSearchType(),
'body' => [
'query' => [
'multi_match' => [
'fields' => ['brand', 'name^5', 'description'],
'query' => $query,
],
],
],
]);
I would need some help to adjust the query so that i get results as i type.
I am not familiar with PHP and laravel, but the reason why it's not giving result on bea b/c you are using match query which applies the same analyzer which is used at index time, and your text bear naked almond bar creates bear, naked, almond and bar tokens and bea doesn't match any token.
You can change your query below(not sure about the correct syntax for the prefix in laravel).
$model = new ProductsListing;
$items = $this->elasticsearch->search([
'index' => $model->getSearchIndex(),
'type' => $model->getSearchType(),
'body' => [
'query' => [
'prefix' => [. --> changed to prefix query.
'fields' => ['brand', 'name^5', 'description'],
'query' => $query,
],
],
],
]);
I am assuming that you are using the text field with default standard analyzer and using the analyze API, you can check the tokens generated for your text.
POST your-index/_analyze
{
"text": "bear naked almond bar",
"analyzer" : "standard"
}
{
"tokens": [
{
"token": "bear",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "naked",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "almond",
"start_offset": 11,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "bar",
"start_offset": 18,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 3
}
]
}
Try match_phrase_prefix with must query inside bool query instead of multi_match ... you will start getting result as you type...
e.g
$query = '{
"size": 10, //size of return array
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"tag":"' . $filter_name . '"
}
}
],
}
}
}';
//Now pass this $query variable to your ES request
there are two methods 1- prefix , 2- match_phrase_prefix
Please see documentation for prefix and match_phrase_prefix
I was able to get help here: github
and this was the perfect solution. Thank you Enricco Zimuel.
$elasticsearch = ClientBuilder::create()
->setHosts(config('services.search.hosts'))
->build();
$model = new ProductsListing;
$items = $elasticsearch->search([
'index' => $model->getSearchIndex(),
'type' => $model->getSearchType(),
'body' => [
//this helped a lot
// https://stackoverflow.com/questions/60716639/how-to-implement-elastic-search-in-laravel-for-auto-complete
//using query_string
'query' => [
'query_string' => [
// 'fields' => ['brand', 'name^5', 'description'], //notice the weight operator ^5
'fields' => ['name'],
'query' => $query.'*',
],
],
],
]);

Unable to create nested json output (aggregated) from CSV input

Issue I am facing is I need aggregation of CSV inputs on ID, and it contains multiple nesting. I am able to perform single nesting, but on further nesting, I am not able to write correct syntax.
INPUT:
input {
generator {
id => "first"
type => 'csv'
message => '829cd0e0-8d24-4f25-92e1-724e6bd811e0,GSIH1,2017-10-10 00:00:00.000,HCC,0.83,COMMUNITYID1'
count => 1
}
generator {
id => "second"
type => 'csv'
message => '829cd0e0-8d24-4f25-92e1-724e6bd811e0,GSIH1,2017-10-10 00:00:00.000,LACE,12,COMMUNITYID1'
count => 1
}
generator {
id => "third"
type => 'csv'
message => '829cd0e0-8d24-4f25-92e1-724e6bd811e0,GSIH1,2017-10-10 00:00:00.000,CCI,0.23,COMMUNITYID1'
count => 1
}
}
filter
{
csv {
columns => ['id', 'reference', 'occurrenceDateTime', 'code', 'probabilityDecimal', 'comment']
}
mutate {
rename => {
"reference" => "[subject][reference]"
"code" => "[prediction][outcome][coding][code]"
"probabilityDecimal" => "[prediction][probabilityDecimal]"
}
}
mutate {
add_field => {
"[resourceType]" => "RiskAssessment"
"[prediction][outcome][text]" => "Member HCC score based on CMS HCC V22 risk adjustment model"
"[status]" => "final"
}
}
mutate {
update => {
"[subject][reference]" => "Patient/%{[subject][reference]}"
"[comment]" => "CommunityId/%{[comment]}"
}
}
mutate {
remove_field => [ "#timestamp", "sequence", "#version", "message", "host", "type" ]
}
}
filter {
aggregate {
task_id => "%{id}"
code => "
map['resourceType'] = event.get('resourceType')
map['id'] = event.get('id')
map['status'] = event.get('status')
map['occurrenceDateTime'] = event.get('occurrenceDateTime')
map['comment'] = event.get('comment')
map['[reference]'] = event.get('[subject][reference]')
map['[prediction]'] ||=
map['[prediction]'] << {
'code' => event.get('[prediction][outcome][coding][code]'),
'text' => event.get('[prediction][outcome][text]'),
'probabilityDecimal'=> event.get('[prediction][probabilityDecimal]')
}
event.cancel()
"
push_previous_map_as_event => true
timeout => 3
}
mutate {
remove_field => [ "#timestamp", "tags", "#version"]
}
}
output{
elasticsearch {
template => "templates/riskFactor.json"
template_name => "riskFactor"
action => "index"
hosts => ["localhost:9201"]
index => ["deepak"]
}
stdout {
codec => json{}
}
}
OUTPUT:
{
"reference": "Patient/GSIH1",
"comment": "CommunityId/COMMUNITYID1",
"id": "829cd0e0-8d24-4f25-92e1-724e6bd811e0",
"status": "final",
"resourceType": "RiskAssessment",
"occurrenceDateTime": "2017-10-10 00:00:00.000",
"prediction": [
{
"probabilityDecimal": "0.83",
"code": "HCC",
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
{
"probabilityDecimal": "0.23",
"code": "CCI",
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
{
"probabilityDecimal": "12",
"code": "LACE",
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
}
]
}
REQUIRED OUTPUT:
{
"resourceType": "RiskAssessment",
"id": "829cd0e0-8d24-4f25-92e1-724e6bd811e0",
"status": "final",
"subject": {
"reference": "Patient/GSIH1"
},
"occurrenceDateTime": "2017-10-10 00:00:00.000",
"prediction": [
{
"outcome": {
"coding": [
{
"code": "HCC"
}
],
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
"probabilityDecimal": 0.83
},
{
"outcome": {
"coding": [
{
"code": "CCI"
}
],
"text": "Member HCC score based on CMS HCC V22 risk adjustment model"
},
"probabilityDecimal": 0.83
}
],
"comment": "CommunityId/COMMUNITYID1"
}

Conditional ElasticSearch sorting by different fields if in range

I do have have products and some of them are reduced in price for a specific date range.
(simplified) example products:
{
"id": 1,
"price": 2.0,
"specialPrice": {
"fromDate": null,
"tillDate": null,
"value": 0,
},
},
{
"id": 2,
"price": 4.0,
"specialPrice": {
"fromDate": 1540332000,
"tillDate": 1571781600,
"value": 2.5,
},
},
{
"id": 3,
"price": 3.0,
"specialPrice": {
"fromDate": null,
"tillDate": null,
"value": 0,
},
}
Filtering by price was no problem. That I could do with a simple bool query.
But I could not yet find a good example for ElasticSearch scripts that could point me in the right direction, even though it should be quite simple, given you know the syntax.
My pseudocode: price = ('now' between specialPrice.fromDate and specialPrice.tillDate) ? specialPrice.value : price
Is there a way to translate this into something that would work in an ElasticSearch sorting?
To clarify further: By default, all products are already sorted by several conditions. The user can also search for any terms and filter the results while also being able to select multiple sorting parameters. Items can for example be sorted by tags and then by price, it's all very dynamic and it does still sort those results by some other properties (including the _score) afterwards.
So just changing the _score would be bad, since that is already calculated in a complex matter to show the best results for the given search terms.
Here is my current script, which does fail at the first params.currentDate:
"sort": {
"_script": {
"type": "number",
"script": {
"source": "if(doc['specialPrice.tillDate'] > params.currentDate) {params.currentPrice = doc['specialPrice.value']} return params.currentPrice",
"params": {
"currentDate": "now",
"currentPrice": "doc['price']"
}
}
}
How it does work now:
One problem was the nesting of some of the properties.
So one of my steps was to duplicate their content to new fields for the product (which I'm not that happy about, but whatever).
So in my mapping, I created new properties for products (specialFrom, specialTill, specialValue) and gave the corresponding fields in my specialPrice "copy_to" properties with the new property names.
The part is in php array syntax, since I'm using ruflin/elastica:
'specialPrice' => [
'type' => 'nested',
'properties' => [
'fromDate' => [
'type' => 'date',
'format' => 'epoch_second',
'copy_to' => 'specialFrom',
],
'tillDate' => [
'type' => 'date',
'format' => 'epoch_second',
'copy_to' => 'specialTill',
],
'value' => [
'type' => 'float',
'copy_to' => 'specialValue',
],
],
],
'specialFrom' => [
'type' => 'date',
'format' => 'epoch_second',
],
'specialTill' => [
'type' => 'date',
'format' => 'epoch_second',
],
'specialValue' => [
'type' => 'float',
],
Now my sorting sorting script does look like this (in my testing client, still working on implementing it within elastica):
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "params.param = ((doc['specialTill'].value - new Date().getTime()) > 0 && (new Date().getTime() - doc['specialFrom'].value) > 0) ? doc['specialValue'].value : doc['price'].value; return params.param;",
"params": {
"param": 0.0
}
}
}
}
I'm not 100% happy with this because I have redundant data and scripts (calling new Date().getTime() twice in the script), but it does work and that is the most important thing for now :)
I've updated the below query post your clarifications. Let me know if that works!
POST dateindex/_search
{
"query":{
"match_all":{ // you can ignore this, I used this to test at my end
}
},
"sort":{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":" params.param = ((doc['specialPrice.tillDate'].value - new Date().getTime()) > 0) ? doc['specialPrice.value'].value : doc['price'].value; return params.param;",
"params":{
"param":0.0
}
},
"order":"asc"
}
}
}
You can try using source instead of inline in the above query as I've been testing on ES5.X version on my machine.
Hope it helps!

How to query array of objects as part of term query

I am using elasticsearch 5.5.0.
Im my index i have data of type attraction part of the json in elastic looks like:
"directions": "Exit the M4 at Junction 1",
"phoneNumber": "03333212001",
"website": "https://www.londoneye.com/",
"postCode": "SE1 7PB",
"categories": [
{
"id": "ce4cf4d0-6ddd-49fd-a8fe-3cbf7be9b61d",
"name": "Theater"
},
{
"id": "5fa1a3ce-fd5f-450f-92b7-2be6e3d0df90",
"name": "Family"
},
{
"id": "ed492986-b8a7-43c3-be3d-b17c4055bfa0",
"name": "Outdoors"
}
],
"genres": [],
"featuredImage": "https://www.daysoutguide.co.uk/media/1234/london-eye.jpg",
"images": [],
"region": "London",
My next query looks like:
var query2 = Query<Attraction>.Bool(
bq => bq.Filter(
fq => fq.Terms(t => t.Field(f => f.Region).Terms(request.Region.ToLower())),
fq => fq.Terms(t => t.Field(f => f.Categories).Terms(request.Category.ToLower())))
The query generated looks like:
{
"query": {
"bool": {
"filter": [
{
"terms": {
"region": [
"london"
]
}
},
{
"terms": {
"categories": [
"family"
]
}
}
]
}
}
}
That returns no results. If i take out the categories bit i get results. So i am trying to do term filter on categories which is an array of objects. Looks like I am doing this query wrong. Anyone any hints on how to get this to work?
Regards
Ismail
You can still use strongly typed properties access by using:
t.Field(f => f.Categories.First().Name)
NEST's property inferrer will reader will read over .First() and yield categories.name.
t.Field(f => f.Categories[0].Name) works as well.

Resources