elasticsearch boosting slowing query - elasticsearch

this is a very novice question but I'm trying to understand how
boosting certain elements in a document works.
I started with this query,
{
"from": 0,
"size": 6,
"fields": [
"_id"
],
"sort": {
"_score": "desc",
"vendor.name.stored": "asc",
"item_name.stored": "asc"
},
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"_all"
],
"query": "Calprotectin",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"query": {
"query_string": {
"fields": [
"targeted_countries"
],
"query": "All US"
}
}
}
]
}
}
}
}
then i needed to boost certain elements in the document more than the others
so I did this
{
"from": 0,
"size": 60,
"fields": [
"_id"
],
"sort": {
"_score": "desc",
"vendor.name.stored": "asc",
"item_name.stored": "asc"
},
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"item_name^4",
"vendor^4",
"id_plus_name",
"category_name^3",
"targeted_countries",
"vendor_search_name^4",
"AdditionalProductInformation^0.5",
"AskAScientist^0.5",
"BuyNowURL^0.5",
"Concentration^0.5",
"ProductLine^0.5",
"Quantity^0.5",
"URL^0.5",
"Activity^1",
"Form^1",
"Immunogen^1",
"Isotype^1",
"Keywords^1",
"Matrix^1",
"MolecularWeight^1",
"PoreSize^1",
"Purity^1",
"References^1",
"RegulatoryStatus^1",
"Specifications/Features^1",
"Speed^1",
"Target/MoleculeDescriptor^1",
"Time^1",
"Description^2",
"Domain/Region/Terminus^2",
"Method^2",
"NCBIGeneAliases^2",
"Primary/Secondary^2",
"Source/ExpressionSystem^2",
"Target/MoleculeSynonym^2",
"Applications^3",
"Category^3",
"Conjugate/Tag/Label^3",
"Detection^3",
"GeneName^3",
"Host^3",
"ModificationType^3",
"Modifications^3",
"MoleculeName^3",
"Reactivity^3",
"Species^3",
"Target^3",
"Type^3",
"AccessionNumber^4",
"Brand/Trademark^4",
"CatalogNumber^4",
"Clone^4",
"entrezGeneID^4",
"GeneSymbol^4",
"OriginalItemName^4",
"Sequence^4",
"SwissProtID^4",
"option.AntibodyProducts^4",
"option.AntibodyRanges&Modifications^1",
"option.Applications^4",
"option.Conjugate^3",
"option.GeneID^4",
"option.HostSpecies^3",
"option.Isotype^3",
"option.Primary/Secondary^2",
"option.Reactivity^4",
"option.Search^1",
"option.TargetName^1",
"option.Type^4"
],
"query": "Calprotectin",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"query": {
"query_string": {
"fields": [
"targeted_countries"
],
"query": "All US"
}
}
}
]
}
}
}
}
the query slowed down considerably, am I doing this correctly? Is there a
way to speed it up? I'm currently in the process of doing the boosting when I index the document, but using it in the query that way is best for the way my application runs. Any help is much appreciated

Query time boosting is used for assigning larger weight to a term. If you want to permanently boost a field, use index time boosting. If you don't want to use this boosting all the time, then it makes sense to create a separate mapping just for it with store: "no" set.

Related

Why does Elasticsearch score these documents the way it does?

I have a query where I'm trying pull documents out of my index and sort them by a date. Additionally, if the document's ID matches a provided one then I boost that result.
When I run my query I'm noticing that some of the documents with a more recent sort date are not at the top of the results because Elasticsearch is giving them a different score than other documents. As a result my result order is incorrect. I don't see anything in my query that could be affecting the score. Anyone have any idea what's happening?
Here's the query I'm using:
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"language.keyword": {
"query": "english",
"operator": "OR",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"functions": [
{
"filter": {
"match": {
"id": {
"query": "ID1",
"operator": "OR",
"boost": 1
}
}
},
"weight": 10
}
],
"score_mode": "multiply",
"boost_mode": "multiply",
"boost": 1
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"sortDate": {
"order": "desc"
}
}
]
}

Elasticsearch Query NOT searching in the specified fields

I am struggling with an elasticsearch query. In the fields option, we have specified '*' which means it should look in all fields as well as given the higher weights to a few fields. But it isn't working as it should.
This query was written by my colleague, it'd be great if you could explain it as well as point out the solution. Here's my query:
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
}
]
}
}
Your query is fine but it will not work on field with "nested" data type.
From doc
Searching across all eligible fields does not include nested documents. Use a nested query to search those documents.
You need to use nested query
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
},
{
"nested": {
"path": "record",
"query": {
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*"
]
}
}
}
}
]
}
}
}

Query String in Elasticsearch 6.4.3

I want to know the difference between these queries
{
"size": "1",
"from": "0",
"track_scores": true,
"sort": [
{
"employee_id": "asc"
}
],
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"content",
"title"
],
"query": "\"Macro Medium\""
}
}
}
When compared to this code
{
"size": "1",
"from": "0",
"track_scores": true,
"sort": [
{
"employee_id": "asc"
}
],
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"content",
"title"
],
"query": "Macro Medium"
}
}
I want to know the difference between "query": "\"Macro Medium\"" and "query": "Macro Medium" in Elasticsearch 6.4.3. Any feedback would be appreciated.
Thanks
query string according to your analyzer (default is standard analyzer), analyze your string and break it in two word (Macro, Medium). then as default, use this two word on should term query (OR). also you can change it to AND ("default_operator" :"AND").
with \"phrase\" you force the elastic to not break the string.

Need Some Help in Query String in ElasticSearch 6.4.3

Suppose I want to count the number of matching results
POST /_count
the following are the bodyJSON
{
"size": "1",
"from": "0",
"track_scores": true,
"sort": [
{
"employee_id": "asc"
}
],
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"content",
"title"
],
"query": "Winter is coming"
}
},
"filter": {
"range": {
"employee_id": {
"gte": "34222232"
}
}
}
}
}
}
Do you know what the code means in the following code?
"query_string": {
"fields": [
"content",
"title"
],
"query": "Winter is coming"
}
and this one
"filter": {
"range": {
"employee_id": {
"gte": "34222232"
}
}
}
Any comment would be appreciated. Thanks
The query_string query helps you find some text in multiple fields. In this case, you're searching for the tokens Winter is coming in the content and title fields.
"query_string": {
"fields": [
"content",
"title"
],
"query": "Winter is coming"
}
The range query is a term query that allows you to filter on the value of some field. In this case, you're considering only documents whose employee_id field is greater or equal (i.e. gte) than 34222232
"filter": {
"range": {
"employee_id": {
"gte": "34222232"
}
}
}
Both together mean that you're looking to find documents with employee_id > 34222232 and whose title or content fields contain the tokens Winter is coming

Elasticsearch Multipal queries with limit

I am trying to write an Elasticsearch query where I match multiple words in my title and description. The below code works fine but it gives all the articles matching those words. My aim is I need 4 articles per query word for e.g. 4 results of Tim Cook and four articles of Steve Jobs
{
"query": {
"multi_match": {
"query": ["Tim Cook","Steve Jobs"],
"fields": ["Title", "Description" ],
"operator":"AND"
}
}
}
Top hits aggregations are what you are looking for -
Basically give 2 filter aggregation and then nest top hits aggregation side them.
So something like below should work fine
{
"size": 0,
"query": {
"multi_match": {
"query": [
"Tim Cook",
"Steve Jobs"
],
"fields": [
"Title",
"Description"
],
"operator": "AND"
}
},
"aggs": {
"tim": {
"aggs": {
"top_hits": {}
},
"filter": {
"query": {
"multi_match": {
"query": [
"Tim Cook"
],
"fields": [
"Title",
"Description"
],
"operator": "AND"
}
}
}
},
"steve": {
"aggs": {
"top_hits": {}
},
"filter": {
"query": {
"multi_match": {
"query": [
"Steve Jobs"
],
"fields": [
"Title",
"Description"
],
"operator": "AND"
}
}
}
}
}
}

Resources