ElasticSearch: Ignore frequency in multi_match query - elasticsearch

I have query like this:
"body": {
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"multi_match": {
"query": "iphone",
"fields": [
"primary.titles^2",
"primary.descriptions^1"
],
"fuzziness": 1,
"prefix_length": 5,
"max_expansions": 25,
"operator": "and"
}
}
],
"must": []
}
}
}
The problem is that docs with title:"iphone/iphone" are scoring much higher than with title:"iphone", is there any way to ignore repeating of term in search scoring?

Related

Elasticsearch wildcard search: when I add space to a query everything falls apart

I have a English dictionary Index, I search fields with the following JSON
GET /words/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"query_string": {
"default_field": "text",
"query": "*orhan*"
}
}
]
}
}
]
}
},
"track_total_hits": true
}
My query purpose is to get all records if they include orhan name, and after running the query I get the results as expected;
_id
_idex
_score
_type
text
B6F1eoQBu3ncIuw4CyKL
words
0.0
_doc
orhan
cKN5eoQBu3ncIuw4JgxK
words
0.0
_doc
vorhand
drDzzYQBu3ncIuw4vn10
words
0.0
_doc
orhan second word
I modify my query and I try the search orhan s but everything falls apart, the whole 54665 records were shown to me.
###
"bool": {
"should": [
{
"query_string": {
"default_field": "text",
"query": "*orhan s*" // <- modified
}
###
I can't add the whole response :) But I can provide,
Response of total value:
"total": {
"value": 54665,
"relation": "eq"
},
My response shouldn't include the whole record just related records shown to me
Query: "query": "*orhan s*"
Response:
_id
_idex
_score
_type
text
drDzzYQBu3ncIuw4vn10
words
0.0
_doc
orhan second word
That's because the default operator is an OR, so you are catching all the words finishing with orhan OR starting with s.
You can change the operator:
GET /words/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"query_string": {
"default_field": "text",
"query": "*orhan s*",
"default_operator": "AND"
}
}
]
}
}
]
}
},
"track_total_hits": true
}
Or add the operator to the query directly:
GET /test_words/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"query_string": {
"default_field": "text",
"query": "*orhan AND s*"
}
}
]
}
}
]
}
},
"track_total_hits": true
}

Elasticsearch Query NOT searching in the specified fields

I am struggling with an elasticsearch query. In the fields option, we have specified '*' which means it should look in all fields as well as given the higher weights to a few fields. But it isn't working as it should.
This query was written by my colleague, it'd be great if you could explain it as well as point out the solution. Here's my query:
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
}
]
}
}
Your query is fine but it will not work on field with "nested" data type.
From doc
Searching across all eligible fields does not include nested documents. Use a nested query to search those documents.
You need to use nested query
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
},
{
"nested": {
"path": "record",
"query": {
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*"
]
}
}
}
}
]
}
}
}

elasticsearch cross fields query alternative for fuzziness?

I have a cross-fields query, and I understand already that you cant use fuzziness with cross-fields queries, but I dont understand the alternative...
this is my simple query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "John Legend",
"fields": [
"fname^-4.0",
"lname^-1.0",
"city^-1.0",
],
"type": "cross_fields",
"lenient": "true",
"operator": "AND"
}
}
],
"minimum_should_match": "1"
}
},
"from": 0,
"size": 20
}
I want to be able to find:
John Legend
Joh
John Lege
is that possible?

Is it possible to use fuzziness for only one field in a multi_match query?

I am using the following multi_match query in Elasticsearch and I am wondering if I can use fuzziness only for "friendly_name field". I have tried different things but doesn't seem to work. I am also wondering if it possible to use an analyzer to get a similar result as the fuzziness does:
"query": {
"multi_match": {
"query": "input query",
"fields": ["code_short", "code_word","friendly_name"],
"minimum_should_match": "2"
} }, "_source": ["code", "friendly_name"]
Any help would be appreciated. Thanks.
If you only need query on one field , you don't need multi match
"match": {
"name": {
"query": "your query",
"fuzziness": "1.5",
"prefix_length": 0,
"max_expansions": 100,
"minimum_should_match": "80%"
}
}
I don't believe that you can fully replace fuzziness, but you have 2 options to explore that might work for you. ngram filter or stemmer filter.
======
Well it wasn't very clear to me what you've intended. But you can do your query that way:
"query": {
"bool": {
"should": [
{
"match": {
"friendly_name": {
"query": "text",
"fuzziness": "1.5",
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"match": {
"code_word": {
"query": "text"
}
}
},
{
"match": {
"code_short": {
"query": "text"
}
}
}
],
"minimum_should_match" : 2
}
}

elasticsearch boosting slowing query

this is a very novice question but I'm trying to understand how
boosting certain elements in a document works.
I started with this query,
{
"from": 0,
"size": 6,
"fields": [
"_id"
],
"sort": {
"_score": "desc",
"vendor.name.stored": "asc",
"item_name.stored": "asc"
},
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"_all"
],
"query": "Calprotectin",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"query": {
"query_string": {
"fields": [
"targeted_countries"
],
"query": "All US"
}
}
}
]
}
}
}
}
then i needed to boost certain elements in the document more than the others
so I did this
{
"from": 0,
"size": 60,
"fields": [
"_id"
],
"sort": {
"_score": "desc",
"vendor.name.stored": "asc",
"item_name.stored": "asc"
},
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"item_name^4",
"vendor^4",
"id_plus_name",
"category_name^3",
"targeted_countries",
"vendor_search_name^4",
"AdditionalProductInformation^0.5",
"AskAScientist^0.5",
"BuyNowURL^0.5",
"Concentration^0.5",
"ProductLine^0.5",
"Quantity^0.5",
"URL^0.5",
"Activity^1",
"Form^1",
"Immunogen^1",
"Isotype^1",
"Keywords^1",
"Matrix^1",
"MolecularWeight^1",
"PoreSize^1",
"Purity^1",
"References^1",
"RegulatoryStatus^1",
"Specifications/Features^1",
"Speed^1",
"Target/MoleculeDescriptor^1",
"Time^1",
"Description^2",
"Domain/Region/Terminus^2",
"Method^2",
"NCBIGeneAliases^2",
"Primary/Secondary^2",
"Source/ExpressionSystem^2",
"Target/MoleculeSynonym^2",
"Applications^3",
"Category^3",
"Conjugate/Tag/Label^3",
"Detection^3",
"GeneName^3",
"Host^3",
"ModificationType^3",
"Modifications^3",
"MoleculeName^3",
"Reactivity^3",
"Species^3",
"Target^3",
"Type^3",
"AccessionNumber^4",
"Brand/Trademark^4",
"CatalogNumber^4",
"Clone^4",
"entrezGeneID^4",
"GeneSymbol^4",
"OriginalItemName^4",
"Sequence^4",
"SwissProtID^4",
"option.AntibodyProducts^4",
"option.AntibodyRanges&Modifications^1",
"option.Applications^4",
"option.Conjugate^3",
"option.GeneID^4",
"option.HostSpecies^3",
"option.Isotype^3",
"option.Primary/Secondary^2",
"option.Reactivity^4",
"option.Search^1",
"option.TargetName^1",
"option.Type^4"
],
"query": "Calprotectin",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"query": {
"query_string": {
"fields": [
"targeted_countries"
],
"query": "All US"
}
}
}
]
}
}
}
}
the query slowed down considerably, am I doing this correctly? Is there a
way to speed it up? I'm currently in the process of doing the boosting when I index the document, but using it in the query that way is best for the way my application runs. Any help is much appreciated
Query time boosting is used for assigning larger weight to a term. If you want to permanently boost a field, use index time boosting. If you don't want to use this boosting all the time, then it makes sense to create a separate mapping just for it with store: "no" set.

Resources