elasticsearch cross fields query alternative for fuzziness? - elasticsearch

I have a cross-fields query, and I understand already that you cant use fuzziness with cross-fields queries, but I dont understand the alternative...
this is my simple query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "John Legend",
"fields": [
"fname^-4.0",
"lname^-1.0",
"city^-1.0",
],
"type": "cross_fields",
"lenient": "true",
"operator": "AND"
}
}
],
"minimum_should_match": "1"
}
},
"from": 0,
"size": 20
}
I want to be able to find:
John Legend
Joh
John Lege
is that possible?

Related

Elasticsearch Query NOT searching in the specified fields

I am struggling with an elasticsearch query. In the fields option, we have specified '*' which means it should look in all fields as well as given the higher weights to a few fields. But it isn't working as it should.
This query was written by my colleague, it'd be great if you could explain it as well as point out the solution. Here's my query:
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
}
]
}
}
Your query is fine but it will not work on field with "nested" data type.
From doc
Searching across all eligible fields does not include nested documents. Use a nested query to search those documents.
You need to use nested query
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*",
"systemNumber^5",
"global_search",
"objectType^2",
"partTypes.text",
"partTypes.id",
"gs_am_people^2",
"gs_am_person^2",
"gs_am_org^2",
"gs_title^2",
"_currentLocation.displayName",
"briefDescription",
"physicalDescription",
"summaryDescription",
"_flatPersonsNameId",
"_flatPeoplesNameId",
"_flatOrganisationsNameId",
"_primaryDate",
"_primaryDateEarliest",
"_primaryDateLatest"
]
}
},
{
"nested": {
"path": "record",
"query": {
"simple_query_string": {
"query": "Atoms for Peace",
"default_operator": "AND",
"flags": "PREFIX|PHRASE|NOT|AND|OR|FUZZY|WHITESPACE",
"fields": [
"*"
]
}
}
}
}
]
}
}
}

elastic search query result sorting

I have below query which is resulting in correct data. But looking to have sorting done, means have exact match first then rest.
{
"query": {
"query_string": {
"query": "Han* Ol*",
"fields": [
"firstName",
"lastName",
"customerNo"
],
"default_operator": "and",
"analyze_wildcard":true
}
}
}
I want to have exact match on top then others like -
Hanry Oliver,
M Hanry Oliver,
Hanry Tran Oliver
The term "exact matches" in the context of wildcard queries is ambiguous, if not meaningless. But if I understand you correctly, you could boost the "exact matches" through a phrase_prefix and by adjusting the query string to (Han).* (Ol).*:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "(Han).* (Ol).*",
"fields": [
"firstName",
"lastName",
"customerNo"
],
"default_operator": "and",
"type": "phrase_prefix",
"analyze_wildcard": true,
"boost": 10
}
},
{
"query_string": {
"query": "Han* Ol*",
"fields": [
"firstName",
"lastName",
"customerNo"
],
"default_operator": "and",
"analyze_wildcard": true
}
}
]
}
}
}

Is it possible to use fuzziness for only one field in a multi_match query?

I am using the following multi_match query in Elasticsearch and I am wondering if I can use fuzziness only for "friendly_name field". I have tried different things but doesn't seem to work. I am also wondering if it possible to use an analyzer to get a similar result as the fuzziness does:
"query": {
"multi_match": {
"query": "input query",
"fields": ["code_short", "code_word","friendly_name"],
"minimum_should_match": "2"
} }, "_source": ["code", "friendly_name"]
Any help would be appreciated. Thanks.
If you only need query on one field , you don't need multi match
"match": {
"name": {
"query": "your query",
"fuzziness": "1.5",
"prefix_length": 0,
"max_expansions": 100,
"minimum_should_match": "80%"
}
}
I don't believe that you can fully replace fuzziness, but you have 2 options to explore that might work for you. ngram filter or stemmer filter.
======
Well it wasn't very clear to me what you've intended. But you can do your query that way:
"query": {
"bool": {
"should": [
{
"match": {
"friendly_name": {
"query": "text",
"fuzziness": "1.5",
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"match": {
"code_word": {
"query": "text"
}
}
},
{
"match": {
"code_short": {
"query": "text"
}
}
}
],
"minimum_should_match" : 2
}
}

Elasticsearch Multipal queries with limit

I am trying to write an Elasticsearch query where I match multiple words in my title and description. The below code works fine but it gives all the articles matching those words. My aim is I need 4 articles per query word for e.g. 4 results of Tim Cook and four articles of Steve Jobs
{
"query": {
"multi_match": {
"query": ["Tim Cook","Steve Jobs"],
"fields": ["Title", "Description" ],
"operator":"AND"
}
}
}
Top hits aggregations are what you are looking for -
Basically give 2 filter aggregation and then nest top hits aggregation side them.
So something like below should work fine
{
"size": 0,
"query": {
"multi_match": {
"query": [
"Tim Cook",
"Steve Jobs"
],
"fields": [
"Title",
"Description"
],
"operator": "AND"
}
},
"aggs": {
"tim": {
"aggs": {
"top_hits": {}
},
"filter": {
"query": {
"multi_match": {
"query": [
"Tim Cook"
],
"fields": [
"Title",
"Description"
],
"operator": "AND"
}
}
}
},
"steve": {
"aggs": {
"top_hits": {}
},
"filter": {
"query": {
"multi_match": {
"query": [
"Steve Jobs"
],
"fields": [
"Title",
"Description"
],
"operator": "AND"
}
}
}
}
}
}

elasticsearch boosting slowing query

this is a very novice question but I'm trying to understand how
boosting certain elements in a document works.
I started with this query,
{
"from": 0,
"size": 6,
"fields": [
"_id"
],
"sort": {
"_score": "desc",
"vendor.name.stored": "asc",
"item_name.stored": "asc"
},
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"_all"
],
"query": "Calprotectin",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"query": {
"query_string": {
"fields": [
"targeted_countries"
],
"query": "All US"
}
}
}
]
}
}
}
}
then i needed to boost certain elements in the document more than the others
so I did this
{
"from": 0,
"size": 60,
"fields": [
"_id"
],
"sort": {
"_score": "desc",
"vendor.name.stored": "asc",
"item_name.stored": "asc"
},
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"item_name^4",
"vendor^4",
"id_plus_name",
"category_name^3",
"targeted_countries",
"vendor_search_name^4",
"AdditionalProductInformation^0.5",
"AskAScientist^0.5",
"BuyNowURL^0.5",
"Concentration^0.5",
"ProductLine^0.5",
"Quantity^0.5",
"URL^0.5",
"Activity^1",
"Form^1",
"Immunogen^1",
"Isotype^1",
"Keywords^1",
"Matrix^1",
"MolecularWeight^1",
"PoreSize^1",
"Purity^1",
"References^1",
"RegulatoryStatus^1",
"Specifications/Features^1",
"Speed^1",
"Target/MoleculeDescriptor^1",
"Time^1",
"Description^2",
"Domain/Region/Terminus^2",
"Method^2",
"NCBIGeneAliases^2",
"Primary/Secondary^2",
"Source/ExpressionSystem^2",
"Target/MoleculeSynonym^2",
"Applications^3",
"Category^3",
"Conjugate/Tag/Label^3",
"Detection^3",
"GeneName^3",
"Host^3",
"ModificationType^3",
"Modifications^3",
"MoleculeName^3",
"Reactivity^3",
"Species^3",
"Target^3",
"Type^3",
"AccessionNumber^4",
"Brand/Trademark^4",
"CatalogNumber^4",
"Clone^4",
"entrezGeneID^4",
"GeneSymbol^4",
"OriginalItemName^4",
"Sequence^4",
"SwissProtID^4",
"option.AntibodyProducts^4",
"option.AntibodyRanges&Modifications^1",
"option.Applications^4",
"option.Conjugate^3",
"option.GeneID^4",
"option.HostSpecies^3",
"option.Isotype^3",
"option.Primary/Secondary^2",
"option.Reactivity^4",
"option.Search^1",
"option.TargetName^1",
"option.Type^4"
],
"query": "Calprotectin",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"query": {
"query_string": {
"fields": [
"targeted_countries"
],
"query": "All US"
}
}
}
]
}
}
}
}
the query slowed down considerably, am I doing this correctly? Is there a
way to speed it up? I'm currently in the process of doing the boosting when I index the document, but using it in the query that way is best for the way my application runs. Any help is much appreciated
Query time boosting is used for assigning larger weight to a term. If you want to permanently boost a field, use index time boosting. If you don't want to use this boosting all the time, then it makes sense to create a separate mapping just for it with store: "no" set.

Resources