"match_phrase" hit with no highlights returned - elasticsearch

I have an index that includes the full text of different books belonging to a specific series. Each document represents a different volume in a series, and each volume has a set of nested documents on it corresponding to a section of text in that book. This is the query we are using in order to get highlights matching a specific phrase within all the books of a given series:
{
"from": 0,
"size": 3,
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"sections.content.phrase": {
"query": "theory legal",
"type": "phrase",
"slop": X
}
}
}
]
}
},
"path": "sections",
"inner_hits": {
"highlight": {
"order": "score",
"fields": {
"sections.content.phrase": {}
}
},
"_source": {
"include": [
"title",
"id"
]
}
}
}
}
],
"filter": [
{
"term": {
"series": "00410"
}
}
]
}
}
}
Normally this query works fine, but for some series we can get hits in books with no highlighted text returned. For example with the above phrase query, series, and a slop value of 1 we correctly get a single hit for one book in the series: (each allegation of discrimination or each <em>theory</em> of <em>legal</em> recovery not required to be set forth in separate. If we take the same query and up the slop value to 3 we suddenly get hits in 5 different books each with no matching highlights found. Not even the original hit from when the slop value was 1 is returned. Why are we getting these results?

Related

Elasticsearch: alternative to cross_fields with fuzziness

I have an elasticsearch index with the standard analyzer. I would like to perfom search queries containing multiple words, e.g. human anatomy. This search should be performed across several fields:
Title
Subject
Description
All the words in the query should be present in any of the fields (e.g. 'human' in title and 'anatomy' in description, etc.). If not all the words are present across these fields, the result shouldn't be returned.
Now, more importantly, I want to get fuzzy matches (for example, these queries should return approximately the same results as human anatomy:
human anatom
human anatomic
humanic anatomic
etc.
So fuzziness should apply to every word in the query.
As Elasticsearch doesn't support fuzziness for the multi-match cross-fields queries, I have been trying to achieve the desired behaviour this way:
{
"query": {
"bool" : {
"must": [
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "human",
"fuzziness": 2,
}
}
},
]
}
}
},
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
]
}
}
},
]
}
}
}
The idea behind this code is the following: find the results where
either of the fields contains human (with 2-letter edit distance, e.g.: humane, humon, humanic, etc.)
and
either of the fields contains anatomy (with 2-letter edit distance, e.g.: anatom, anatomic, etc.).
Unfortunately, this code does not work and fails to retrieve a great number of relevant results. For example (the edit distance between each of the words in the two queries <= 2):
human anatomic – 0 results
humans anatomy – 21 results
How can I make fuzziness work within the given conditions? Recreating the index with n-gram is currently not an option, so I would like to make fuzziness work.

prefixQuery in Elastic search not working

{
"from": 0,
"size": 100,
"timeout": "10m",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"input.custom_attrs.index": {
"value": "1",
"boost": 1
}
}
}
]
}
},
{
"bool": {
"must": [
{
"prefix": {
"input.custom_attrs.value": {
"value": "An*",
"boost": 1
}
}
}
]
}
}
]
}
}
]
}
}
]
}
}
}
Explanation -
I want to search the field with "An" as prefix .
Also i am sure that there is data with value "Annual" and "Annual Fund" ,which should appears in all match search .
But these records are not appearing with prefix query as given above.I tried with regexp query and wildcard query too .But they are also not working .Please give your valuable suggestions how to make the query working.
Possible causes why it's not working
look like while indexing data you used the default mapping or text field, which uses the default standard analyzer which converts the generated tokens to lowercase.
While prefix queries are not analyzed and search term doesn't go through any analyzer and will not be lowercased.
In your case, you are searching for An, note capital A, while for Annual and Annual fund, tokens would be annual and annual and fund, hence its not matching.
Solution:
Please use an as your prefix query and you should get your search results.

Query on multiple range of document

What I want to search is to extract documents among certain range of documents, not the whole documents. I know ids of documents. For example, I want to query matching some sentences with query field - 'pLabel' among the documents ids of which I know via different process. My trial is as below but I got bunch of documents which is different with my expectation.
For example, in such documents as eid1, eid2...etc groups, I want to query filtering out the matching documents out of the groups (eid1, eid2, eid3, ...). Query is shown as below.
How I fix query statement to get the right search result?
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "pLabel" ,
"query": "search words here"
}
}
] ,
"must_not": [] ,
"should": [
{
"term": {
"eid": "eid1"
}
} ,
{
"term": {
"eid": "eid2"
}
}
]
}
} ,
"size": 0 ,
"_source": [
"eid"
] ,
"aggs": {
"eids": {
"terms": {
"field": "eid" ,
"size": 1000
}
}
}
}
You need to move the should clause of the Doc IDs inside the must clause.
Right now the query can return any document that matches the query_string clause, it'll only prefer docs that matches the Doc IDs.
Also, you should use terms query
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "pLabel",
"query": "search words here"
}
},
{
"terms": {
"user": ["eid1", "eid2"]
}
}
]
}
},
"size": 0,
"_source": [
"eid"
],
"aggs": {
"eids": {
"terms": {
"field": "eid",
"size": 1000
}
}
}
}

Elasticsearch Remove duplicate results if greater than some value

I have news articles form multiple sources saved and each source have different category I need to write a query which will reverse time sort the article in chunks of 15 at a time also I don't need more than 3 articles from a particular source I am using the below query but the results are wrong can any one tell me what am I doing wrong.
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"category": "Digital"
}
},
{
"match_phrase": {
"type": "Local"
}
}
]
}
},
"collapse": {
"field": "source.keyword",
"max_concurrent_group_searches": 3
},
"sort": [
{
"pub_date": {
"order": "desc"
}
}
]
}

Minimum should match on filtered query

Is it possible to have a query like this
"query": {
"filtered": {
"filter": {
"terms": {
"names": [
"Anna",
"Mark",
"Joe"
],
"execution" : "and"
}
}
}
}
With the "minimum_should_match": "2" statement?
I know that I can use a simple query (I've tried, it works) but I don't need the score to be computed. My goal is just to filter documents which contains 2 of the values.
Does the score generally heavily impact the time needed to retrieves document?
Using this query:
"query": {
"filtered": {
"filter": {
"terms": {
"names": [
"Anna",
"Mark",
"Joe"
],
"execution" : "and",
"minimum_should_match": "2"
}
}
}
}
I got this error:
QueryParsingException[[my_db] [terms] filter does not support [minimum_should_match]]
Minimum should match is not a parameter for the terms filter. If that is the functionality you are looking for, I might rewrite your query like this, to use the bool query wrapped in a query filter:
{
"filter": {
"query": {
"bool": {
"should": [
{
"term": {
"names": "Anna"
}
},
{
"term": {
"names": "Mark"
}
},
{
"term": {
"name": "Joe"
}
}
],
"minimum_should_match": 2
}
}
}
}
You will get documents matching preferably exactly all three, but the query will also match document with exactly two of the three terms. The must is an implicit and. We also do not compute score, as we have executed the query as a filter.

Resources