ElasticSearch - Phrase match on whole document? Not just one specific field - elasticsearch

Is there a way I can use elastic match_phrase on an entire document? Not just one specific field.
We want the user to be able to enter a search term with quotes, and do a phrase match anywhere in the document.
{
"size": 20,
"from": 0,
"query": {
"match_phrase": {
"my_column_name": "I want to search for this exact phrase"
}
}
}
Currently, I have only found phrase matching for specific fields. I must specify the fields to do the phrase matching within.
Our document has hundreds of fields, so I don't think its feasible to manually enter the 600+ fields into every match_phrase query. The resultant JSON would be huge.

You can use a multi-match query with type phrase that runs a match_phrase query on each field and uses the _score from the best field. See phrase and phrase_prefix.
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
This extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Adding a working example with index data, search query and search result
Index data:
{
"name":"John",
"cost":55,
"title":"Will Smith"
}
{
"name":"Will Smith",
"cost":55,
"title":"book"
}
Search Query:
{
"query": {
"multi_match": {
"query": "Will Smith",
"type": "phrase"
}
}
}
Search Result:
"hits": [
{
"_index": "64519840",
"_type": "_doc",
"_id": "1",
"_score": 1.2199391,
"_source": {
"name": "Will Smith",
"cost": 55,
"title": "book"
}
},
{
"_index": "64519840",
"_type": "_doc",
"_id": "2",
"_score": 1.2199391,
"_source": {
"name": "John",
"cost": 55,
"title": "Will Smith"
}
}
]

You can use * in match query field parameter which will search all the available field in the document. But it will reduce your query speed since you are searching the whole document

Related

ElasticSearch - Multiple query on one call (with sub limit)

I have a problem with ElasticSearch, I need you :)
Today I have an index in which I have my documents. These documents represent either Products or Categories.
The structure is this:
{
"_index": "documents-XXXX",
"_type": "_doc",
"_id": "cat-31",
"_score": 1.0,
"_source": {
"title": "Category A",
"type": "category",
"uniqId": "cat-31",
[...]
}
},
{
"_index": "documents-XXXX",
"_type": "_doc",
"_id": "prod-1",
"_score": 1.0,
"_source": {
"title": "Product 1",
"type": "product",
"uniqId": "prod-1",
[...]
}
},
What I'd like to do, in one call, is:
Have 5 documents whose type is "Product" and 2 documents whose type is "Category". Do you think it's possible?
That is, two queries in a single call with query-level limits.
Also, isn't it better to make two different indexes, one for the products, the other for the categories?
If so, I have the same question, how, in a single call, do both queries?
Thanks in advance
If product and category are different contexts I would try to separate them into different indices. Is this type used in all your queries to filter results? Ex: I want to search for the term xpto in docs with type product or do you search without applying any filter?
About your other question, you can apply two queries in a request. The Multi search API can help with this.
You would have two answers one for each query.
GET my-index-000001/_msearch
{ }
{"query": { "term": { "type": { "value": "product" } }}}
{"index": "my-index-000001"}
{"query": { "term": { "type": { "value": "category" } }}}

Is there a difference between "match" and "simple_query_string" if no special characters?

Elastic Search 7.9
I'm searching a single field with a textbox exposed to users through a web UI.
{
match: {
body: {
query: 'beer pretzels',
}
}
}
I'm debating whether to use simple_query_string instead.
{
simple_query_string: {
query: 'beer pretzels',
}
}
My initial thought was to switch to simple_query_string if I detect special characters in the keywords. But now I wonder why I'd use match at all.
My questions:
Are there any differences between match and simple_query_string for the simple case where the keywords contains no special characters?
Any reason why I would not use simple_query_string all the time?
Simple Query string returns documents based on a provided query
string, using a parser with a limited but fault-tolerant syntax.
Refer this to get a detailed explanation, which states that :
The simple_query_string query is a version of the query_string query
that is more suitable for use in a single search box that is exposed
to users because it replaces the use of AND/OR/NOT with +/|/-,
respectively, and it discards invalid parts of a query instead of
throwing an exception if a user makes a mistake.
It supports Lucene syntax to interpret the text, you can refer this article that gives detailed information about how simple query string works.
Match Query returns documents that match a provided text, number, date
or boolean value. The provided text is analyzed before matching.
Refer to this ES documentation part and this blog, to understand how the match query works
I have tried to run this below search query using both simple query string and match query:
Index Data
{
"content":"foo bar -baz"
}
Search Query using simple query string:
{
"query": {
"simple_query_string": {
"fields": [ "content" ],
"query": "foo bar -baz"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_63937563",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642, <-- note this
"_source": {
"content": "foo bar -baz"
}
}
]
Search Query using match query:
{
"query": {
"match": {
"content": {
"query": "foo bar -baz"
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_63937563",
"_type": "_doc",
"_id": "1",
"_score": 0.8630463, <-- note this
"_source": {
"content": "foo bar -baz"
}
}
]
Please refer this SO answer that explains the difference between multi_match and query_string

Is there any way to match similar match in Elastic Search

I have a elastic search big document
I am searching with below query
{"size": 1000, "query": {"query_string": {"query": "( string1 )"}}}
Let say my string1 = Product, If some one accident type prduct some one forgot to o
Is there any way to search for that also
{"size": 1000, "query": {"query_string": {"query": "( prdct )"}}} also has to return result of prdct + product
You can use fuzzy query that returns documents that contain terms similar to the search term. Refer this blog to get detailed explanation of fuzzy queries.
Since,you have more edit distance to match prdct. Fuzziness parameter can be defined as :
0, 1, 2
0..2 = Must match exactly
3..5 = One edit allowed
More than 5 = Two edits allowed
Index Data:
{
"title":"product"
}
{
"title":"prdct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prdct",
"fuzziness":15,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "my-index1",
"_type": "_doc",
"_id": "2",
"_score": 3.465736,
"_source": {
"title": "prdct"
}
},
{
"_index": "my-index1",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
}
]
There are many solutions to this problem:
Suggestions (did you mean X instead).
Fuzziness (edits from your original search term).
Partial matching with autocomplete (if someone types "pr" and you provide the available search terms, they can click on the correct results right away) or n-grams (matching groups of letters).
All of those have tradeoffs in index / search overhead as well as the classic precision / recall problem.

Nested attribute term Query

I have a documents something like bellow
{
"_index": "lines",
"_type": "lineitems",
"_id": "4002_11",
"_score": 2.6288738,
"_source": {
"data": {
"type": "Shirt"
}
}
}
I want to get a count based on type attribute value. Any suggestion on this?
I tried term query but no lick with that.
You should use the terms aggregation, this will return the number of documents aggregated for each "type" field values.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

Unexpected Match query scoring on a FirstMiddleLast field

I am using a match query to search a fullName field which contains names in (first [middle] last) format. I have two documents, one with "Brady Holt" as the fullName and the other as "Brad von Holdt". When I search for "brady holt", the document with "Brad von Holdt" is scored higher than the document with "Brady Holt" even though it is an exact match. I would expect the document with "Brady Holt" to have the highest score. I am guessing it has something to do with the 'von' middle name causing the score to be higher?
These are my documents:
[
{
"id": 509631,
"fullName": "Brad von Holdt"
},
{
"id": 55425,
"fullName": "Brady Holt"
}
]
This is my query:
{
"query": {
"match": {
"fullName": {
"query": "brady holt",
"fuzziness": 1.0,
"prefix_length": 3,
"operator": "and"
}
}
}
}
This is the query result:
"hits": [
{
"_index": "demo",
"_type": "person",
"_id": "509631",
"_score": 2.4942014,
"_source": {
"id": 509631,
"fullName": "Brad von Holdt"
}
},
{
"_index": "demo",
"_type": "person",
"_id": "55425",
"_score": 2.1395948,
"_source": {
"id": 55425,
"fullName": "Brady Holt"
}
}
]
A good read on how Elasticsearch does scoring, and how to manipulate relevancy, can be found in the Elasticsearch Guide: What is Relevance?. In particular, you may want to experiment with the explain functionality of a search query.
The shortest answer for you here is that the score of a hit is the product of its best-matching term according to a TF/IDF calculation. The number of matching terms will affect which documents are matched, but it's the "best" term that determine's a document's score. Your query doesn't have an "exact" match, per se: it has multiple matching terms, the scores of which are calculated independently.
Tuning relevancy can be a bit of a subtle art, and depends a lot on how the fields are being analyzed, the overall frequency distributions of various terms, the queries you're running, and even how you're sharding and distributing the index within a cluster (different shards will have different term frequencies).
(It may also be relevant, so to speak, that your example has two spellings of "Holt" and "Holdt".)
In any case, getting familiar with explain functionality and the underlying scoring mechanics is a helpful next step for you here.
Also, if you want an exact phrase match, you should read the ES guide on Phrase Matching.

Resources