Is there any way to match similar match in Elastic Search - elasticsearch

I have a elastic search big document
I am searching with below query
{"size": 1000, "query": {"query_string": {"query": "( string1 )"}}}
Let say my string1 = Product, If some one accident type prduct some one forgot to o
Is there any way to search for that also
{"size": 1000, "query": {"query_string": {"query": "( prdct )"}}} also has to return result of prdct + product

You can use fuzzy query that returns documents that contain terms similar to the search term. Refer this blog to get detailed explanation of fuzzy queries.
Since,you have more edit distance to match prdct. Fuzziness parameter can be defined as :
0, 1, 2
0..2 = Must match exactly
3..5 = One edit allowed
More than 5 = Two edits allowed
Index Data:
{
"title":"product"
}
{
"title":"prdct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prdct",
"fuzziness":15,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "my-index1",
"_type": "_doc",
"_id": "2",
"_score": 3.465736,
"_source": {
"title": "prdct"
}
},
{
"_index": "my-index1",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
}
]

There are many solutions to this problem:
Suggestions (did you mean X instead).
Fuzziness (edits from your original search term).
Partial matching with autocomplete (if someone types "pr" and you provide the available search terms, they can click on the correct results right away) or n-grams (matching groups of letters).
All of those have tradeoffs in index / search overhead as well as the classic precision / recall problem.

Related

ElasticSearch - Multiple query on one call (with sub limit)

I have a problem with ElasticSearch, I need you :)
Today I have an index in which I have my documents. These documents represent either Products or Categories.
The structure is this:
{
"_index": "documents-XXXX",
"_type": "_doc",
"_id": "cat-31",
"_score": 1.0,
"_source": {
"title": "Category A",
"type": "category",
"uniqId": "cat-31",
[...]
}
},
{
"_index": "documents-XXXX",
"_type": "_doc",
"_id": "prod-1",
"_score": 1.0,
"_source": {
"title": "Product 1",
"type": "product",
"uniqId": "prod-1",
[...]
}
},
What I'd like to do, in one call, is:
Have 5 documents whose type is "Product" and 2 documents whose type is "Category". Do you think it's possible?
That is, two queries in a single call with query-level limits.
Also, isn't it better to make two different indexes, one for the products, the other for the categories?
If so, I have the same question, how, in a single call, do both queries?
Thanks in advance
If product and category are different contexts I would try to separate them into different indices. Is this type used in all your queries to filter results? Ex: I want to search for the term xpto in docs with type product or do you search without applying any filter?
About your other question, you can apply two queries in a request. The Multi search API can help with this.
You would have two answers one for each query.
GET my-index-000001/_msearch
{ }
{"query": { "term": { "type": { "value": "product" } }}}
{"index": "my-index-000001"}
{"query": { "term": { "type": { "value": "category" } }}}

ElasticSearch - Phrase match on whole document? Not just one specific field

Is there a way I can use elastic match_phrase on an entire document? Not just one specific field.
We want the user to be able to enter a search term with quotes, and do a phrase match anywhere in the document.
{
"size": 20,
"from": 0,
"query": {
"match_phrase": {
"my_column_name": "I want to search for this exact phrase"
}
}
}
Currently, I have only found phrase matching for specific fields. I must specify the fields to do the phrase matching within.
Our document has hundreds of fields, so I don't think its feasible to manually enter the 600+ fields into every match_phrase query. The resultant JSON would be huge.
You can use a multi-match query with type phrase that runs a match_phrase query on each field and uses the _score from the best field. See phrase and phrase_prefix.
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
This extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Adding a working example with index data, search query and search result
Index data:
{
"name":"John",
"cost":55,
"title":"Will Smith"
}
{
"name":"Will Smith",
"cost":55,
"title":"book"
}
Search Query:
{
"query": {
"multi_match": {
"query": "Will Smith",
"type": "phrase"
}
}
}
Search Result:
"hits": [
{
"_index": "64519840",
"_type": "_doc",
"_id": "1",
"_score": 1.2199391,
"_source": {
"name": "Will Smith",
"cost": 55,
"title": "book"
}
},
{
"_index": "64519840",
"_type": "_doc",
"_id": "2",
"_score": 1.2199391,
"_source": {
"name": "John",
"cost": 55,
"title": "Will Smith"
}
}
]
You can use * in match query field parameter which will search all the available field in the document. But it will reduce your query speed since you are searching the whole document

SuggestionBuilder with BoolQueryBuilder in Elasticsearch

I am currently using BoolQueryBuilder to build a text search. I am having an issue with wrong spellings. When someone searches for a "chiar" instead of "chair" I have to show them some suggestions.
I have gone through the documentation and observed that the SuggestionBuilder is useful to get the suggestions.
Can I send all the requests in a single query, so that I can show the suggestions if the result is zero?
No need to send different search terms ie chair, chiar to get suggestions, it's not efficient and performant and you don't know all the combinations which user might misspell.
Instead, Use the fuzzy query or fuzziness param in the match query itself, which can be used in the bool query.
Let me show you an example, using the match query with the fuzziness parameter.
index def
{
"mappings": {
"properties": {
"product": {
"type": "text"
}
}
}
}
Index sample doc
{
"product" : "chair"
}
Search query with wrong term chiar
{
"query": {
"match" : {
"product" : {
"query" : "chiar",
"fuzziness" : "4" --> control it according to your application
}
}
}
}
Search result
"hits": [
{
"_index": "so_fuzzy",
"_type": "_doc",
"_id": "1",
"_score": 0.23014566,
"_source": {
"product": "chair"
}
}

Find most similar documents in Elasticsearch

How do I find the top 100 most similar documents between two indices in Elasticsearch?
Document #1 is in index1, type11, field111.
Document #2 is in index2, type21, field211
Edit: Both fields are strings.
I looked at the documentation for More Like This query. But it doesn't tell me how I can quickly compare the results for different kinds of similarity metrics and look at the top results.
Try this query, but substitute the id values for your documents:
GET index1,index2/_search
{
"query": {
"more_like_this": {
"fields": [
"field111",
"field211"
],
"like": [
{
"_index": "index1",
"_id": "DOC_1_ID"
},
{
"_index": "index2",
"_id": "DOC_2_ID"
}
],
"min_term_freq": 1,
"max_query_terms": 12
}
}
}

Elasticsearch match phrase prefix not matching all terms

I am having an issue where when I use the match_phrase_prefix query in Elasticsearch, it is not returning all the results I would expect it to, particularly when the query is one word followed by one letter.
Take this index mapping (this is a contrived example to protect sensitive data):
http://localhost:9200/test/drinks/_mapping
returns:
{
"test": {
"mappings": {
"drinks": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
}
And amongst millions of other records are these:
{
"_index": "test",
"_type": "drinks",
"_id": "2",
"_score": 1,
"_source": {
"name": "Johnnie Walker Black Label"
}
},
{
"_index": "test",
"_type": "drinks",
"_id": "1",
"_score": 1,
"_source": {
"name": "Johnnie Walker Blue Label"
}
}
The following query, which is one word followed by two letters:
POST http://localhost:9200/test/drinks/_search
{
"query": {
"match_phrase_prefix" : {
"name" : "Walker Bl"
}
}
}
returns this:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5753642,
"hits": [
{
"_index": "test",
"_type": "drinks",
"_id": "2",
"_score": 0.5753642,
"_source": {
"name": "Johnnie Walker Black Label"
}
},
{
"_index": "test",
"_type": "drinks",
"_id": "1",
"_score": 0.5753642,
"_source": {
"name": "Johnnie Walker Blue Label"
}
}
]
}
}
Whereas this query with one word and one letter:
POST http://localhost:9200/test/drinks/_search
{
"query": {
"match_phrase_prefix" : {
"name" : "Walker B"
}
}
}
returns no results. What could be happening here?
I will assume that you are working with Elasticsearch 5.0 and above.
I think it might have to be because of the max_expansions default value.
As seen in the documentation here, the max_expansions parameters is used to control how many prefixes the last term will be expanded with. The default value is 50 and it might explain why you find "black" and "blue" with the two first letters B and L, but not with the B only.
The documentation is pretty clear about it:
The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which let’s you get started quickly with search-as-you-type but it’s results, which usually are good enough, can sometimes be confusing.
Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.
The problem is that the first 50 terms may not include the term fox so the phase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears
I wouldn't be able to tell you if it's ok to increase this parameter above 50 if you are looking for good performances since I never tried myself.

Resources