Elasticsearch Partial Phrase Match

Elasticsearch Partial Phrase Match - elasticsearch

In Elasticsearch, I would like to match the record "John Oxford" when searching "John Ox". I'm currently using a match_phrase_prefix as such:
{
"query": {
"match_phrase_prefix":{
"SearchName": {
"query": "John Ox"
}
}
}
}
I know this doesn't work because, as the docs state:
While easy to set up, using the match_phrase_prefix query for search autocompletion can sometimes produce confusing results.
For example, consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.
The problem is that the first 50 terms may not include the term fox so the phrase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears.
For better solutions for search-as-you-type see the completion suggester and the search_as_you_type field type.
Is there another way to achieve this, then, without changing the way the data is stored in ES?

You can use match bool prefix query. Adding a working example with index data and search query
Index Data:
{
"name":"John Oxford"
}
Search Query:
{
"query": {
"match_bool_prefix" : {
"name" : "John Ox"
}
}
}
Search Result:
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.287682,
"_source" : {
"name" : "John Oxford"
}
}
]

Related

Change compound token default behaviour in lucene/elasticsearch

Lucene/elasticsearch provide a possibility of compound tokens / subtokens. This is an important feature for e.g. German with composed words. The default behaviour of lucene is to combine the subtokens with an OR in order to not hurt recall and exclude documents from being returned. In specific situations, however, the opposite is required.
Assume that I want to index the following two documents:
Document 1:
PUT /idxwith/_doc/1
{
"name": "stockfisch"
}
Document 2:
PUT /idxwith/_doc/2
{
"name" : "laufstock"
}
Where the words will be decomposed as follows:
stockfisch ==> stock, fisch
laufstock ==> lauf, stock
Now with the following search query:
POST /idxwith/_search
{
"query": {
"match": {
"name": {
"query": "stockfisch"
}
}
}
}
I'd expect only the first document to be returned - which is not the case. As the subtokens are combined with OR, both documents will be returned (hurting the precision of my search):
"hits" : [
{
"_index" : "idxwith",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.3287766,
"_source" : {
"name" : "stockfisch"
}
},
{
"_index" : "idxwith",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.241631,
"_source" : {
"name" : "laufstock"
}
}
]
I'm looking for hints on how to adapt lucene (or elastic) to make this behaviour configurable, i.e. to be able to define that subtokens are combined with an AND if necessary.
Thanks!

To solve this problem you can use matchphrase query like this:
POST /idxwith/_search
{
"query": {
"match_phrase": {
"name": {
"query": "stockfisch"
}
}
}
}
A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. for more info about MatchPhrase check here.
It is also possbile to use Operator in match query that it means all terms should be in term, more info here.
In your specific case I think Match_Phrase is a much better option since the order of terms are important.

How do I combine different indexes in a "more like this" query?

The docs of the MLT query give following example (abbreviated by me) to retrieve a document similar to an existing document:
"query": {
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "imdb",
"_id" : "1"
}],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
Which seems to compare the "title" and "description" fields among movie titles to the one movie with ID 1. Suppose I have an index for people's comments though and I would like to get all movie titles which have a "title" or "description" similar to one particular comment.
I know that I could provide free text as a value for the "like" field - the document (comment) is already part of another index though, so I would like to use that one. Just not based on the "title" and "description" fields (which would not exist on a comment), but let's say its "body" field. How would I do that?

You can add the same alias on both indexes : https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
and run the query against the alias
note: this will cause a higher load on your elastic cluster.

Search part of string with ElasticSeach

In ElasticSearch, I tried to makeit search for a part of a word.
My example data is
{
"_id" : "1",
"name" : "Nopales",
}
{
"_id" : "2",
"name" : "ginger ale",
}
{
"_id" : "3",
"name" : "Kale",
}
{
"_id" : "4",
"name" : "Triticale Flour Whole Grain",
}
and the request is to find the matching words with the word "ale is"
GET /_search
{
"query": {
"query_string" : {"default_field" : "name", "query" : "*ale*"}
}
}
and it returns the results like
Kale, Triticale Flour Whole Grain, ginger ale, Nopales
but the best match is ginger ale, which contains the exact word, then kale, then nopales based on word counts then Triticale Flour Whole Grain.
Anyone have idea how to achieve this?

I think you should use *ale . In your query it searching for ale in all values of name field. And these all values Kale, Triticale Flour Whole Grain, ginger ale, Nopales contains ale. When you will query with *ale, It will search for string which ending with ale. You can also use like this "query" : "(*ale)OR(ale*)" for different kind of wildcards.

Given a Document ID find the matching Document in Elasticsearch

I have indexed some articles in the Elasticsearch. Now suppose a user likes an article now i want to recommend some matching article to him. Assuming articles are precise and well written to the point. All articles are of same type.
I know it is like getting all the tokens related to that article and searching all other article on them. Is there anything in elastic search which does this for me...?
Or any other way of doing this..?

You can use More Like This Query:
From the doc it selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results. 
Usage:
{
"query": {
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "your index",
"_type" : "articles",
"_id" : "1" # your document id
}
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}

Similar searching with elastic search

If I have a table, which contains a lot of persons. Each person will have their own attributes such as name, social id, age, sex, number of children...
Given a person A which is 40 years old male, have 2 children.. Provide me all persons that is similar to person A.
Is this something I can do with Elastic search? I'm thinking about More Like This query https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html
Thank you very much.

Yes MLT will work properly, you can specify the fields where you want to apply the more like this query, like for in your case, it would be age, number of children. Any other specific field through which you want to match. Here is the exmaple query-
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["age", "number_of_children"],
"like" :
{
"_index" : "people",
"_type" : "person",
"_id" : "1"
},
"min_term_freq" : 1
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch Partial Phrase Match - elasticsearch

Related

Change compound token default behaviour in lucene/elasticsearch

How do I combine different indexes in a "more like this" query?

Search part of string with ElasticSeach

Given a Document ID find the matching Document in Elasticsearch

Similar searching with elastic search

Categories

Resources