ElasticSearch match a word without compound words - elasticsearch

I have to search on Elastic a full word like home but not compound words like homeland.
Using this query:
{
"query": {
"match": {
"text": {
"query": term,
"minimum_should_match": "100%"
}
}
}
}
}
will both match home and homeland or hometown etc. How to restrict the match to the input word token only?
NOTE. The word maybe in a different language, so I cannot specify something like a english analyzer, because I do not know the language of the word a priori.

Related

Elatisearch match_phrase_prefix query, with exact prefix match

I have a match_phrase_prefix query, which works as expected. But when the users passes any special characters at the end of the keyword, ES ignores these characters, and still returns the result.
query{ match_phrase_prefix:{ content: { query: searchTerm } } }
I am using this query to search for prefix. If i pass a term like overflow####!! ES is returning me all the results with the word overflow in it. But instead i want to make an exact prefix match, where the special characters are not ignored. The search term could be of multiple words as well stack overflow search.
How could i make ES search of prefix_match without ignoring the special_chars.
You can use keyword analyzer when defining your query.
{
"query": {
"match_phrase_prefix": {
"content": {
"query": "overflow####!!",
"analyzer": "keyword"
}
}
}
}

How can I achieve this type of queries in ElasticSearch?

I have added a document like this to my index
POST /analyzer3/books
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
And then I do queries like this
GET /analyzer3/_analyze
{
"analyzer": "english",
"text": "\"The * day I went with my * to the\""
}
And it successfully returns the previously added document.
My idea is to have quotes so that the query becomes exact, but also wildcards that can replace any word. Google has this exact functionality, where you can search queries like this, for instance "I'm * the university" and it will return page results that contain texts like I'm studying in the university right now, etc.
However I want to know if there's another way to do this.
My main concern is that this doesn't seem to work with other languages like Japanese and Chinese. I've tried with many analyzers and tokenizers to no avail.
Any answer is appreciated.
Exact matches on the tokenized fields are not that straightforward. Better save your field as keyword if you have such requirements.
Additionally, keyword data type support wildcard query which can help you in your wildcard searches.
So just create a keyword type subfield. Then use the wildcard query on it.
Your search query will look something like below:
GET /_search
{
"query": {
"wildcard" : {
"title.keyword" : "The * day I went with my * to the"
}
}
}
In the above query, it is assumed that title field has a sub-field named keyword of data type keyword.
More on wildcard query can be found here.
If you still want to do exact searches on text data type, then read this
Elasticsearch doesn't have Google like search out of the box, but you can build something similar.
Let's assume when someone quotes a search text what they want is a match phrase query. Basically remove the \" and search for the remaining string as a phrase.
PUT test/_doc/1
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
GET test/_search
{
"query": {
"match_phrase": {
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
}
}
For the * it's getting a little more interesting. You could just make multiple phrase searches out of this and combine them. Example:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": "The"
}
},
{
"match_phrase": {
"title": "day I went with my"
}
},
{
"match_phrase": {
"title": "to the"
}
}
]
}
}
}
Or you could use slop in the phrase search. All the terms in your search query have to be there (unless they are being removed by the tokenizer or as stop words), but the matched phrase can have additional words in the phrase. Here we can replace each * with 1 other words, so a slop of 2 in total. If you would want more than 1 word in the place of each * you will need to pick a higher slop:
GET test/_search
{
"query": {
"match_phrase": {
"title": {
"query": "The * day I went with my * to the",
"slop": 2
}
}
}
}
Another alternative might be shingles, but this is a more advanced concept and I would start off with the basics for now.

how to allow exact word match to have higher score

In elastic search, I have defined two synonyms for "swim", like "play", "walk".
Say if I have only two values in elastic search with value "I like to swim", "I like to play".
If the user enter a query "I hope to play" I want it to match "I like to play" with a higher score than "I like to swim" (exact word matches (in this case play) to have higher score), is there a way to achieve that?
Yes you can !
You can acheive any boost you want with the boolean query
The global concept and that you will define must clauses that need to be matched, and should that could match and thus add a boost in your document score.
So for your need, document needs to match with synonyms, and should match without it.
You just need to index your searchable fields with two properties, one with synonyms and another without synonyms.
and example could be
{
"query": {
"bool": {
"must": [
{
"match": {
"text.withSynonyms": "I hope to play"
}
}
],
"should": [
{
"match": {
"text.withoutSynonyms": "I hope to play"
}
}
]
}
}
}

Elasticsearch wildcard query not honoring the analyzer of the field

I have a field named "tag" which is analyzed(default behavior) in elasticsearch. The "tag" field can have a single word or a comma separated string to store multiple tags. For eg. "Festive, Fast, Feast".
Now for example if a tag is "Festive", before indexing I am converting it to small case(to ignore case sensitivity) and indexing it as "festive".
Now if I search using a match query with all caps letters as mentioned below I get results fine(as expected).
{
"query": {
"match": {
"tag": "FESTIVE"
}
}
}
But if I do a wildcard query as mentioned below I don't get results :(
{
"query": {
"wildcard": {
"tag": {
"value": "F*"
}
}
}
}
If I change the value field in wildcard search to "f*" instead of "F*" then I get results.
Does anyone have any clue why is wildcard query behaving case sensitive?
Wildcard queries, fall under term level queries and hence not analyzed. From the Docs
Matches documents that have fields matching a wildcard expression (not
analyzed)
You will get expected results with query string query, it will lowercase the terms because by default as lowercase_expanded_terms is true. Try this
GET your_index/_search
{
"query": {
"query_string": {
"default_field": "tag",
"query": "F*"
}
}
}
Hope this helps!

How to deal with punctuation in an ElasticSearch field

I have a field in a document stored in Elastic Search, which I want to be analyzed as a full text field. In one case, it contains a value for the name field like this:
A&B Corp
I want to be able to search the documents for an auto-complete widget, using a query like this (suppose the user typed A&B into the autocomplete field). The intention is to match documents that contain the any terms with the typed prefix.
{ "query": {
"filtered": {
"query": {
"query_string": {
"query": "A&B*",
"fields": [
"firstName",
"lastName",
"name",
"key",
"email"
]
}
},
"filter": {
"terms": {
"environmentId": [
"foo"
]
}
}
}
}
}
```
My mapping for the name field looks like this:
"name": {
"type": "string"
},
But, I get no results. The query structure works for documents that don't have & in the field, so I'm pretty sure that is part of the problem.
But, I'm not sure how to deal with this. I am pretty sure I still want to analyze the field for full text search.
In addition, if I add a space before the * in the query (ie, "query": "A&B *",) then I get results including A&B, so I don't think it is just discarding the ampersand and treating the A and B as separate terms.
Should I change my mapping? The query?
The Query_string query has a set of reserved characters that needs to be escaped.
query_string : Read the reserved characters section
So to search for
'A&B' (or) 'A&B Corp' (or) 'A&B....'
Your query must be "A&B\\*" such that the query_string parser treats
it as a * wildcard operator.
While currently your query is searching for exact match of
"A&B*" it expects asterik to be part of your data.
And when you search "A&B *" the whitespace is a reserved
character so its
now searching for "A&B" (or) "*" and hence you get a match in this
case.

Resources