Elasticsearch. How to find phrases if query has no spaces - elasticsearch

For example, I have a document with a phrase "Star wars" in the name field.
I would like to make a search with DSL and query "starwars" and get this document.
I am trying to get something like this
GET _search
{
"query" : {
"match_phrase" : {
"name": {
"query" : "starwars"
}
}
}
}
How can I do it with elasticsearch?

I think you would need to update the analyzer on that name field with a custom analyzer that includes the synonym token filter with a synonym for starwars.
The docs on creating a custom analyzer should help you out. Additionally, the standard analyzer is applied by default if you did not specify any analyzer for that name field in your mapping. You can base your custom analyzer on that and add that synonym token filter in that array of filters. Perhaps, give some more thought to how you want the content to be analyzed for the other requirements you have as well as this.
With this analyzer update you should be able to use that query and get the result you expect.
Example:
{
"filter" : {
"my_synonym" : {
"type" : "synonym",
"synonyms" : [
"star wars => starwars"
]
}
},
"analyzer" : {
"standard_with_synonym" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_synonym", "stop"]
}
}
}

Related

Achieve same search result for synonym in elasticsearch

For example I have two Entities called Project and Technology. Each instance of Project has ManyToOne relationship with Entity Technology. Now some Project has JavaScript, some has javascript and some has JS. And I am searching project using elastic-search.
What can be a feasible way, so that when user search with anyone from javascript, JavaScript and JS, user gets same search result.
This is a task for synonyms, you need to apply synonyms filter.
It could be done with something like this:
PUT /test_index
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
}
The synonym.txt should contain the data, in your case:
javascript, JavaScript, JS
it means, that this words are synonyms, and when user will search by them in the field, query will be expanded, if you're using match query.
After these changes, I would recommend to reindex your data.

Stemming in elastic search replacing the original string

I used the following settings to create ES index.
"settings": {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "english"
}
}
}
}
I noticed that while analysing the stemmer replaces the original string with the stemmed word. Is there a way to index the original string and stemmed token both ?
Your question is about a "preserve_original" parameter for stemmer token filter:
You will find "preserve_original" e.g. for Word Delimiter Token Filter but not for stemmer token filter.
If you need the original word e.g. for aggregation you can copy the field to another one with a suited analyzer.
If you need the original on the same position of your index you have to wrap the stemmer and build your own analyzer as plugin.

How to build an Elasticsearch phrase query that matches text with special characters?

During the last few days I've been playing around elastic-search indexing and searching and I've to build different queries that I intended to. My problem right now is being able to build a query that is able to match text with special characters even if I don't type them in the "search bar". I'll give an example to easily explain what I mean.
Imagine you have a document indexed that contains a field called page content. Inside this field, you can have a part of the text such as
"O carro do João é preto." (means João's car is black in portuguese)
What I want to be able to do is type something like:
O carro do joao e preto
and still be able to get the proper match.
What I've tried so far:
I've been using the match phrase query provided in the documentation of elasticsearch (here) such as the example below:
GET _search
{
"query": {
"match_phrase": {
"page content":
{
"query": "o carro do joao e preto"
}
}
}
}
The result of this query gives me 0 hits. Which is perfectly acceptable given that the provided content of the query is different from what has been stored in that document.
I've tried setting the ASCII Folding Token Filter (here) but I'm not sure of how to use it. So what I've basically done is creating a new index with this query:
PUT /newindex '
{
"page content": "O carro do João é preto",
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "standard",
"filter" : ["standard", "my_ascii_folding"]
}
},
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : true
}
}
}
}
}'
Then if I try to query, using the match_phrase query provided above, like this:
O carro do joao e preto
it should show me the correct result as I wanted it to. But the thing is it isn't working for me. Am I forgetting something? I've been around this for the last two days without success and I feel like it's something that I'm missing.
So question: What do I have to do to get the desired matching?
Managed to find the answer to my own question. I had to change the analyzer a little bit when I created the index. Further details in this previous answer:
My code now:
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "asciifolding"]
},
"text" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase"],
"char_filter" : "html_strip"
},
"sortable" : {
"tokenizer" : "keyword",
"filter" : ["lowercase"],
"char_filter" : "html_strip"
}
}
}
}
}

How to make sure elasticsearch is using the analyzers defined on the mappings?

I have an index in elasticsearch with several custom analyzers for specific fields. Example:
"titulo" : {
"type" : "string",
"index_analyzer" : "analyzer_titulo",
"search_analyzer" : "analyzer_titulo"
}
analyzer_titulo is this:
"analyzer_titulo":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"stop_filter",
"filter_shingle",
"stemmer_filter"
],
"char_filter":[
"html_strip"
],
"tokenizer":"standard"
}
However when i try to use the _analyze api to test the analyzer for this field elasticsearch seems to ignore the custom analyzer:
As you can see both results are different but, if my understanding is correct, they should be the same.
What i am missing here? Is there a way to use the _explain api to see what analyzer is used?
PS: unfortunately i can't post my full mappings (company policy) but i only have one index and one type.
Thanks
I'm not familiar with the tool you're using to test your analyser (don't know why it's not working), but what you can do is run a query that returns the values sitting in the index
curl 'http://localhost:9200/myindex/livros/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "titulo"
}
}
}
}'
If your type has many documents in it, then you'll want to change the match_all :{} to something more specific.

Elasticsearch phrase prefix query on multiple fields

I'm new to ES and I'm trying to build a query that would use phrase_prefix for multiple fields so I dont have to search more than once.
Here's what I've got so far:
{
"query" : {
"text" : {
"first_name" : {
"query" : "Gustavo",
"type" : "phrase_prefix"
}
}
}
}'
Does anybody knows how to search for more than one field, say "last_name" ?
The text query that you are using has been deprecated (effectively renamed) a while ago in favour of the match query. The match query supports a single field, but you can use the multi_match query which supports the very same options and allows to search on multiple fields. Here is an example that should be helpful to you:
{
"query" : {
"multi_match" : {
"fields" : ["title", "subtitle"],
"query" : "trying out ela",
"type" : "phrase_prefix"
}
}
}
You can achieve the same using the Java API like this:
QueryBuilders.multiMatchQuery("trying out ela", "title", "subtitle")
.type(MatchQueryBuilder.Type.PHRASE_PREFIX);

Resources