Achieve same search result for synonym in elasticsearch - elasticsearch

For example I have two Entities called Project and Technology. Each instance of Project has ManyToOne relationship with Entity Technology. Now some Project has JavaScript, some has javascript and some has JS. And I am searching project using elastic-search.
What can be a feasible way, so that when user search with anyone from javascript, JavaScript and JS, user gets same search result.

This is a task for synonyms, you need to apply synonyms filter.
It could be done with something like this:
PUT /test_index
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
}
The synonym.txt should contain the data, in your case:
javascript, JavaScript, JS
it means, that this words are synonyms, and when user will search by them in the field, query will be expanded, if you're using match query.
After these changes, I would recommend to reindex your data.

Related

Liferay portal 7.3.7 case insensitive, diacritics free with ElasticSearch

I am having a dilema on liferay portal 7.3.7 with case insensitive and diacritis free search through elasticsearch in JournalArticles with custom ddm fields. Liferay generated fieldmappings in Configuration->Search like this:
...
},
"localized_name_sk_SK_sortable" : {
"store" : true,
"type" : "keyword"
},
...
I would like to have these *_sortable fields usable for case insensitive and dia free searching, so I tried to add analyzer and normalizer to liferay search advanced configuration in System Settings->Search->Elasticsearch 7 like this:
{
"analysis":{
"analyzer":{
"ascii_analyzer":{
"tokenizer": "standard",
"filter":["asciifolding","lowercase"]
}
},
"normalizer": {
"ascii_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
}
After that, I overrided mapping for template_string_sortable:
{
"template_string_sortable" : {
"mapping" : {
"analyzer": "ascii_analyzer",
"normalizer": "ascii_normalizer",
"store" : true,
"type" : "keyword"
},
"match_mapping_type" : "string",
"match" : "*_sortable"
}
}
After reindexing, my sortable fields looks like this:
...
},
"localized_name_sk_SK_sortable" : {
"normalizer" : "ascii_normalizer",
"store" : true,
"type" : "keyword"
},
...
Next, I try to create new content for my ddm structure, but all my sortable fields looks same, like this:
"localized_title_sk_SK": "test diakrity časť 1 ľščťžýáíéôň title",
"localized_title_sk_SK_sortable": "test diakrity časť 1 ľščťžýáíéôň title",
but I need that sortable field without national characters, so i.e. I can find by "cast 1" through wildcardQuery in localized_title_sk_SK_sortable and so on... THX for any advice (maybe I just have wrong appearance to whole problem? And I am really new to ES)
First of all it would be better to apply original_ascii_folding and then lowercase filter, but keep in mind this filter are for search and your _source data wouldn't be changed because you applied analyzer on the field.
If you need to manipulate the data before ingesting it you can use Ingest pipeline feature in Elasticsearch for more information check here.

Elasticsearch stem_override filter with big inline list of rules

I wand to add big inline list of rules for stem_override filter (see https://www.elastic.co/guide/en/elasticsearch/reference/5.6/analysis-stemmer-override-tokenfilter.html).
My index settings looks like this:
{
"settings": {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "custom_stems", "porter_stem"]
}
},
"filter" : {
"custom_stems" : {
"type" : "stemmer_override",
"rules" : [
"running => run",
"stemmer => stemmer"
... //200 000 rules
]
}
}
}
}
When I send this request to ES it runs so long that I don't receive any response and no index eventually created.
There is solution with rules stored in file in ES config folder. Then filter configuration is:
"filter" : {
"custom_stems" : {
"type" : "stemmer_override",
"rules_path" : "analysis/stemmer_override.txt"
}
}
In this case everything works fine. But this is not what I need. I will not have access to ES filesystem. I need to have possibility create new settings only via request.
Are there any solutions how to make ES process such huge requests(around 4Mb) quickly?
Thanks

Elasticsearch. How to find phrases if query has no spaces

For example, I have a document with a phrase "Star wars" in the name field.
I would like to make a search with DSL and query "starwars" and get this document.
I am trying to get something like this
GET _search
{
"query" : {
"match_phrase" : {
"name": {
"query" : "starwars"
}
}
}
}
How can I do it with elasticsearch?
I think you would need to update the analyzer on that name field with a custom analyzer that includes the synonym token filter with a synonym for starwars.
The docs on creating a custom analyzer should help you out. Additionally, the standard analyzer is applied by default if you did not specify any analyzer for that name field in your mapping. You can base your custom analyzer on that and add that synonym token filter in that array of filters. Perhaps, give some more thought to how you want the content to be analyzed for the other requirements you have as well as this.
With this analyzer update you should be able to use that query and get the result you expect.
Example:
{
"filter" : {
"my_synonym" : {
"type" : "synonym",
"synonyms" : [
"star wars => starwars"
]
}
},
"analyzer" : {
"standard_with_synonym" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_synonym", "stop"]
}
}
}

How to make sure elasticsearch is using the analyzers defined on the mappings?

I have an index in elasticsearch with several custom analyzers for specific fields. Example:
"titulo" : {
"type" : "string",
"index_analyzer" : "analyzer_titulo",
"search_analyzer" : "analyzer_titulo"
}
analyzer_titulo is this:
"analyzer_titulo":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"stop_filter",
"filter_shingle",
"stemmer_filter"
],
"char_filter":[
"html_strip"
],
"tokenizer":"standard"
}
However when i try to use the _analyze api to test the analyzer for this field elasticsearch seems to ignore the custom analyzer:
As you can see both results are different but, if my understanding is correct, they should be the same.
What i am missing here? Is there a way to use the _explain api to see what analyzer is used?
PS: unfortunately i can't post my full mappings (company policy) but i only have one index and one type.
Thanks
I'm not familiar with the tool you're using to test your analyser (don't know why it's not working), but what you can do is run a query that returns the values sitting in the index
curl 'http://localhost:9200/myindex/livros/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "titulo"
}
}
}
}'
If your type has many documents in it, then you'll want to change the match_all :{} to something more specific.

How to create alias for dynamic fields in elasticsearch dynamic templates?

I am using elasticsearch 1.0.2 and using a sample dynamic template in my index. Is there anyway we can derive the field index name from a part of dynamic field Name
This is my template
{"dynamic_templates":[
"dyn_string_fields": {
"match": "dyn_string_*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index" : "analyzed",
"index_name": "{name}"
}
}
}]}
The dynamic templates work and I am able to add fields. Our goal is to add fields with the "dyn_string_" prefix but while searching it should be just the fieldname without the "dyn_string_" prefix. I tested using match_mapping_type to add fields but this will allow any field to be added. Does someone have any suggestions?
I looked at Elasticsearch API and they have a transform feature in 1.3 which allows to modify the document before insertion.(unfortunately I will not be able to upgrade to that version.)
In single template several aliases can be set. For quick example please have a look at this dummy example:
curl -XPUT localhost:9200/_template/test_template -d '
{
"template" : "test_*",
"settings" : {
"number_of_shards" : 4
},
"aliases" : {
"name_for_alias" : {}
},
"mappings" : {
"type" : {
"properties" : {
"id" : {
"type" : "integer",
"include_in_all" : false
},
"test_user_id" : {
"type" : "integer",
"include_in_all" : false
}
}
}
}
}
'
There "name_for_alias" is you simple alias. As parameter there can be defined preset filters if you want use alias for filtering data.
More information can be found here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

Resources