I need to remove the stop words from query in elasticsearch. I am able to apply analyzer at index level but let me know how to apply analyzer at query or search level in elasticsearch.
you have to configure your elasticsearch mappings to add search_analyzers to fields you want to analyze query time.
like
{
"service" :{
"_source" : {"enabled" : true },
"properties":{
"name" : {"type" : "string", "index" : "not_analyzed"},
"name_snow": { "type": "string", "search_analyzer": "simple_analyzer", "index_analyzer": "snowball_analyzer" }
}
}
}
when you will query on this field, the terms entered will be analyzed first than queries in the shard.
Related
I am having a dilema on liferay portal 7.3.7 with case insensitive and diacritis free search through elasticsearch in JournalArticles with custom ddm fields. Liferay generated fieldmappings in Configuration->Search like this:
...
},
"localized_name_sk_SK_sortable" : {
"store" : true,
"type" : "keyword"
},
...
I would like to have these *_sortable fields usable for case insensitive and dia free searching, so I tried to add analyzer and normalizer to liferay search advanced configuration in System Settings->Search->Elasticsearch 7 like this:
{
"analysis":{
"analyzer":{
"ascii_analyzer":{
"tokenizer": "standard",
"filter":["asciifolding","lowercase"]
}
},
"normalizer": {
"ascii_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
}
After that, I overrided mapping for template_string_sortable:
{
"template_string_sortable" : {
"mapping" : {
"analyzer": "ascii_analyzer",
"normalizer": "ascii_normalizer",
"store" : true,
"type" : "keyword"
},
"match_mapping_type" : "string",
"match" : "*_sortable"
}
}
After reindexing, my sortable fields looks like this:
...
},
"localized_name_sk_SK_sortable" : {
"normalizer" : "ascii_normalizer",
"store" : true,
"type" : "keyword"
},
...
Next, I try to create new content for my ddm structure, but all my sortable fields looks same, like this:
"localized_title_sk_SK": "test diakrity časť 1 ľščťžýáíéôň title",
"localized_title_sk_SK_sortable": "test diakrity časť 1 ľščťžýáíéôň title",
but I need that sortable field without national characters, so i.e. I can find by "cast 1" through wildcardQuery in localized_title_sk_SK_sortable and so on... THX for any advice (maybe I just have wrong appearance to whole problem? And I am really new to ES)
First of all it would be better to apply original_ascii_folding and then lowercase filter, but keep in mind this filter are for search and your _source data wouldn't be changed because you applied analyzer on the field.
If you need to manipulate the data before ingesting it you can use Ingest pipeline feature in Elasticsearch for more information check here.
Mapping:
{
"articles" : {
"mappings" : {
"data" : {
"properties" : {
"author" : {
"type" : "text",
"analyzer" : "standard"
},
"content" : {
"type" : "text",
"analyzer" : "english"
},
"tags" : {
"type" : "keyword"
},
"title" : {
"type" : "text",
"analyzer" : "english"
}
}
}
}
}
}
Example data:
{
"author": "John Smith",
"title": "Hello world",
"content": "This is some example article",
"tags": ["programming", "life"]
}
So as you see I have mapping with different analysers on different fields. Now I want to search across those fields in a following way:
only documents matching all search keywords are returned (like multi_match with cross_fields as a type and and as operator)
query should be fuzzy so it can tolerate some typos
different fields should have different boost values (e.g. title more important than content)
For example following query should match above document:
programing worlds john examlpe
How can I do it? According to documentation fuzziness won't work with cross_fields nor fields with different analysers.
One way of doing it would be implementing custom _all fields and coping all values there using copy_to but with this approach I can't assign different weights nor use different analysers.
Say that I have a field "productTitle" which I want to use for my users to search for products.
I also want to apply autocomplete functionality. So I m using an autocomplete_analyzer with the following filter:
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
However, at the same time when users make a search I don't want the "edge_ngram" to be applied, since it produces lot of irrelevant results.
For example when users want to search for "mi" and start typing "m", "mi".. they should get the results starting with m,mi as auto-complete options. However, when they actually make the query, they should only get results with the word "mi". Currently they also see results with "mini" etc..
Therefore, is it possible to have "productTitle" indexed using two different analyzers? Is multi-field type an option for me?
EDIT: Mapping for productTitle
"productTitle" : {
"type" : "string",
"index_analyzer" : "second",
"search_analyzer" : "standard",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
,
"second" analyzer
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"autocomplete_filter"
]
}
So when I'm querying for :
"filtered" : {
"query" : {
"match" : {
"productTitle" : {
"query" : "mi",
"type" : "boolean",
"minimum_should_match" : "2<75%"
}
}
}
}
I also get results like "mini". But I need to only get results including just "mi"
Thank you
hmm ... as far as I know, there is no way to apply multiple analyzers for same field ... what You can make is to use "Multi Fields".
here is an example how to apply different analyzers for "subfields":
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html#_multi_fields_with_multiple_analyzers
The correct way of preventing what you describe in your answer is to specify both analyzer and search_analyzer in your field mapping, like this:
"productTitle": {
"type": "string",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "standard"
}
The autocomplete analyzer will kick in at indexing time and tokenize your title according to your edge_ngram configuration and the standard analyzer will kick in at search time without applying the edge_ngram stuff.
In this context, there is no need for multi-fields unless you need to tokenize the productTitle field in different ways.
I'm trying to perform wildcard queries over the _all field. An example query could be:
GET index/type/_search
{
"from" : 0,
"size" : 1000,
"query" : {
"bool" : {
"must" : {
"wildcard" : {
"_all" : "*tito*"
}
}
}
}
}
The thing is that to use a wildcard query the _all field needs to be not_analyzed, otherwise the query won't work. See ES documentation for more info.
I tried to set the mappings over the _all field using this request:
PUT index
{
"mappings": {
"type": {
"_all" : {
"enabled" : true,
"index_analyzer": "not_analyzed",
"search_analyzer": "not_analyzed"
},
"_timestamp": {
"enabled": "true"
},
"properties": {
"someProp": {
"type": "date"
}
}
}
}
}
But I'm getting the error analyzer [not_analyzed] not found for field [_all].
I want to know what I'm doing wrong and if there is another (better) way to perform this kind of queries.
Thanks.-
Have you tried removing:
"search_analyzer": "not_analyzed"
Also, I wonder how well a wildcard across all properties will scale. Have you looked into NGrams? See the docs here.
Most probably you wanted to give option
"index": "not_analyzed"
Index attribute for a string field, _all is a string field, determines if that field should be analyzed or not.
"search_analyzer" is to set to determine which analyzer should be used for user entered query, which is valid if index attribute is set to analyzed.
"index_analyzer" is to set to determine which analyzer should be used for documents, again which is valid if index attribute is set to analyzed.
I have a some data that I'm aggregating with elasticsearch 1.5.2 and when I do a terms aggregation on a field like city the buckets don't match full strings from the field. Ex.) If city is St. Louis then one bucket would be St. and the other Louis. Does anyone know how to make sure that when it aggregates it goes into a St. Louis bucket?
note: This may be caused from the data being analyzed which I'm pretty sure breaks up strings when comparing and searching etc.
You're correct. So you simply need to map your city field as a not_analyzed string using this mapping:
{
"your_type" : {
"properties" : {
"city" : {
"type" : "string",
"index" : "analyzed",
"fields" : {
"raw" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
And then you can simply run your aggregation on the city.raw field (which contains the un-analyzed value, i.e. St. Louis) instead of city, which is analyzed and breaks up the content into several tokens (i.e. st and louis).
If you know in advance, you're never going to need the analyzed field, you can simply store the not_analyzed field like this (i.e. no need for the fields part declaring a multi-field):
{
"your_type" : {
"properties" : {
"city" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}