Multi-field search for synonym in the query string - elasticsearch

It looks like Elasticsearch does not take field analyzers into account for multi-field search using query string, without specifying field.
Can this be configured for index or specified in the query?
Here it is a hands on example.
Given files from commit (spring-data-elasticsearch).
There is a test SynonymRepositoryTests, which will pass with QueryBuilders.queryStringQuery("text:british") and QueryBuilders.queryStringQuery("british").analyzer("synonym_analyzer") queries.
Is it possible to make it passing with QueryBuilders.queryStringQuery("british") query, without specifying field and analyzer for query?

You could query without specifying fields or analyzers. By default query string query will query on _all field which is combination of all fields and uses standard analyzer. so QueryBuilders.queryStringQuery("british") will work.
You can exclude some fields from all fields while creating index and you can also create custom all field with the help of copy_to functionality.
UPDATE
You would have to use your custom analyzer on _all fields while creating index.
PUT text_index
{
"settings": {
"analysis": {
"filter": {
"edge_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"prefix_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"edge_filter"
]
}
}
}
},
"mappings": {
"test_type": {
"_all": {
"enabled": true,
"analyzer": "prefix_analyzer" <---- your synonym analyzer
},
"properties": {
"name": {
"type": "string"
},
"tag": {
"type": "string",
"analyzer": "simple"
}
}
}
}
}
You can replace prefix_analyzer with your synonym_analyzer and then it should work.

Related

How to use a custom analyser on specific elasticsearch documents

Suppose I have a custom analyser that I want to use on only specific documents that have the table entity_type, how would I go about that?
Document I want to match:
{
... other keys
"_source": {
"entity_type": "table" // <-- I want to match this and use the custom analyser on this entire document
}
}
Custom analyser (currently just set to the default but I want it to only affect tables)
elasticsearch.indices.create(
index="myIndex",
body={
"settings": {
"analysis": {
"char_filter": {
"underscore_to_dash": {
"type": "mapping",
"mappings": ["_ => -"],
}
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"],
"char_filter": ["underscore_to_dash"],
}
},
},
}
})
Analyzer can only be applied to a specific field. So in your case it might make sense to have 2 fields and use one for "entity_type": "table" and another for other docs.

Search for parts of a string in _id field in an existing elasticSearch index

Hie,
I am working with an existing Elastic Search index, trying to search for a string in the _id field.
The _id in this index consists of two concatinated strings, and I need to be able to search for the second part of that string.
After reading documentation I have found out that I probably should use ngram to search for a substring, but I can't make this work properly.
I have found an example online from someone who was trying to do the same, så I updated my index with the following:
PUT /"myIndex"
{"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"partial_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"partial": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"partial_filter"
]
}
}
}
}}
And then tried to add this:
PUT /"index"/_mapping/type2
{
"type2": {
"properties": {
"_id": {
"type": "string",
"analyzer": "partial"
}
}
}
}
That gives me an exception: "Rejecting mapping update to [bci_report_provider_s_dev-"myIndex"] as the final mapping would have more than 1 type: [type2, bci-report]"
How can I resolve this, and is there another way to be able to de a partial search on the _id field?
Thanks a lot in advance!
Bjørn Olav Berg

Why is my elastic search prefix query case-sensitive despite using lowercase filters on both index and search?

The Problem
I am working on an autocompleter using ElasticSearch 6.2.3. I would like my query results (a list of pages with a Name field) to be ordered using the following priority:
Prefix match at start of "Name" (Prefix query)
Any other exact (whole word) match within "Name" (Term query)
Fuzzy match (this is currently done on a different field to Name using a ngram tokenizer ... so I assume cannot be relevant to my problem but I would like to apply this on the Name field as well)
My Attempted Solution
I will be using a Bool/Should query consisting of three queries (corresponding to the three priorities above), using boost to define relative importance.
The issue I am having is with the Prefix query - it appears to not be lowercasing the search query despite my search analyzer having the lowercase filter. For example, the below query returns "Harry Potter" for 'harry' but returns zero results for 'Harry':
{ "query": { "prefix": { "Name.raw" : "Harry" } } }
I have verified using the _analyze API that both my analyzers do indeed lowercase the text "Harry" to "harry". Where am I going wrong?
From the ES documentation I understand I need to analyze the Name field in two different ways to enable use of both Prefix and Term queries:
using the "keyword" tokenizer to enable the Prefix query (I have applied this on a .raw field)
using a standard analyzer to enable the Term (I have applied this on the Name field)
I have checked duplicate questions such as this one but the answers have not helped
My mapping and settings are below
ES Index Mapping
{
"myIndex": {
"mappings": {
"pages": {
"properties": {
"Id": {},
"Name": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"analyzer": "keywordAnalyzer",
"search_analyzer": "pageSearchAnalyzer"
}
},
"analyzer": "pageSearchAnalyzer"
},
"Tokens": {}, // Other fields not important for this question
}
}
}
}
}
ES Index Settings
{
"myIndex": {
"settings": {
"index": {
"analysis": {
"filter": {
"ngram": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "15"
}
},
"analyzer": {
"keywordAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "keyword"
},
"pageSearchAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "standard"
},
"pageIndexAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding",
"ngram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "l2AXoENGRqafm42OSWWTAg",
"version": {}
}
}
}
}
Prefix queries don't analyze the search terms, so the text you pass into it bypasses whatever would be used as the search analyzer (in your case, the configured search_analyzer: pageSearchAnalyzer) and evaluates Harry as-is directly against the keyword-tokenized, custom-filtered harry potter that was the result of the keywordAnalyzer applied at index time.
In your case here, you'll need to do one of a few different things:
Since you're using a lowercase filter on the field, you could just always use lowercase terms in your prefix query (using application-side lowercasing if necessary)
Run a match query against an edge_ngram-analyzed field instead of a prefix query like described in the ES search_analyzer docs
Here's an example of the latter:
1) Create the index w/ ngram analyzer and (recommended) standard search analyzer
PUT my_index
{
"settings": {
"index": {
"analysis": {
"filter": {
"ngram": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "15"
}
},
"analyzer": {
"pageIndexAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding",
"ngram"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
}
},
"mappings": {
"pages": {
"properties": {
"name": {
"type": "text",
"fields": {
"ngram": {
"type": "text",
"analyzer": "pageIndexAnalyzer",
"search_analyzer": "standard"
}
}
}
}
}
}
}
2) Index some sample docs
POST my_index/pages/_bulk
{"index":{}}
{"name":"Harry Potter"}
{"index":{}}
{"name":"Hermione Granger"}
3) Run the a match query against the ngram field
POST my_index/pages/_search
{
"query": {
"match": {
"query": "Har",
"operator": "and"
}
}
}
I think it is better to use match_phrase_prefix query without using .keyword suffix. Check the docs at here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html

Finding exact value with synonym applied in elasticsearch

I have synonym set up for certain fields, however I would like it to be applied to fields I map to not_analyzed.
For example I have field foo storing seksyen 10 which is not_analyzed, and a synonym entry is added for seksyen, section (dealing with mixed languages within a document).
'foo': {'type': 'string', 'index': 'not_analyzed'}
Suppose user issue a query
{"term": {"foo": "section 10"}}
and is expecting foo with seksyen 10 and section 10. However, with the current mapping I can't return seksyen 10 given the query. Also I am doing a filtered query here because I don't want these to be returned
whatever seksyen 10
seksyen 10, something
whatever section 10 something
I just want synonym expansion to be applied to the query, without it specified in the query. How should I do that?
First of all, using term will not do any analysis on the searched text, so you need a different type of query.
You can do it like the following:
{
"mappings": {
"test": {
"properties": {
"foo": {
"type": "string",
"index": "not_analyzed",
"search_analyzer": "synonym"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"seksyen, section"
]
}
}
}
}
}
So, you define a search_analyzer to be used at search time only. And then you need to give up on term filter otherwise it will not work:
{
"query": {
"match": {
"foo": "section"
}
}
}
The solution above works in ES 1.x. In ES 2.x the search_analyzer and a not_analyzed field will not be possible anymore.

How to implement case sensitive search in elasticsearch?

I have a field in my indexed documents where i need to search with case being sensitive. I am using the match query to fetch the results.
An example of my data document is :
{
"name" : "binoy",
"age" : 26,
"country": "India"
}
Now when I give the following query:
{
“query” : {
“match” : {
“name” : “Binoy"
}
}
}
It gives me a match for "binoy" against "Binoy". I want the search to be case sensitive. It seems by default,elasticsearch seems to go with case being insensitive. How to make the search case sensitive in elasticsearch?
In the mapping you can define the field as not_analyzed.
curl -X PUT "http://localhost:9200/sample" -d '{
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}'
echo
curl -X PUT "http://localhost:9200/sample/data/_mapping" -d '{
"data": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}'
Now if you can do normal index and do normal search , it wont analyze it and make sure it deliver case insensitive search.
It depends on the mapping you have defined for you field name. If you haven't defined any mapping then elasticsearch will treat it as string and use the standard analyzer (which lower-cases the tokens) to generate tokens. Your query will also use the same analyzer for search hence matching is done by lower-casing the input. That's why "Binoy" matches "binoy"
To solve it you can define a custom analyzer without lowercase filter and use it for your field name. You can define the analyzer as below
"analyzer": {
"casesensitive_text": {
"type": "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
You can define the mapping for name as below
"name": {
"type": "string",
"analyzer": "casesensitive_text"
}
Now you can do the the search on name.
note: the analyzer above is for example purpose. You may need to change it as per your needs
Have your mapping like:
PUT /whatever
{
"settings": {
"analysis": {
"analyzer": {
"mine": {
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "mine"
}
}
}
}
}
meaning, no lowercase filter for that custom analyzer.
Here is the full index template which worked for my ElasticSearch 5.6:
{
"template": "logstash-*",
"settings": {
"analysis" : {
"analyzer" : {
"case_sensitive" : {
"type" : "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"fluentd": {
"properties": {
"message": {
"type": "text",
"fields": {
"case_sensitive": {
"type": "text",
"analyzer": "case_sensitive"
}
}
}
}
}
}
}
As you see, the logs are coming from FluentD and are saved into a timebased index logstash-*. To make sure, I can still execute wildcard queries on the message filed, I put a multi-field mapping on that field. Wildcard/analyzed queries can be done on message field and the case sensitive one on the message.case_sensitive field.

Resources