Overriding default keyword analysis for elasticsearch - elasticsearch

I am trying to configure an elasticsearch index to have a default indexing policy of analysis with the keyword analyzer, and then overriding it on some fields, to allow them to be free text analyzed. So effectively opt-in free text analysis, where I am explicitly specifying in the mapping which fields are analysed for free text matching. My mapping defintion looks like this:
PUT test_index
{
"mappings":{
"test_type":{
"index_analyzer":"keyword",
"search_analyzer":"standard",
"properties":{
"standard":{
"type":"string",
"index_analyzer":"standard"
},
"keyword":{
"type":"string"
}
}
}
}
}
So standard should be an analyzed field, and keyword should be exact match only. However when I insert some sample data with the following command:
POST test_index/test_type
{
"standard":"a dog in a rug",
"keyword":"sheepdog"
}
I am not getting any matches against the following query:
GET test_index/test_type/_search?q=dog
However I do get matches against:
GET test_index/test_type/_search?q=*dog*
Which makes me think that the standard field is not being analyzed. Does anyone know what I am doing wrong?

Nothing's wrong with the index created. Change your query to GET test_index/test_type/_search?q=standard:dog and it should return the expected results.
If you do not want to specify field name in the query, update your mapping such that you provide the index_analyzer and search_analyzer values explicitly for each field with no default values. See below:
PUT test_index
{
"mappings": {
"test_type": {
"properties": {
"standard": {
"type": "string",
"index_analyzer": "standard",
"search_analyzer": "standard"
},
"keyword": {
"type": "string",
"index_analyzer": "keyword",
"search_analyzer": "standard"
}
}
}
}
}
Now if you try GET test_index/test_type/_search?q=dog, you'll get the desired results.

Related

Cannot retrieve data which includes specific symbols in Kibana

I try to use Kibana to retrive the comment data which includes some specific symbols like ?and 。 They are not general symbols.
I try to use escape character \ for them, the KQL is like comment:\?or comment:\\?, but it doesn't work, can anyone help?
When you create a sample doc and let ES auto-generate the mapping for you,
POST comments/_doc
{
"comment": "?"
}
running
GET comments/_mapping
will get you
"comment":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
Now, the text type's analyzer is usually standard by default.
When we attempt to see how our non-standard chars got analyzed
GET comments/_analyze
{
"text": "?",
"analyzer": "standard"
}
the result is
{
"tokens" : [ ]
}
meaning we cannot search for its contents using the standard-analyzed text field but need to
either define a different default analyzer
or define this analyzer in one of the comment's fields
Going with the 2nd approach (since it's good practice to keep differently-analyzed fields separate),
PUT comments2
{
"mappings": {
"properties": {
"comment": {
"type": "text",
"fields": {
"whitespace_analyzed": {
"type": "text",
"analyzer": "whitespace"
}
}
}
}
}
}
POST comments2/_doc
{
"comment": "?"
}
After verifying
GET comments2/_analyze
{
"text": "?",
"analyzer": "whitespace"
}
we can do the following in KQL
comment.whitespace_analyzed:"?"
Note that there are a bunch of built-in analyzers to choose from but you're more than welcome to create your own.

Full Text Search as well as Terms Search on same filed of Elasticsearch

I'm from MySql background. So I don't know much about elasticsearch and it's working.
Here is my requirements
There will be table of resulted records with sorting option on all the column. There will be filter option from where user will select multiple values for multiple columns (e.g, City should be from City1, City2, City3 and Category should be from Cat2, Cat22, Cat6). There will be also search bar where user will enter some text and full text search will be applied on some fields (i.e, City, Area etc).
This image will give better understanding.
Where I'm facing problem is Full Text Search. I have tried some mapping but every time I have to compromise either on Full Text Search or Terms Search. So I think there is no any way to apply both search on same field. But as I told, I don;t know much about elasticsearch. So if any one have solution, it will be appreciated.
Here is what I have applied currently which makes sorting and Terms Searching enable but Full Text Search is not working.
{
"mappings":{
"my_type":{
"properties":{
"city":{
"type":"string",
"index":"not_analyzed"
},
"category":{
"type":"string",
"index":"not_analyzed"
},
"area":{
"type":"string",
"index":"not_analyzed"
},
"zip":{
"type":"string",
"index":"not_analyzed"
},
"state":{
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
You can update the mapping with multifields with two mappings one for full text and another for terms search. Here's a sample mapping for city.
{
"city": {
"type": "string",
"index": "not_analyzed",
"fields": {
"fulltext": {
"type": "string"
}
}
}
}
Default mapping is for terms search, so when terms search is required, you could simple query in "city" field. But, you need full-text search, query must be performed on "city.fulltext". Hope this helps.
Full-text search won't work on not_analyzed fields and sorting won't work on analyzed fields.
You need to use multi-fields.
It is often useful to index the same field in different ways for different purposes. This is the purpose of multi-fields. For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations:
For example :
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
} ...
}
}
}
}
Use the dot notation to sort by city.raw :
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
}
}

How to use dot notation in geo percolator

I use Elasticsearch 2.2.1 for searching documents that relate to a specific geo graphic location (within a bouding box). I want to create a percolator that I can use to check if a new document relates to an existing query.
This works fine if I put the percolator into the index containing the documents but because of the issue mention in this document and the workaround mentioned here I need to put the percolate queries into a dedicated percolator index.
When I try to put a percolator into this index:
PUT /mypercindex/.percolator/1
{"query": {"filtered": {"filter":
{"bool":
{"should":
[
{"geo_bounding_box":
{"location.coordinates":
{"bottom_right":
{"lat":50.0,"lon":8.0}
,"top_left":
{"lat":54.0,"lon":3.0}
}
}
}
]
}
}
}}}
I get an error message saying that:
Strict field resolution and no field mapping can be found for the field with name [location.coordinates]
In the percolator documentation is mentioned that, in case of a dedicated percolator index, you need to:
make sure that the mappings from the normal index are also available on the percolate index
This may cause my issue but I cannot find the documentation about how to make the mapping from one index available in the other. I tried to add the dedicated percolator index with the same mapping as my document index but when I do this I still get the same error message.
The mapping of my document index resembles this:
{"my_mapping": {
"dynamic":"strict",
"properties":{
"body":{
"properties":{
"author":{
"type":"string",
"index":"not_analyzed"
},
"hashtags":{
"type":"string",
"index":"not_analyzed"
},
"language":{
"type":"string",
"index":"not_analyzed"
}
,"text":{
"type":"string",
"analyzer":"stopwords"
},
"title":{
"type":"string",
"analyzer":"stopwords"
}
}
},
"location":{
"properties":{
"coordinates":{
"type":"geo_point"
},
"names":{
"type":"string",
"analyzer":"standard"
}
}
}
}
}}
Any help would be greatly appreciated!
Adding a .percolator mapping to the mapping, like mentioned in the Github issue that addresses this workaround, fixed the issue for me:
".percolator": {
"dynamic": true,
"properties": {
"id": {
"type": "integer"
}
}
}

Keep non-stemmed tokens on Elasticsearch

I'm using a stemmer (for the Brazilian Portuguese Language) when I index documents on Elasticsearch. This is what my default analyzer looks like(nvm minor mistakes here because I've copied this by hand from my code in the server):
{
"analysis":{
"filter":{
"my_asciifolding": {
"type": "asciifolding",
"preserve_original": true,
},
"stop_pt":{
"type": "stop",
"ignore_case": true,
"stopwords": "_brazilian_"
},
"stemmer_pt": {
"type": "stemmer",
"language": "brazilian"
}
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_asciifolding",
"stop_pt",
"stemmer_pt"
]
}
}
}
}
I haven't really touched my type mappings (apart from a few numeric fields I've declared "type":"long") so I expect most fields to be using this default analyzer I've specified above.
This works as expected, but the thing is that some users are frustrated because (since tokens are being stemmed), the query "vulnerabilities" and the query "vulnerable" return the same results, which is misleading because they expect the results having an exact match to be ranked first.
Whats is the default way (if any) to do this in elasticsearch? (maybe keep the unstemmed tokens in the index as well as the stemmed tokens?) I'm using version 1.5.1.
I ended up using "fields" field to index my attributes in different ways. Not sure whether this is optimal but this is the way I'm handling it right now:
Add another analyzer (I called it "no_stem_analyzer") with all filters that the "default" analyzer has, minus "stemmer".
For each attribute I want to keep both non stemmed and stemmed variants, I did this (example for field "DESCRIPTION"):
"mappings":{
"_default_":{
"properties":{
"DESCRIPTION":{
"type"=>"string",
"fields":{
"no_stem":{
"type":"string",
"index":"analyzed",
"analyzer":"no_stem_analyzer"
},
"stemmed":{
"type":"string",
"index":"analyzed",
"analyzer":"default"
}
}
}
},//.. other attributes here
}
}
At search time (using query_string_query) I must also indicate (using field "fields") that I want to search all sub-fields (e.g. "DESCRIPTION.*")
I also based my approach upon [this answer].(elasticsearch customize score for synonyms/stemming)

elasticsearch search query for exact match not working

I am using query_string for search. Searching is working fine but its getting all records with small letters and capital letters match.But i want to exact match with case sensitive?
For example :
Search field : "title"
Current output :
title
Title
TITLE,
I want to only first(title). How to resolved this issue.
My code in java :
QueryBuilder qbString=null;
qbString=QueryBuilders.queryString("title").field("field_name");
You need to configure your mappings / text processing so tokens are indexed without being lowercased.
The "standard"-analyzer lowercases (and removes stopwords).
Here's an example that shows how to configure an analyzer and a mapping to achieve this: https://www.found.no/play/gist/7464654
With Version 5 + on ElasticSearch there is no concept of analyzed and not analyzed for index, its driven by type !
String data type is deprecated and is replaced with text and keyword, so if your data type is text it will behave like string and can be analyzed and tokenized.
But if the data type is defined as keyword then automatically its NOT analyzed, and return full exact match.
SO you should remember to mark the type as keyword when you want to do exact match with case sensitive.
code example below for creating index with this definition:
PUT testindex
{
"mappings": {
"original": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"APPLICATION": {
"type": "text",
"fields": {
"exact": {"type": "keyword"}
}
},
"type": {
"type": "text",
"fields": {
"exact": {"type": "keyword"}
}
}
}
}
}
}

Resources