Elasticsearch Mapping and Settings definitions - elasticsearch

I tried setting a new index by configuring its mapping and settings.
Here is the code I used:
POST /test/text
{
"settings": {
"analysis": {
"filter": {
"greek_stop": {
"type": "stop",
"stopwords": "_greek_"
},
"greek_lowercase": {
"type": "lowercase",
"language": "greek"
},
"greek_stemmer": {
"type": "stemmer",
"language": "greek"
}
},
"analyzer": {
"greek": {
"tokenizer": "standard",
"filter": [
"greek_lowercase",
"greek_stop",
"greek_stemmer"
]
}
}
}
},
"mappings": {
"article": {
"properties": {
"title": {
"type": "string",
"fields": {
"greek": {
"type": "string",
"analyser": "greek"
}
}
},
"content": {
"type": "string",
"fields": {
"greek": {
"type": "string",
"analyser": "greek"
}
}
},
"indexed_date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
}
I used POST as I didn't care much about the _id and that way ES gives a random value to this variable instead of using PUT where I have to specify its value. The reason why I created two fields for title and content is because I wish to have the raw version of text and the 'stop words removed-stemmed' version in order to be able to weight higher if the term is found exactly as the user entered it (instead of storing only the stemmed version of a word)
After populating the index with data, e.g.:
PUT /test/text
{
"title": " ",
"content": " ",
"date": " "
}
I tried doing a search query like this:
GET /test/text/_search
{
"query":{
"multi_match":{
"query":"όμορφος",
"type":"most_fields",
"fields":["content","content.greek","title","title.greek"]}}}
And then by changing the query to "όμορφη", these words have the same stemmed version which is "όμορφ", and as a result due to the ES's greek language analyzer I should get the same entry, which I don't.
Any idea why? Should I be doing something more while indexing my documents? After reading the documentation I was under the impression that after defining the mapping when indexing a piece of text it would automatically get indexed both ways and the query would each time be analysed using the appropriate analyzer automatically.
If I am under the right impression why my query doesn't return the same results? Any ideas?
Thank you in advance.

To create the index in the first place you need to call PUT test not POST /test/text. The latter will simply create a new document of type text in a new index called test, but with the default settings and mappings.
So first:
PUT /test
{
"settings": {
...
},
"mappings": {
...
}
}
Then you can create new documents with (note that your mapping type was called article, not text)
POST /test/article
{
"title": " ",
"content": " ",
"date": " "
}
Then only your search query will work

Related

Custom stopword analyzer is not woring properly

I have created an index with a custom analyzer for stop words. I want that elastic-search to ignore these words at the time of searching. Then I added one document data in elasticsearch mapping.
but when I am querying in kibana for "the" keyword with the query. It should not show any successful match, because in my_analzer I have put "the" in my_stop_word section. But it is showing the match. I have studied that if you mention one analyzer at the time of indexing in the mapping field. then it takes that analyzer by default at the time of the query.
please help!
PUT /pandey
{
"settings":
{
"analysis":
{
"analyzer":
{
"my_analyzer":
{
"tokenizer": "standard",
"filter": [
"my_stemmer",
"english_stop",
"my_stop_word",
"lowercase"
]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
},
"english_stop":{
"type": "stop",
"stopwords": "_english_"
},
"my_stop_word": {
"type": "stop",
"stopwords": ["robot", "love", "affection", "play", "the"]
}
}
}
},
"mappings": {
"properties": {
"dialog": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
PUT pandey/_doc/1
{
"dailog" : "the boy is a robot. he is in love. i play cricket"
}
GET pandey/_search
{
"query": {
"match": {
"dailog": "the"
}
}
}
A small spelling mistake can lead to this.
You defined mapping for dialog but added document with field name dailog. the dynamic field mappings behavior of elastic will index it without error. we can disable it though.
So the query, "dailog": "the" will get the result using default analyzer.

How to create and add values to a standard lowercase analyzer in elastic search

Ive been around the houses with this for the past few days trying things in various orders but cant figure out why its not working.
I am trying to create an index in Elasticsearch with an analyzer which is the same as the "standard" analyzer but retains upper case characters when records are stored.
I create my analyzer and index as follows:
PUT /upper
{
"settings": {
"index" : {
"analysis" : {
"analyzer": {
"rebuilt_standard": {
"tokenizer": "standard",
"filter": [
"standard"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
}
}
}
Then add two records to test like this...
POST /upper/doc
{
"text" : "TEST"
}
Add a second record...
POST /upper/doc
{
"text" : "test"
}
Using /upper/_settings gives the following:
{
"upper": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "upper",
"creation_date": "1537788581060",
"analysis": {
"analyzer": {
"rebuilt_standard": {
"filter": [
"standard"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "s4oDgdsFTxOwsdRuPAWEkg",
"version": {
"created": "6030299"
}
}
}
}
}
But when I search with the following query I still get two matches! Both the upper and lower cases which must mean the analyser is not applied when I store the records.
Search like so...
GET /upper/_search
{
"query": {
"term": {
"text": {
"value": "test"
}
}
}
}
Thanks in advance!
first thing first you set your analyzer on the title field instead of upon the text field (since your search is on the text property, and since you are indexing doc with only text property)
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
try
"properties": {
"text": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
and keep us posted ;)

Empty value generates mapper_parsing_exception for Elasticsearch completion suggester field

I have a name field which is a completion suggester, and indexing generates a mapper_parsing_exception error, stating value must have a length > 0.
There are indeed some empty values in this field. How do I accommodate them?
ignore_malformed had no effect, either at the properties or index level.
I tried filtering out empty strings in the analyzer, setting a min length:
PUT /genes
{
"settings": {
"analysis": {
"filter": {
"remove_empty": {
"type": "length",
"min": 1
}
},
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"remove_empty"
]
}
}
}
},
"mappings": {
"gene": {
"name": {
"type": "completion",
"analyzer": "keyword_lowercase"
}
}
}
}
}
Or filter empty strings as a stopword:
"remove_empty": {
"type": "stop",
"stopwords": [""]
}
Attempting to apply a filter to the name mapping generates an unsupported parameter error:
"mappings": {
"gene": {
"name": {
"type": "completion",
"analyzer": "keyword_lowercase",
"filter": "remove_empty"
}
}
}
}
This sure feels like it ought to be simple. Is there a way to do this?
Thanks!
I have faced the same issue. After some research it seems to me that currently the only option is to change data (e.g. replace empty values with some dummy non-empty values) before indexing.
But there is also good news. This issue exists on GitHub and was resolved about a month ago. It is planned to be released in version 6.4.0.

How to force a terms filter to ignore stopwords?

I have an Elasticsearch index with a bunch of fields, some of which I want to use along with the default stopword list. On the other hand, I have a username field which should return results for users called the, be etc.
Of course, when I run the following query:
{
"query": {
"constant_score": {
"filter": {
"terms": {
"username": [
"be"
]
}
}
}
}
}
nothing is returned. I have seen various solutions for changing the standard analyzer to remove stopwords, but am struggling to find how I would do so for this one field only. Thanks for any pointers.
You can do it like the following: add a custom analyzer that shouldn't use stopwords and then explicitly specify this analyzer just for those fields you want stopwords to be recognized (like your username field).
PUT /stopwords
{
"settings": {
"analysis": {
"analyzer": {
"my_english": {
"type": "english",
"stopwords": "_none_"
}
}
}
},
"mappings": {
"text": {
"properties": {
"title": {
"type": "string"
},
"content": {
"type": "string"
},
"username": {
"type": "string",
"analyzer": "my_english"
}
}
}
}
}

Why prefix returns documents without the specific prefix?

I want to return only documents which their name start with "pizza". this is what I've done:
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name": "pizza"
}
}
}
}
}
But I've got these 3 documents:
{
"name": "Viana Pizza",
"city": "Mashhad",
"address": "Vakil abad",
"foods": ["Pizza"],
"salad": true,
"rate": 5.0
}
{
"name": "Pizza Pizza",
"city": "Mashhad",
"address": "Bahar st",
"foods": ["Pizza"],
"salad": true,
"rate": 8.5
}
{
"name": "Reza Pizza",
"city": "Tehran",
"address": "Vali Asr",
"foods": ["Pizza"],
"salad": true,
"rate": 7.5
}
As you can see, Only one of them has "pizza" in the beginning of the name field.
What's wrong?
Probably, the simplest explanation given that you didn't provide the actual mapping, is that you have th e "name" field as "string" and "analyzed" (the default). Which means that "Reza Pizza" will be transformed to "reza" and "pizza" terms.
And your filter will match against terms, not against entire fields. Because ES analyzes the fields and forms terms when the standard mapping is used.
You need to either change your "name" field to "not_analyzed" or add another field to mirror the "name" but this mirror field to be "not_analyzed". Also, for text "pizza" (lowercase) to work in this case you need to create a custom analyzer.
Below you have the solution with the mirror field:
PUT /pizza
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
And in searching you need to use the mirror field:
GET /pizza/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name.raw": "pizza"
}
}
}
}
}
That's all about Elasticsearch analyzers. Let's read the documentation on prefix filter:
Filters documents that have fields containing terms with a specified prefix (not analyzed).
Here we can see that this filter matches terms, not the whole field value. When you index the document, ES splits your field values to terms using analyzers. Default analyzer splits value by whitespace and convert parts to lowercse. So all three results have term pizza in the name field and pizza term perfectly matches pizza prefix. If you want to match field value as is - I'd suggest you to map name field as not_analyzed

Resources