How to define the shards number on ElasticSearch? - elasticsearch

I've installed, a elasticsearch cluster with three nodes, i pretend to use for search emails.
On my platform i'll have 40k mails per month. So my question is How to define the shards number and replicas number on ElasticSearch?
Are there best practices for the measure?
Thanks in advance.

You can define it in the elasticsearch.yml file or when you create a template for an index.

The best would be to have an index per timeframe like one index per month or week. This will help because we cant increase the number of shards per index once its created.
Now you need to define the maping and settings of the indices that are going to be created in future. So you need to define an index template which will in turn define these.
The best practice would be to define the shard number and replica number in the template.
Following is a sample template which would be applied to any index who is not created and has a name starting with te.
curl -XPUT localhost:9200/_template/te_prefix -d '{
"template" : "te*",
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
},
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false }
}
}
}'
Or if you want to create just one index and settings , you can use the following -
Here index name is stats.
curl -X PUT "http://localhost/stats" -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"flat": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"stats": {
"properties": {
"DocCount": {
"type": "long"
},
"Query": {
"type": "string",
"analyzer": "flat"
},
"ResponseTime": {
"type": "long"
}
}
}
}
}'

Related

Elasticsearch create multi index on a daily bases

I load this index in elasticsearch
curl -XPUT 'localhost:9200/filebeat?pretty' -H 'Content-Type: application/json' -d'
{
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": {
"enabled": false
}
},
"dynamic_templates": [
{
"template1": {
"mapping": {
"doc_values": true,
"ignore_above": 50000,
"index": "not_analyzed",
"type": "{dynamic_type}"
},
"match": "*"
}
}
],
"properties": {
"#timestamp": {
"type": "date"
},
"message": {
"type": "string",
"index": "analyzed"
},
"offset": {
"type": "long",
"doc_values": "true"
},
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
},
"settings": {
"index.refresh_interval": "2s"
},
"template": "filebeat-*"
}
'
and the result of curl 'localhost:9200/_cat/indices?v' is filebeat-2018-02-05
and every day on index add to the list of elasticsearch on a daily basies and I have to add it in kibana if I want to search on my latest log file. why elasticsearch add multiple index on a daily bases. and how can I solve this problem (just have my own indexes)
thank you.
I assume you're pushing data to elasticsearch using filebeat.
Elasticsearch doesnt decide what index your data should be written to. It is filebeat that tells elasticsearch where the data should be written. And the default behaviour of filebeat/logstash is to create a new new index every day.
If you want to visualize data for a range of index patterns, you can use the wildcard symbol in your kibana index pattern, say filebeat-*. And all visualizations created against filebeat-* should have aggregated data from all your filebeat- indices.
The reason to have a new index everyday is to help with the logging use case, where new data is more valuable than old data. Hence, this gives an opportunity to easily retire old data, or to move old indices to a less performant elasticsearch node etc.
If you still need a new pattern, you should be able to modify your filebeat config file and specify the new index_pattern value. Document

Elasticsearch Automatic Synonyms

I am looking at Elasticsearch to handle search queries made by users in on my website.
Say that I have a document person with field vehicles_owned which is a list of strings. For example:
{
"name":"james",
"surname":"smith",
"vehicles_owned":["car","bike","ship"]
}
I would like to query which people own a certain vehicle. I understand it's possible to configure ES so that boat is treated as a synonym of ship and if I query with boat I am returned the user james who owns a ship.
What I don't understand is whether this is done automatically, or if I have to import lists of synonyms.
The idea is to create a custom analyzer for the vehicles_owned field which leverages the synonym token filter.
So you first need to define your index like this:
curl -XPUT localhost:9200/your_index -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt" <-- your synonym file
}
}
}
}
},
"mappings": {
"syn": {
"properties": {
"name": {
"type": "string"
},
"surname": {
"type": "string"
},
"vehicles_owned": {
"type": "string",
"index_analyzer": "synonym" <-- use the synonym analyzer here
}
}
}
}
}'
Then you can add all the synonyms you want to handle in the $ES_HOME/config/synonyms.txt file using the supported formats, for instance:
boat, ship
Next, you can index your documents:
curl -XPUT localhost:9200/your_index/your_type/1 -d '{
"name":"james",
"surname":"smith",
"vehicles_owned":["car","bike","ship"]
}'
And finally searching for either ship or boat will get you the above document we just indexed:
curl -XGET localhost:9200/your_index/your_type/_search?q=vehicles_owned:boat
curl -XGET localhost:9200/your_index/your_type/_search?q=vehicles_owned:ship

How to implement case sensitive search in elasticsearch?

I have a field in my indexed documents where i need to search with case being sensitive. I am using the match query to fetch the results.
An example of my data document is :
{
"name" : "binoy",
"age" : 26,
"country": "India"
}
Now when I give the following query:
{
“query” : {
“match” : {
“name” : “Binoy"
}
}
}
It gives me a match for "binoy" against "Binoy". I want the search to be case sensitive. It seems by default,elasticsearch seems to go with case being insensitive. How to make the search case sensitive in elasticsearch?
In the mapping you can define the field as not_analyzed.
curl -X PUT "http://localhost:9200/sample" -d '{
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}'
echo
curl -X PUT "http://localhost:9200/sample/data/_mapping" -d '{
"data": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}'
Now if you can do normal index and do normal search , it wont analyze it and make sure it deliver case insensitive search.
It depends on the mapping you have defined for you field name. If you haven't defined any mapping then elasticsearch will treat it as string and use the standard analyzer (which lower-cases the tokens) to generate tokens. Your query will also use the same analyzer for search hence matching is done by lower-casing the input. That's why "Binoy" matches "binoy"
To solve it you can define a custom analyzer without lowercase filter and use it for your field name. You can define the analyzer as below
"analyzer": {
"casesensitive_text": {
"type": "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
You can define the mapping for name as below
"name": {
"type": "string",
"analyzer": "casesensitive_text"
}
Now you can do the the search on name.
note: the analyzer above is for example purpose. You may need to change it as per your needs
Have your mapping like:
PUT /whatever
{
"settings": {
"analysis": {
"analyzer": {
"mine": {
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "mine"
}
}
}
}
}
meaning, no lowercase filter for that custom analyzer.
Here is the full index template which worked for my ElasticSearch 5.6:
{
"template": "logstash-*",
"settings": {
"analysis" : {
"analyzer" : {
"case_sensitive" : {
"type" : "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"fluentd": {
"properties": {
"message": {
"type": "text",
"fields": {
"case_sensitive": {
"type": "text",
"analyzer": "case_sensitive"
}
}
}
}
}
}
}
As you see, the logs are coming from FluentD and are saved into a timebased index logstash-*. To make sure, I can still execute wildcard queries on the message filed, I put a multi-field mapping on that field. Wildcard/analyzed queries can be done on message field and the case sensitive one on the message.case_sensitive field.

Changing elasticsearch index's shard-count on the next index-rotation

I have an ELK (Elasticsearch-Kibana) stack wherein the elasticsearch node has the default shard value of 5. Logs are pushed to it in logstash format (logstash-YYYY.MM.DD), which - correct me if I am wrong - are indexed date-wise.
Since I cannot change the shard count of an existing index without reindexing, I want to increase the number of shards to 8 when the next index is created. I figured that the ES-API allows on-the-fly persistent changes.
How do I go about doing this?
You can use the "Template Management" features in Elasticsearch: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/indices-templates.html
Create a new logstash template by using:
curl -XPUT localhost:9200/_template/logstash -d '
{
"template": "logstash-*",
"settings": {
"number_of_replicas": 1,
"number_of_shards": 8,
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all": {
"enabled": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
],
"properties": {
"#version": {
"type": "string",
"index": "not_analyzed"
},
"geoip": {
"type": "object",
"dynamic": true,
"path": "full",
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}'
The next time the index that matches your pattern is created, it will be created with your new settings.
The setting is on your elasticsearch. You need to change to config file config/elasticsearch.yml
Change the index.number_of_shards: 8. and restart elasticsearch. The new configuration will set and the new index will use the new configuration, which create 8 shard as you want.
Best would be to use templates and to add one I would recommend Kopf pluin found here: https://github.com/lmenezes/elasticsearch-kopf
You can ofcourse use the API:
curl -XPUT $ELASTICSEARCH-MASTER$:9200/_template/$TEMPLATE-NAME$ -d '$TEMPLATE-CONTENT$'
In the plugin: on the top left corner click on more -> Index templates and then create a new template and make sure you have the following settings as part of your template:
{
"order": 0,
"template": "logstash*",
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {### your mapping ####},
"aliases": {}
}
The above setting will make sure that if a new new index with name logstash* is created it would have 5 number of shards and 1 replica.

Searching synonyms in elasticsearch

I'm trying to create synonym search over languages indexed in ES.
For example,
Indexed document -> name: German
Synonyms: German, Deutsch, XYZ
What I want to make is, when I type either German or Deutsch or XYZ, that ES returns me German...
Is that possible at all?
Yes very much so. ElasticSearch handles synonyms very well. Here is an example of how I configured synonyms on my cluster -
curl -XPOST localhost:9200/**new-index** -d '{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0,
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
}
},
"analyzer": {
"synonym": {
"tokenizer": "lowercase",
"filter": [
"synonym"
]
}
}
}
},
"mappings": {
"**new-type**": {
"_all": {
"enabled": false
},
"properties": {
"Title": {
"type": "multi_field",
"store": "yes",
"fields": {
"Title": {
"type": "string",
"analyzer": "synonym"
}
}
}
}
}
}
}'
The path for the synonym file looks inside the config folder for the synonym folder and locates the text file. An example of the contents of the synonyms.txt for your requirements would be -
German, Deutsch, XYZ
REMEMBER - if you have a lower case filter at index time, the synonyms need to be in lower case. Restart nodes if not working.

Resources