How to compress with 'best_compression' elasticsearch data - elasticsearch

How can I compress all elasticsearch data (existing one as well as new data) with the "best_compression" option?
I know since 5.00 version I can't put "index.codec: best_compression" in the elasticsearch.yml file. I've read the log which indicates that it's deccaped and I should use
curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{"index.codec" : "best_compression"}'
But when used I'm given the following error:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Can't update non dynamic settings [[index.codec]] for open indices [[logstash-dns-2018.07.30/xHq6UfgsSD2M1dBZhV3cOg], [logstash-2018.07.27/7U7uUsEORFqXtJtrk4KvDw], [logstash-dns-2018.07.27/Xbx15QXOQ5KJAK7iop_54Q], [logstash-http-2018.07.27/q0Rs65a3TjW4NJfcljUHEw], [logstash-flow-2018.07.30/0Erbh2TcRgmFJLMLr8Ka8w], [logstash-2018.07.30/boOd8BdrQV2QoziKaZ_2lw], [logstash-alert-2018.07.27/o5yqwdNqR5yAcbJ-HCNVHw], [logstash-alert-2018.07.30/pp6ZWKLISECVzUiCDDeydQ], [logstash-tls-2018.07.30/rZi6KfC7RtqOVjUt7CCqDQ], [logstash-ssh-2018.07.27/wKi-p6slSqO0-vbwRqS1ZA], [.kibana/XaFQRcEXTW6jLUCmBijzKQ], [logstash-tls-2018.07.27/hbiXYCzjRumh3ND6up9vNw], [logstash-flow-2018.07.27/XfspJr1TS4y6MnCgAmRq1g], [logstash-fileinfo-2018.07.27/9VWyBHsqRmO4QsnN-gdt_w], [logstash-http-2018.07.30/U9JO9Cp-QQO7gvRNoHt7FQ], [logstash-fileinfo-2018.07.30/nlwHeDOsQ3ii8CLxcgE3Ag]]"}],"type":"illegal_argument_exception","reason":"Can't update non dynamic settings [[index.codec]] for open indices [[logstash-dns-2018.07.30/xHq6UfgsSD2M1dBZhV3cOg], [logstash-2018.07.27/7U7uUsEORFqXtJtrk4KvDw], [logstash-dns-2018.07.27/Xbx15QXOQ5KJAK7iop_54Q], [logstash-http-2018.07.27/q0Rs65a3TjW4NJfcljUHEw], [logstash-flow-2018.07.30/0Erbh2TcRgmFJLMLr8Ka8w], [logstash-2018.07.30/boOd8BdrQV2QoziKaZ_2lw], [logstash-alert-2018.07.27/o5yqwdNqR5yAcbJ-HCNVHw], [logstash-alert-2018.07.30/pp6ZWKLISECVzUiCDDeydQ], [logstash-tls-2018.07.30/rZi6KfC7RtqOVjUt7CCqDQ], [logstash-ssh-2018.07.27/wKi-p6slSqO0-vbwRqS1ZA], [.kibana/XaFQRcEXTW6jLUCmBijzKQ], [logstash-tls-2018.07.27/hbiXYCzjRumh3ND6up9vNw], [logstash-flow-2018.07.27/XfspJr1TS4y6MnCgAmRq1g], [logstash-fileinfo-2018.07.27/9VWyBHsqRmO4QsnN-gdt_w], [logstash-http-2018.07.30/U9JO9Cp-QQO7gvRNoHt7FQ], [logstash-fileinfo-2018.07.30/nlwHeDOsQ3ii8CLxcgE3Ag]]"},"status":400}

Solved:
Close all indices:
http://localhost:9200/_all/_close'
Apply best_compression to all
curl -XPUT 'http://localhost:9200/_all/_settings' -d '{"index.codec" : "best_compression"}'
Open all indices:
curl -XPOST 'http://localhost:9200/_all/_open'

Related

Restore elasticsearch cluster onto another cluster

Hello i have 3 node elasticsearch cluster ( source ) and i have snapshot called
snapshot-1 which taken from source cluster
and i have another 6 node elasticsearch cluster ( destination ) cluster
and when i restore my destinatition cluster from snapshot-1 using this command
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
> {
> "indices": "*",
> "ignore_unavailable": true,
> "include_global_state": false,
> "rename_pattern": ".security(.+)",
> "rename_replacement": "delete_$1",
> "include_aliases": false
> }
> '
{
and i got this error
"error" : {
"root_cause" : [
{
"type" : "snapshot_restore_exception",
"reason" : "[snapshot:snapshot-1 yjg/mHsYhycHQsKiEhWVhBywxQ] cannot restore index [.ilm-history-0003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
}
so as you can see the index .ilm-history-0003 already exists in the cluster, but how can i do rename replacement for security,.ilm,.slm,.transfrom indices using only 1 rename_pattern?
like this one
"rename_pattern": ".security(.+)",
From my experiences the rename pattern doesn't need to be super fancy because you will probably
a) delete the index (as your renaming pattern suggests) or
b) reindex data from the restored index to new indices. In this case the naming of the restored index is insignificant.
So this is what I would suggest:
Use the following renaming pattern to include all indices. Again, from my experience, your first aim is to get the old data restored. After that you have to manage the reindexing etc.
POST /_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_restore
{
"indices": "*",
"ignore_unavailable": true,
"include_aliases": false,
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}
This will prepend restored_ to the actual index name resulting in the following restored indices:
restored_security
restored_.ilm*
restored_.slm*
restored_.transfrom*
I hope I could help you.
solve it using this way
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
with response:
{
"indices": "*,-.slm*,-,ilm*,-.transfrom*,-security*",
"ignore_unavailable": true,
"include_global_state": false,
"include_aliases": false
}

Getting error index.max_inner_result_window during rolling upgrade of ES from 5.6.10 to 6.8.10

I have 2 data nodes and 3 master nodes in an ES cluster. I was doing a rolling upgrade as ES suggested moving from 5.6.10 to 6.8.10.
As there should be zero downtime, I was testing that and getting one error.
I have upgraded the 1 data node and do basic search testing. It is working fine. When I have upgraded 2nd node search is breaking with the below Error.
java.lang.IllegalArgumentException: Top hits result window is too large, the top hits aggregator [top]'s from + size must be less than or equal to: [100] but was [999]. This limit can be set by changing the [index.max_inner_result_window] index level setting.
index.max_inner_result_window -- This property was introduced in the 6.X version, and the master node is still on 5.6.10. So what will be the solution with 0 downtimes?
Note: My indexing is stopped completely. My 2 data nodes are now on 6.8.10 and master nodes are on 5.6.
Thanks
1 - Change the parameter on current indexes:
curl -X PUT "http://localhost:9200/_all/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index.max_inner_result_window": "2147483647"
}
'
2 - Create a template to further indexes:
curl -X PUT "http://localhost:9200/_index_template/template_max_inner_result?pretty" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["*"],
"template": {
"settings": {
"index":{
"max_inner_result_window": 2147483647
}
}
}
}
'

Backup and restore some records of an elasticsearch index

I wish to take a backup of some records(eg latest 1 million records only) of an Elasticsearch index and restore this backup on a different machine. It would be better if this could be done using available/built-in Elasticsearch features.
I've tried Elasticsearch snapshot and restore (following code), but looks like it takes a backup of the whole index, and not selective records.
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump?pretty=true" -d '
{
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump/snapshot1?wait_for_completion=true&pretty=true" -d '
{
"indices" : "index_name",
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
The format of backup could be anything, as long as it can be successfully restored on a different machine.
you can use _reinex API. it can take any query. after reindex, you have a new index as backup, which contains requested records. easily copy it where ever you want.
complete information is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
In the end, I fetched the required data using python driver because that is what I found the easiest for the given use case.
For that, I ran an Elasticsearch query and stored its response in a file in newline-separated format and then I later restored data from it using another python script. A maximum of 10000 entries are returned this way along with the scroll ID to be used to fetch next 10000 entries and so on.
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
page = es.search(index=['ct_analytics'], body={'size': 10000, 'query': _query, 'stored_fields': '*'}, scroll='5m')
while len(page['hits']['hits']) > 0:
es_data = page['hits']['hits'] #Store this as you like
scrollId = page['_scroll_id']
page = es.scroll(scroll_id=scrollId, scroll='5m')

How to delete all attributes from the schema in solr?

Deleting all documents from solr is
curl http://localhost:8983/solr/trans/update?commit=true -d "<delete><query>*:*</query></delete>"
Adding a (static) attribute to the schema is
curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field":{"name":"trans","type":"string","stored":true, "indexed":true},}' http://localhost:8983/solr/trans/schema
Deleting one attribute is
curl -X POST -H 'Content-type:application/json' -d '{ "delete-field":{"name":"trans"}}' http://arteika:8983/solr/trans/schema
Is there a way to delete all attributes from the schema?
At least in version 6.6 of the Schema API and up to the current version 7.5 of it, you can pass multiple commands in a single post (see 6.6 and 7.5 documenation, respectively). There are multiple accepted formats, but the most intuitive one (I think) is just passing an array for the action you want to perform:
curl -X POST -H 'Content-type: application/json' -d '{
"delete-field": [
{"name": "trans"},
{"name": "other_field"}
]
}' 'http://arteika:8983/solr/trans/schema'
So. How do we obtain the names of the fields we want to delete? That can be done by querying the Schema:
curl -X GET -H 'Content-type: application/json' 'http://arteika:8983/solr/trans/schema'
In particular, the copyFields, dynamicFields and fields keys in the schema object in the response.
I automated clearing all copy field rules, dynamic field rules and fields as follows. You can of course use any kind of script that is available to you. I used Python 3 (might work with Python 2, I did not test that).
import json
import requests
# load schema information
api = 'http://arteika:8983/solr/trans/schema'
r = requests.get(api)
# delete copy field rules
names = [(o['source'], o['dest']) for o in r.json()['schema']['copyFields']]
payload = {'delete-copy-field': [{'source': name[0], 'dest': name[1]} for name in names]}
requests.post(api, data = json.dumps(payload),
headers = {'Content-type': 'application/json'})
# delete dynamic fields
names = [o['name'] for o in r.json()['schema']['dynamicFields']]
payload = {'delete-dynamic-field': [{'name': name} for name in names]}
requests.post(api, data = json.dumps(payload),
headers = {'Content-type': 'application/json'})
# delete fields
names = [o['name'] for o in r.json()['schema']['fields']]
payload = {'delete-field': [{'name': name} for name in names]}
requests.post(api, data = json.dumps(payload),
headers = {'Content-type': 'application/json'})
Just a note: I received status 400 responses at first, with null error messages. Had a bit of a hard time figuring out how to fix those, so I'm sharing what worked for me. Changing the default of updateRequestProcessorChain in solrconfig.xml to false (default="${update.autoCreateFields:false}") and restarting the Solr service made those errors go away for me. The fields I was deleting were created automatically, that may have something to do with that.

Can't get geo_point to work with Bonsai on Heroku

I'm trying to use a geo_point field on Heroku/Bonsai but it just doesn't want to work.
It works in local, but whenever I check the mapping for my index on Heroku/Bonsai it says my field is a string: "coordinates":{"type":"string"}
My mapping looks like this:
tire.mapping do
...
indexes :coordinates, type: "geo_point", lat_lon: true
...
end
And my to_indexed_json like this:
def to_indexed_json
{
...
coordinates: map_marker.nil? ? nil : [map_marker.latitude, map_marker.longitude].join(','),
...
}.to_json
end
In the console on Heroku I tried MyModel.mapping and MyModel.index.mapping and the first one correctly has :coordinates=>{:type=>"geo_point", :lat_lon=>true}.
Here's how I got this to work. Index name 'myindex' type name 'myindextype'
On the local machine
curl -XGET https://[LOCAL_ES_URL]/myindex/myindextype/_mapping
save the output to a .json file. example: typedefinition.json (or hand build one)
{
"myindextype":{
"properties":{
"dataone":{"type":"string"},
"datatwo":{"type":"double"},
"location":{"type":"geo_point"},
"datathree":{"type":"long"},
"datafour":{"type":"string"}
}
}
}
On heroku enter the command
heroku config
and get the BONSAI_URL. Put it in the follwoing commands in place of [BONSAI_URL]. (https://asdfasdfdsf:asdfadf#asdfasdfasdf.us-east-1.bonsai.io/myindex)
curl -XDELETE https://[BONSAI_URL]/myindex
curl -XPOST https://[BONSAI_URL]/myindex
curl -XPUT -d#typedefinition.json https://[BONSAI_URL]/myindex/myindextype/_mapping
curl -XGET https://[BONSAI_URL]/myindex/myindextype/_mapping
Deletes the indes if it exists.
Createds an empty index.
Uses the .json file as a definition for mapping.
Get the new mapping to make sure it worked.

Resources