unable to backup elastic search data? - elasticsearch

Hi I am having an elastic search (version 6.6.0) running on a machine . It has some indexes .
curl -X GET "10.10.9.1:9200/_cat/indices/mep-reports*?v&s=index&pretty"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open mep-reports-2019.09.11 l6iFm9fSTp6Q07Qa8BsB-w 1 1 149002 1065 13.6mb 13.6mb
yellow open mep-reports-2019.09.13 lX3twLgnThKUcOoF3B1vbw 1 1 80079 3870 10.1mb 10.1mb
yellow open mep-reports-2019.09.18 NzHFBXIASIifRpmlrWQmmg 1 1 283066 164 25.9mb 25.9mb
yellow open mep-reports-2019.09.20 UB3uCEouSAOAsy96AVz__g 1 1 22002 2 1.8mb 1.8mb
yellow open mep-reports-2019.09.23 VXI7K7SFS-Ol_FoHinuY3A 1 1 269836 2632 19.8mb 19.8mb
yellow open mep-reports-2019.09.25 yd6PUSA2Snug-1BAUZICzw 1 1 200001 1972 13.5mb 13.5mb
yellow open mep-reports-2019.10.01 ji0BqsTQRmm-rIKCd2pg_Q 1 1 5000 790 467kb 467kb
yellow open mep-reports-2019.10.10 rt3kb2VFTH6XLiqrIvpEow 1 1 5000 790 450.6kb 450.6kb
yellow open mep-reports-2019.10.17 ws3zILaySwu69U16dKSQlw 1 1 27 9 24.4kb 24.4kb
yellow open mep-reports-2019.10.24 iKc8ruqWTBCsYz83k6NpHg 1 1 2500 540 276.8kb 276.8kb
close mep-reports-2019.10.30 Qrq98yUeS_yvCwzDoQHb3A
yellow open mep-reports-2020.02.10 upBGvHxnTdaxHP52N8fEPg 1 1 56000 3260 5.3mb 5.3mb
yellow open mep-reports-2020.02.11 GfTOrlHBSJKKToHh3u4jnQ 1 1 500 0 43.4kb 43.4kb
I would like to take a data backup and populate that in my local elastic search instance. for that i have tried the following
curl -X PUT "10.10.9.1:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/tmp/es-backup"
}
}'
it then returns
{
"acknowledged" : true
}
how ever when i tries to list the back up folder it is empty .
ls -ltra /tmp/es-backup
total 4
drwxrwxrwt 1 root root 4096 Feb 12 11:11 ..
drwxrwxr-x 2 omn omn 6 Feb 12 11:46 .
really appreciate any help
thank you

Try to use elasticdump to transfer data from one elastic to another.
You can also output your data into a .json file.
A working example:
Export your indices (data) from a remote elastic server to a .json file.
elasticdump --input=http://10.10.9.1:9200 --output=data.json --type=data
Import data.json (located in your local machine) to your local elastic server.
elasticdump --input=data.json --output=http://localhost:9200 --type=data
Try it.
Hope this helps

Related

Trying to create a snapshot of an index, status is "empty_store"

I am using ElasticSearch 7.17.
I am trying to create a snapshot of an index:
(I know I shouldn't have a single shard, but for now, that's how it is) :
$ curl -s -k "http://localhost:9200/_cat/indices"
yellow open myIndex vVr6ojDCQTi9ASOUGkkRBA 1 1 679161903 0 140.8gb 140.8gb
I have already registered an S3 bucket for snapshots, which I named backups.
I ran the following command:
$ curl -s -k -X PUT "http://localhost:9200/_snapshot/backups/myIndex?pretty&wait_for_completion=false" -H "content-type:application/json" -d'{"indices": "myIndex"}'
{
"accepted" : true
}
Now, I want to have a look at the progress of that backup's upload :
$ curl -s -k "http://localhost:9200/_cat/snapshots/backups/myIndex"
myIndex IN_PROGRESS 1676385605 14:40:05 0 00:00:00 8.6m 1 0 0 0
$ curl -s -k "http://localhost:9200/_cat/recovery"
myIndex 0 37ms empty_store done n/a n/a 172.24.0.3 7529c7447620 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
It's been in this state, with no change, for the past 1 hour.
I don't understand why 0 bytes are transfered. Am I missing something obvious ?
I don't know what empty_store refers to - shouldn't it be existing_store ?
Other people were right - it just took its time.
The snapshot ended in "SUCCESS" status, but the repository remains in empty_store.

Elasticsearch indice take more size on disk than they appear

I have several indices in elasticsearch, one of them has only around 100 documents in it, but it must be updated every other second.
Result of GET _cat/indices is as following:
green open index1 8naYU5e-R-iHvfSKnrEiGw 1 0 2 9 25.5kb 25.5kb
yellow open index2 ZPQWzY7VRYGnBG0i6AL5ag 5 1 5658 89 1.2mb 1.2mb
yellow open index3 MTIDbt4uQbOv4K-0uuyOKA 5 1 0 0 1.1kb 1.1kb
yellow open index4 laF0UcIYTFKQQ6bB9dtQyw 5 1 0 0 1.1kb 1.1kb
yellow open index5 d5SYGXhYTPiVH_GKSA47lQ 5 1 0 0 1.1kb 1.1kb
yellow open index6 nIiNMwNWRZu-aISdLWa8ZA 5 1 110964 61 16.1mb 16.1mb
yellow open index7 g492XL4ZRKy4NOIBwF1yzA 5 1 111054 352 12.5mb 12.5mb
yellow open index8 C2g2RI_oQaOxUvpbzSnVIQ 5 1 123 400 484.8kb 484.8kb
As you can see, index7 has only 123 documents in it and it should not take more than 500kb on disk.
But result of du -sh ./* is like this:
128K ./8naYU5e-R-iHvfSKnrEiGw
1.5G ./C2g2RI_oQaOxUvpbzSnVIQ
172K ./d5SYGXhYTPiVH_GKSA47lQ
1.1G ./g492XL4ZRKy4NOIBwF1yzA
172K ./laF0UcIYTFKQQ6bB9dtQyw
172K ./MTIDbt4uQbOv4K-0uuyOKA
424M ./nIiNMwNWRZu-aISdLWa8ZA
276M ./ZPQWzY7VRYGnBG0i6AL5ag
It's taking more than 1GB on disk.
My question is why and how can I fix it?
I'm using elasticsearch 6.2.4 on Ubuntu 16.04
UPDATE
result of du -sh ./g492XL4ZRKy4NOIBwF1yzA/*
3.2M ./indices/g492XL4ZRKy4NOIBwF1yzA/0/index
8.0K ./indices/g492XL4ZRKy4NOIBwF1yzA/0/_state
241M ./indices/g492XL4ZRKy4NOIBwF1yzA/0/translog
3.1M ./indices/g492XL4ZRKy4NOIBwF1yzA/1/index
8.0K ./indices/g492XL4ZRKy4NOIBwF1yzA/1/_state
238M ./indices/g492XL4ZRKy4NOIBwF1yzA/1/translog
3.2M ./indices/g492XL4ZRKy4NOIBwF1yzA/2/index
8.0K ./indices/g492XL4ZRKy4NOIBwF1yzA/2/_state
241M ./indices/g492XL4ZRKy4NOIBwF1yzA/2/translog
3.1M ./indices/g492XL4ZRKy4NOIBwF1yzA/3/index
8.0K ./indices/g492XL4ZRKy4NOIBwF1yzA/3/_state
241M ./indices/g492XL4ZRKy4NOIBwF1yzA/3/translog
3.1M ./indices/g492XL4ZRKy4NOIBwF1yzA/4/index
8.0K ./indices/g492XL4ZRKy4NOIBwF1yzA/4/_state
241M ./indices/g492XL4ZRKy4NOIBwF1yzA/4/translog
4.0K ./indices/g492XL4ZRKy4NOIBwF1yzA/_state/state-4.st
The size you measured using du -h on the index folder doesn't only include the size taken by the documents stored in the index, but also contains the translog files, which by default can go up to 512mb.
In your case, _cat/indices reveals that your index7 index is 12.5mb big and when running du -h on your index folder, you can see that each index sub-folder located within each shard folder is approximately 3.1mb, so about the same magnitude as reported by _cat/indices.

Elasticsearch respository shows no snapshots after upgrade from 2.x to 5.x

On a RHEL6 system, I followed the steps laid out here to create a repository and capture a snapshot prior to my upgrade. I verified the existence of the snap shot:
curl 'localhost:9200/_snapshot/_all?pretty=true'
Which gave me the following result:
{ "upgrade_backup" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/tmp/elasticsearch-backup"
} } }
After upgrading Elasticsearch via yum, I went to restore my snapshot but none are showing up:
curl 'localhost:9200/_snapshot/_all?pretty=true'
{ }
I checked on the file system and see the repository files:
ls -lrt /tmp/elasticsearch-backup
total 24
-rw-r--r--. 1 elasticsearch elasticsearch 121 Apr 7 14:42 meta-snapshot-number-one.dat
drwxr-xr-x. 3 elasticsearch elasticsearch 21 Apr 7 14:42 indices
-rw-r--r--. 1 elasticsearch elasticsearch 191 Apr 7 14:42 snap-snapshot-number-one.dat
-rw-r--r--. 1 elasticsearch elasticsearch 37 Apr 7 14:42 index
-rw-r--r--. 1 elasticsearch elasticsearch 188 Apr 7 14:51 index-0
-rw-r--r--. 1 elasticsearch elasticsearch 8 Apr 7 14:51 index.latest
-rw-r--r--. 1 elasticsearch elasticsearch 29 Apr 7 14:51 incompatible-snapshots
I made sure elasticsearch.yml still has the "data.repo" tag, so I'm not sure where to look or what to do to determine what happened, but somehow my snapshots vanished!
You need to add following line to elasticsearch.yml:
path.repo: ["/tmp/elasticsearch-backup"]
Then restart Elastic service and create a new snapshots repository:
curl -XPUT "http://localhost:92000/_snapshot/backup" -H 'Content-Type: application/json' -d '{
"type": "fs",
"settings": {
"location": "/tmp/elasticsearch-backup",
"compress": true
}
}'
Now you should be able to list all snapshots in your repository and eventually restore them:
curl -s -XGET "localhost:9200/_snapshot/backup/_all" | jq .

Elasticsearch 1.7.4 log rotation

I'm trying to use logging.yml ( Elasticsearch file ) + logrotate configuration for elasticsearch log rotation .
Information :
1 . Elasticsearch version - 1.7.4
I don't want to keep any rotated files ...
Configuration :
logging.yml configuration :
file:
type: org.apache.log4j.rolling.RollingFileAppender
file: ${path.logs}/${cluster.name}.log
rollingPolicy: org.apache.log4j.rolling.TimeBasedRollingPolicy
rollingPolicy.FileNamePattern: ${path.logs}/${cluster.name}.log.%d{yyyy-MM-dd}.gz
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
Logrotate configuration :
/var/log/elasticsearch/*.log {
daily
rotate 0
copytruncate
compress
delaycompress
missingok
notifempty
maxage 0
create 644 elasticsearch elasticsearch
}
More details :
Running ls on /var/log/elasticsearch :
total 20K
-rw-r--r-- 1 elasticsearch elasticsearch 18763 Jul 4 08:46 dba01es.d1.log
-rw-r--r-- 1 elasticsearch elasticsearch 0 Jun 19 10:01 dba01es.d1_index_indexing_slowlog.log
-rw-r--r-- 1 elasticsearch elasticsearch 0 Jun 19 10:01 dba01es.d1_index_search_slowlog.log
Running manually logrotate :
logrotate -fv /etc/logrotate.d/elasticsearch
logrotate output :
reading config file /etc/logrotate.d/elasticsearch
reading config info for /var/log/elasticsearch/*.log
Handling 1 logs
rotating pattern: /var/log/elasticsearch/*.log forced from command line (no old logs will be kept)
empty log files are not rotated, old logs are removed
considering log /var/log/elasticsearch/dba01es.d1.log
log needs rotating
considering log /var/log/elasticsearch/dba01es.d1_index_indexing_slowlog.log
log does not need rotating
considering log /var/log/elasticsearch/dba01es.d1_index_search_slowlog.log
log does not need rotating
rotating log /var/log/elasticsearch/dba01es.d1.log, log->rotateCount is 0
dateext suffix '-20160704'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
previous log /var/log/elasticsearch/dba01es.d1.log.1 does not exist
renaming /var/log/elasticsearch/dba01es.d1.log.1.gz to /var/log/elasticsearch/dba01es.d1.log.2.gz (rotatecount 1, logstart 1, i 1),
old log /var/log/elasticsearch/dba01es.d1.log.1.gz does not exist
renaming /var/log/elasticsearch/dba01es.d1.log.0.gz to /var/log/elasticsearch/dba01es.d1.log.1.gz (rotatecount 1, logstart 1, i 0),
old log /var/log/elasticsearch/dba01es.d1.log.0.gz does not exist
log /var/log/elasticsearch/dba01es.d1.log.2.gz doesn't exist -- won't try to dispose of it
copying /var/log/elasticsearch/dba01es.d1.log to /var/log/elasticsearch/dba01es.d1.log.1
truncating /var/log/elasticsearch/dba01es.d1.log
Running ll after running logrotate manually :
total 32K
-rw-r--r-- 1 elasticsearch elasticsearch 0 Jul 4 08:48 dba01es.d1.log
-rw-r--r-- 1 elasticsearch elasticsearch 28937 Jul 4 08:48 dba01es.d1.log.1
-rw-r--r-- 1 elasticsearch elasticsearch 0 Jun 19 10:01 dba01es.d1_index_indexing_slowlog.log
-rw-r--r-- 1 elasticsearch elasticsearch 0 Jun 19 10:01 dba01es.d1_index_search_slowlog.log
My question are :
Why the dba01es.d1.log.1 file is not compressed ?
Why the rotate 0 is not working here ? and logrotate keep saving the rotate file ....
Thanks a lot !
Amit

ElasticSearch find disk space usage

How can I find the amount of disk space that Elastic Search is using for my indexes? I'm currently running it locally and I'm trying to see how much disk space I will need on the VM that I'll be spinning up.
The Elasticsearch way to do this would be to use _cat/shards and look at the store column:
curl -XGET "http://localhost:9200/_cat/shards?v"
index shard prirep state docs store ip node
myindex_2014_12_19 2 r STARTED 76661 415.6mb 192.168.1.1 Georgianna Castleberry
myindex_2014_12_19 2 p STARTED 76661 417.3mb 192.168.1.2 Frederick Slade
myindex_2014_12_19 2 r STARTED 76661 416.9mb 192.168.1.3 Maverick
myindex_2014_12_19 0 r STARTED 76984 525.9mb 192.168.1.1 Georgianna Castleberry
myindex_2014_12_19 0 r STARTED 76984 527mb 192.168.1.2 Frederick Slade
myindex_2014_12_19 0 p STARTED 76984 526mb 192.168.1.3 Maverick
myindex_2014_12_19 3 r STARTED 163 208.5kb 192.168.1.1 Georgianna Castleberry
myindex_2014_12_19 3 p STARTED 163 191.4kb 192.168.1.2 Frederick Slade
myindex_2014_12_19 3 r STARTED 163 181.6kb 192.168.1.3 Maverick
myindex_2014_12_19 1 p STARTED 424923 2.1gb 192.168.1.1 Georgianna Castleberry
myindex_2014_12_19 1 r STARTED 424923 2.1gb 192.168.1.2 Frederick Slade
myindex_2014_12_19 1 r STARTED 424923 2.1gb 192.168.1.3 Maverick
myindex_2014_12_19 4 r STARTED 81020 435.9mb 192.168.1.1 Georgianna Castleberry
myindex_2014_12_19 4 p STARTED 81020 437.8mb 192.168.1.2 Frederick Slade
myindex_2014_12_19 4 r STARTED 81020 437.8mb 192.168.1.3 Maverick
Otherwise in Linux to view the space by folder use:
du -hs /myelasticsearch/data/folder
or to view the space by filesystem:
df -h
In case you don't need per-shard statistics returned by /_cat/shards you can use
curl -XGET 'http://localhost:9200/_cat/allocation?v'
to get used and available disk space for each node.
To view the overall disk usage/available space on ES cluster you can use the following command:
curl -XGET 'localhost:9200/_cat/allocation?v&pretty'
Hope this helps.
you can use the nodes stats rest API
see: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/cluster-nodes-stats.html
make a request for the fs stats like so:
http://:9200/_nodes/stats/fs?pretty=1
and you will see:
{
"cluster_name" : "<cluster>",
"nodes" : {
"pEO34wutR7qk3Ix8N7MgyA" : {
"timestamp" : 1438880525206,
"name" : "<name>",
"transport_address" : "inet[/10.128.37.111:9300]",
"host" : "<host>",
"ip" : [ "inet[/10.128.37.111:9300]", "NONE" ],
"fs" : {
"timestamp" : 1438880525206,
"total" : {
"total_in_bytes" : 363667091456,
"free_in_bytes" : 185081352192,
"available_in_bytes" : 166608117760,
"disk_reads" : 154891,
"disk_writes" : 482628039,
"disk_io_op" : 482782930,
"disk_read_size_in_bytes" : 6070391808,
"disk_write_size_in_bytes" : 1989713248256,
"disk_io_size_in_bytes" : 1995783640064,
"disk_queue" : "0",
"disk_service_time" : "0"
},
"data" : [ {
"path" : "/data1/elasticsearch/data/<cluster>/nodes/0",
"mount" : "/data1",
"dev" : "/dev/sda4",
"total_in_bytes" : 363667091456,
"free_in_bytes" : 185081352192,
"available_in_bytes" : 166608117760,
"disk_reads" : 154891,
"disk_writes" : 482628039,
"disk_io_op" : 482782930,
"disk_read_size_in_bytes" : 6070391808,
"disk_write_size_in_bytes" : 1989713248256,
"disk_io_size_in_bytes" : 1995783640064,
"disk_queue" : "0",
"disk_service_time" : "0"
} ]
}
}
}
}
the space for the data drive is listed:
"total" : {
"total_in_bytes" : 363667091456,
"free_in_bytes" : 185081352192,
"available_in_bytes" : 166608117760,
A more concise solution to find the size of indices is to use
curl -XGET 'localhost:9200/_cat/indices?v'
The output has a 'store.size' column that tells you exactly the size of an index.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open logstash-2017.03.01 TfraFM8TQkSXdxjx13CnpQ 5 1 33330000 0 1gb 1gb
yellow open .monitoring-es-2-2017.03.02 10YscrcfQuGny5wMxeb0TA 1 1 68834 88 30.3mb 30.3mb
yellow open .kibana GE6xXV7QT-mNbX7xTPbZ4Q 1 1 3 0 14.5kb 14.5kb
yellow open .monitoring-es-2-2017.03.01 SPeQNnPlRB6y7G6w1Axokw 1 1 29441 108 14.7mb 14.7mb
yellow open .monitoring-data-2 LLeWqsD-QE-rPFblwu5K_Q 1 1 3 0 6.9kb 6.9kb
yellow open .monitoring-kibana-2-2017.03.02 l_MAPERUTmSbq0xbhpnf2Q 1 1 5320 0 1.1mb 1.1mb
yellow open .monitoring-kibana-2-2017.03.01 UFVg9c7TTA-nbsEd2d4oFw 1 1 2699 0 763.4kb 763.4kb
In addition you can find out about available disk space by using
curl -XGET 'localhost:9200/_nodes/_local/stats/fs'
Look up the disk space information under the 'fs' key
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "elasticsearch",
"nodes": {
"MfgVaoRQT9iRAZtAvO549Q": {
"fs": {
"timestamp": 1488466297268,
"total": {
"total_in_bytes": 29475753984,
"free_in_bytes": 18352095232,
"available_in_bytes": 18352095232
},
}
}
}
}
I've tested this for ElasticSearch version 5.2.1
You may want to use the _cat api for nodewise disk space usage
curl http://host:9200/_cat/nodes?h=h,diskAvail
Reference : https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-nodes.html
RUN BELOW TO COMMAND TO FIND OUT DISK SPACE USED BY EACH ELASTICSEARCH INDEXING
# FOR SHARDS
curl http://host:9200/_cat/shards?v&pretty
# OR
GET _cat/shards?v&pretty
RUN BELOW TO COMMAND TO FIND OUT DISK SPACE USED BY EACH ELASTICSEARCH INDICES
# FOR INDICES
curl -XGET 'host:9200/_cat/indices?v&pretty
# SORT BY SIZE STORE OF INDICES
curl -XGET 'host:9200/_cat/indices/_all?v&s=store.size
OUTPUT
# GET /_cat/indices/_all?v&s=store.size
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open sync-rails-logs sSIBqr2iQHG8TGeKFozTpQ 5 1 0 0 1.2kb 1.2kb
yellow open web-nginx-logs iTV-xFFBSdy-C2-NTuEwqQ 5 1 0 0 1.2kb 1.2kb
yellow open web-rails-logs BYD_qHS8SguZvBuGpNvCwA 5 1 0 0 1.2kb 1.2kb
yellow open sync-nginx-logs XAI1hsxlT6qBYN4Ql36lbg 5 1 0 0 1.2kb 1.2kb
green open .tasks XGrMZiqCR0Wr33cCG1u0VQ 1 0 1 0 6.2kb 6.2kb
green open .kibana_1 -g0ztoGWQnuOXnP6di7OYQ 1 0 13 0 100.6kb 100.6kb
green open .kibana_2 eAxt-LXbQyybCyp_6ZYNZg 1 0 14 5 432.2kb 432.2kb
green open sync-nginx-logs-2019-09-13 Q_Ki0dvXQEiuqiGCd10hRg 1 0 144821 0 28.8mb 28.8mb
green open sync-nginx-logs-2019-08-31 m7-oi7ZTSM6ZH_wPDWwbdw 1 0 384954 0 76.4mb 76.4mb
yellow open sync-nginx-logs-2019-08-26 gAvOPNhMRZK6fjAazpzPQQ 5 1 354260 0 76.5mb 76.5mb
green open sync-nginx-logs-2019-09-01 vvgysMB_SqGDFegF6_wOEQ 1 0 400248 0 79.5mb 79.5mb
green open sync-nginx-logs-2019-09-02 8yHv66FuTE6A8L5GgnEl3g 1 0 416184 0 84.8mb 84.8mb
green open sync-nginx-logs-2019-09-07 iZCX1A3fRMaglOCHFLaFsA 1 0 436122 0 86.7mb 86.7mb
green open sync-nginx-logs-2019-09-08 4Y9rA_1cSlGJ9KADmickQQ 1 0 446164 0 88.3mb 88.3mb
RUN BELOW TO COMMAND TO FIND OUT OVERALL DISK SPACE USED BY ALL ELASTICSEARCH INDICES
GET _cat/nodes?h=h,diskAvail
OR
curl http://host:9200/_cat/nodes?h=h,diskAvail
OUTPUT:-
148.3gb
Or you may also query disk directly to measure disk space for each directories under /var/lib/elasticsearch/[environment name]/nodes/0/indices on Elasticsearch nodes.
$ du -b --max-depth=1 /var/lib/elasticsearch/[environment name]/nodes/0/indices \
| sort -rn | numfmt --to=iec --suffix=B --padding=5
> 17GB /var/lib/elasticsearch/env1/nodes/0/indices
3.8GB /var/lib/elasticsearch/env1/nodes/0/indices/index1
2.1GB /var/lib/elasticsearch/env1/nodes/0/indices/index2
1.2GB ...

Resources