I need to keep elasticsearch-data in sync within 3 server using elasticsearch-curator. All I want to update data on one server and others update themselves using snapshot and restore method.
I was able to create snapshot using curator on first server but couldn't restore it on another.
Snapshot
While taking snapshot Host entry in curator.yml is like hosts: ["localhost"] on Server 1. I can easily restore it on Server 1 itself.
But, the problem arise when I try to restore it on Server 2
Host entry in curator.yml is like hosts: ["localhost","Server 1 IP"]
It generates error message:
2017-02-27 10:39:58,927 INFO Preparing Action ID: 1, "restore"
2017-02-27 10:39:59,145 INFO Trying Action ID: 1, "restore": Restore all indices in the most recent curator-* snapshot with state SUCCESS. Wait for the restore to complete before continuing. Do not skip the repository filesystem access check. Use the other options to define the index/shard settings for the restore.
2017-02-27 10:39:59,399 INFO Restoring indices "['test_sec']" from snapshot: curator-20170226143036
2017-02-27 10:39:59,409 ERROR Failed to complete action: restore. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(500, u'snapshot_restore_exception', u'[all_index:curator-20170226143036]snapshot does not exist')
This is kind of related to the answer at how to restore elasticsearch indices from S3 to blank cluster using curator?
How did you add the repository to the original (source) cluster? You need to use the exact same steps to add the repository to the new (target) cluster. Only then will the repository be readable by the new cluster.
Without more information, it's harder to pinpoint, but the snapshot does not exist message seems clear in this regard. It indicates that the repository is not the same shared file system as the source cluster.
Related
Looking at the logs in one of the filebeat pods i can see this:
2021-01-04T10:10:52.754Z DEBUG [add_cloud_metadata] add_cloud_metadata/providers.go:129 add_cloud_metadata: fetchMetadata ran for 2.351101ms
2021-01-04T10:10:52.754Z INFO [add_cloud_metadata] add_cloud_metadata/add_cloud_metadata.go:93 add_cloud_metadata: hosting provider type detected as openstack, metadata={"ava
ilability_zone":"us-east-1c","instance":{"id":"i-08f536567bd9945df","name":"ip-10-101-2-178.ec2.internal"},"machine":{"type":"m5.2xlarge"},"provider":"openstack"}
2021-01-04T10:10:52.755Z DEBUG [processors] processors/processor.go:120 Generated new processors: add_cloud_metadata={"availability_zone":"us-east-1c","instance":{"id":"i-08f5
36567bd9945df","name":"ip-10-101-2-178.ec2.internal"},"machine":{"type":"m5.2xlarge"},"provider":"openstack"}, add_docker_metadata=[match_fields=[] match_pids=[process.pid, process.ppid]]
2021-01-04T10:10:52.755Z INFO instance/beat.go:392 filebeat stopped.
2021-01-04T10:10:52.755Z ERROR instance/beat.go:956 Exiting: data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (pat
h.data).
Exiting: data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (path.data).
as you can see the filebeat stopped with an error :
data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (path.data).
After searching the problem in github/forum i found this :
https://discuss.elastic.co/t/data-path-already-locked-by-another-beat/219852/4
Which looks like my problem,
Im using the default filebeat-kubernetes.yaml , and there is no information in ELK / Filebeats docs on how to add unique paths in the filebeat-kubernetes.yaml
where do i add them and how do i make them unique?
Thanks
I had the same problem. It means that your data path (/var/lib/filebeats) are locked by another filebeat instance. So execute sudo systemctl stop filebeat (in my case) to ensure that you don't have running filebeat
and then run filebeat with sudo filebeat -e which prints logs in console
I also tried link, that you shared, but it didn't help me. Here another solutions, may be it would help you: https://discuss.elastic.co/t/data-path-already-locked-by-another-beat/219852/2
In addition to #Anton's answer, In one of the scenarios, I had a lock file in the data path. This could be /var/lib/filebeat/filebeat.lock depending on the configuration. Delete the file and run sudo filebeat -e
If you want to run Elastic stack as a service, the solution is just to restart all of the stack in this order:
Elasticsearch
Kibana
Logstash
Filebeat(s)
which is already suggested in this link.
we facing the same issue as described in Artifactory : java.io.IOException: Failed to deploy file. Status code: 404 Response message when running our deployment via bitbucket pipelines.
This happens on Artifactory cloud to all pipelines from on day to another.
Execution failed for task ':artifactoryDeploy'.
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Failed to deploy file. Status code: 404 Response message: Artifactory returned the following errors:
Failed to persist file with sha1: 0fexxxxxxxxxxxxxxxx Status code: 404
In Artifactory system-logs I get following warning all the time, but I'm not sure if this issue is connected. Beside following message there are no errors in logs:
2020-08-25T16:26:43.889Z [jfrt ] [WARN ] [c19ba246224f712c] [ntuallyPersistedAddFileTask:96] [al-binary-provider-2] - Failed to delete 'add file' after completing eventually persisted task '/storage/eventual/_add/a3/a396fb897aXXXXXXXXXXXXXXXXXXXXXXXX'
ERROR in request.log
2020-08-26T07:05:43.041Z|1765ac2ce37a6ffc|34.232.119.183|gradle-build|PUT|/gradle-dev-local/app/app-front/1.0.1.418_dev/app-front-1.0.1.418_dev.war;build.timestamp=1598425011065;build.name=app;build.number=1598425011337|404|0|0|9|ArtifactoryBuildClient/2.18.0
2020-08-26T07:05:44.014Z|e62cf9a7063d3fff|34.232.119.183|gradle-build|PUT|/gradle-dev-local/com/customer/app/app-core/1.0.1.418_dev/app-core-1.0.1.418_dev.pom;build.timestamp=1598425011065;build.name=app;build.number=1598425011337|404|4474|0|184|ArtifactoryBuildClient/2.18.0
Does anyone has an idea what could be the reason and what could be checked on top?
We deploying via Artifactory plugin & gradle. (https://bintray.com/jfrog/jfrog-jars/build-info-extractor-gradle#release)
We use fix version but I also updated the plugin to 4.17.1 (before we used 4.9.8)
Thanks in advance!
That sounds like more of an internal issue than something with your client.
It sounds like you may be using some sort of cloud storage, which in turn is using eventual storage. I can imagine a situation like this arising from using a mounted eventual directory over a sharded one in an HA setup.
I'd recommend to see whether that file exists in the filestore still or if it has weird permissions that couldn't be removed. If it is indeed a mounted eventual it'd be worth checking too if the request to upload that artifact came in multiple times; perhaps it was a collision of some sort.
Along those lines, since it's a 404 (not found) and it couldn't delete that file; I'm wondering whether it just couldn't write it to _add in the first place.
To summarize it could be one of two in my opinion with the information so far:
You are using a mounted eventual directory, which may be causing issues
The permissions on the filestore are not correct, affecting the filestore operations
I have just installed ICp CE edition 2.1.0 on Ubuntu 16.04 (one cluster, one master, one worker node, very basic installation). When opening the 'catalog' page (https://.......... :8443/catalog/), I get the message 'Error loading Charts'.
In the 'admin>repositories' page I can see ibm-charts https//blablabla and local-charts https://blablabla/helm-repo/....
The 'admin>metering dasboard dispays an error 'E_DATA_QUERY_ERROR: The query for loginbootstrap failed with the response '500 Internal Server Error'
I have done very few modifications in the config.yaml (and hosts) files in the cluster directory (just configured the password authentication). Maybe some more custom configuration is required.
I'm discovering/learning about this product,maybe there is an obvious explication for such kind of behavior according to an expert.
Thanks
Regarding the "error loading charts", check the following:
Deployments > helm-api > {click the pod name at the bottom} > logs.
Then in another tab open the Admin > Repositories page and click Sync Repositories and watch the log in other tab. Attempt to open the Catalog as well and watch the same log.
If you are seeing any cloudant related error, one possible way to resolve is to delete the helm-api pod and it will reinitialize with the view and the error should go away.
There was possibly an issue when connecting to cloudant when we setup the connection to it. So that helm-api pod needs a restart in order to add some files to cloudant now that it has been initialized.
My understanding is that a fix will be going in to help automate this recovery step in the next release.
As for the 'E_DATA_QUERY_ERROR: The query for loginbootstrap failed with the response '500 Internal Server Error' that was supposedly fixed in the GA release. Are you certain that you have installed the latest ICP from dockerhub for the CE release?
https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0/installing/install_containers_CE.html
The two problems, chart loading error and metering 'loginbootstap' error, are likely to have the same root cause: a problem communicating with the Cloudant database at the time of first startup when databases would be initialized. Restarting the helm-api pod would help the charts, and restarting the metering-server and then the metering-ui pods should resolve the Metering error.
Today I have seen the same issue on ICp 2.1.0.1 EE when I try to navigate Catalog -> helm charts page. Page loading for a while then ended with "error loading charts". Weird thing is I didn't do anything, just leave it, after several hours re-visit again and it works.
Next time, I will first try sync repository Manage -> Helm Repositories -> Sync repositories, then check helm-api pod: (kubectl is on Windows)
kubectl -n kube-system get pods |findstr helm-api
then kill the pod if it is not running.
These are my files:
Nodes.pp file
site.pp file
I need to setup the infrastructure in the diagram, and I would like to use Puppet Automation in order to do so. I would need to,
Create 4 VMs, one for DB, 1 web server, 1 load balancer, 1 master
Set them up with Puppet Agent
Find the appropriate modules/cookbooks from the community site
(Puppet Forge/ Chef Supermarket)
Configure the nodes using recipes/classes fetched from the community
sites.
Provide configuration parameters in order to have all these nodes
connect to each other.
End goal is to have a working Wordpress setup.
I got stuck with the master agent configuration process. I have a Puppet master and 3 agents up and running. But, but whenever I run #puppet agent --test in the agent, It throws an error. I look forward to the community's help.
The error I am getting is...
[root#agent1 vagrant]# puppet agent --noop --test
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
First take a look at the puppet master logs.
Second: The error message is to short. There is missing something after the
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could The text after the "Could" can be helpful ;)
I am trying to set up a RabbitMQ cluster on Windows servers, and this requires using shared Erlang cookie file. According to the documentation, all I need to do is to ensure that the root directories on different machines contain the same .erlang.cookie file. So what I did is found these files on both machines and overwrote them with the same shared version.
After that all rabbitmqctl commands failed on the machine with new file version with "unable to connect to node..." error message. I tried to restart RabbitMQ Windows service, but still rabbitmqctl complained. I even reinstalled RabbitMQ on that machine, but then .erlang.cookie was reset back to the old version. Whenever I tried to use new version of cookie file, rabbitmqctl failed. When I restored an old version, it worked fine.
Basically I am stuck and can not proceed with cluster setup until I resolve this issue. Any help is appreciated.
UPDATE: Received an answer from RabbitMQ:
"rabbitmqctl will pick up the cookie from the user home directory while the service will pick it up from C:\windows. So you will need to synchronise those with each other, as well as with the other machine."
This basically means that cookie file needs to be repaced in two places: C:\Windows and current_user.
You have the above correct. The service will use the cookie at C:\Windows and when you use rabbitmqctl.bat to query the status it is using the cookie in your user directory (%USERPROFILE%).
When the cookies don't match the error look like
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-2.8.2\sbin>rabbitmqctl.bat status
Status of node 'rabbit#PC-FOOBAR' ...
Error: unable to connect to node 'rabbit#PC-FOOBAR': nodedown
DIAGNOSTICS
===========
nodes in question: ['rabbit#PC-FOOBAR']
hosts, their running nodes and ports:
- PC-FOOBAR: [{rabbit,49186},{rabbitmqctl30566,63150}]
current node details:
- node name: 'rabbitmqctl30566#pc-foobar'
- home dir: U:\
- cookie hash: Vp52cEvPP1PukagWi5S/fQ==
There is one more gotcha for RabbitMQ cookies on Windows... If you have a %HOMEDIR% and %HOMEPATH% environment variables (as we do in our current test environment, and sets homedir above to U:\), then RabbitMQ will get the cookie there and if there isn't one it makes one up and writes it there. This left me banging my head on my desk for quite a while when trying to get this working. Once I found this gotcha it was obvious the cookie files were the problem (as documented) they were just at an odd location (not documented AFAIK).
Hope this solves someones pain setting up RabbitMQ Clustering on Windows.