Related
Hope everyone is keeping safe!
I have a Ruby on Rails application hosted on AWS Beanstalk. I am using CloudFormation template, to update any stack for e.g., Ruby version, Linux Platform upgrade etc.
I was trying to upgrade, Linux box to 2.11.7 and Ruby to 2.6.6 and then ElasticSearch to 7.4
I was doing these changes in CloudFormation YML template and then I ran aws cloudformation update-stack command to apply these changes.
While the changes took time, I accidentally clicked on Rebuild Environment from Web AWS Console as a result, all the previously configured settings like SQS, Load balancer etc., were replaced by new settings.
Now, whenever I am trying to execute the update-stack command, it fails with below errors:
2020-06-09 15:25:44 UTC+0530
WARN
Environment health has transitioned from Info to Degraded. Command failed on all instances.
Incorrect application version found on all instances. Expected version "code-pipeline-xxxxxxxxxx" (deployment 2377). Application update failed 40 seconds ago and took 79 seconds.
2020-06-09 15:25:03 UTC+0530
INFO
The environment was reverted to the previous configuration setting.
2020-06-09 15:24:44 UTC+0530
INFO
Environment health has transitioned from Ok to Info. Application update in progress on 1 instance. 0 out of 1 instance completed (running for 39 seconds).
2020-06-09 15:24:30 UTC+0530
ERROR
During an aborted deployment, some instances may have deployed the new application version.
To ensure all instances are running the same version, re-deploy the appropriate application version.
2020-06-09 15:24:30 UTC+0530
ERROR
Failed to deploy application.
2020-06-09 15:24:30 UTC+0530
ERROR
Unsuccessful command execution on instance id(s) 'i-xxxxxxxxxx'. Aborting the operation.
2020-06-09 15:24:30 UTC+0530
INFO
Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2020-06-09 15:24:30 UTC+0530
ERROR
[Instance: i-xxxxxxxxxx] Command failed on instance. Return code: 18 Output: (TRUNCATED)...g: the running version of Bundler (1.16.0) is older than the version that created the lockfile (1.17.3). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`. Your Ruby version is 2.6.6, but your Gemfile specified 2.6.5. Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/10_bundle_install.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
2020-06-09 15:24:19 UTC+0530
INFO
Deploying new version to instance(s).
2020-06-09 15:23:45 UTC+0530
INFO
Updating environment developWeb's configuration settings.
2020-06-09 15:23:36 UTC+0530
INFO
Environment update is starting.
I can confirm that I have Ruby-2.6.6 set. I am not sure from where it is picking up the old version of Ruby?
Is there any way I can fix this? OR forcefully apply template changes?
Any help on this would be highly appreciated.
[UPDATE]: When I try to connect to ElasticSearch from Rails console, I get:
Faraday::ConnectionFailed: Failed to open TCP connection to old-elasticsearch-host-name.es.amazonaws.com:80 (Hostname not known: old-elasticsearch-host-name.es.amazonaws.com)
from /opt/rubies/ruby-2.6.6/lib/ruby/2.6.0/net/http.rb:949:in `rescue in block in connect'
Caused by SocketError: Failed to open TCP connection to old-elasticsearch-host-name.es.amazonaws.com:80 (Hostname not known: old-elasticsearch-host-name.es.amazonaws.com)
from /opt/rubies/ruby-2.6.6/lib/ruby/2.6.0/net/http.rb:949:in `rescue in block in connect'
Caused by Resolv::ResolvError: no address for old-elasticsearch-host-name.es.amazonaws.com
from /opt/rubies/ruby-2.6.6/lib/ruby/2.6.0/resolv.rb:94:in `getaddress'
The new URL of elasticsearch instance is different but it is still picking up the old URL from ELASTICSEARCH_HOST ENV variable.
Information from my CF template:
I can now provide info as per request. please tag me to see what I have in CF template
This was a config issue.
Whenever I was running aws update-stack command, it was going to s3 and pulling the zip (of the source code) code and in Gemfile of that zip code ruby version was set to 2.6.5.
So, I'd uploaded the fresh copy of the source code and then executed the update-stack command and it worked
I have set the proxy in command as following
set HTTP_PROXY=http://user:passowrd#host.com:8080
set HTTPS_PROXY=https://user:passowrd#host.com:8080
set HTTP_USER=myuser
set HTTP_PASSWORD=mypwd
and future more I have set environment variable as HTTP_PROXY, HTTPS_PROXY, HTTP_USER, HTTP_PASSWORD
Somehow still getting following error
>terraform init
Initializing the backend...
Initializing provider plugins...
- Checking for available provider plugins...
Registry service unreachable.
This may indicate a network issue, or an issue with the requested Terraform Registry.
Error: registry service is unreachable, check https://status.hashicorp.com/ for status updates
please note that https://status.hashicorp.com/ having access behind the proxy.
but I am not sure terraform init actually which URL/service API is getting access
Working for me with proxy:
C:\Users\xxxx\Desktop\VMWare_Scripts\Terraform>set HTTP_PROXY=http://xxxx:8080
C:\Users\xxxx\Desktop\VMWare_Scripts\Terraform>terraform init
Initializing the backend...
Initializing provider plugins...
Finding latest version of hashicorp/vsphere...
Installing hashicorp/vsphere v1.24.1...
Installed hashicorp/vsphere v1.24.1 (signed by HashiCorp)
The following providers do not have any version constraints in configuration,
so the latest version was installed.
To prevent automatic upgrades to new major versions that may contain breaking
changes, we recommend adding version constraints in a required_providers block
in your configuration, with the constraint strings suggested below.
hashicorp/vsphere: version = "~> 1.24.1"
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
I have just upgraded to Kafka 1.0 and zookeeper 3.4.10.At first, it all started fine. Stand - alone producer and consumer worked as expected. After I've ran my code for about 10 minutes, Kafka fails with this error:
[2017-11-07 16:48:01,304] INFO Stopping serving logs in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
[2017-11-07 16:48:01,320] FATAL Shutdown broker because all log dirs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)
I have reinstalled and reconfigured Kafka 1.0 again, the same thing happened. If I try to restart, the same error occurs.
Deleting log files helps to start Kafka, but it fails again after the short run.
I have been running 0.10.2 version for a long while, and never encountered anything like this, it was very stable over the long periods of time.
I have tried to find a solution and followed instructions in the documentation.
This is not yet a production environment, it is fairly simple setup, one producer, one consumer reading from one topic.
I am not sure if this could have anything to do with zookeeper.
**Update: ** the issue has been posted at Apache JIRA board
The consensus so far seems to be that it is a Windows issue.
Ran into this issue as well, and only clearing the kafka-logs did not work. You'll also have to clear zookeeper.
Steps to resolve:
Make sure to stop zookeeper.
Take a look at your server.properties file and locate the logs directory under the following entry.
Example:
log.dirs=/tmp/kafka-logs/
Delete the log directory and its contents. Kafka will recreate the directory once it's started again.
Take a look at the zookeeper.properties file and locate the data directory under the following entry.
Example:
dataDir=/tmp/zookeeper
Delete the data directory and its contents. Zookeeper will recreate the directory once it's started again.
Start zookeeper.
<KAFKA_HOME>bin/zookeeper-server-start.sh -daemon <KAFKA_HOME>config/zookeeper.properties
Start the kakfa broker.
<KAFKA_HOME>bin/kafka-server-start.sh -daemon <KAFKA_HOME>config/server.properties
Verify the broker has started with no issues by looking at the logs/kafkaServer.out log file.
I've tried all the solutions like
Clearing Kafka Logs and Zookeeper Data (issue reoccurred after creating new topic)
Changing log.dirs path from forward slash "/" to backward slash "\" (like log.dirs=C:\kafka_2.12-2.1.1\data\kafka ) folder named C:\kafka_2.12-2.1.1\kafka_2.12-2.1.1datakafka was created and the issue did stop and the issue was resolved.
Finally I found this link, you'll get it if you google kafka log.dirs windows
Just clean the logs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs and restart kafka
If at all, you are trying to execute in Windows machine, try changing path in windows way for parameter log.dirs (like log.dirs=C:\some_path\some_path_kafLogs) in server.properties in /config folder.
By default, this path will be in unix way (like /unix/path/).
This worked for me in Windows machine.
So this seems to be a windows issue.
https://issues.apache.org/jira/browse/KAFKA-6188
The JIRA is resolved, and there is an unmerged patch attached to it.
https://github.com/apache/kafka/pull/6403
so your options are:
get it running on windows and build it with the patch
run it in a unix style filesystem (linux or mac)
perhaps running it on docker in windows is worth a shot
The problem is in a concurrent working with log files of kafka. The task is a delaying of external log files changing between all Kafka threads and
Topic configuration can help:
Map<String, String> config = new HashMap<>();
config.put(CLEANUP_POLICY_CONFIG, CLEANUP_POLICY_COMPACT);
config.put(FILE_DELETE_DELAY_MS_CONFIG, "3600000");
config.put(DELETE_RETENTION_MS_CONFIG, "864000000");
config.put(RETENTION_MS_CONFIG, "86400000");
What worked for me was deleting both kafka and zookeeper log directories then configuring my log directories path in both kafka and zookeeper server.properties files (can be found in kafka/conf/server.properties) from the usual slash '/' to a backslash '\'
on windows changing to path separators '' resolved the issue, each required a double backslash ' C:\\path\\logs
Simply delete all the logs from :
C:\tmp\kafka-logs
and restart zookeeper and kafka server.
I need to keep elasticsearch-data in sync within 3 server using elasticsearch-curator. All I want to update data on one server and others update themselves using snapshot and restore method.
I was able to create snapshot using curator on first server but couldn't restore it on another.
Snapshot
While taking snapshot Host entry in curator.yml is like hosts: ["localhost"] on Server 1. I can easily restore it on Server 1 itself.
But, the problem arise when I try to restore it on Server 2
Host entry in curator.yml is like hosts: ["localhost","Server 1 IP"]
It generates error message:
2017-02-27 10:39:58,927 INFO Preparing Action ID: 1, "restore"
2017-02-27 10:39:59,145 INFO Trying Action ID: 1, "restore": Restore all indices in the most recent curator-* snapshot with state SUCCESS. Wait for the restore to complete before continuing. Do not skip the repository filesystem access check. Use the other options to define the index/shard settings for the restore.
2017-02-27 10:39:59,399 INFO Restoring indices "['test_sec']" from snapshot: curator-20170226143036
2017-02-27 10:39:59,409 ERROR Failed to complete action: restore. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(500, u'snapshot_restore_exception', u'[all_index:curator-20170226143036]snapshot does not exist')
This is kind of related to the answer at how to restore elasticsearch indices from S3 to blank cluster using curator?
How did you add the repository to the original (source) cluster? You need to use the exact same steps to add the repository to the new (target) cluster. Only then will the repository be readable by the new cluster.
Without more information, it's harder to pinpoint, but the snapshot does not exist message seems clear in this regard. It indicates that the repository is not the same shared file system as the source cluster.
I am using rsyslog (rsyslog-7.4.7-7.el7_0.x86_64) on CentOS 7 (CentOS Linux release 7.1.1503 (Core)). We have some applications on it which is using syslog framework for logging. We have a lot of logs. At peak, it can be upto 50000 logs in one second.
Our system was earlier running on CentOS 6.2 (and rsyslog 5.8) and we never observed any drop. After doing some search, we found that there is rate limiting. We are getting messages like "imjournal: begin to drop messages due to rate-limiting" in /var/log/messages and then "imjournal: 130886 messages lost due to rate-limiting". We tried different ways to disable or tune it without success. We tried the following.
1) Changes in /etc/rsyslog.conf
$ModLoad imjournal # provides access to the systemd journal
$imjournalRatelimitInterval 1
$imjournalRatelimitBurst 50000
Some other info from rsyslog.conf as follows. Didn't change anything here
$OmitLocalLogging on
$IMJournalStateFile imjournal.state
We also saw that there is some rate limiting with imuxsock; but that we understand that that won't be used when OmitLocalLogging is ON
2) Changes in /etc/systemd/journald.conf
Storage=auto
RateLimitInterval=1s
RateLimitBurst=100000
Our application has modules in Java (using SLF4J and LOG4J) and modules in C/C++ (using syslog() call). For the C/C++ modules, we are missing DEBUG logs most of the time. But DEBUG logs of Java modules are apparently fine always.
Version of systemd is "systemd-208-20.el7.x86_64". The application and rsyslogd are on same machine.
With the latest update to systemd (219-19) on CentOS 7, the only way we were able to get our logging working again, without any rate limiting by journald or IMUXSock, was using the config changes below. This also decreased, but didn't completely eliminate the excessive CPU consumption by journald.
Add the following to /etc/rsyslog.conf after '$ModLoad imuxsock' and '$ModLoad imjournal':
$IMUXSockRateLimitInterval 0
$IMJournalRatelimitInterval 0
Set the following in /etc/systemd/journald.conf:
Storage=volatile
Compress=no
RateLimitInterval=0
MaxRetentionSec=5s
Restart journald and rsyslog to pickup the changes with:
systemctl restart systemd-journald.service
systemctl restart rsyslog.service
Prior to this last update to systemd, you could comment out '$ModLoad imjournal' in /etc/rsyslog.conf to resolve this, but that doesn't work any longer.