Riak node no longer working after changing IP address - amazon-ec2

I'm using an instanced Amazon EC2 virtual Ubuntu 12.04 server as my single Riak node. I've gone through all the proper stages of setting up Riak on the instance using the guide on the basho website here. Where x.x.x.x is the private IP address of the instance, this included:
Installation
Using sudo su - to gain root privileges (EC2 logs me in as 'Ubuntu').
Installing the SSL Lib with:
sudo apt-get install libssl0.9.8
Downloading the 64-bit package for 12.04:
wget http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/CURRENT/ubuntu/precise/riak_1.2.1-1_amd64.deb
Then unpacking via:
sudo dpkg -i riak_1.2.1-1_amd64.deb
As instructed in the basho guide, I updated these two files (using vi):
vm.args
Changing -name riak#x.x.x.x to the private IP of my instance.
app.config
Changing {http, [ {"x.x.x.x", 8098 } ]} to the private IP of my instance.
Changing {pb_ip, "x.x.x.x"} to the private IP of my instance.
The Riak node was working fine when I first setup the server and performed the above, I could connect to the node, and using riak start then riak-admin test returned successfully with:
>Attempting to restart script through sudo -H -u riak
>Successfully completed 1 read/write cycle to 'riak#x.x.x.x'
The next day I fired up the instance, repeated the above process (ignoring installation) with the instance's new IP address y.y.y.y (the private IP of the instance changes every time it stops/starts) and typed riak start into the terminal, only to be greeted with:
>Attempting to restart script through sudo -H -u riak
>Riak failed to start within 15 seconds,
>see the output of 'riak console' for more information.
>If you want to wait longer, set the environment variable
>WAIT_FOR_ERLANG to the number of seconds to wait
In the riak console the error given is:
>gen_server riak_core_capability terminated with reason: no function clause matching orddict:fetch('riak#y.y.y.y', [{'riak#x.x.x.x',[{{riak_core,staged_joins},[true,false]},{{riak_core,vnode_routing},[proxy,...]},...]}])
Where y.y.y.y is the new instance IP address and x.x.x.x was the old one.
I've been scratching my head over this for a while now and can't find anything on the topic, the only solution I can think of is to re-install Riak on the off chance my PATH directories have gone wrong. If that fails my last resort would be to terminate the instance and reconfigure Riak on a new instance. So before I jump the gun, what I would like to ask is:
After updating the fields in app.config and vm.args with the new instance IP address, why is the riak start command no longer successful?
Is there any way for an Ubuntu EC2 instance to be assigned with a static private IP? Not only would this help solve the problem, but saves me time having to update app.config and vm.args every time I start/stop the instance.

So after some more digging around and intense reading, I've found a solution:
You need to remove the Riak ring and start Riak again to reset riak_core.
You can do this by using this command in the terminal:
rm -rf /var/lib/riak/ring/*
NOTE: This should be done after you've updated app.config and vm.args with the new server IP, nasty side-effects can occur otherwise.
Then
riak start
I was no longer thrown a 'failed to connect' error, and after issuing a riak-admin test command I pleasantly received (where y.y.y.y is my instance's private IP):
>Attempting to restart script through sudo -H -u riak
>Successfully completed 1 read/write cycle to 'riak#y.y.y.y'
I should note that this solution applies to virtual servers as well as physical ones. Although I would imagine the reassigning of IP's would be a much rarer occurrence in physical servers.
Now while that solves the issue, it still means whenever I need to reboot the instance I have to go through editing the app.config and vm.args files to change the private IP address (remember the private IP changes every time an Ubuntu instance is started/stopped) and then clear the Riak ring using the command above, so it's not exactly an elegant solution.
If anyone knows a way to set a static private IP to an EC2 instance (or another solution that tackles both hurdles?) it would solve this problem outright.
EDIT: 14/12/12
A limited solution to assigning a static IP to an EC2 instance:
Amazon Web Services allows the association of Elastic IP's to EC2 instances (of any kind). Therefore, if an instance has an elastic IP associated with it, even if it is rebooted, that IP will remain associated with that instance. You can find the documentation on elastic IP's here.
If you're under Amazon's free usage tier, creating an Elastic IP shouldn't charge you as long as it's associated with a running instance. If an elastic IP is disassociated, Amazon will incur charges for each running hour of an unused Elastic IP for as long as that Elastic IP remains disassociated. For example, terminating an instance will disassociate an elastic IP, unless that elastic IP is re-associated or released, the above applies. Stopping your instance entirely then starting it at a later time will also disassociate an elastic IP.
You can have a maximum of one elastic IP per an instance, any more and this will incur charges.
For those interested, you can find more information Elastic IP's pricing here under Elastic IP Addresses.

As of Riak 1.3, riak-admin reip is deprecated and the use of riak-admin cluster replace is the recomended way of replacing a cluster's name.
These are the commands I had to issue:
riak stop # stop the node
riak-admin down riak#127.0.0.1 # take it down
sudo rm -rf /var/lib/riak/ring/* # delete the riak ring
sudo sed -i "s/127.0.0.1/`hostname -i`/g" /etc/riak/vm.args # Change the name in config
riak-admin cluster force-replace riak#127.0.0.1 riak#"`hostname -i`" # replace the name
riak start # start the node
That should set the node's name to riak#[your EC2 internal IP address].

As well as changing the PB and HTTP IP's in the app.config, and the vm.args IP I also had to run:
http://docs.basho.com/riak/1.2.0/references/Command-Line-Tools---riak-admin/#reip
Without doing this, running riak console and looking in the output, the old IP is still present in the error log.

Related

MongoDB no suitable servers found

I'm having trouble connecting to a replica set.
[MongoDB\Driver\Exception\ConnectionTimeoutException]
No suitable servers found (`serverSelectionTryOnce` set):
[Server closed connection. calling ismaster on 'a.mongodb.net:27017']
[Server closed connection. calling ismaster on 'b.mongodb.net:27017']
[Server closed connection. calling ismaster on 'c.mongodb.net:27017']
I however, can connect using MongoChef
Switching any localhost references to 127.0.0.1 helped me. There is a difference between localhost and 127.0.0.1
See: localhost vs. 127.0.0.1
MongoDB can be set to run on a UNIX socket or TCP/IP
If all else fails, what I've found that works most consistently across all situations is the following:
In your hosts file, make sure you have a name assigned to the IP address you want to use (other than 127.0.0.1).
192.168.0.101 coolname
or
192.168.0.101 coolname.somedomain.com
In mongodb.conf:
bind_ip = 192.168.0.101
Restart Mongo
NOTE1: When accessing mongo from the command line, you now have to specify the host.
mongo --host=coolname
NOTE2: You'll also have to change any references to either localhost or 127.0.0.1 to your new name.
$client = new MongoDB\Client("mongodb://coolname:27017");
I had the same error in a docker based setup:
container1: nginx listening on port 80
container2: php-fpm listening on port 9000
container3: mongodb listening on port 27017
nginx forwarding php to php-fpm
Trying to access mongodb from php gave this error.
In the mongodb Dockerfile, the culprit was:
CMD ["mongod", "--bind_ip", "127.0.0.1"]
Needed to change it to:
CMD ["mongod", "--bind_ip", "0.0.0.0"]
And the error went away. Hope this helps somebody.
The IP address of your home network may have changed, which would lead to MongoDB locking you out.
I solved this problem for myself by going to MongoDB Atlas and changing which IP address is allowed to connect to my data. Originally, I'd set it up to only allow connections from my home network. But my home network IP address changed, and I started getting the same error message as you.
To check if this is the same issue with you, go to MongoDB Atlas, go into your project, and click "Network Access" on the left hand side of the screen. That's where you're able to update your IP address. It shows you what IP address(es) it's allowing in. To find out what your current IP address is, go to whatismyipaddress.com and update MongoDB if it's different.
In my case, I am temporarily coding PHP from Windows7 against MongoDB on my VPS running Linux Debian 9. The PHP will be eventually running in the same Linux box to provide an API to the MongoDB data.
BTW, it does not appear this local composer install is doing me any good, it's pure ugliness. My PHP after the fix below works without the require line require_once 'C:\Users\<Windows User Name>\vendor\autoload.php'.
My fix is different than the accepted answer which to me did not make sense.
I did not have to touch any hosts file
So edit your /etc/mongod.conf with your target machine's IP and restart with sudo systemctl restart mongod that's it
I don't know what to blame
PHP and MongoDB sites for the terrible documentation skimpy and incomplete PHP examples, or...
MongoDB installation on Linux failing to mention this bindIP.
My startup experience with MongoDB is so far very negative given all the changes that have occurred nothing matches what I expected from the videos I watched. I can't seem to find any that reflect what I am going thru like
$DB_CONNECTION_STRING="mongodb://user:password#164.152.09.84:27017"
$m = new MongoDB\Driver\Manager( $DB_CONNECTION_STRING )
instead of
$m = new MongoClient()
Hope this helps someone
PS. Always say NO to semicolons, camelCAsE and anything case-sensitive... absurdity at its best.

Installing Kubernetes on mac with vagrant and virtualbox

This is my first attempt to install and use Kubernetes. I am trying to install an environment on Mac for developing my own apps and deploying them for test locally with Kubernetes. I am familiar with using Vagrant, VirtualBox and Docker for the same purpose. When I saw this page https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/vagrant.md I assumed it would be trivial. I executed these lines:
export KUBERNETES_PROVIDER=vagrant
curl -sS https://get.k8s.io | bash
This created a master VM and a Minion, but Kubernetes seems to have failed to start on the master. On the master /var/log/salt/master is full of python Traceback errors, like this:
2015-07-17 22:14:42,629 [cherrypy.error ][INFO ][3252] [17/Jul/2015:22:14:42] ENGINE Started monitor thread '_TimeoutMonitor'.
2015-07-17 22:14:42,736 [cherrypy.error ][ERROR ][3252] [17/Jul/2015:22:14:42] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in _start_http_thread
self.httpserver.start()
File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 1824, in start
raise socket.error(msg)
error: No socket could be created
Vagrant is version 1.7.3. VirtualBox is version 4.3.30
Have I made an obvious stupid mistake?
I don't yet know the fix but I know what is going wrong since it happens to me as well:
OS X 10.10.3
Vagrant 1.7.4
VirtualBox 4.3.30
Kubernetes 1.0.1
When I run the default configuration of this (which creates one "master" and one "minion" VM) I see that the static IP address is not being assigned to the "eth1" interface, and I also see that the Salt API server is sitting in what appears to be an infinite retry loop because it is trying to listen on that IP address.
Also, the following message happened during boot:
[vagrant#kubernetes-master ~]$ dmesg | grep eth1
[ 9.321496] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
So basically, the static IP address didn't get assigned because eth1 wasn't ready when the system first booted, and Salt is waiting for it to get assigned.
I could fix this after boot by sshing to the box using "vagrant ssh" and running the command:
sudo /etc/init.d/network restart
on each host.
This "fixes" eth1 by assigning the static IP address, and after that Salt begins to do its thing, installs Docker, boots various containers, and so on.
What I don't know is how to make this work every time without manual intervention. It appears to be some sort of a race condition between Vagrant and VirtualBox.
If you just want to kick the tires with Kubernetes, I'd recommend installing boot2docker and then following the Running kubernetes locally via Docker getting started guide. Once you are comfortable interacting with the Kubernetes API and want a more complex local setup, you can then work on installing Vagrant.
If the Vagrant instructions aren't working, you should also feel free to file a bug in the github repository.
The tutorial pointed by Robert is realy easy to run. Just change the version to 0.21.2 (maybe 0.21.3 works too).
Else, if you prefer a vagrant solution, try with pires cluster on vagrant. It runs with almost nothing to change.
Running Kubernetes inside VirtualBox requires 4 networks and some adjustments to the configuration:
The VirtualBox HOST ONLY network will be the network used to access the Kubernetes master and nodes from the Mac or PC.
The NAT Network to download packages from the Internet.
The internal connections between Kubernetes PODs uses a tunnel network TUN
The Kubernetes Cluster IP Network is a private IP range used inside the cluster to give each Kubernetes service a dedicated IP
Vagrantfile needs to pass the node public IPs to the Ansible roles that configure Kubernetes to set KUBELET_EXTRA_ARGS environment variable with the public IP of each node (required for reading logs using kubectl).
NodePort needs to be used to publish applications running inside the Kubernetes cluster as Load Balancers are not available in VirtualBox.
See the full example and download the code at Building a Kubernetes Cluster with Vagrant and Ansible (without Minikube), it has been tested in Ubuntu but should work on a MAC as well.

Unable to get Mesos to run from tutorial: Setting up a Single Node Mesosphere Cluster

I have been following this tutorial to try and setup a single node mesosphere cluster from their
official tutorial:
http://mesosphere.com/docs/getting-started/developer/single-node-install/
I followed all the commands without any issues, and I also added the ports 5050 and 8080 to my security group. When I try to access the console for mesos/marathon, I get a "Internet Explorer cannot display the webpage" message.
They also recommend checking it the following way:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
But that comes up with an error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0106 17:03:08.126703 20993 process.cpp:1561] Failed to initialize, gethostbyname2: Unknown host
*** Check failure stack trace: ***
I am not really sure how to troubleshoot this either, and there are not many tutorials I could find on how to install mesos on ubuntu.
I checked the contents of the zk file, seems to be the default value.
$ cat /etc/mesos/zk
zk://localhost:2181/mesos
I would really appreciate any clues on how to go about this one.
Edit: The process is definitely running too - just an fyi:
root 31545 8.5 5.9 187464 35604 ? Ssl 17:28 0:00 /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --log_dir=/var/log/mesos
root 31563 28.5 2.1 116304 12856 ? Rs 17:28 0:00 /usr/local/sbin/mesos-master --zk=zk://localhost:2181/mesos --port=5050 --log_dir=/var/log/mesos --quorum=1 --wo
Mesos uses gethostbyname2 to resolve hostnames to IPs. The first thing I would recommend, is to try "ping localhost" and "ping hostname", and verify that there are no strange settings in /etc/hosts. If you're doing a multi-node cluster, I'd recommend that hostname map to the public IP address (not 127.0.x.1).
If that doesn't help, you can try setting the --ip and --hostname flags when starting mesos-master and mesos-slave, to bypass the gethostbyname2 resolution. These can also be set by writing to the file-based parameters, e.g. /etc/mesos/mesos-master/ip
For additional troubleshooting, try running wget http://localhost:5050 (or curl -L) from the mesos master, to verify that it is locally visible. Also try wget http://<public_ip>:5050 to verify that the web server is up and serving to the public IP. Depending on how your (EC2?) node is setup, you may need to expose/forward the port, or connect to a VPN.
Thanks Adam. I ran the wget and curl commands, and nothing was actually listening on port 8080 or 5050. I did open those ports in the ec2. A simple reboot did the trick however, once I ssh'ed into the ec2 instance after the reboot, both mesos and marathon were running and both ports are now showing after I ran
netstat -ntln.

How to read Elastic ip of an aws instance when created through vagrant and chef-solo

I am using vagrant with chef-solo for creating and provisioning a test environment, with an elastic ip assigned. I want to read the elastic ip of the test environment and return it to jenkins, where jenkins uses this ip and deploys the war into this machine for a functional testing.
Is this possible to do?
Yes it's possible to get the system public IP. You can get the information by accessing the instance metadata. Here is the command by which you can get the public IP associated with your instance.
GET http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:29:96:8f:6a:2d/public-ipv4s
Change the MAC address as yours.
Thanks linuxnewbee. But i found another way.
$vagrant ssh
$ ec2metadata | grep public-hostname
This command returns publich-hostname ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com
Now I have to pass this ip to jenkins, which is yet to be done.

Install Chef-Server 11 on EC2 Instance

I am using hosted Chef for quite some time. Wanted to explore the opensource chef server. hence I am trying to setup my Chef-Server 11 on EC2 instance.
I have Chef-server running and I can access the web GUI for the same. I have the chef-workstation configured on another ec2 instance that is also working fine.
Problem: I am not able to upload any cookbook.
I get below error when I try uploading the cookbook:
# knife cookbook upload getting-started
Uploading getting-started [0.4.0]
/opt/chef/embedded/lib/ruby/1.9.1/net/http.rb:763:in `initialize': Connection refused - connect(2) (Errno::ECONNREFUSED)
However, other list commands of knife are working fine.
I did my home work and bumped on below links:
http://www.opscode.com/blog/2013/03/11/chef-11-server-up-and-running/
http://www.curvve.com/blog/servers/2013/script-to-configure-and-set-your-hostname-and-fqdn-on-ec2-instances/
So,
It is mentioned that the chef-server needs a working FQDN to work. I set the my public ec2 host name as the hostname of the server as well as set it up in /etc/hosts. Rebooted the instance. Ran chef-server-ctl reconfigure again. And still facing the same error.
QUESTION: How to figure out the FQDN part of the EC2 instance for chef-server to work? if anyone has set up chef-server successfully on EC2 and was able to upload the cookbooks, then please share your steps for FQDN workout.
I was having a hard time with this but this solution worked!
Edit /etc/chef-server/chef-server.rb and add these lines (create the file if it doesn't exist):
server_name = "THE PUBLIC IP OF YOUR INSTANCE"
api_fqdn server_name
nginx['url'] = "https://#{server_name}"
nginx['server_name'] = server_name
lb['fqdn'] = server_name
bookshelf['vip'] = server_name
I found the solution here
http://sahebjade.blogspot.com/2013/05/check-your-knife-configuration-and.html
This is how i got it working. updated the public DNS name of my ec2 instance (chef-server) in /etc/sysconfig/network and service network restart. Now I am able to upload the cookbooks fine.
Need to think about elastic IP as potential option for my chef-server.
Edit /etc/chef-server/chef-server.rb and add these lines (create the file if it doesn't exist):
bookshelf["vip"] = node["ipaddress"]
bookshelf["url"] = "https://#{node["ipaddress"]}"
erchef['s3_url_ttl'] = 3600
The first two lines will point your chef-server URL to the machine's IP and the third will solve a timeout issue that apparently always exist when the Chef Server is on EC2.
I wanted to expand some on the answers since they don't give a complete picture. This applies to Chef 11 (hopefully Chef 12 is smarter)
In my case I rolled a master up under VPC #1 which gave it an internal address like this
ip-10-0-0-10.ec2.internal
Because I was only playing with the VPC initially, I had misconfigured some things I needed so I had to drop it and I created a new scheme. Thankfully, I was able to snapshot the old Chef master and bring it up under the new VPC but I found that I couldn't log into Chef anymore. It took some digging but I found in my /var/log/chef-server/chef-server-webui/current log that the install had glommed onto the old hostname and set that as the internal URL for... everything. This caused problems after the internal hostname change
2014-12-24_16:19:09.46680 SocketError: Error connecting to https://ip-10-0-0-10.ec2.internal/users/admin - getaddrinfo: Name or service not known
Now, to the OP answer
Need to think about elastic IP as potential option for my chef-server
In my case, I just added a CNAME to CloudFlare and set that as my permanent address. Since I can set CloudFlare to a low TTL on that one address it makes it easy to move it around between IP changes (I don't need an Elastic IP while I'm just getting it configured). This way I could then tell Chef to always look for the same URL and not worry about an EIP.
Once that was done, I had to update Chef. I don't know what changed (this is 11.16.4) but I found the configs live in /var/opt/chef-server/chef-server-webui/etc/chefserver.rb as opposed to some of the other answers listing chef-server.rb. Not sure if that's a YMMV thing or not.
I changed the following towards the bottom of that file
# Environment specific application configuration.
# These values override the ones set in 'RAILS_ROOT/config/application.rb'
#config.chef_server_url = "https://ip-10-0-0-10.ec2.internal"
config.chef_server_url = "https://chef.mydomain.com"
I also changed /var/opt/chef-server/nginx/etc/chef_https_lb.conf
server_name chef.mydomain.com;
Finally I restarted Chef
chef-server-ctl restart
That seems to have done the trick. Logins work again.

Resources