I am having a problem. I am trying to get a node IP that's running postgresql and is the replication master. So I can implant the IP to repmgr cookbook so it will automatically set it in to SSH config file. I need it because I am not running SSH on port 22 so I need to automatize the process.
This is the configure.rb file:
template File.join(node[:repmgr][:pg_home], '.ssh/config') do
source 'ssh_config.erb'
mode 0644
owner node[:repmgr][:system_user]
group node[:repmgr][:system_user]
variables( :hosts => Array(master_node[:ipaddress]) )
end
directory File.dirname(node[:repmgr][:config_file_path])
template node[:repmgr][:config_file_path] do
source 'repmgr.conf.erb'
mode 0644
end
Master node IP address is taken from attributes (default.rb):
default[:repmgr][:addressing][:master] = nil
I need to change the nil to something else so I can get the IP of the master server so slave aka. standby server can add it's IP to SSH config so it will replicate over my SSH port not the default 22 port.
I hope someone can help be because I am a really new to Ruby and I only know the basics of it.
Thank you. I hope you understand my question.
If you are using chef-server, then you can find other provisioned nodes through search in recipe. You can search for nodes by different properties: such as role, recipe, environment, name and so on. I hope your replication master node has some attribute that makes it unique, for example postgres recipe or replication-master role in run_list.
nodes = search( :node, 'recipes:postgres' )
or
nodes = search( :node, 'role:replication-master' )
Search returns an array of nodes that have the corresponding attributes.
And then:
node.default[:repmgr][:addressing][:master] = nodes.first[:ipaddress]
The code should be written in recipe file, not in attributes.
Related
My Setup
3 nodes running ceph + cephfs
2 of these nodes running CTDB & Samba
1 client (not one of the 3 servers)
It is a Lab setup, so only one nic per server=node, one subnet as well as all Ceph components plus Samba on the same servers. I'm aware, that this is not the way to go.
The problem
I want to host a clustered Samba file share on top of Ceph with ctdb. I followed the CTDB documentation (https://wiki.samba.org/index.php/CTDB_and_Clustered_Samba#Configuring_Clusters_with_CTDB) and parts of this: https://wiki.samba.org/index.php/Samba_CTDB_GPFS_Cluster_HowTo.
The cluster is running and a share is reachable, readable and writeable on both nodes, my smb.conf looks as follows:
[global]
netbios name = CEPHFS
workgroup = SIMPLE
clustering = yes
idmap config * : backend = autorid
idmap config * : range = 1000000-1999999
log file = /var/log/samba/smb.log
# Set files creation permissions
create mask = 664
force create mode = 664
# Set directory creation mask
directory mask = 2775
force directory mode = 2775
[public]
comment = public share
path = /mnt/mycephfs/testshare
public = yes
writeable = yes
only guest = yes
ea support = yes
CTDB manages Samba and reports both nodes as OK.
But when i read or write to one of the nodes via the public IP and let it fail (restarting ctdb), the read or write fails. A second write attempt succeeds (the public IP gets taken by the other host successfully).
But CTDB should be able to do this according to https://ctdb.samba.org/ -> IP Takeover?
I have a tcpdump of the new server (the one taking over the public ip) sending a tcp RST to my client after the client sending retransmissions to the server.
Any idea, what the problem could be?
PS: I'm more than happy to provide you with more Information (ctdb config file, firewall configuration, pcap, whatever ;) ) but this is long enough ....
I tried to use Ambari to manage the installation and maintenance of the Hadoop cluster.
After I started ambari server, I use the web page to set up Hadoop cluster.
But at the 3rd step-- confirm hosts, the error shows below
And I check the log at /var/log/ambari-server, I found:
INFO:root:BootStrapping hosts ['qiao'] using /usr/lib/python2.6/site-packages/ambari_server cluster primary OS: redhat6 with user 'root' sshKey File /var/run/ambari-server/bootstrap/1/sshKey password File null using tmp dir /var/run/ambari-server/bootstrap/1 ambari: master; server_port: 8080; ambari version: 1.4.1.25
INFO:root:Executing parallel bootstrap
ERROR:root:ERROR: Bootstrap of host qiao fails because previous action finished with non-zero exit code (1)
INFO:root:Finished parallel bootstrap
Do you provide ssh rsa private key or paste it?
and from the place you are installing, make sure you can ssh to any hosts without typing any password.
If still the same error, try
ambari-server reset
ambari-server setup
Pls restart ambari-server
ambari-server restart
and then try accessing Ambari
It would work.
Make sure you can ssh to every single host on the list, including all master hosts.
To do this, ensure that Ambari host's .ssh/id_rsa.pub entry is included in every hosts' .ssh/authorized_keys file. Then ssh from Ambari's host to every single server - and check if it is asking for your password. You can use a tutorial like http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/ to check if everything has been done properly.
You need to do the same on the Ambari host itself, if you added it to hosts list.
I'm using an instanced Amazon EC2 virtual Ubuntu 12.04 server as my single Riak node. I've gone through all the proper stages of setting up Riak on the instance using the guide on the basho website here. Where x.x.x.x is the private IP address of the instance, this included:
Installation
Using sudo su - to gain root privileges (EC2 logs me in as 'Ubuntu').
Installing the SSL Lib with:
sudo apt-get install libssl0.9.8
Downloading the 64-bit package for 12.04:
wget http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/CURRENT/ubuntu/precise/riak_1.2.1-1_amd64.deb
Then unpacking via:
sudo dpkg -i riak_1.2.1-1_amd64.deb
As instructed in the basho guide, I updated these two files (using vi):
vm.args
Changing -name riak#x.x.x.x to the private IP of my instance.
app.config
Changing {http, [ {"x.x.x.x", 8098 } ]} to the private IP of my instance.
Changing {pb_ip, "x.x.x.x"} to the private IP of my instance.
The Riak node was working fine when I first setup the server and performed the above, I could connect to the node, and using riak start then riak-admin test returned successfully with:
>Attempting to restart script through sudo -H -u riak
>Successfully completed 1 read/write cycle to 'riak#x.x.x.x'
The next day I fired up the instance, repeated the above process (ignoring installation) with the instance's new IP address y.y.y.y (the private IP of the instance changes every time it stops/starts) and typed riak start into the terminal, only to be greeted with:
>Attempting to restart script through sudo -H -u riak
>Riak failed to start within 15 seconds,
>see the output of 'riak console' for more information.
>If you want to wait longer, set the environment variable
>WAIT_FOR_ERLANG to the number of seconds to wait
In the riak console the error given is:
>gen_server riak_core_capability terminated with reason: no function clause matching orddict:fetch('riak#y.y.y.y', [{'riak#x.x.x.x',[{{riak_core,staged_joins},[true,false]},{{riak_core,vnode_routing},[proxy,...]},...]}])
Where y.y.y.y is the new instance IP address and x.x.x.x was the old one.
I've been scratching my head over this for a while now and can't find anything on the topic, the only solution I can think of is to re-install Riak on the off chance my PATH directories have gone wrong. If that fails my last resort would be to terminate the instance and reconfigure Riak on a new instance. So before I jump the gun, what I would like to ask is:
After updating the fields in app.config and vm.args with the new instance IP address, why is the riak start command no longer successful?
Is there any way for an Ubuntu EC2 instance to be assigned with a static private IP? Not only would this help solve the problem, but saves me time having to update app.config and vm.args every time I start/stop the instance.
So after some more digging around and intense reading, I've found a solution:
You need to remove the Riak ring and start Riak again to reset riak_core.
You can do this by using this command in the terminal:
rm -rf /var/lib/riak/ring/*
NOTE: This should be done after you've updated app.config and vm.args with the new server IP, nasty side-effects can occur otherwise.
Then
riak start
I was no longer thrown a 'failed to connect' error, and after issuing a riak-admin test command I pleasantly received (where y.y.y.y is my instance's private IP):
>Attempting to restart script through sudo -H -u riak
>Successfully completed 1 read/write cycle to 'riak#y.y.y.y'
I should note that this solution applies to virtual servers as well as physical ones. Although I would imagine the reassigning of IP's would be a much rarer occurrence in physical servers.
Now while that solves the issue, it still means whenever I need to reboot the instance I have to go through editing the app.config and vm.args files to change the private IP address (remember the private IP changes every time an Ubuntu instance is started/stopped) and then clear the Riak ring using the command above, so it's not exactly an elegant solution.
If anyone knows a way to set a static private IP to an EC2 instance (or another solution that tackles both hurdles?) it would solve this problem outright.
EDIT: 14/12/12
A limited solution to assigning a static IP to an EC2 instance:
Amazon Web Services allows the association of Elastic IP's to EC2 instances (of any kind). Therefore, if an instance has an elastic IP associated with it, even if it is rebooted, that IP will remain associated with that instance. You can find the documentation on elastic IP's here.
If you're under Amazon's free usage tier, creating an Elastic IP shouldn't charge you as long as it's associated with a running instance. If an elastic IP is disassociated, Amazon will incur charges for each running hour of an unused Elastic IP for as long as that Elastic IP remains disassociated. For example, terminating an instance will disassociate an elastic IP, unless that elastic IP is re-associated or released, the above applies. Stopping your instance entirely then starting it at a later time will also disassociate an elastic IP.
You can have a maximum of one elastic IP per an instance, any more and this will incur charges.
For those interested, you can find more information Elastic IP's pricing here under Elastic IP Addresses.
As of Riak 1.3, riak-admin reip is deprecated and the use of riak-admin cluster replace is the recomended way of replacing a cluster's name.
These are the commands I had to issue:
riak stop # stop the node
riak-admin down riak#127.0.0.1 # take it down
sudo rm -rf /var/lib/riak/ring/* # delete the riak ring
sudo sed -i "s/127.0.0.1/`hostname -i`/g" /etc/riak/vm.args # Change the name in config
riak-admin cluster force-replace riak#127.0.0.1 riak#"`hostname -i`" # replace the name
riak start # start the node
That should set the node's name to riak#[your EC2 internal IP address].
As well as changing the PB and HTTP IP's in the app.config, and the vm.args IP I also had to run:
http://docs.basho.com/riak/1.2.0/references/Command-Line-Tools---riak-admin/#reip
Without doing this, running riak console and looking in the output, the old IP is still present in the error log.
I am currently converting a large number of EC2 instances to a series of puppet scripts and then using Vagrant to virtualise the rig for local development.
I have gotten stuck on managing the network in vagrant and mapping that onto production.
First I have something like this:
# Main Web Server
config.vm.define :app do |app_config|
app_config.vm.host_name = "web1"
app_config.vm.network :hostonly, "10.0.0.2"
app_config.vm.box = "precise64"
...etc
puppet.manifest_file = "persist/web.pp"
end
# First DB server
config.vm.define :db1 do |db1_config|
db1_config.vm.host_name = "db1"
db1_config.vm.network :hostonly, "10.0.0.3"
db1_config.vm.box = "precise64"
...etc
puppet.manifest_file = "persist/db.pp"
end
etc.
Then in this case web.pp and db.pp are reasonably simply they just setup python, uwsgi, nginx, mysql etc.
So the problem: I need inside for example puppet/modules/hosts/files/hosts have to specify something like:
10.0.0.3 db1.X.com
10.0.0.4 db2.X.com
etc.
In production we use a combination of our sites DNS and ec2 instances DNS records (which I cant put into hosts). Typically our haproxy hosts have a public DNS record and they hold the EC2 names in their config (which makes using a hosts file impossible).
So how can I build a file that both puppet and vagrant can import and use a global mapping such as:
hosts = {
web => 10.0.0.2,
db1 => 10.0.0.3,
db2 => 10.0.0.4,
}
Which I can access from within puppet templates for example haproxy.cfg but also access from within Vagrant files so i can set vm.network to it as well.
Restrictions:
Cannot be IP only, must be symbolic name to either IP or DNS
Cannot use a puppet master (unfortunately not flexible on this one).
Virtualising the DNS server as well seems messy so id rather not.
Also I am really new to ruby, if you provide examples (which would be great) please describe it enough for me to be able to find my way through the ruby docs.
2 Options:
1)
You can have vagrant read a hash that you shown from a file ( placed in it's dir, DB, whatever ) and then generate Vagrantfile using that content. Vagrant's config just runs some ruby code ( i think ) if not
2)
Have a shell script generate Vagrantfile and/or puppet config / static file.
Implementation details are trivial.
I want to use the parallel capabilities of ipython on a remote computer cluster. Only the head node is accessible from the outside. I have set up ssh keys so that I can connect to the head node with e.g. ssh head and from there I can also ssh into any node without entering a password, e.g. ssh node3. So I can basically run any commands on the nodes by doing:
ssh head ssh node3 command
Now what I really want to do is to be able to run jobs on the cluster from my own computer from ipython. The way to set up the hosts to use in ipcluster is:
send_furl = True
engines = { 'host1.example.com' : 2,
'host2.example.com' : 5,
'host3.example.com' : 1,
'host4.example.com' : 8 }
But since I only have a host name for the head node, I don't think I can do this. One option is to set us ssh tunneling on the head node, but I cannot do this in my case, since this requires enough ports to be open to accommodate all the nodes (and this is not the case). Are there any alternatives?
I use ipcluster on the NERSC clusters by using the PBS queue:
http://ipython.org/ipython-doc/stable/parallel/parallel_process.html#using-ipcluster-in-pbs-mode
in summary you submit jobs which runs mpiexec ipengine, (after having launched ipcontroller on the login node). Do you have PBS on your cluster?
this was working fine with ipython .10, it is now broken in .11 alpha.
I would set up a VPN server on the master, and connect to that with a VPN client on my local machine. Once established, the virtual private network will allow all of the slaves to appear as if they're on the same LAN as my local machine (on a "virtual" network interface, in a "virtual" subnet), and it should be possible to ssh to them.
You could possibly establish that VPN over SSH ("ssh tunneling", as you mention); other options are OpenVPN and IPsec.
I don't understand what you mean by "this requires enough ports to be open to accommodate all the nodes". You will need: (i) one inbound port on the master, to provide the VPN/tunnel, (ii) inbound SSH on each slave, accessible from the master, (iii) another inbound port on each slave, over which the master drives the IPython engines. Wouldn't (ii) and (iii) be required in any setup? So all we've added is (i).