I'm using weed-fs 0.7 beta.
I'm having an issue where the master server always does not have any free volume servers while i have 2 of them.
I have 2 servers in Linode, i used one of them to create a master, volume and filer server using this command.
./weed server -ip.bind="192.168.139.166" -master.port=9333 -volume.port=8080 -volume.max="7" -dir="./data" -master.dir="./master" -filer=true -filer.dir="./filer"
The 3 systems starts properly. But when i check the master server using this command:
curl "http://{IP-ADDRESS}:9333/dir/status?pretty=y"
This is the result:
{
"Topology": {
"DataCenters": null,
"Free": 0,
"Max": 0,
"layouts": null
},
"Version": "0.70 beta"
}
I can add in file into the volume server directly using this:
curl -F file=help.txt http://{IP-ADDRESS}:8080/3,01637037d6
When i attempt to add the above file, this is the response on the console of the server:
I0512 08:30:06 20079 store.go:346] volume 3 size 20 will exceed limit 0
I0512 08:30:06 20079 store.go:348] error when reporting size: No master node available!
My best guess is that somehow the Master server does not seems to be able to detect the volume server, while both of them are on the same server.
I tried using my 2nd server to run volume server and point the master server's IP address using private IP, it does not work either.
But it seems like the volume servers are able to work without the master server.
Use -ip="192.168.139.166", instead of -ip.bind
Related
We want to migrate an ElasticSearch v7 index from server 1 to server 2.
How can we copy and restore a snapshot from one server to another?
The offical documentation seems to assume both servers share the same physical storage. These servers do not "know" each other and have completely seperate disks. In our case we:
Create the same repository foo on both servers
Run a snapshot foo1 on server 1
Create the /snapshots/foo directory on server 2
Verify the repo on server 2
Run a snapshot footest on server 2 to verify files are written to /snapshots/foo
Copy across the /snapshots/foo directory from server 1 to server 2
Run the restore of foo1:
POST /_snapshot/foo/foo1/_restore
{
"indices": "foo"
}
The error is:
{
"error": {
"reason": "[foo:foo1] snapshot does not exist",
"root_cause": [
{
"reason": "[foo:foo1] snapshot does not exist",
"type": "snapshot_restore_exception"
}
],
"type": "snapshot_restore_exception"
},
"status": 500
}
When we check the repo:
GET /_snapshot/foo/*
We can see only one snapshot (which we created on server 2). We cannot see the snapshot we copied across from server 1.
So how does one transfer a snapshot to server 2?
#Marc there are few options for you to copy data from one server to another in terms of indices:
Configure CCR (cross cluster replication) for your index and let it sync between the two servers (which can be on separate machines. network/ etc... they should just be able to communicate)
You can have a single repository created which can be accessed by both servers (using shared IAM user or role assuming it's on AWS). You can then take snapshot of your index from server1 and restore the snapshot in server2.
Not recommended but you can copy ${path.data} files from server1 to ${path.data} in server2.
However, the main question here would be: are your server1 and server2 running on same ES version ?
I have two Windows Server 2012 R2 machines located in one of the client's datacenters. Both servers are domain-joined. They both have RabbitMQ 3.6.0. installed on them. RabbitMQ is running as Windows Service on both machines. I've tried to cluster these two machines for a long time now without success. I always get the following error when I try to cluster them.
One the first machine nodeA I run the command 'rabbitmqctl join_cluster rabbit#nodeB'. This is what I get:
Clustering node 'rabbit#nodeA' with 'rabbit#nodeB' ...
Error: unable to connect to nodes ['rabbit#nodeB']: nodedown
`DIAGNOSTICS`
===========
attempted to contact: ['rabbit#nodeB']
rabbit#nodeB:
* connected to epmd (port 4369) on nodeB
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
* suggestion: is the Erlang distribution using TLS?
current node details:
- node name: 'rabbitmq-cli-3892#nodeA'
- home dir: C:\Users\mydirectory
- cookie hash: l+SSu57+cRyAQ03AJdwAbQ==
I've tried this setup with Azure Virtual Machines within Azure Virtual Network and I succeeded to cluster the two VM's, however it seems I cannot connect these two (customer's machines) together.
This is what I have done and ensured:
There isn't any firewall blocking connections
Added host names to hosts file located on C:\Windows\system32\drivers\etc
Tried to refer to host names as FQDN without adding anything to hosts file
Tried to refer to host names with CAPITAL letters and without
Copied the same exact .erlang.cookie to C:\Windows and C:\Users\mydirectory on both machines.
I've read, understood and applied RabbitMQ Clustering Guide https://www.rabbitmq.com/clustering.html
Stopped, restarted, reinstalled RabbitMQ on both machines.
It seems I can't get it to work. On Azure machines, which were not domain-joined clustering worked beautifully. I am really running out of options... Any help?
i had the same problem you need to install rabbitmq as a admin. uninstall then reinstall as admin and it should work fine
Try to connect to each of RabbitMQ nodes via remote shell and check if value of cookie is the same (cookie can be set in 3 different ways: .erlang.cookie is one of them).
erl -remsh 'rabbitmq-cli-3892#nodeA' -name 'test#nodeA'
erlang:get_cookie().
Previous night I was tinkering with Elixir running code on my both machines at home, but when I woke up, I asked myself Can I actually do the same using heroku run command?
I think theoretically it should be entirely possible if setup properly. Obviously heroku run iex --sname name executes and gives me access to shell (without functioning backspace which is irritating) but i haven't accessed my app yet.
Each time I executed the command it gave me different machine. I guess it's how Heroku achieve sandbox. I also was trying to find a way to determine address of my app's machine but haven't got any luck yet.
Can I actually connect with the dyno running the code to evaluate expressions on it like you would do iex -S mix phoenix.server locally ?
Unfortunately it's not possible.
To interconnect Erlang VM nodes you'd need EPMD port (4369) to be open.
Heroku doesn't allow opening custom ports so it's not possible.
In case You'd want to establish a connection between your Phoenix server and Elixir node You'd have to:
Two nodes on the same machine:
Start Phoenix using iex --name phoenix#127.0.0.1 -S mix phoenix.server
Start iex --name other_node#127.0.0.1
Establish a connection using Node.ping from other_node:
iex(other_node#127.0.0.1)1> Node.ping(:'phoenix#127.0.0.1')
(should return :pong not :pang)
Two nodes on different machines
Start Phoenix using some external address
iex --name phoenix#195.20.2.2 --cookie someword -S mix phoenix.server
Start second node
iex --name other_node#195.20.2.10 --cookie someword
Establish a connection using Node.ping from other_node:
iex(other_node#195.20.2.10)1> Node.ping(:'phoenix#195.20.2.2')
(should return :pong not :pang)
Both nodes should contact each other on the addresses they usually see each other on the network. (Full external IP when different networks, 192.168.X.X when in the same local network, 127.0.0.1 when on the same machine)
If they're on different machines they also must have set the same cookie value, because by default it takes automatically generated cookie in your home directory. You can check it out by running:
cat ~/.erlang.cookie
What's last you've got to make sure that your EPMD port 4369 is open, because Erlang VM uses it for internode data exchange.
As a sidenote if you will leave it open make sure to make your cookie as private as possible, because if someone knows it, he can have absolute power over your machine.
When you execute heroku run it will start a new one-off dyno which is a temporary instance that is deprovisioned when you finish the heroku run session. This dyno is not a web dyno and cannot receive inbound HTTP requests through Heroku's routing layer.
From the docs:
One-off dynos can never receive HTTP traffic, since the routers only route traffic to dynos named web.N.
https://devcenter.heroku.com/articles/one-off-dynos#formation-dynos-vs-one-off-dynos
If you want your phoenix application to receive HTTP requests you will have to set it up to run on a web dyno.
It has been a while since you've asked the question, but someone might find this answer valuable, though.
As of 2021 Heroku allows forwarding multiple ports, which allows to remsh into a running ErlangVM node. It depends on how you deploy your application, but in general, you will need to:
Give your node a name and a cookie (i.e. --name "myapp#127.0.0.1" --cookie "secret")
Tell exactly which port a node should bind to, so you know which pot to forward (i.e. --erl "-kernel inet_dist_listen_min 9000 -kernel inet_dist_listen_max 9000")
Forward EPMD and Node ports by running heroku ps:forward 9001:4369,9000
Remsh into your node: ERL_EPMD_PORT=9001 iex --cookie "secret" --name console#127.0.0.1 --remsh "myapp#127.0.0.1"
Eventually you should start your server with something like this (if you are still using Mix tool): MIX_ENV=prod elixir --name "myapp#127.0.0.1" --cookie "secret" --erl "-kernel inet_dist_listen_min 9000 -kernel inet_dist_listen_max 9000" -S mix phx.server --no-halt
If you are using Releases, most of the setup has already been done for you by the Elixir team.
To verify that EPMD port has been forwarded correctly, try running epmd -port 9001 -names. The output should be:
epmd: up and running on port 4369 with data:
name myapp#127.0.0.1 at port 9000
You may follow my notes on how I do it for Dockerized releases (there is a bit more hustle): https://paveltyk.medium.com/elixir-remote-shell-to-a-dockerized-release-on-heroku-cc6b1196c6ad
I'm using an instanced Amazon EC2 virtual Ubuntu 12.04 server as my single Riak node. I've gone through all the proper stages of setting up Riak on the instance using the guide on the basho website here. Where x.x.x.x is the private IP address of the instance, this included:
Installation
Using sudo su - to gain root privileges (EC2 logs me in as 'Ubuntu').
Installing the SSL Lib with:
sudo apt-get install libssl0.9.8
Downloading the 64-bit package for 12.04:
wget http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/CURRENT/ubuntu/precise/riak_1.2.1-1_amd64.deb
Then unpacking via:
sudo dpkg -i riak_1.2.1-1_amd64.deb
As instructed in the basho guide, I updated these two files (using vi):
vm.args
Changing -name riak#x.x.x.x to the private IP of my instance.
app.config
Changing {http, [ {"x.x.x.x", 8098 } ]} to the private IP of my instance.
Changing {pb_ip, "x.x.x.x"} to the private IP of my instance.
The Riak node was working fine when I first setup the server and performed the above, I could connect to the node, and using riak start then riak-admin test returned successfully with:
>Attempting to restart script through sudo -H -u riak
>Successfully completed 1 read/write cycle to 'riak#x.x.x.x'
The next day I fired up the instance, repeated the above process (ignoring installation) with the instance's new IP address y.y.y.y (the private IP of the instance changes every time it stops/starts) and typed riak start into the terminal, only to be greeted with:
>Attempting to restart script through sudo -H -u riak
>Riak failed to start within 15 seconds,
>see the output of 'riak console' for more information.
>If you want to wait longer, set the environment variable
>WAIT_FOR_ERLANG to the number of seconds to wait
In the riak console the error given is:
>gen_server riak_core_capability terminated with reason: no function clause matching orddict:fetch('riak#y.y.y.y', [{'riak#x.x.x.x',[{{riak_core,staged_joins},[true,false]},{{riak_core,vnode_routing},[proxy,...]},...]}])
Where y.y.y.y is the new instance IP address and x.x.x.x was the old one.
I've been scratching my head over this for a while now and can't find anything on the topic, the only solution I can think of is to re-install Riak on the off chance my PATH directories have gone wrong. If that fails my last resort would be to terminate the instance and reconfigure Riak on a new instance. So before I jump the gun, what I would like to ask is:
After updating the fields in app.config and vm.args with the new instance IP address, why is the riak start command no longer successful?
Is there any way for an Ubuntu EC2 instance to be assigned with a static private IP? Not only would this help solve the problem, but saves me time having to update app.config and vm.args every time I start/stop the instance.
So after some more digging around and intense reading, I've found a solution:
You need to remove the Riak ring and start Riak again to reset riak_core.
You can do this by using this command in the terminal:
rm -rf /var/lib/riak/ring/*
NOTE: This should be done after you've updated app.config and vm.args with the new server IP, nasty side-effects can occur otherwise.
Then
riak start
I was no longer thrown a 'failed to connect' error, and after issuing a riak-admin test command I pleasantly received (where y.y.y.y is my instance's private IP):
>Attempting to restart script through sudo -H -u riak
>Successfully completed 1 read/write cycle to 'riak#y.y.y.y'
I should note that this solution applies to virtual servers as well as physical ones. Although I would imagine the reassigning of IP's would be a much rarer occurrence in physical servers.
Now while that solves the issue, it still means whenever I need to reboot the instance I have to go through editing the app.config and vm.args files to change the private IP address (remember the private IP changes every time an Ubuntu instance is started/stopped) and then clear the Riak ring using the command above, so it's not exactly an elegant solution.
If anyone knows a way to set a static private IP to an EC2 instance (or another solution that tackles both hurdles?) it would solve this problem outright.
EDIT: 14/12/12
A limited solution to assigning a static IP to an EC2 instance:
Amazon Web Services allows the association of Elastic IP's to EC2 instances (of any kind). Therefore, if an instance has an elastic IP associated with it, even if it is rebooted, that IP will remain associated with that instance. You can find the documentation on elastic IP's here.
If you're under Amazon's free usage tier, creating an Elastic IP shouldn't charge you as long as it's associated with a running instance. If an elastic IP is disassociated, Amazon will incur charges for each running hour of an unused Elastic IP for as long as that Elastic IP remains disassociated. For example, terminating an instance will disassociate an elastic IP, unless that elastic IP is re-associated or released, the above applies. Stopping your instance entirely then starting it at a later time will also disassociate an elastic IP.
You can have a maximum of one elastic IP per an instance, any more and this will incur charges.
For those interested, you can find more information Elastic IP's pricing here under Elastic IP Addresses.
As of Riak 1.3, riak-admin reip is deprecated and the use of riak-admin cluster replace is the recomended way of replacing a cluster's name.
These are the commands I had to issue:
riak stop # stop the node
riak-admin down riak#127.0.0.1 # take it down
sudo rm -rf /var/lib/riak/ring/* # delete the riak ring
sudo sed -i "s/127.0.0.1/`hostname -i`/g" /etc/riak/vm.args # Change the name in config
riak-admin cluster force-replace riak#127.0.0.1 riak#"`hostname -i`" # replace the name
riak start # start the node
That should set the node's name to riak#[your EC2 internal IP address].
As well as changing the PB and HTTP IP's in the app.config, and the vm.args IP I also had to run:
http://docs.basho.com/riak/1.2.0/references/Command-Line-Tools---riak-admin/#reip
Without doing this, running riak console and looking in the output, the old IP is still present in the error log.
I want to use the parallel capabilities of ipython on a remote computer cluster. Only the head node is accessible from the outside. I have set up ssh keys so that I can connect to the head node with e.g. ssh head and from there I can also ssh into any node without entering a password, e.g. ssh node3. So I can basically run any commands on the nodes by doing:
ssh head ssh node3 command
Now what I really want to do is to be able to run jobs on the cluster from my own computer from ipython. The way to set up the hosts to use in ipcluster is:
send_furl = True
engines = { 'host1.example.com' : 2,
'host2.example.com' : 5,
'host3.example.com' : 1,
'host4.example.com' : 8 }
But since I only have a host name for the head node, I don't think I can do this. One option is to set us ssh tunneling on the head node, but I cannot do this in my case, since this requires enough ports to be open to accommodate all the nodes (and this is not the case). Are there any alternatives?
I use ipcluster on the NERSC clusters by using the PBS queue:
http://ipython.org/ipython-doc/stable/parallel/parallel_process.html#using-ipcluster-in-pbs-mode
in summary you submit jobs which runs mpiexec ipengine, (after having launched ipcontroller on the login node). Do you have PBS on your cluster?
this was working fine with ipython .10, it is now broken in .11 alpha.
I would set up a VPN server on the master, and connect to that with a VPN client on my local machine. Once established, the virtual private network will allow all of the slaves to appear as if they're on the same LAN as my local machine (on a "virtual" network interface, in a "virtual" subnet), and it should be possible to ssh to them.
You could possibly establish that VPN over SSH ("ssh tunneling", as you mention); other options are OpenVPN and IPsec.
I don't understand what you mean by "this requires enough ports to be open to accommodate all the nodes". You will need: (i) one inbound port on the master, to provide the VPN/tunnel, (ii) inbound SSH on each slave, accessible from the master, (iii) another inbound port on each slave, over which the master drives the IPython engines. Wouldn't (ii) and (iii) be required in any setup? So all we've added is (i).