I've been trying to install Redis on my Raspberry Pi Zero W running Raspbian Buster. I downloaded the latest and ran Make, which went fine, but when I run "make test," it fails under "Testing integration/replication" and then crashes my Pi so I have to reboot it. I've tried this several times and it always goes that same way. I'm nervous about running make install at this point because this test keeps failing. 😬
I am also running MariaDB, Wordpress, PostgreSQL, and Timetrex with nginx if that gives any sense of my current RAM usage.
Is there any way to get it to pass the test?
This is where it stops, during test 27:
[26/64 done]: integration/block-repl (27 seconds)
Testing integration/replication
[ok]: Slave enters handshake
[ok]: Slave is able to detect timeout during handshake
[ok]: Set instance A as slave of B
[ok]: INCRBYFLOAT replication, should not remove expire
[ok]: GETSET replication
[ok]: BRPOPLPUSH replication, when blocking against empty list
[ok]: BRPOPLPUSH replication, list exists
[ok]: BLMOVE (left, left) replication, when blocking against empty list
[ok]: BLMOVE (left, left) replication, list exists
[ok]: BLMOVE (left, right) replication, when blocking against empty list
[ok]: BLMOVE (left, right) replication, list exists
[ok]: BLMOVE (right, left) replication, when blocking against empty list
[ok]: BLMOVE (right, left) replication, list exists
[ok]: BLMOVE (right, right) replication, when blocking against empty list
[ok]: BLMOVE (right, right) replication, list exists
[ok]: BLPOP followed by role change, issue #2473
[ok]: Second server should have role master at first
[ok]: SLAVEOF should start with link status "down"
[ok]: The role should immediately be changed to "replica"
[ok]: Sync should have transferred keys from master
[ok]: The link status should be up
[ok]: SET on the master should immediately propagate
[ok]: FLUSHALL should replicate
[ok]: ROLE in master reports master with a slave
[ok]: ROLE in slave reports slave in connected state
[ok]: Connect multiple replicas at the same time (issue #141), master diskless=no, replica diskless=disabled
[ok]: Connect multiple replicas at the same time (issue #141), master diskless=no, replica diskless=swapdb
[ok]: Connect multiple replicas at the same time (issue #141), master diskless=yes, replica diskless=disabled
[ok]: Connect multiple replicas at the same time (issue #141), master diskless=yes, replica diskless=swapdb
[ok]: Master stream is correctly processed while the replica has a script in -BUSY state
[ok]: slave fails full sync and diskless load swapdb recovers it
[ok]: diskless loading short read
[TIMEOUT]: clients state report follows.
sockd58378 => (SPAWNED SERVER) pid:24500
Killing still running Redis server 24477
Killing still running Redis server 24489
Killing still running Redis server 24500
I looked for an error code but found only this:
!!! WARNING The following tests failed:
*** [TIMEOUT]: clients state report follows.
Cleanup: may take some time... OK
make[1]: *** [Makefile:391: test] Error 1
make[1]: Leaving directory '/home/pi/redis-stable/src'
make: *** [Makefile:6: test] Error 2
It doesn't say which tests failed, so I don't even know what to Google at this point.
Related
I am trying to setup a small cluster, managed with SLURM. The controller is also a compute node. The config in /etc/slurm/slurm.conf is:
NodeName=controller,node[01-02] RealMemory=250000 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN
PartitionName=compute Nodes=ALL Default=YES MaxTime=INFINITE State=UP
When running sinfo I get:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
compute* up infinite 2 unk* node[01-02]
compute* up infinite 1 idle controller
However, when running slurmd -C on each node I get:
NodeName=node01 CPUs=64 Boards=1 SocketsPerBoard=1 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=257655
UpTime=0-00:30:44
The same on the other node. I have allowed the ports 6817 and 6818 (the default slurm ports) on all machines (for TCP - which I assume is the protocol). I have also checked that the /etc/slurm/slurm.conf and /etc/slurm/slurmdbd.conf are the same, along with the munge keys (this works).
Is there anyway to debug the connection to a given machine?
Thanks in advance for any help.
I was able to go through the log files and found out the connections were being blocked. The cluster is using Fedora and so I added each machine to the firewall trusted list using this link - whitelist source ip addresses in centos 7
These updated firewall settings did not seem to be applied straight away so I had to restart all machines and now SLURM is functioning correctly.
Consider a redis sentinel setup with 5 machines. Each machine has sentinel process(s1,s2,s3,s4,s5) and redis instance(r1,r2,r3,r4,r5) running. One is master(r1) and others as slave(r2...r5). During failover of master r1, redis configuration slaveof of must be override with new master r3.
Who will override the redis configuration of slave redis(r2,r4,r5)? Elected sentinel responsible for failover(assuming s2 is elected sentinel) s2 will override the redis configuration at r2,r4,r5 or sentinel running at their respective machine will override the local redis configuration(sn will override configuration of rn)?
Elected Sentinel would update the configuration.This is the full list of Sentinel capabilities at a high level:
Monitoring: Sentinel constantly checks if your master and slave instances are working as expected.
Notification: Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances.
Automatic failover: If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
Configuration provider: Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
For more details, refer to docs
I am trying to adopt a service discovery mechanism for my system. I have a bunch of nodes and they will communicate with each other via gRpc. Because in some frameworks like Mesos, a new node is brought up after it fails would possibly has a different ip address and a different port, I am thinking of using service discovery so that each node can have a cluster config that is agnostic to node failure.
My current options are to using DNS or strongly-consistent key-value store like etcd or zookeeper. My problem is to understand how the cache of name mappings in healthy nodes get invalidated and updated when a node goes through down and up.
The possible ways I can think of are:
When healthy nodes detect a connection problem, they invalidate
their cache entry immediately and keep pulling the DNS registry
until the node is connectable again.
When a node is down and up, the DNS registry broadcasts the events to all healthy nodes. Seems this may require heartbeats from DNS registry.
The cache in each node has a TTL field and within a TTL interval each node has to live with the node failure until the cache entry expires and pulls from the DNS registry again.
My question is which option (you can name more) is the case in reality and why it is better than other alternatives?
I want to replace current 3 ZooKeeper servers with 3 new ZooKeeper servers. I have added:
new Zoo to Ambari,
add new Zoo to variables:
hbase.zookeeper.quorum
ha.zookeeper.quorum
zookeeper.connect
hadoop.registry.zk.quorum
yarn.resourcemanager.zk-address
Restart services, restart RM, and still I can't connect to any new Zoo when I turn off all old Zoo servers.
zookeeper-client -server zoo-new1
I get the following error:
"Unable to read additional data from server sessionid 0x0, likely server has closed socket"
And on new Zoo server in logs (zookeeper.out):
"Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running"
When I run one of the old ZooKeepers, then everything is working, and I can connect also to the new ZooKeeper servers.
My best guess is that this has to do with one of the most important properties in zookeeper, namely leader election. If you start with a zookeeper quorum with 3 servers and add 3 more servers to it. You will have to have at least 4 servers running for the quorum to be accessible. When a zookeeper node was unable to elect a leader it will look as if it's down.
This is also the reason why your setup works when you start one of the old zookeepers, because they are now 4 alive of 6 possible. If you want the new setup to work you need to remove the old servers from the config, so that the quorum only knows about the three new ones. To simply shut a zookeeper server down will not remove it from the quorum.
In Amazon VPC, on two nodes I have installed rabbitmq
On Node 1, I ran the following commands
#Node 1
/etc/init.d/rabbitmq-server stop
rabbitmq-server -detached
rabbitmqctl start_app
rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode": "all"}'
On Node 2, I ran the following commands to setup the cluster
/etc/init.d/rabbitmq-server stop
rabbitmq-server -detached
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit#<PrivateIP>
rabbitmqctl start_app
rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode": "all"}'
RabbitMQ nodes are behind a Elastic Load Balancer. I ran a java program to keep pushing messages into rabbitmq.
Case 1: rabbitmqctl list-queues -- showed the quename and queue message count same while the java program was pushing messages to the queue.
Case 2: I stopped rabbitmq on node 2 and then started it again. Checked the cluster status and queue message counts. The message count was correct ( 3330 on both node 1 and node 2 )
Case 3 : I stopped rabbitmq on node 1 while the java program was pushing messages to the queue.
I checked the queue message count in node 2 , count was 70.
I started rabbitmq on node 1, and then checked queue count was 75.
I want to setup a rabbitmq high availability cluster and ensure no message loss. I have enabled sync_queue on rabitmq start in /etc/init.d/rabbitmq-server.
Appreciate if you can point out, why the message counts dropped from approx 3330 to 70. And also what's the best way to setup and ensure HA.
A few tips:
Does your app use publisher confirms? If you don't want to lose messages, it should.
Is automatic syncing of queues enabled? If not, you have to manually initiate queue syncing for any queue.
You should not restart any node while queues are being synced, or messages might be lost.
If you don't want lose messages you should considerer to use tx-transtaction
channel.txSelect();
channel.basicPublish("", youQueue, MessageProperties.PERSISTENT_TEXT_PLAIN,
message.getBytes());
channel.txCommit();
This could be kill the performance, if you have a high messages rate.
Visit
http://www.rabbitmq.com/blog/2011/02/10/introducing-publisher-confirms/